Personal tools

Apache Spark

Rice University_122422A
[Rice University]
 

- Overview

Apache Spark is an open-source data processing engine for large data sets. It's designed for big data applications, such as streaming data, graph data, machine learning, and artificial intelligence (AI). 

Spark can:

  • Perform processing tasks on very large data sets
  • Distribute data processing tasks across multiple computers
  • Handle both batches as well as real-time analytics and data processing workloads
  • Utilize in-memory caching
  • Optimize query execution for fast analytic queries against data of any size

 

Spark can run on:

  • Apache Hadoop
  • Apache Mesos
  • Kubernetes
  • On its own
  • In the cloud
  • Against diverse data sources

 

Spark started in 2009 as a research project at the University of California, Berkeley. Thousands of companies, including 80% of the Fortune 500, use Apache Spark.
 

 

 

[More to come ...]

  

Document Actions