Personal tools

Big Data Tools

CERN's Super Computin Grid_1
(CERN's Super Computing Grid, CERN)


Apache Hadoop is designed to support the processing of large data sets in a distributed computing environment. Hadoop can handle big batches of distributed information but most often there's a need for a real time processing of people generated data like Twitter or Facebook updates. Financial compliance monitoring is another area of our central time processing is needed, in particular to reduce market data. Social media and market data are two types of what we call high velocity data. 

Apache Storm and Spark are two other open source frameworks that handle such real time data generated at a fast rate. Both Storm and Spark can integrate data with any database or data storage technology. 


  • [Apache Hadoop]: The Apache Hadoop project develops open-source software for reliable, scalable, distributed computing. 
  • [Apache Storm]: Apache Storm is a free and open source distributed realtime computation system. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing. Storm is simple, can be used with any programming language.
  • [Apache Spark]: Apache Spark is a fast and general engine for large-scale data processing. Apache Spark is an open source big data processing framework built around speed, ease of use, and sophisticated analytics. It was originally developed in 2009 in UC Berkeley’s AMPLab, and open sourced in 2010 as an Apache project.



[More to come ...]


Document Actions