Spark: Low Latency, Massively Parallel Processing Framework

While Hadoop fits well in most batch processing workloads, and is the primary choice of big data processing today, it is not optimized for other types of workloads due to its following limitation:

Nevertheless, the Map/Reduce processing paradigm is a proven mechanism for dealing with large scale data.  On the other hand, many of Hadoop’s infrastructure piece such as HDFS, HBase has been mature over time.

In this blog post, we’ll look at a different architecture called Spark, which has taken the strength of Hadoop and made improvements in a number of Hadoop’s weaknesses, and provides a more efficient batch processing framework with a much lower latency.  Spark has generated a lot of excitement in the big data community and represents a very promising parallel execution stack for big data analytics.  […]