In this post, we'll discuss another important topic of big data processing: real-time stream processing area. This is an area where Hadoop falls short because of its high latency, and another open source framework Storm is developed to cover the need in real-time processing. Unfortunately, Hadoop and Storm provides quite different programming model, resulting in high development and maintenance cost. Continue from my previous post on Spark , which provides a highly efficient parallel processing framework. Spark streaming is a natural extension of its core programming paradigm to provide large-scale, real-time data processing. The biggest benefits of using Spark Streaming is that it is based on a similar programming paradigm of its core and there is no need to develop and maintain a completely different programming paradigm for batch and realtime processing. Spark Core Prog...