Big Data News: Streaming in the Extreme.. An evolution in Data Processing and Analytics

Uli Bethke Apache, Big Data, Data Science, Data Science, MapR, Technology

Google’s Dataflow have submitted a project proposal to open source Dataflow through the Apache Software Foundation along with MapR on Streaming across Data Centers.

imgStreamingStarts

As another week comes to a close, the wheels of our big data community continue to move in cycles of innovation and progress, which as always never fail to impress.

Google’s Tyler Adikiu brought us through a thoughtful whistle stop tour of streaming data beyond batch processing in part 2 of a 2 part series. This lengthy article let us in on Google’s project proposal to Apache on open sourcing Dataflow. It then went into a deep dive on a number of streaming areas focusing on unbound data pipeline streaming features like windowing, triggers and accumulation. He then goes onto explore Dataflow and how it handles these features in context to a data processing pipeline, where one is ingesting, transforming and processing data sets. The article’s resulting focus on windowing for streaming data is an deep dive of insightful and articulate thought on where streaming needs to go in the short to medium term to maintain the progressive momentum achieved to date.

MapR’s exploration of Streaming in the Extreme is a wonderful exploration of streaming across data centres and what works in terms of models, platforms and application. The exploration of streaming starts at the primary enabler, which is messaging. After defining the general use case for scaled streaming, the exploration of messaging covers important and impacting modelling points on message handling. The “at most once” message model is a tried and tested approach that is quite rightly advocated. MapR’s Jim Scott then explores options, messaging platform choices, streaming engines and opportunities for improvement that make allot of sense when thinking about throughput, workload and business level demands placed upon any big data infrastructure.  

As the weekend calls upon us once again, we should take great comfort in seeing the progression and direction our big data community is moving in. The future's looking bright indeed! Have a great weekend all!!

About Sonra

We are a Big Data company based in Ireland. We are experts in data lake implementations, clickstream analytics, real time analytics, and data warehousing on Hadoop and Spark. We also run the Hadoop User Group (HUG) Ireland. We can help with your Big Data implementation. You can get in touch today, we would love to hear from you!