Big Data News: Streaming in the Extreme.. An evolution in Data Processing and Analytics

Uli Bethke Apache, Big Data, Data Science, Data Science, MapR, Technology

Google’s Dataflow have submitted a project proposal to open source Dataflow through the Apache Software Foundation along with MapR on Streaming across Data Centers.

imgStreamingStarts

As another week comes to a close, the wheels of our big data community continue to move in cycles of innovation and progress, which as always never fail to impress.

Google’s Tyler Adikiu brought us through a thoughtful whistle stop tour of streaming data beyond batch processing in part 2 of a 2 part series. This lengthy article let us in on Google’s project proposal to Apache on open sourcing Dataflow. It then went into a deep dive on a number of streaming areas focusing on unbound data pipeline streaming features like windowing, triggers and accumulation. He then goes onto explore Dataflow and how it handles these features in context to a data processing pipeline, where one is ingesting, transforming and processing data sets. The article’s resulting focus on windowing for streaming data is an deep dive of insightful and articulate thought on where streaming needs to go in the short to medium term to maintain the progressive momentum achieved to date.

MapR’s exploration of Streaming in the Extreme is a wonderful exploration of streaming across data centres and what works in terms of models, platforms and application. The exploration of streaming starts at the primary enabler, which is messaging. After defining the general use case for scaled streaming, the exploration of messaging covers important and impacting modelling points on message handling. The “at most once” message model is a tried and tested approach that is quite rightly advocated. MapR’s Jim Scott then explores options, messaging platform choices, streaming engines and opportunities for improvement that make allot of sense when thinking about throughput, workload and business level demands placed upon any big data infrastructure.  

As the weekend calls upon us once again, we should take great comfort in seeing the progression and direction our big data community is moving in. The future's looking bright indeed! Have a great weekend all!!

About Sonra

We are a Big Data company based in Ireland. We are experts in data lake implementations, clickstream analytics, real time analytics, and data warehousing on Hadoop and Spark. We also run the Hadoop User Group (HUG) Ireland. We can help with your Big Data implementation. You can get in touch today, we would love to hear from you!

About the author

Uli Bethke LinkedIn Profile

Uli has 18 years’ hands on experience as a consultant, architect, and manager in the data industry. He frequently speaks at conferences. Uli has architected and delivered data warehouses in Europe, North America, and South East Asia. He is a traveler between the worlds of traditional data warehousing and big data technologies.

Uli is a regular contributor to blogs and books, holds an Oracle ACE award, and chairs the the Hadoop User Group Ireland. He is also a co-founder and VP of the Irish chapter of DAMA, a non for profit global data management organization. He has co-founded the Irish Oracle Big Data User Group.