Big Data News: Convergence with Mapr and Faster Stateful Streaming Processes with Spark

Uli Bethke Big Data, DFS, HUG Ireland, MapR, Spark

Mapr on Impedance Mismatch and how convergence is achieved for layered architecture along with Databricks on using the new Spark API “mapWithState” for faster Stateful Spark Streaming

ServerRoom

As our big data world comes to the end of another week, the team at Sonra have been once again impressed by the weeks highlights in big data.

Mapr has shared its insights on big data architecture regarding data services. They started by reviewing Kafka and Confluent with its inherent problems along with Kudu as a better version of Kafka. They note however that impedance mismatch is still present thanks to the stacked architecture approach to data services. The clash of programming approaches in data layers is something Mapr apparently have thought about and have resolved through convergence by taking a container-based approach to their file system as a platform. Message queues are not layered on top of a filesystem for example, they are deployed at same layer in the stack converging all data services onto one containerised layer. A clever approach to better data services that don’t suffer interactive illness through impedance mismatch. That along with faster processing speeds, better availability and consistency makes Mapr and its products an innovative ‘big data’ technology space to watch.  

Databricks published a great blog article about faster stateful streaming processes using Spark 1.6’s API “mapWithState”. They compare this new API with “updateStateByKey” and how it dramatically improves performance. The 10x performance increase claims along with better ability for developers to express programming logic makes Spark v1.6 a true elevation in feature richness and performance optimization. When it comes to big data architectures with integrated streaming, it would appear that Apache Spark is staying ahead of the pack and shows no sign of slowing down in feature innovations, architectural refinements or performance optimizations. If you are in the big data streaming space, then Spark v1.6 is one to definitely check out.

So as the week comes to a close, an exciting year opens up more and more secrets to delight us with Hadoop User Group (HUG) Ireland’s next event on Feb 8th with two further HUG Ireland meetups leading into Hadoop Summit Dublin on April 13/14th. Do note that HUG Ireland’s February 8th event will have a raffle for free tickets to Hadoop Summit so register today at hugireland.org and RSVP for the event to be in with a chance to win a free pass to a key European big data event in the National Convention Centre on the quays in Dublin 1. Have a great weekend all!..

About Sonra

We are a Big Data company based in Ireland. We are experts in data lake implementations, clickstream analytics, real time analytics, and data warehousing on Hadoop and Spark. We also run the Hadoop User Group (HUG) Ireland. We can help with your Big Data implementation. You can get in touch today, we would love to hear from you!

About the author

Uli Bethke LinkedIn Profile

Uli has 18 years’ hands on experience as a consultant, architect, and manager in the data industry. He frequently speaks at conferences. Uli has architected and delivered data warehouses in Europe, North America, and South East Asia. He is a traveler between the worlds of traditional data warehousing and big data technologies.

Uli is a regular contributor to blogs and books, holds an Oracle ACE award, and chairs the the Hadoop User Group Ireland. He is also a co-founder and VP of the Irish chapter of DAMA, a non for profit global data management organization. He has co-founded the Irish Oracle Big Data User Group.