Big Data News: Convergence with Mapr and Faster Stateful Streaming Processes with Spark

Uli Bethke Big Data, DFS, HUG Ireland, MapR, Spark

Mapr on Impedance Mismatch and how convergence is achieved for layered architecture along with Databricks on using the new Spark API “mapWithState” for faster Stateful Spark Streaming

ServerRoom

As our big data world comes to the end of another week, the team at Sonra have been once again impressed by the weeks highlights in big data.

Mapr has shared its insights on big data architecture regarding data services. They started by reviewing Kafka and Confluent with its inherent problems along with Kudu as a better version of Kafka. They note however that impedance mismatch is still present thanks to the stacked architecture approach to data services. The clash of programming approaches in data layers is something Mapr apparently have thought about and have resolved through convergence by taking a container-based approach to their file system as a platform. Message queues are not layered on top of a filesystem for example, they are deployed at same layer in the stack converging all data services onto one containerised layer. A clever approach to better data services that don’t suffer interactive illness through impedance mismatch. That along with faster processing speeds, better availability and consistency makes Mapr and its products an innovative ‘big data’ technology space to watch.  

Databricks published a great blog article about faster stateful streaming processes using Spark 1.6’s API “mapWithState”. They compare this new API with “updateStateByKey” and how it dramatically improves performance. The 10x performance increase claims along with better ability for developers to express programming logic makes Spark v1.6 a true elevation in feature richness and performance optimization. When it comes to big data architectures with integrated streaming, it would appear that Apache Spark is staying ahead of the pack and shows no sign of slowing down in feature innovations, architectural refinements or performance optimizations. If you are in the big data streaming space, then Spark v1.6 is one to definitely check out.

So as the week comes to a close, an exciting year opens up more and more secrets to delight us with Hadoop User Group (HUG) Ireland’s next event on Feb 8th with two further HUG Ireland meetups leading into Hadoop Summit Dublin on April 13/14th. Do note that HUG Ireland’s February 8th event will have a raffle for free tickets to Hadoop Summit so register today at hugireland.org and RSVP for the event to be in with a chance to win a free pass to a key European big data event in the National Convention Centre on the quays in Dublin 1. Have a great weekend all!..

About Sonra

We are a Big Data company based in Ireland. We are experts in data lake implementations, clickstream analytics, real time analytics, and data warehousing on Hadoop and Spark. We also run the Hadoop User Group (HUG) Ireland. We can help with your Big Data implementation. You can get in touch today, we would love to hear from you!