Big Data News: Apache Samza V 0,0.10 Release and Dataiku on great Predictive Modelling for Healthcare

January 22, 2016

Apache Samza Release of V 0,0.10 and Dataiku’s Free eBook on how great Predictive Modelling projects are done in Healthcare
As the week draws to a close, the team here at Sonra have once again been impressed by the recent developments our industry has presented our community with.
Apache has launched their new release of Samza V0, 0.10. This big data messaging framework works seamlessly with Apache Kafka and is the third release as a Top Level Project from Apache. The ‘top level’ focus it has received since January last has produced the following features, that make up some of the release highlights for Samza users in their big data framework:

  • HDFS Provider introduced to allow Samza to directly write to HDFS stores and Elasticsearch
  • Host affinity feature introduced for Yarn increasing efficacy of recovery for stateful jobs
  • Coordinator Stream introduced for configuring large Samza jobs along with a CLI application for said configuration

This upgrade will be certainly good news for anybody using Samza especially with Yarn, Hadoop and Kafka. It is also notable that this Apache TLP project is on the rise with a growing community of contributors and companies getting on board with Samza. These community contributions are a sure sign of Samza’s growing efficacy as it gains traction as a Top Level Project with Apache.
Dataiku have a blog article introducing their new (free) eBook on deploying a Predictive Modelling Project in healthcare. Predictive Analytics is a fascinating area that is far from mature and Dataiku have been good enough to add a free eBook to an insightful guide on deploying a Predictive Analytics Project. From knowing what you want in clear terms from your project to data ingestion to data transformation to data modelling and machine learning, the article maps out “Patient No Shows” in a project format that illustrates how big data can help Healthcare once the steps to effective Predictive analytics are taken. The eBook, which I just downloaded (eBook and/or PDF format) is free and is an enlightening sharing of knowledge by the Data Scientists at Dataiku.
There is no doubt that this week is like every other in our community, where great innovations lead to great revelations that delight and excite our curiosity for the week to come. Have a great weekend all and happy trails for the week to come…
About Sonra
We are a Big Data company based in Ireland. We are experts in data lake implementations, clickstream analytics, real time analytics, and data warehousing on Hadoop and Spark. We also run the Hadoop User Group (HUG) Ireland. We can help with your Big Data implementation. You can get in touch today, we would love to hear from you!