Converting ACORD XML to Avro row storage

In this example we will use Flexter to convert an XML file to the Apache Avro format. We then query and analyse the output in the Spark-Shell. Flexter can generate…
Uli Bethke April 26, 2018

Using Apache Airflow to build reusable ETL on AWS Redshift

Building a data pipeline on Apache Airflow to populate AWS Redshift In this post we will introduce you to the most popular workflow management tool - Apache Airflow. Using Python…
Dorian Beganovic January 1, 2018

Apache Spark Quickstart Packages

We are pleased to announce three Apache Spark Quickstart Packages. The packages are designed for companies that want to explore and evaluate Apache Spark. Example Use Cases The quickstart packages…
Uli Bethke May 15, 2017

Big Data News: Streaming in the Extreme.. An evolution in Data Processing and Analytics

Google’s Dataflow have submitted a project proposal to open source Dataflow through the Apache Software Foundation along with MapR on Streaming across Data Centers. As another week comes to a…
Uli Bethke January 29, 2016

Big Data News: Apache Samza V 0,0.10 Release and Dataiku on great Predictive Modelling for Healthcare

Apache Samza Release of V 0,0.10 and Dataiku’s Free eBook on how great Predictive Modelling projects are done in Healthcare As the week draws to a close, the team here…
Uli Bethke January 22, 2016

Big Data News: HUG Ireland’s 1st 2016 Big Data Event, Airbnb’s Predictive Model using NPS and Hive Optimization

Hadoop User Group (HUG) Ireland packed the house with a great evening on Apache Mesos/Myriad and an overview of Airbnb’s Predictive Model After a restful holiday season, the new year…
Uli Bethke January 15, 2016
spinner