Converting ACORD XML to Avro row storage

Uli Bethke Apache, XML

In this example we will use Flexter to convert an XML file to the Apache Avro format. We then query and analyse the output in the Spark-Shell. Flexter can generate a target schema from an XML file or a combination of XML and XML schema (XSD) files. We will use the data from The ACORD RLC Insurance and Reinsurance Service ...

About the author

Uli Bethke LinkedIn Profile

Uli has 18 years’ hands on experience as a consultant, architect, and manager in the data industry. He frequently speaks at conferences. Uli has architected and delivered data warehouses in Europe, North America, and South East Asia. He is a traveler between the worlds of traditional data warehousing and big data technologies.

Uli is a regular contributor to blogs and books, holds an Oracle ACE award, and chairs the the Hadoop User Group Ireland. He is also a co-founder and VP of the Irish chapter of DAMA, a non for profit global data management organization. He has co-founded the Irish Oracle Big Data User Group.

Using Apache Airflow to build reusable ETL on AWS Redshift

Dorian Beganovic Apache

Building a data pipeline on Apache Airflow to populate AWS Redshift In this post we will introduce you to the most popular workflow management tool - Apache Airflow. Using Python as our programming language we will utilize Airflow to develop re-usable and parameterizable ETL processes that ingest data from S3 into Redshift and perform an upsert from a source table ...

Apache Spark Quickstart Packages

Uli Bethke Apache, Spark, SparkSQL

We are pleased to announce three Apache Spark Quickstart Packages. The packages are designed for companies that want to explore and evaluate Apache Spark. Example Use Cases The quickstart packages can be used for various scenarios. I have listed some use cases below. You would like to evaluate a certain Spark feature and identify its benefits and limitations You don’t ...

About the author

Uli Bethke LinkedIn Profile

Uli has 18 years’ hands on experience as a consultant, architect, and manager in the data industry. He frequently speaks at conferences. Uli has architected and delivered data warehouses in Europe, North America, and South East Asia. He is a traveler between the worlds of traditional data warehousing and big data technologies.

Uli is a regular contributor to blogs and books, holds an Oracle ACE award, and chairs the the Hadoop User Group Ireland. He is also a co-founder and VP of the Irish chapter of DAMA, a non for profit global data management organization. He has co-founded the Irish Oracle Big Data User Group.

Big Data News: Streaming in the Extreme.. An evolution in Data Processing and Analytics

Uli Bethke Apache, Big Data, Data Science, Data Science, MapR, Technology

Google’s Dataflow have submitted a project proposal to open source Dataflow through the Apache Software Foundation along with MapR on Streaming across Data Centers. As another week comes to a close, the wheels of our big data community continue to move in cycles of innovation and progress, which as always never fail to impress. Google’s Tyler Adikiu brought us through ...

About the author

Uli Bethke LinkedIn Profile

Uli has 18 years’ hands on experience as a consultant, architect, and manager in the data industry. He frequently speaks at conferences. Uli has architected and delivered data warehouses in Europe, North America, and South East Asia. He is a traveler between the worlds of traditional data warehousing and big data technologies.

Uli is a regular contributor to blogs and books, holds an Oracle ACE award, and chairs the the Hadoop User Group Ireland. He is also a co-founder and VP of the Irish chapter of DAMA, a non for profit global data management organization. He has co-founded the Irish Oracle Big Data User Group.

Big Data News: Apache Samza V 0,0.10 Release and Dataiku on great Predictive Modelling for Healthcare

Uli Bethke Apache, Big Data, Business Intelligence, Data Discovery, Data Science, Data Science

Apache Samza Release of V 0,0.10 and Dataiku’s Free eBook on how great Predictive Modelling projects are done in Healthcare As the week draws to a close, the team here at Sonra have once again been impressed by the recent developments our industry has presented our community with. Apache has launched their new release of Samza V0, 0.10. This big ...

About the author

Uli Bethke LinkedIn Profile

Uli has 18 years’ hands on experience as a consultant, architect, and manager in the data industry. He frequently speaks at conferences. Uli has architected and delivered data warehouses in Europe, North America, and South East Asia. He is a traveler between the worlds of traditional data warehousing and big data technologies.

Uli is a regular contributor to blogs and books, holds an Oracle ACE award, and chairs the the Hadoop User Group Ireland. He is also a co-founder and VP of the Irish chapter of DAMA, a non for profit global data management organization. He has co-founded the Irish Oracle Big Data User Group.