How to convert XML to Spark Delta Tables and Parquet

The main option for converting XML on Spark to Parquet and Delta Tables is the Spark XML-Library. It is an external library that can be integrated with Spark but does…
Chinmay Sinha January 25, 2018

Advanced Spark Structured Streaming – Aggregations, Joins, Checkpointing

In this post we are going to build a system that ingests real time data from Twitter, packages it as JSON objects and sends it through a Kafka Producer to…
Dorian Beganovic November 27, 2017

Streaming Tweets to Snowflake Data Warehouse with Spark Structured Streaming and Kafka

Streaming Tweets to Snowflake Data Warehouse with Spark Structured Streaming and Kafka Streaming architecture In this post we will build a system that ingests real time data from Twitter, packages…
Dorian Beganovic November 20, 2017

Apache Spark Quickstart Packages

We are pleased to announce three Apache Spark Quickstart Packages. The packages are designed for companies that want to explore and evaluate Apache Spark. Example Use Cases The quickstart packages…
Uli Bethke May 15, 2017

A brief history of XML – From hype to useful data format

Is XML really dead? When it first became popular about 20 years ago, XML was meant to be the one and only format to serialize, encapsulate, and exchange data. The…
Vadim Mytarev October 18, 2016

Take the pain out of XML processing on Spark.

Note: We have written an updated version of this post that shows XML conversion on Spark to Parquet with code samples. Did you ever have to process XML files? Complex…
Maciek Kocon September 8, 2016
spinner