Advanced Spark Structured Streaming - Aggregations, Joins, Checkpointing

Dorian Beganovic Spark

In this post we are going to build a system that ingests real time data from Twitter, packages it as JSON objects and sends it through a Kafka Producer to a Kafka Cluster. A Spark Streaming application will then parse those tweets in JSON format and perform various transformations on them including filtering, aggregations and joins. A table in a ...

Streaming Tweets to Snowflake Data Warehouse with Spark Structured Streaming and Kafka

Dorian Beganovic Kafka, Snowflake, Spark

Streaming Tweets to Snowflake Data Warehouse with Spark Structured Streaming and Kafka Streaming architecture In this post we will build a system that ingests real time data from Twitter, packages it as JSON objects and sends it through a Kafka Producer to a Kafka Cluster. A Spark Streaming application will then consume those tweets in JSON format and stream them ...

Apache Spark Quickstart Packages

Uli Bethke Apache, Spark, SparkSQL

We are pleased to announce three Apache Spark Quickstart Packages. The packages are designed for companies that want to explore and evaluate Apache Spark. Example Use Cases The quickstart packages can be used for various scenarios. I have listed some use cases below. You would like to evaluate a certain Spark feature and identify its benefits and limitations You don’t ...

A brief history of XML - From hype to useful data format

Vadim Mytarev Flexter, Hadoop, Spark, XML, XSD

Is XML really dead? When it first became popular about 20 years ago, XML was meant to be the one and only format to serialize, encapsulate, and exchange data. The serialization format to end all serialization formats so to speak. This was a bold claim. Has it materialised? Over the last couple of years it has become clear that this ...

Take the pain out of XML processing on Spark.

Maciek Kocon Big Data, Spark, XML

Did you ever have to process XML files? Complex and large ones? Lots of them? No matter which processing framework or programming language you use it always is pain. It never is easy. It can be sure that it is very time consuming and error prone. Unless you have a very simple XML file you are guaranteed to run into ...