Convert XML with Spark to Parquet

Chinmay Sinha Spark, XML

It can be very easy to use Spark to convert XML to Parquet and then query and analyse the output data. As I have outlined in a previous post, XML processing can be painful especially when you need to convert large volumes of complex XML files. Apache Spark has various features that make it a perfect fit for processing XML ...

Streaming Tweets to Snowflake Data Warehouse with Spark Structured Streaming and Kafka

Dorian Beganovic Kafka, Snowflake, Spark

Streaming Tweets to Snowflake Data Warehouse with Spark Structured Streaming and Kafka Streaming architecture In this post we will build a system that ingests real time data from Twitter, packages it as JSON objects and sends it through a Kafka Producer to a Kafka Cluster. A Spark Streaming application will then consume those tweets in JSON format and stream them ...

Big Data News: Convergence with Mapr and Faster Stateful Streaming Processes with Spark

Uli Bethke Big Data, DFS, HUG Ireland, MapR, Spark

Mapr on Impedance Mismatch and how convergence is achieved for layered architecture along with Databricks on using the new Spark API “mapWithState” for faster Stateful Spark Streaming As our big data world comes to the end of another week, the team at Sonra have been once again impressed by the weeks highlights in big data. Mapr has shared its insights ...

Big Data News: Yahoo’s Data Sketching and Apache Spark 1.6

Uli Bethke Apache, Big Data, Community, Data Science, Data Science, Hadoop, Hive, Open Source Software, Technology

Launching 2016 in style with an exploration of Yahoo’s successful scaling of aggregate computational queries using data sketching libraries to Apache Spark releasing Spark 1.6 Firstly, the team at Sonra would like to wish you and yours every success in 2016. As the arrow of time pushes us forward, our Big Data industry is forging ahead in a cycle of ...