Convert XML with Spark to Parquet

Chinmay Sinha Spark, XML

It can be very easy to use Spark to convert XML to Parquet and then query and analyse the output data. As I have outlined in a previous post, XML processing can be painful especially when you need to convert large volumes of complex XML files. Apache Spark has various features that make it a perfect fit for processing XML ...

Window Functions (aka Analytic Functions) in Spark.

Uli Bethke analytic functions, Big Data, MapR, Spark, SparkSQL, SQL

As of Spark 1.4.0 we now have support for window functions (aka analytic functions) in SparkSQL. At Sonra we are heavy users of SparkSQL to handle data transformations for structured data. We also use it in combination with cached RDDs and Tableau for business intelligence and visual analytics. Spark SQL and Window Functions: The rationale I am a strong supporter ...

Hadoop User Group Ireland Meetup (24 June 2015): An Introduction to Spark

Uli Bethke Big Data, Hadoop, HUG Ireland, Spark, SparkSQL, SQL, Tableau, Uncategorized

Thanks again to everyone who attended the third Hadoop User Group Ireland meetup. Also thanks to Bank of Ireland Grand Canal Square for making the venue available. Participants in the event can send feedback to their Twitter and Facebook accounts: facebook.com/BOIGrandCanalSquare twitter.com/BOIGrandCanalSQ. Also thanks to Étienne from Idiro and Antonio from HP for their great presentations. We have all of ...

In-memory analytics with Tableau, SparkSQL, and MapR

Uli Bethke Big Data, Hadoop, Hive, MapR, Spark, SparkSQL, Tableau

Last week Tableau released version 9.0 of their data visualisation tool. From a Big Data point of view the nicest new feature was support for querying cached (in-memory) SchemaRDDs (Data Frames as of Spark 1.3). In this tutorial I will show you how to connect to Spark 1.2.1 on the MapR 4.1 sandbox with Tableau 9.0. Pre-requisites: - Download the ...