Window Functions (aka Analytic Functions) in Spark.

Uli Bethke analytic functions, Big Data, MapR, Spark, SparkSQL, SQL

As of Spark 1.4.0 we now have support for window functions (aka analytic functions) in SparkSQL. At Sonra we are heavy users of SparkSQL to handle data transformations for structured data. We also use it in combination with cached RDDs and Tableau for business intelligence and visual analytics. Spark SQL and Window Functions: The rationale I am a strong supporter ...

Hadoop User Group Ireland Meetup (24 June 2015): An Introduction to Spark

Uli Bethke Big Data, Hadoop, HUG Ireland, Spark, SparkSQL, SQL, Tableau, Uncategorized

Thanks again to everyone who attended the third Hadoop User Group Ireland meetup. Also thanks to Bank of Ireland Grand Canal Square for making the venue available. Participants in the event can send feedback to their Twitter and Facebook accounts: facebook.com/BOIGrandCanalSquare twitter.com/BOIGrandCanalSQ. Also thanks to Étienne from Idiro and Antonio from HP for their great presentations. We have all of ...

Multiple Spark Worker Instances on a single Node. Why more of less is more than less.

Uli Bethke Big Data, Hadoop, Spark

If you are running Spark in standalone mode on memory rich nodes it can be beneficial to have multiple worker instances on the same node as a very large heap size has two disadvantages: - Garbage collector pauses can hurt throughput of Spark jobs. - Heap size of >32 GB can't use CompressedOoops. So 35 GB is actually less than ...

In-memory analytics with Tableau, SparkSQL, and MapR

Uli Bethke Big Data, Hadoop, Hive, MapR, Spark, SparkSQL, Tableau

Last week Tableau released version 9.0 of their data visualisation tool. From a Big Data point of view the nicest new feature was support for querying cached (in-memory) SchemaRDDs (Data Frames as of Spark 1.3). In this tutorial I will show you how to connect to Spark 1.2.1 on the MapR 4.1 sandbox with Tableau 9.0. Pre-requisites: - Download the ...