Comparing Window Function Features by Database Vendors

Jiří Mauritz Data Warehouse, Redshift, SQL for Analysis, Window Functions

We will round off the series on window functions with comparison of what database vendors offer. There are various mutations of window functions and every vendor supports a different subset or feature. Some also add extra window functions or features beyond standard ANSI SQL. One of the most powerful features is user-defined aggregate functions (UDAF), which some databases allow using ...

Big Data News: Convergence with Mapr and Faster Stateful Streaming Processes with Spark

Uli Bethke Big Data, DFS, HUG Ireland, MapR, Spark

Mapr on Impedance Mismatch and how convergence is achieved for layered architecture along with Databricks on using the new Spark API “mapWithState” for faster Stateful Spark Streaming As our big data world comes to the end of another week, the team at Sonra have been once again impressed by the weeks highlights in big data. Mapr has shared its insights ...

Big Data News: Streaming in the Extreme.. An evolution in Data Processing and Analytics

Uli Bethke Apache, Big Data, Data Science, Data Science, MapR, Technology

Google’s Dataflow have submitted a project proposal to open source Dataflow through the Apache Software Foundation along with MapR on Streaming across Data Centers. As another week comes to a close, the wheels of our big data community continue to move in cycles of innovation and progress, which as always never fail to impress. Google’s Tyler Adikiu brought us through ...

Big Data News: Yahoo’s Data Sketching and Apache Spark 1.6

Uli Bethke Apache, Big Data, Community, Data Science, Data Science, Hadoop, Hive, Open Source Software, Technology

Launching 2016 in style with an exploration of Yahoo’s successful scaling of aggregate computational queries using data sketching libraries to Apache Spark releasing Spark 1.6 Firstly, the team at Sonra would like to wish you and yours every success in 2016. As the arrow of time pushes us forward, our Big Data industry is forging ahead in a cycle of ...

Big Data News: HyperLogLog with Spark and Open Source GZinga Compression

Uli Bethke Big Data, Spark, Technology

Exploring the performance enhancements of HyperLogLog on Spark and adding splittable and seekable features to Gzip in a new open source project called GZinga Life is never dull in big data and as I left a great Spark Dublin meetup last night pondering the distributed performance enhancements of using dataframes in Spark, I was once again struck by the continuous ...