Big Data News: HyperLogLog with Spark and Open Source GZinga Compression

Uli Bethke Big Data, Spark, Technology

Exploring the performance enhancements of HyperLogLog on Spark and adding splittable and seekable features to Gzip in a new open source project called GZinga Life is never dull in big data and as I left a great Spark Dublin meetup last night pondering the distributed performance enhancements of using dataframes in Spark, I was once again struck by the continuous ...