Big Data News: Apache Kafka 0.9 Release and Ebay Open Sourcing of Pulsar Reporting

Uli Bethke Big Data, HUG Ireland, Open Source Software, Technology

Open Source movers this week with Kafka 0.9 release along with Ebay’s extension of Pulsar open source with Pulsar Reporting

imgMacHeaven

 The world does not stand still too long in Big Data as more open source projects come to fruition adding value and substance to our community’s open source movement.

 Apache Kafka was one such movement with its 0.9 version release that has some groovy new features/enhancements that increase the overall efficiency and effectiveness of Kafka on any big data technology stack. The main areas of improvement are as follows:

 Security: Kafka itself now authenticates users using TLS certificates or Kerberos. User defined access similar to Unix access is also implemented along with over the wire support  (via SSL) for encryption protecting data across untrusted networks.

 Connect: Kafka has become popular for enabling streaming data but tools supporting its adoption have created problems around availability and scalability for users. This release looks to fix the problems of availability and scalability by imbedding  Kafka Connect (aka Copycat). It replaces stand alone tools like Camus as an integrated import and export tool for Kafka implementations solving the problems of fault tolerance, partitioning and offset management, etc.

 User Defined Quotas: Kafka in this release gives power to enforce user quotas for access to the cluster preventing memory pressure, which would be thanks to the fast rate of (uncapped) data consumption by Kafka which can also place I/O strain on a cluster’s broker resources impacting cluster health and productivity. Client read/write quotas are a logical way to marshall resources on a multi tenancy platform increasing resource efficiency and productivity on a daily basis.

 Ebay have being also busy with their announcement that their open source project Pulsar Pipeline has a new addition called Pulsar Reporting. The project is a near real time analytics tool that is designed for Pulsar Pipeline, providing real time graphics and reporting for the Pulsar pipeline. It sits neatly in the Pulsar Framework on top of the pulsar APIs, which in turn is on top of OLAP engines and Druid, which in turn sits neatly on top of Kafka and Hadoop. If you are using Pulsar or thinking about it, then this addition will make life much easier in an implementation.

 Pulsar reporting can:

  • Generate (near) Real Time Reports
  • Provides data visualisations through an extensive chart widget set
  • Add or remove data sources with no downtime
  • Manage authentication and approval permissions for data access
  • Be responsive on multiple screen sizes thanks to responsive design features

Pulsar reporting has:

  • A reporting API for external data sources (using SQL/JSON)
  • A visual GUI editor for non coding analysts who want to build reports
  • Efficient streaming capability via Kafka and Druid

As the week closes, the team here at Sonra are excited about the possibilities these developments plus more bring to our big data community. On that note, we will be sponsoring a Hadoop User Group (HUG) Ireland event tomorrow morning @Filmbase; exploring more of this exciting open sources developments with Greenplum and HAWQ SQL to name but a few. Hope to see you there, but if not; have a great weekend and week to come. We look forward to sharing our experiences with you again next week so stay tuned for developments! Bonne weekend all!

About Sonra

We are a Big Data company based in Ireland. We are experts in data lake implementations, clickstream analytics, real time analytics, and data warehousing on Hadoop and Spark. We also run the Hadoop User Group (HUG) Ireland. We can help with your Big Data implementation. You can get in touch today, we would love to hear from you!