Sonra to present at DataWorks Summit San Jose (17-21 June)
Our CEO Uli Bethke will co-present with Avi Sawant from Cincinnati Insurance Company on how Flexter, our powerful ETL tool for XML and JSON, converts large volumes of ACORD based insurance XMLs to ORC and Hive on the Hortonworks platform. Here is the abstract for the presentation.
Add a SPARK to your ETL
Wednesday, June 20
11:00 AM – 11:40 AM
Meeting Room 230A
A large proportion of enterprise data is locked away in complex and verbose industry data standards (ACORD, FpML, HL7, ISO 20022, XBRL etc.) or other proprietary formats based on XML/JSON. Standard ETL tools do not work well to unlock this data for data analytics. They do not scale, don’t perform well, and they don’t handle change gracefully. Besides, traditional ETL development takes forever to create the required data pipelines. Flexter (a distributed big data solution from Sonra) has solved this problem with Apache Spark and completely automated the process of converting complex XML/JSON into text, a relational database, or Hadoop. In this talk, we will describe how we solved the problem of processing complex XML files (modeled after ACORD Insurance Industry standard) at Cincinnati Insurance Company. Cincinnati Insurance Company is a subsidiary of Cincinnati Financial Corporation, a FORTUNE 500® company and included in the 2017 FORTUNE 500 list of the largest U.S. companies. We will walk you through the architecture and go into the technical details of Cincinnati Insurance Company’s data warehouse solution, which runs on Hortonworks HDF/HDP platform enabled by Flexter which runs on Spark and is written in Scala. Along the way we will describe the problems we have solved with our existing data pipeline and how we made our platform more flexible in terms of ingesting different types of XMLs originating from our operational systems. We will also provide an architectural inner workings of the Flexter tool for XML/JSON processing.