JSON. To ETL or to NoETL? The big data question.

Uli Bethke ETL, JSON

NoETL. The little brother of NoSQL You have probably come across the term NoSQL. It was coined a few years back to describe a class of database systems that can scale across a large number of nodes for distributed (and sometimes global processing) of transactions (OLTP). Very early technologies were DynamoDB and Cassandra. These technologies trade in scalability for consistency ...

SpaceX Performance for Snowflake with Clustering Keys

Dorian Beganovic Snowflake

Introduction Snowflake stores tables by dividing their rows across multiple micro-partitions (horizontal partitioning). Each micro-partition automatically gathers metadata about all rows stored in it such as the range of values (min/max etc.) for each of the columns. This is a standard feature of column store technologies. For example Apache ORC format (optimized row columnar) keeps similar statistics of its data. ...

Converting TVAnytime XML to Impala and Parquet

Chinmay Sinha XML

In this example we will use Flexter to convert an XML file to parquet. We then query and analyse the output with Impala (using Cloudera VM). Flexter can generate a target schema from an XML file or a combination of XML and XML schema (XSD) files. In our example we process ContentCS.xml file from the TVA data (https://tech.ebu.ch/tvanytime). "TV-Anytime" (TVA) ...