SpaceX Performance for Snowflake with Clustering Keys

Dorian Beganovic Snowflake

Introduction Snowflake stores tables by dividing their rows across multiple micro-partitions (horizontal partitioning). Each micro-partition automatically gathers metadata about all rows stored in it such as the range of values (min/max etc.) for each of the columns. This is a standard feature of column store technologies. For example Apache ORC format (optimized row columnar) keeps similar statistics of its data. ...

Create your own custom aggregate (UDAF) and window functions in Snowflake

Dorian Beganovic Snowflake

In this post we will show you how to create your own aggregate functions in Snowflake cloud data warehouse. This type of feature is known as a user defined aggregate function. Most big data frameworks such as Spark, Hive, Impala etc. let you create your own UDAFs. Also traditional databases such as Oracle or SQL Server have this feature. However, ...

Learn Window Functions on Snowflake. Become a cloud data warehouse superhero.

Dorian Beganovic Snowflake, Window Functions

In a recent post we compared Window Function Features by Database Vendors. In this post we will give you an overview on the support for various window function features on Snowflake. Window functions are essential for data warehousing Window functions are the base of data warehousing workloads for many reasons. First of all they are very similar to the GROUP ...

Streaming Tweets to Snowflake Data Warehouse with Spark Structured Streaming and Kafka

Dorian Beganovic Kafka, Snowflake, Spark

Streaming Tweets to Snowflake Data Warehouse with Spark Structured Streaming and Kafka Streaming architecture In this post we will build a system that ingests real time data from Twitter, packages it as JSON objects and sends it through a Kafka Producer to a Kafka Cluster. A Spark Streaming application will then consume those tweets in JSON format and stream them ...