Loading data into Snowflake and performance of large joins

Dorian Beganovic Snowflake

Introduction In this blog post we will load a large dataset into Snowflake and then evaluate the performance of joins in Snowflake. Loading large data into Snowflake Dataset The dataset we will load is hosted on Kaggle and contains Checkouts of Seattle library from 2006 until 2017. You can also download the data and see some samples here. The dataset ...

Deep dive on caching in Snowflake

Dorian Beganovic Snowflake

In this post we will explain the clever caching strategies Snowflake uses for performance optimization. In the process we will also cover related internals of Snowflake. A lot of information is from the official research paper created by the Snowflake authors which explains the architecture of Snowflake in depth. Caching in virtual warehouses Snowflake strictly separates the storage layer from ...

The top 10+1 things we love about Snowflake

Dorian Beganovic Snowflake

Introduction I have been familiarising myself with Snowflake over the last couple of months and these are my impressions on the top 10+1 features that really make Snowflake stand out compared to other cloud based data warehouse solutions. 1. Results of queries are stored and can be viewed in query history The fact that Snowflake stores the history of queries ...

SpaceX Performance for Snowflake with Clustering Keys

Dorian Beganovic Snowflake

Introduction Snowflake stores tables by dividing their rows across multiple micro-partitions (horizontal partitioning). Each micro-partition automatically gathers metadata about all rows stored in it such as the range of values (min/max etc.) for each of the columns. This is a standard feature of column store technologies. For example Apache ORC format (optimized row columnar) keeps similar statistics of its data. ...

Create your own custom aggregate (UDAF) and window functions in Snowflake

Dorian Beganovic Snowflake

In this post we will show you how to create your own aggregate functions in Snowflake cloud data warehouse. This type of feature is known as a user defined aggregate function. Most big data frameworks such as Spark, Hive, Impala etc. let you create your own UDAFs. Also traditional databases such as Oracle or SQL Server have this feature. However, ...