Learn Window Functions on Snowflake. Become a cloud data warehouse superhero.

Published on January 31, 2018
Updated on July 25, 2024

In a recent post we compared Window Function Features by Database Vendors. In this post we will give you an overview on the support for various window function features on Snowflake.

Window functions are essential for data warehousing

Window functions are the base of data warehousing workloads for many reasons.
First of all they are very similar to the GROUP BY clause as they run on a set of rows instead of a single one but, but unlike the GROUP BY clause we do not lose the individual rows. To remedy this we could perform a self join with the result of GROUP BY operation but window functions implement the same functionality in a more efficient way that gives better performance.
Window functions also provide complex functions that are based on the (requested) order of rows such as DENSE_RANK which would be hard to implement with traditional SQL. Using window functions you can avoid writing procedural SQL code. I haven’t come across a requirement where a problem could not be solved with window functions.
Just as a reminder, window functions are used with the following syntax:

<function> ( [ <argument1> [ , ... , <argumentN> ] )
  OVER (
       [ PARTITION BY <expr1> ]
       [ ORDER BY <expr2> ]
       [ frame_definition ] )

<function> ( [ <argument1> [ , ... , <argumentN> ] )

OVER (

[ PARTITION BY <expr1> ]

[ ORDER BY <expr2> ]

[ frame_definition ] )

Window Functions on Snowflake

Please don’t mix up the cloud data warehouse Snowflake here with the dimensional modelling design pattern. Snowflake is the first cloud native data warehouse with fully decoupled storage and compute. It is optimized for analytical workloads as data is stored in columnar format and micro-partitoned. Find out more about the unique Snowflake architecture on their website.
We will now walk you through window function support on Snowflake. Let’s first have a look at the dataset.

The dataset

We will use a dataset of phone top-ups. Here is the schema of our dataset:

Sample rows of the dataset:

Aggregate and sequencing functions on Snowflake

Aggregate functions perform operations that take into account all values across a window.
Sequencing functions produce output based on the position of the row in the window.
Aggregate functions supported by Snowflake are:

AVG
COUNT
MAX
MIN
SUM

Sequencing and ranking functions in Snowflake are:

ROW_NUMBER
LAG
LEAD
FIRST_VALUE
LAST_VALUE
NTH_VALUE
RANK
PERCENT_RANK
DENSE_RANK
CUME_DIST
NTILE
WIDTH_BUCKET

Overall there is not much more that you can wish for.
Snowflake has plenty of aggregate and sequencing functions available.

Analytical and statistical function on Snowflake

Analytical and statistical functions provide information based on the distribution and properties of the data inside a partition.
Analytical and statistical functions in Snowflake:

MEDIAN
CORR – correlation with non-null pairs in a partition/group
COVAR_POP – population covariance for non-null pairs in a partition/group
COVAR_SAMP – sample covariance for non-null pairs in a partition
VARIANCE
STDDEV
PERCENTILE_CONT
REGR_* – linear regression
HLL_* – cardinality estimation
APPROXIMATE_JACCARD_INDEX – similarity estimation
APPROX_TOP_K – frequency estimation
APPROX_PERCENTILE – percentile estimation

Please note the approximate query functions APPROX_TOP_K etc. The rationale behind approximate query functions is to trade off performance against accuracy. Some calculations such as calculating TOP N are very expensive as they require a lot of sorting and looking at the whole data set. Using sampling techniques you can speed up these calculations by an order of magnitute and not all use cases require 100% accuracy. We cover approximate query functions in our popular training course Big Data for Data Warehouse Professionals in more detail.
Analytical functions are also very well supported on Snowflake.

Support for flexible frame definition

Partitions can be further subdivided into frames. Frames are those rows over which window functions inside the partition are applied.
For example if we wanted to get an alert when the user tops up his account with more credits than in the last two top ups combined we could run the following query:

SELECT
      customer_id,
      date,
      topup_value >
      COALESCE(SUM(topup_value) OVER (
                 PARTITION BY customer_id
                 ORDER BY date
            	ROWS BETWEEN 2 PRECEDING AND 1 PRECEDING)
       , 0) AS is_alert
FROM
     topups
ORDER BY
      date;

SELECT

customer_id,

date,

topup_value >

COALESCE(SUM(topup_value) OVER (

PARTITION BY customer_id

ORDER BY date

ROWS BETWEEN 2 PRECEDING AND 1 PRECEDING)

, 0) AS is_alert

FROM

topups

ORDER BY

date;

In the ROWS clause we specified that the we are interested in the two rows preceding the current one.
Generally, Snowflake distinguishes between Cumulative and Sliding window frames.
1) Cumulative window frames are defined as either:

{ ROWS | RANGE } BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
{ ROWS | RANGE } BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING

The ROWS option computes the aggregate for the current row using all rows from the beginning or end of the partition to the current row according to the specified ORDER BY subclause. RANGE is similar to ROWS, except it only computes the aggregate for rows that have the same value as the current row and according to the specified ORDER BY subclause.
2) Sliding window frames are defined as one of:

ROWS BETWEEN <N> { PRECEDING | FOLLOWING } AND <N> { PRECEDING | FOLLOWING }
ROWS BETWEEN UNBOUNDED PRECEDING AND <N> { PRECEDING | FOLLOWING }
ROWS BETWEEN <N> { PRECEDING | FOLLOWING } AND UNBOUNDED FOLLOWING

Unfortunately Snowflake does not currently support numerical values in the RANGE clause. The RANGE clause can only be used with CURRENT ROW and UNBOUNDED PRECEDING/FOLLOWING. This is a limitation in situations where you want to monitor the product usage of a customer for only the last 24 hours. We explained this scenario in more depth in our previous blog post comparing the ROWS and RANGE functionality of Redshift and BigQuery.

Distinct inside window function

If we want to count the number of distinct ID’s that made a call each day we have to use the DISTINCT clause within the window function.
The example query is following:

SELECT
    t.*,
    COUNT(DISTINCT id) OVER (PARTITION BY date)
FROM
    topups AS t
ORDER BY
    date ASC ;

SELECT

t.*,

COUNT(DISTINCT id) OVER (PARTITION BY date)

FROM

topups AS t

ORDER BY

date ASC ;

Snowflake does support the DISTINCT clause in window functions for most but not all of them.
Sequencing and ranking functions do not support the DISTINCT clause.
Most of the general aggregation functions (SUM, COUNT, AVG, HASH_AGG, LISTAGG, STDDEV…) mentioned in Snowflakes documentation do support DISTINCT clause.
In the official documentation for each function it is noted whether it supports the DISTINCT clause.

LISTAGG – Aggregation into strings

If we want to concatenate all values of a partition into a single string with a specific delimiter we can use the LISTAGG function.
Here is an example of a query utilizing the LISTAGG function:

SELECT
  t.*,
  LISTAGG(topup_value,',') OVER (PARTITION BY date)
FROM
    topups AS t
ORDER BY
    date ASC ;

SELECT

t.*,

LISTAGG(topup_value,',') OVER (PARTITION BY date)

FROM

topups AS t

ORDER BY

date ASC ;

And the result of this query:

Unfortunately, we weren’t able to use the WITHIN GROUP clause to concatenate the top up values in the order based on their values. This is a small limitation of Snowflake but we expect it will be fixed soon given that similar functions support this clause.

ARRAYS – Aggregation into arrays

Similarly to LISTAGG, If we want to create an array with all the values of a partition we can use the ARRAY_AGG function. This could be useful if we want to denormalise our data to avoid costly joins between very large transaction tables such as ORDER and ORDER_ITEM.
For the ARRAY_AGG function Snowflake does support the WITHIN GROUP clause to order the top ups based on their value:

SELECT
    t.*,
    ARRAY_AGG(topup_value)
        WITHIN GROUP ( ORDER BY topup_value ASC )
        OVER (PARTITION BY date)
FROM
    topups AS t
ORDER BY
    date ASC ;

SELECT

t.*,

ARRAY_AGG(topup_value)

WITHIN GROUP ( ORDER BY topup_value ASC )

OVER (PARTITION BY date)

FROM

topups AS t

ORDER BY

date ASC ;

The result of this query is:

WINDOW CLAUSE – Clause for the WINDOW declaration

When a query involves multiple window functions, it is possible to write out each one with a separate OVER clause, but this is duplicative and error-prone if the same windowing behavior is wanted for several functions. Instead, each windowing behavior can be named in a WINDOW clause and then referenced in OVER.
For example

SELECT
    sum(topup_value) OVER w,
    max(topup_value) OVER w,
    avg(topup_value) OVER w
FROM
    topups
WINDOW w AS (PARTITION BY date ORDER BY id);

SELECT

sum(topup_value) OVER w,

max(topup_value) OVER w,

avg(topup_value) OVER w

FROM

topups

WINDOW w AS (PARTITION BY date ORDER BY id);

Snowflake does not currently support WINDOW declaration.

UDAF – User-defined aggregate functions

Snowflake provides a number of pre-defined aggregate functions such as MAX, MIN, and SUM for performing operations on a set of rows. You can, however, define custom aggregate functions your specific data. Here is more information on how user defined aggregate functions work on the Oracle database.

User-defined aggregate functions are supported by Snowflake.
Snowflake allows using Javascript table UDFs which output an extra column that contains the result of applying a UDF to your table (or the partition you define).

They are not perfect as there are limitations when using them with window functions. However, we have found a nice workaround to make it work.
In our next post we will cover UDFs and table UDAFs in more depth. It’s very exciting.

Join the conversation: What is your favourite window function? When did a window function help you with a difficult problem?

Enjoyed this post? Have a look at the other posts on our blog.

We created the content in partnership with Snowflake.

Back to Blog

Cookie	Duration	Description
__cfruid	session	Cloudflare sets this cookie to identify trusted web traffic.
cookielawinfo-checkbox-marketing	1 month	This cookie is set by the GDPR Cookie Consent plugin to store the user consent for the cookies in the category "Marketing".
cookielawinfo-checkbox-necessary	1 month	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-preferences	1 month	This cookie is set by the GDPR Cookie Consent plugin to check if the user has given consent to use cookies under the "Preferences" category.
cookielawinfo-checkbox-statistics	1 month	This cookie is set by the GDPR Cookie Consent plugin to store the user consent for the cookies in the category "Statistics".
cookielawinfo-checkbox-unclassified	1 month	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Unclassified".
CookieLawInfoConsent	1 month	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
csrftoken	1 year	This cookie is associated with Django web development platform for python. Used to help protect the website against Cross-Site Request Forgery attacks

Cookie	Duration	Description
AnalyticsSyncHistory	1 month	Linkedin set this cookie to store information about the time a sync took place with the lms_analytics cookie.
bcookie	2 years	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser ID.
bscookie	2 years	LinkedIn sets this cookie to store performed actions on the website.
lang	session	LinkedIn sets this cookie to remember a user's language setting.
li_gc	2 years	Linkedin set this cookie for storing visitor's consent regarding using cookies for non-essential purposes.
lidc	1 day	LinkedIn sets the lidc cookie to facilitate data center selection.
mgref	1 year	This cookie is set by Eventbrite to deliver content tailored to the end user's interests and improve content creation. It is also used for event-booking purposes.
mgrefby	1 year	This cookie is set by Eventbrite to deliver content tailored to the end user's interests and improve content creation. It is also used for event-booking purposes.
UserMatchHistory	1 month	LinkedIn sets this cookie for LinkedIn Ads ID syncing.

Cookie	Duration	Description
G	1 year	Cookie used to facilitate the translation into the preferred language of the visitor.
SERVERID	session	This cookie is set by Slideshare's HAProxy load balancer to assign the visitor to a specific server.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_7H38LVR4Z5	2 years	This cookie is installed by Google Analytics.
_gat_gtag_UA_44804396_1	1 minute	Set by Google to distinguish users.
_gat_UA-44804396-1	1 minute	A variation of the _gat cookie set by Google Analytics and Google Tag Manager to allow website owners to track visitor behaviour and measure site performance. The pattern element in the name contains the unique identity number of the account or website it relates to.
_gcl_au	3 months	Provided by Google Tag Manager to experiment advertisement efficiency of websites using their services.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
SIDCC	6 Months	The "SIDCC" cookie is used as security measure to protect users data from unauthorised access
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
AN	1 month
AS	session
ebEventToTrack	1 month
eblang	1 year
SNID	2 years	This cookie is set by the Google. This cookie is used by the map which helps visitors to identify and reach the facility.
SP	session
SS	session

From Code to Clarity: Visualizing SQL code for Documentation and Debugging

From Code to Clarity: Visualizing SQL code for Documentation and Debugging

From Code to Clarity: Visualizing SQL code for Documentation and Debugging

From Code to Clarity: Visualizing SQL code for Documentation and Debugging

Learn Window Functions on Snowflake. Become a cloud data warehouse superhero.

Window functions are essential for data warehousing

Window Functions on Snowflake

The dataset

Aggregate and sequencing functions on Snowflake

Analytical and statistical function on Snowflake

Support for flexible frame definition

Distinct inside window function

LISTAGG – Aggregation into strings

ARRAYS – Aggregation into arrays

WINDOW CLAUSE – Clause for the WINDOW declaration

UDAF – User-defined aggregate functions

Learn Window Functions on Snowflake. Become a cloud data warehouse superhero.

Window functions are essential for data warehousing

Window Functions on Snowflake

The dataset

Aggregate and sequencing functions on Snowflake

Analytical and statistical function on Snowflake

Support for flexible frame definition

Distinct inside window function

LISTAGG – Aggregation into strings

ARRAYS – Aggregation into arrays

WINDOW CLAUSE – Clause for the WINDOW declaration

UDAF – User-defined aggregate functions

Related Articles

Iceberg Ahead! All you need to know about Snowflake’s Polaris Catalog

How to Parse & Flatten XML in Snowflake With Examples

Data Orchestration Deep Dive Snowflake Tasks. An Airflow replacement?

Cookies consent