SQL,

Home » SQL » Comparing Window Function Features by Database Vendors

Comparing Window Function Features by Database Vendors

by Uli Bethke

Uli has been rocking the data world since 2001. As the Co-founder of Sonra, the data liberation company, he’s on a mission to set data free. Uli doesn’t just talk the talk—he writes the books, leads the communities, and takes the stage as a conference speaker.

Any questions or comments for Uli? Connect with him on LinkedIn.

Published on September 15, 2017
Updated on November 20, 2024

We will round off the series on window functions with comparison of what database vendors offer. There are various mutations of window functions and every vendor supports a different subset or feature. Some also add extra window functions or features beyond standard ANSI SQL. One of the most powerful features is user-defined aggregate functions (UDAF), which some databases allow using as window functions, giving us new possibilities and greater power.
Let’s first introduce the features. Let’s then have a look at a matrix comparing feature with vendor offerings and some comments. Next section contains list of vendors together with links to the documentation on the window functions, and the last section inform you about some extra features of individual vendors.

Window Function Features

1. AGG&SEQ – Aggregate and sequencing functions

We examined what kind of functions can be used in the OVER clause as window functions. This group includes the main aggregation functions that you can use in the GROUP BY clause, plus sequencing and ranking functions, which produce output according to a position of the row in the table.

Aggregate:

MAX
MIN
SUM
AVG
COUNT

Sequencing and ranking:

ROW_NUMBER
RANK
DENSE_RANK
FIRST
LAST
NTH_VALUE
LEAD
LAG

2. ANALYTIC – Analytical/Statistical functions

All functions that give some valuable information resulting from the distribution of the data.

PERCENTILE_CONT
PERCENTILE_DISC
NTILE
CUME_DIST
PERCENT_RANK
MEDIAN
REGR_*

Note that the first two functions require the WITHIN GROUP clause instead of the OVER clause. We also include statistical functions calculating linear regression starting with REGR_ in this category.
Notice in the matrix below, that the MEDIAN function is often not supported. Standard aggregate functions, such as MAX or AVG, can be easily computed in one iteration over the values, whereas for the MEDIAN function, the engine needs to sort the values first (even though there are efficient algorithms for doing this). Therefore, the function is often not supported at all, rather than available but slow.

3. RANGE – Support for flexible frame definition

The ORDER clause of the window function can contain the keywords ROWS or RANGE. The numerical expression of rows is almost always supported in the ROWS clause. However, the numerical expression of range in the RANGE clause is often not supported because of the complexity of implementing it.
We consider this feature to be satisfied if the database allows numerical values inside the RANGE clause. Some of the databases allow using RANGE clause only with UNBOUNDED or CURRENT ROW. You will see a note in the matrix if this is the case. You can find more on this topic in the previous post Window function frames on Redshift and BigQuery.

4. DISTINCT – Distinct inside window function

Removes duplicate values before applying the window function. The syntax is the following:

1	window_function (DISTINCT field_name) OVER (PARTITION BY … ORDER BY ...)

Particularly useful in the context of COUNT(…) aggregate function. As it is expensive to calculate distinct values some engines, e.g. Oracle now support approximate distinct.

5. LISTAGG

LISTAGG (in Oracle) or STRING_AGG (in PostgreSQL) is a function that aggregates values into a string of characters, where the values are delimited by a specified separator. For instance, the query:

1	SELECT last_name, STRING_AGG(last_name, ', ') OVER (PARTITION BY dept_no) FROM employees;

for each employee produces a comma separated list of all employees in his department.
We also include the function XMLAGG in this category, which aggregates values into XML.

6. ARRAYS – Aggregation into arrays

Aggregating values into an array using the ARRAY_AGG function. This is particularly useful for nesting and unnesting parent child relationships of very large tables.

7. CLAUSE – Clause for the WINDOW declaration

This feature enables users to define the window in a stand-alone clause. The statement prevents repetitions of the same window function definitions.
Syntax:

1	WINDOW name_of_the_window AS (PARTITIONED BY … ORDER BY ...)

8. UDAF – User-defined aggregate functions

UDAFs allow users to create a custom aggregate function and, most importantly, allow users to use them as window functions. Most supported languages are Java, C, C++, C#. DB2 also allows COBOL, PostgreSQL allows Perl and Redshift requires Python. An example of a str_agg UDAF function in PostgreSQL using plain SQL:

CREATE OR REPLACE FUNCTION concat_ws_comma(text, ANYELEMENT)

RETURNS text AS $

SELECT concat_ws(',', $1, $2)

$ LANGUAGE sql;

CREATE AGGREGATE str_agg (ANYELEMENT) (

sfunc = concat_ws_comma,

stype = text);

Matrix of features supported by vendors

Features / Databases	AGG&SEQ	ANALYTIC	RANGE	DISTINCT	LISTAGG	ARRAYS	CLAUSE	UDAF
Oracle
Teradata
MS SQL
DB2
PostgreSQL
Hive/Spark
Apache Drill
Presto
Cloudera Impala
Google BigQuery
Amazon Redshift
Snowflake

Vendors

For the comparison we have picked the most popular databases and also some trending vendors. The vendors listed below contain links leading straight to the window function section of their documentation.

Extra features

Oracle: LOOKUPS

The functions First and Last can be applied logically before the window function itself and allow us to print another field than we are aggregating over. They are used always together with the keyword KEEP and can be used also without window functions.
Syntax:

1	window_function() KEEP (DENSE_RANK FIRST/LAST ORDER BY expr) OVER (PARTITION BY … ORDER BY ...)

Oracle: MATCH_RECOGNIZE

Very useful feature for the analytics is the pattern matching in Oracle. With a lot of customizations, the MATCH_RECOGNIZE clause enables looking for a trend or pattern in the data, such as spikes, quick drop or periods. For instance, when we daily watch a price of a product, we might want to see all occurrences of the following series: rise, then five days without change and then drop. Let’s look at the code implementing this analysis:

SELECT * FROM price_history MATCH_RECOGNIZE (

PARTITION BY product

ORDER BY tstamp

MEASURES STRT.tstamp AS start_tstamp,

LAST(UP.tstamp) AS peak_tstamp,

LAST(DOWN.tstamp) AS end_tstamp,

MATCH_NUMBER() AS mno

ONE ROW PER MATCH

AFTER MATCH SKIP TO LAST DOWN

PATTERN (STRT UP+ FLAT{5} DOWN+)

DEFINE

UP AS UP.price > PREV(UP.price),

FLAT AS FLAT.price = PREV(FLAT.price),

DOWN AS DOWN.price < PREV(DOWN.price)

) MR

ORDER BY MR.product, MR.start_tstamp;

The MEASURES clause defines what will be produced for each occurrence in the output, DEFINE clause allows us to define pattern variables and the PATTERN defines the trend we want to find. For thorough information on pattern matching, see Oracle documentation.

Teradata: RESET WHEN

The clause RESET WHEN can be placed after ORDER BY in the window function declaration and is always followed by a condition. If the condition is evaluated to true at some row, a new dynamic partition is created in addition to the partitions defined in the PARTITION BY clause. For example, we can count the days, when the price is rising and reset the counting whenever the price decreases or does not move:

SELECT price, ROW_NUMBER() OVER

(ORDER BY tstamp

RESET WHEN price <=

-- previous row

SUM(balance) over (ORDER BY tstamp ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING)

) - 1 as count

FROM prices;

And a possible output could look similar to this:

price	count
60	0
70	1
65	0
71	1
80	2
87	3
85	0

Find more on the RESET WHEN clause in the Teradata documentation.

Teradata: QUALIFY

In the previous posts on the window functions, we often needed to filter out some rows after the window functions had been applied. Since they operate after WHERE and HAVING clause, we had to use another SELECT as a subquery to filter them out.
The QUALIFY clause was made to overcome this limitation and works exactly the same way as HAVING clause for the traditional aggregations. See the example from the official documentation:

SELECT StoreID, SUM(profit) OVER (PARTITION BY StoreID)

FROM facts

QUALIFY SUM(profit) OVER (PARTITION BY StoreID) > 2;

PostgreSQL: JSON_AGG

Function JSON_AGG, which aggregates values similarly as XMLAGG but into the JSON format.

Conclusion

Even though window functions are a clear and coherent concept, we can see a rich diversity in features implemented by the vendors. Usually, when a vendor have not implemented some feature, it is because there is not so large group of users, who would benefit. Moreover, full implementation of RANGE or MEDIAN requires excessive work on efficient algorithms. Also, often parallelization of such calculations pose serious challenges in the implementation (cf. count (distinct)). A lot of the vendors are, however, promising these features, hence should be delivered in the near future.
One can notice from the discussion forums that the features are discussed at the moment and the development is very alive. This is especially true for the newer databases, such as PrestoDB, Spark or Snowflake. Let’s hope the vendors find resources to provide us with the full set of the window functions available, as they are defined by the SQL language. Until then, let yourself be inspired from the matrix above when choosing the right vendor.

About the author:

Uli Bethke

Co-founder of Sonra

Any questions or comments for Uli? Connect with him on LinkedIn.

Follow Uli Bethke:

Back to Blog

Cookie	Duration	Description
__cfruid	session	Cloudflare sets this cookie to identify trusted web traffic.
cookielawinfo-checkbox-marketing	1 month	This cookie is set by the GDPR Cookie Consent plugin to store the user consent for the cookies in the category "Marketing".
cookielawinfo-checkbox-necessary	1 month	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-preferences	1 month	This cookie is set by the GDPR Cookie Consent plugin to check if the user has given consent to use cookies under the "Preferences" category.
cookielawinfo-checkbox-statistics	1 month	This cookie is set by the GDPR Cookie Consent plugin to store the user consent for the cookies in the category "Statistics".
cookielawinfo-checkbox-unclassified	1 month	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Unclassified".
CookieLawInfoConsent	1 month	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
csrftoken	1 year	This cookie is associated with Django web development platform for python. Used to help protect the website against Cross-Site Request Forgery attacks

Cookie	Duration	Description
AnalyticsSyncHistory	1 month	Linkedin set this cookie to store information about the time a sync took place with the lms_analytics cookie.
bcookie	2 years	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser ID.
bscookie	2 years	LinkedIn sets this cookie to store performed actions on the website.
lang	session	LinkedIn sets this cookie to remember a user's language setting.
li_gc	2 years	Linkedin set this cookie for storing visitor's consent regarding using cookies for non-essential purposes.
lidc	1 day	LinkedIn sets the lidc cookie to facilitate data center selection.
mgref	1 year	This cookie is set by Eventbrite to deliver content tailored to the end user's interests and improve content creation. It is also used for event-booking purposes.
mgrefby	1 year	This cookie is set by Eventbrite to deliver content tailored to the end user's interests and improve content creation. It is also used for event-booking purposes.
UserMatchHistory	1 month	LinkedIn sets this cookie for LinkedIn Ads ID syncing.

Cookie	Duration	Description
G	1 year	Cookie used to facilitate the translation into the preferred language of the visitor.
SERVERID	session	This cookie is set by Slideshare's HAProxy load balancer to assign the visitor to a specific server.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_7H38LVR4Z5	2 years	This cookie is installed by Google Analytics.
_gat_gtag_UA_44804396_1	1 minute	Set by Google to distinguish users.
_gat_UA-44804396-1	1 minute	A variation of the _gat cookie set by Google Analytics and Google Tag Manager to allow website owners to track visitor behaviour and measure site performance. The pattern element in the name contains the unique identity number of the account or website it relates to.
_gcl_au	3 months	Provided by Google Tag Manager to experiment advertisement efficiency of websites using their services.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
SIDCC	6 Months	The "SIDCC" cookie is used as security measure to protect users data from unauthorised access
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
AN	1 month
AS	session
ebEventToTrack	1 month
eblang	1 year
SNID	2 years	This cookie is set by the Google. This cookie is used by the map which helps visitors to identify and reach the facility.
SP	session
SS	session

SQL Visualisation Guide - Query Diagrams, Lineage & ERD

SQL Visualisation Guide - Query Diagrams, Lineage & ERD

SQL Visualisation Guide - Query Diagrams, Lineage & ERD

SQL Visualisation Guide - Query Diagrams, Lineage & ERD

Comparing Window Function Features by Database Vendors

Window Function Features

1. AGG&SEQ – Aggregate and sequencing functions

2. ANALYTIC – Analytical/Statistical functions

3. RANGE – Support for flexible frame definition