Working with window functions
In this section we describe the conceptual structure and general usage of window functions.
Overview
Unlike other SQL functions, a window function calculates the values on a group or tuple of rows. Such a group is called a partition. Partitions are defined by specifying implicit or explicit "window frames". A window frame is calculated relative to the current row and contains a fixed or variable set of rows. The function results are calculated per row within each partition. Each row within the window frame therefore becomes the current row.
The window functions calculate the values over the defined partition and return them for each row belonging to this partition.
The OVER() clause defines the components, that are necessary to accomplish this behavior:
- A
PARTITION BYclause - An
ORDER BYclause - A specification of the window frame.
From query to query all of this components may be optional or contains multiple OVER() clauses.
A query with an empty OVER() clause can also be valid. In this case, the values are calculated across all rows (as if all rows belonged to the same partition) but, unlike aggregate functions, are not output as one row but per row.
Syntax
function_name([function_arguments])
OVER ([PARTITION BY part_expr [, ...]]
[ORDER BY order_expr [ASC | DESC] [NULLS {FIRST | LAST}] [, ...]]
[{ {ROWS | RANGE} UNBOUNDED PRECEDING
| {ROWS | RANGE} n PRECEDING
| {ROWS | RANGE} CURRENT ROW
| {ROWS | RANGE} BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
| {ROWS | RANGE} BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
| {ROWS | RANGE} BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
| {ROWS | RANGE} BETWEEN n {PRECEDING | FOLLOWING} AND n {PRECEDING | FOLLOWING}
| {ROWS | RANGE} BETWEEN UNBOUNDED PRECEDING AND n {PRECEDING | FOLLOWING}
| {ROWS | RANGE} BETWEEN n {PRECEDING | FOLLOWING} AND UNBOUNDED FOLLOWING
}])Arguments
function_name: Name of an SQL Function.
function_arguments (optional): Function arguments.
OVER(): This clause specifies that this function should be applied as a window function (and not as an aggregate function) and specifies the window frame over which rows the function should be applied.
An OVER() clause with empty parentheses uses all rows as one partition and an implicit default window frame.
PARTITION BY part_expr (optional): This clause specifies the part_expr, that used to group rows into partitions.
Values are calculated per partition if this clause is specified. If this clause is omitted, all rows in the query are considered as one partition.
ORDER BY order_expr (optional or required): This clause sorts rows within a partition before the function calculates the values. ORDER BY clause is required, if a window frame is specified explicitly or for some specific functions and optional in all other cases.
ORDER BY within the OVER() clause is to be distinguished from the ORDER BY clause, which is used to sort the results of a query, although they have the same syntax.
More about data sorting for window functions and ORDER BY within an OVER() clause can be found below in the section ORDER BY clause.
Window Frame definition options:
See Range- vs. row-based window frames, RANGE-based window frames, ROW-based window frames and Further examples for more information and more examples.
RANGE UNBOUNDED PRECEDING or ROWS UNBOUNDED PRECEDING: This keywords specifies a window frame that contains rows between the first row in the partition and the current row.
RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING or ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING : This keywords specifies a window frame that contains rows between the current row the and last row in the partition.
ROWS BETWEEN n {PRECEDING | FOLLOWING} AND n {PRECEDING | FOLLOWING} : This clause allows to define a ROWS-based window frame with the explicitly specified offset of size n.
ROWS BETWEEN n {PRECEDING | FOLLOWING} AND n {PRECEDING | FOLLOWING} or
ROWS BETWEEN UNBOUNDED PRECEDING AND n {PRECEDING | FOLLOWING} or
ROWS BETWEEN n {PRECEDING | FOLLOWING} AND UNBOUNDED FOLLOWING
This clauses allows defining a ROWS-based window frame with the explicitly specified offset of size n or use the UNBOUNDED keyword to define one of the boundaries as a dynamic frame boundary.
RANGE BETWEEN n {PRECEDING | FOLLOWING} AND n {PRECEDING | FOLLOWING} or
RANGE BETWEEN UNBOUNDED PRECEDING AND n {PRECEDING | FOLLOWING} or
RANGE BETWEEN n {PRECEDING | FOLLOWING} AND UNBOUNDED FOLLOWING
This clause allows to define a RANGE-based window frame with the explicitly specified offset of size n or use the UNBOUNDED keyword to define one of the boundaries as a dynamic frame boundary.
When specifying RANGE BETWEEN window frames with explicit offsets, only one ORDER BY expression is allowed.
Not all data types are supported by this type of window frame. See more about data types, limitations and other options for this type of window frames under RANGE-based window frames.
n: is an unsigned constant, positive numeric value including 0.
With ORDER BY ord_expr ASC, the syntax ...n FOLLOWING means, that all rows of the partition of the current row with order_expr values, that are between x and x+n will be included in the window frame, where x is the value of ord_expr expression for current row. The syntax ...n PRECEDING means in this case, that all rows of the partition of the current row with order_expr values, that are between x-n and x will be included in the window frame. See RANGE BETWEEN examples.
With ORDER BY DESC both clauses behave in opposite ways.
Important Notes
- Window frames are supported by all window functions. However, the syntax and supported types of window frames can vary from function to function.
- If no window frame was specified explicitely and
ORDER BYclause is specified, default syntax for window frame specification is used. The default window frame syntax for functions COUNT, MIN, MAX, SUM and AVG isRANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. See also example and notes for this example. - If an empty
OVER()is used, that is, no window frame is explicitly specified and noORDER BYclause is specified,RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWINGis the default for the functions COUNT, MIN, MAX, SUM and AVG. See also example and notes for this example. - The offsets, specified as
0 PRECEDINGor0 FOLLOWINGare equivalent toCURRENT ROW. - When you define an explicit window frame, the window functions require that the data in the window be in a known order. Therefore, the
ORDER BYclause within theOVER()clause is mandatory for the window frame syntax, although thisORDER BYclause is generally optional.
Examples: Index & Tables
Examples:
- ROWS & RANGE UNBOUNDED PRECEDING
- ROWS & RANGE BETWEEN, Named Mixed Boundaries: cumulative totals
- ROWS & RANGE BETWEEN, Named Opened Boundaries, PARTITION BY: totals per partition
- ROWS & RANGE BETWEEN, Named Mixed Boundaries, PARTITION BY: cumulative totals per partition
- ROWS BETWEEN, Mixed & Explicit Boundaries, PARTITION BY: moving averages (sliding)
- RANGE BETWEEN, Mixed & Explicit Boundaries, PARTITION BY: averages, syntax errors
- RANGE & ROWS BETWEEN Comparison
- Supported non-ranking window functions
- Default window frame definitions for queries with and without
ORDER BYclause (see notes) - ROWS BETWEEN explicit frames without current row
GROUP BYand window functions- Handling of
NULLvalues
Please create this tables if you want to try the examples yourself: Create my_table and my_table2 tables (please unfold).
CREATE TABLE IF NOT EXISTS my_folder.my_table(
ord STRING,
grp STRING,
col1 INTEGER,
grp2 INTEGER,
col2 INTEGER,
id INTEGER,
date_str STRING
);
INSERT INTO my_folder.my_table (ord, grp, col1, grp2, col2, id, date_str) VALUES
('a', 'g1', 10, 1, 10, 1,'2023-06-30'),
('b', 'g1', 20, 1, 20, 2,'2023-07-31'),
('c', 'g1', 20, NULL, 25, 3,'2023-08-31'),
('a', 'g2', 30, 2, 30, 4,'2023-09-30'),
('b', 'g2', 30, 2, 35, 5,'2023-10-31'),
('c', 'g2', 40, 2, 40, 6,'2023-11-30'),
('a', 'g3', 50, 3, 45, 7,'2023-12-31'),
('b', 'g3', 50, NULL, 53, 8,'2024-01-31'),
('c', 'g3', 60, NULL, 60, 9,'2024-02-29'),
('d', 'g3', 50, 3, 55, 10,'2024-03-31');
SELECT * FROM my_folder.my_table;+-----+-------+------+------+------+----+-------------------------+
| ORD | GRP | COL1 | GRP2 | COL2 | ID | DATE |
+-----+-------+------+------+------+----+-------------------------+
| a | g1 | 10 | 1 | 10 | 1 | 2023-06-30 00:00:00.000 |
| b | g1 | 20 | 1 | 20 | 2 | 2023-07-31 00:00:00.000 |
| c | g1 | 20 | NULL | 25 | 3 | 2023-08-31 00:00:00.000 |
| a | g2 | 30 | 2 | 30 | 4 | 2023-09-30 00:00:00.000 |
| b | g2 | 30 | 2 | 35 | 5 | 2023-10-31 00:00:00.000 |
| c | g2 | 40 | 2 | 40 | 6 | 2023-11-30 00:00:00.000 |
| a | g3 | 50 | 3 | 45 | 7 | 2023-12-31 00:00:00.000 |
| b | g3 | 50 | NULL | 53 | 8 | 2024-01-31 00:00:00.000 |
| c | g3 | 60 | NULL | 60 | 9 | 2024-02-29 00:00:00.000 |
| d | g3 | 50 | 3 | 55 | 10 | 2024-03-31 00:00:00.000 |
+-----+-------+------+------+------+----+-------------------------+And:
CREATE TABLE IF NOT EXISTS my_folder.my_table2(
product STRING,
prod_cat STRING,
sale DECIMAL,
quarter INTEGER,
quarter_2 DECIMAL
);
INSERT INTO my_folder.my_table2 VALUES
('Blue Hat', 'Hat', 3.5,1,1.1),
('Red Hat', 'Hat', 4.0,1,1.1),
('Yellow Hat', 'Hat', 4.5,1,1.1),
('Blue Hat', 'Hat', 2.8,2,1.2),
('Red Hat', 'Hat', 3.2,2,1.2),
('Yellow Hat', 'Hat', 1.5,2,1.2),
('Blue Hat', 'Hat', 3.3,3,1.3),
('Red Hat', 'Hat', 4.8,3,1.3),
('Yellow Hat', 'Hat', 5.5,3,1.3),
('Blue Hat', 'Hat', 3.5,4,1.4),
('Red Hat', 'Hat', 4.0,4,1.4),
('Yellow Hat', 'Hat', 4.6,4,1.4),
('Blue Jeans', 'Jeans', 13.5,1,1.1),
('Grey Jeans', 'Jeans', 22.6,1,1.1),
('Green Jeans', 'Jeans', 16.5,1,1.1),
('Blue Jeans', 'Jeans', 12.8,2,1.2),
('Grey Jeans', 'Jeans', 31.7,2,1.2),
('Green Jeans', 'Jeans', 10.3,2,1.2),
('Blue Jeans', 'Jeans', 11.4,3,1.3),
('Grey Jeans', 'Jeans', 23.9,3,1.3),
('Green Jeans', 'Jeans', 23.8,3,1.3),
('Blue Jeans', 'Jeans', 19.1,4,1.4),
('Grey Jeans', 'Jeans', 17.3,4,1.4),
('Green Jeans', 'Jeans', 17.4,4,1.4);
SELECT * FROM my_folder.my_table2;+-------------+----------+------+---------+-----------+
| PRODUCT | PROD_CAT | SALE | QUARTER | QUARTER_2 |
+-------------+----------+------+---------+-----------+
| Blue Hat | Hat | 3.5 | 1 | 1.1 |
| Red Hat | Hat | 4 | 1 | 1.1 |
| Yellow Hat | Hat | 4.5 | 1 | 1.1 |
| Blue Hat | Hat | 2.8 | 2 | 1.2 |
| Red Hat | Hat | 3.2 | 2 | 1.2 |
| Yellow Hat | Hat | 1.5 | 2 | 1.2 |
| Blue Hat | Hat | 3.3 | 3 | 1.3 |
| Red Hat | Hat | 4.8 | 3 | 1.3 |
| Yellow Hat | Hat | 5.5 | 3 | 1.3 |
| Blue Hat | Hat | 3.5 | 4 | 1.4 |
| Red Hat | Hat | 4 | 4 | 1.4 |
| Yellow Hat | Hat | 4.6 | 4 | 1.4 |
| Blue Jeans | Jeans | 13.5 | 1 | 1.1 |
| Grey Jeans | Jeans | 22.6 | 1 | 1.1 |
| Green Jeans | Jeans | 16.5 | 1 | 1.1 |
| Blue Jeans | Jeans | 12.8 | 2 | 1.2 |
| Grey Jeans | Jeans | 31.7 | 2 | 1.2 |
| Green Jeans | Jeans | 10.3 | 2 | 1.2 |
| Blue Jeans | Jeans | 11.4 | 3 | 1.3 |
| Grey Jeans | Jeans | 23.9 | 3 | 1.3 |
| Green Jeans | Jeans | 23.8 | 3 | 1.3 |
| Blue Jeans | Jeans | 19.1 | 4 | 1.4 |
| Grey Jeans | Jeans | 17.3 | 4 | 1.4 |
| Green Jeans | Jeans | 17.4 | 4 | 1.4 |
+-------------+----------+------+---------+-----------+Window vs. aggregate functions
Some aggregate functions have window function implementations of the same name (see COUNT, MIN, MAX, SUM and AVG).
The differences lie in the scope of rows used for the calculation and in the way the results are output.
| AGGREGATE FUNCTION | WINDOW FUNCTION | |
| Input Data | All rows | Rows of the partition, defined by OVER clause |
| Output Data | One row | Per row of the corresponding partition |
Compare the output of AVG function as aggregation function and window function:
SELECT AVG(col1) FROM my_folder.my_table;
SELECT AVG(col1) OVER() FROM my_folder.my_table;+---------- +
| AVG(COL1) |
+-----------+
| 36 |
+-----------+
+------------------+
| AVG(COL1) OVER() |
+------------------+
| 36 |
| 36 |
| 36 |
| 36 |
| 36 |
| 36 |
| 36 |
| 36 |
| 36 |
| 36 |
+------------------+In this example, we used the AVG window function with an empty OVER() clause, which resulted in the same average value being calculated because all 10 rows were analyzed as one partition, but it was output per row and not in one row as in the aggregation variant.
In following example we use the same table, but analyse the partitions, based on column grp:
SELECT grp, AVG(col1) FROM my_folder.my_table GROUP BY grp; -- aggregate
SELECT grp, col1, AVG(col1) OVER(PARTITION BY grp) FROM my_folder.my_table; -- window+-----+------------+
| GRP | AVG(COL1) |
+-----+------------+
| g1 | 16.666667 |
| g2 | 33.333333 |
| g3 | 52.500000 |
+-----+------------+
+-----+------+----------------------------------+
| GRP | COL1 | AVG(COL1) OVER(PARTITION BY GRP) |
+-----+------+----------------------------------+
| g1 | 10 | 16.666 |
| g1 | 20 | 16.666 |
| g1 | 20 | 16.666 |
| g2 | 30 | 33.333 |
| g2 | 30 | 33.333 |
| g2 | 40 | 33.333 |
| g3 | 50 | 52.500 |
| g3 | 50 | 52.500 |
| g3 | 60 | 52.500 |
| g3 | 50 | 52.500 |
+-----+------+----------------------------------+The calculation is restarted when the value of the column defining the partition changes.
The functionality of window functions becomes even clearer when using the explicit window frame definition.
This example uses an explicit window definition to calculate the values per row as an average of the previous row, the current row, and a following row per grp partition:
SELECT grp, col1, ord,
AVG(col1) OVER( PARTITION BY grp
ORDER BY ord ASC
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) as MA
FROM my_folder.my_table ORDER BY grp, ord;+-----+------+-----+--------+
| GRP | COL1 | ORD | MA |
+-----+------+-----+--------+
| g1 | 10 | a | 15.000 |
| g1 | 20 | b | 16.666 |
| g1 | 20 | c | 20.000 |
| g2 | 30 | a | 30.000 |
| g2 | 30 | b | 33.333 |
| g2 | 40 | c | 35.000 |
| g3 | 50 | a | 50.000 |
| g3 | 50 | b | 53.333 |
| g3 | 60 | c | 53.333 |
| g3 | 50 | d | 55.000 |
+-----+------+-----+--------+The value for row GRP='g1' and ORD='a' is calculated as an average of values 10 and 20 because it is the first row in the partition (you can also see it in rows GRP='g2' and ORD='a' and GRP='g3' and ORD='a').
The value for row GRP='g1' and ORD='c' is calculated as an average of values 20 and 20 because it is the last row in the partition (same as for last values of other grp partitions).
Other values were calculated as an average of a previous, current and following row.
The ORDER BY clause within the OVER() clause influences the sorting of the window data within the partition, whereas the final order by clause only influences the output sorting.
The output of the window function therefore depends not only on the value of the individual line, but also on the values of the other lines that qualify for the defined window frame.
ORDER BY clause
The ORDER BY clause of the OVER() clause ensures that the calculation of the result is based on the sorted data.
The ORDER BY clause is required when window functions compute rolling operations on the subsets of rows in each partition and apply them to data with explicit window borders (e.g. calculating running totals, moving averages etc.). For window functions of this kind, the set of "previous" and "following" rows must be deterministic and thus the ORDER BY clause is necessary.
The ORDER BY clause within the OVER() clause affects the sorting of the window data used in the calculation, but not the output of the query.
The output of the query can be sorted using the regular ORDER BY clause.
Compare this example with the example above:
SELECT grp, col1, ord,
AVG(col1) OVER( PARTITION BY grp
ORDER BY ord ASC
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) as MA
FROM my_folder.my_table ORDER BY grp, col1 DESC; +-----+------+-----+--------+
| GRP | COL1 | ORD | MA |
+-----+------+-----+--------+
| g1 | 20 | c | 20.000 |
| g1 | 20 | b | 16.666 |
| g1 | 10 | a | 15.000 |
| g2 | 40 | c | 35.000 |
| g2 | 30 | b | 33.333 |
| g2 | 30 | a | 30.000 |
| g3 | 60 | c | 53.333 |
| g3 | 50 | d | 55.000 |
| g3 | 50 | b | 53.333 |
| g3 | 50 | a | 50.000 |
+-----+------+-----+--------+The results for each grp x ord pair did not change because the window definition remained the same. Only the output sorting ORDER BY grp, ord was changed to ORDER BY grp, col1 DESC, which resulted in a different sorting of the output.
The syntax of the ORDER BY clause within an OVER() clause is similar to the syntax of the regular ORDER BY clause with regard to the keywords ASC and DESC and the handling of NULL values.
ORDER BY clause can have a different meaning in different window frame types. For more information see ROWS-based window frames, RANGE-based window frames and Example: Handling of NULL Values.In summary, the following must be considered when deciding whether to use an ORDER BY clause in an OVER() clause:
- Both the
ORDER BYclause and thePARTITION BYclause within anOVER()clause are optional and are used independently. - Some functions require the presence of an
ORDER BYclause in anOVER()clause. - For some functions, an
ORDER BYclause within anOVER()clause is optional. - Some functions do not allow an
ORDER BYclause within anOVER()clause. - An
ORDER BYclause within anOVER()clause is evaluated by some functions as an implicit window frame definition.
Caution
Generally speaking, SQL is an explicit language, with few implied clauses. However, for some window functions, an ORDER BY clause implies a window frame. For details, see Usage notes for window frames.
Because behavior that is implied rather than explicit can lead to results that are difficult to understand, Snowflake recommends declaring window frames explicitly.
Types of window frames
Depending on the clauses used and the purpose of the calculation, one can distinguish between different types of window frames. One speaks of
- implicit & explicit window frames,
- ROW-based & RANGE-based window frames,
- cummulative & sliding window frames.
Implicit & Explicit window frames
Explicit window frame definition is described in the Syntax. This type of definition includes ways to use an offset of defined size or open boundaries to limit window frames for the rows to which the function is to be applied.
Implicit window frames are defined by the system if the OVER() clause does not contain an explicit window frame definition. For more about default definitions of implicit window frames, see Syntax and the Important Notes.
RANGE- & ROWS-based window frames
Both ROWS based window frame specifications and RANGE based specifications specify start and end points that specify the group of rows to which the function is applied. These boundaries can be specified as exact offset values (explicit boundaries as number of rows n) or named boundaries (implicit boundaries).
Explicit offset values are defined with numbers or intervals and keywords PRECEDING or FOLLOWING.
When defining the named start and end points, you can use the following keywords:
CURRENT ROWUNBOUNDED PRECEDING: from the beginning of the partitionUNBOUNDED FOLLOWING: to the end of the partition.
⚬ ROWS-based window frames
With the ROWS based window frame definition, the frame is calculated as exact count of rows, relative to the current row and includes rows that physically fit into the frame defined by the specified offset. The physical order of the rows in the query has an influence and can be additionally specified using the ORDER BY clause.
Boundaries of the window frame and the current row are always included. This applies to both explicitly specified window frames and implicitly specified ones. A window frame, for example, that is delimited as ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING contains the current row and all physically subsequent rows up to the end of the partition. A specification of ROWS BETWEEN CURRENT ROW AND 14 FOLLOWING means that 14 following rows, counting from the current row, including the current row, are included in the window frame.
If the specified number of preceding or following rows exceeds the boundaries of the partition, the exceeding values are treated as NULL.
⚬ RANGE-based window frames
A RANGE-based window frame in is a method of defining a dynamic set of rows used for calculations in a window function. Unlike ROWS, which refers to a fixed number of rows, RANGE is based on the values in a specific column and considers all rows within a specific range of values.
The column values used in the ORDER BY clause affect not only the sorting of the rows, but also the number of rows in the window frame defined as RANGE based window frame.
The process can be illustrated as follows:
- The data is sorted using the
ORDER BYclause within theOVER()clause - Then the rows are iterated over:
- The current row is determined
- The value of the
PARTITION BYexpression part_expr of the current row is determined - The value of the
ORDER BYexpression order_expr of the current row is determined - All rows are found whose
ORDER BYexpression order_expr has values that are within the value range of the specified window frame boundaries. - For these rows, it is checked whether the value of the
PARTITION BYexpression part_expr corresponds to the value of thePARTITION BYexpression part_expr of the current row. - The window frame includes the rows for which 4. and 5. apply, i.e. rows whose order_expr value lies within the window frame boundaries and which belong to the partition (including the current row).
- The value is calculated and output.
- The process is repeated from point 1. for the next row.
For example, with ORDER BY my_col ASC RANGE BETWEEN 2 PRECEDING AND CURRENT ROW the window frame contains
- all rows that have the same value in column my_col as the current row and belong to the same partition (including the current row)
- all rows, whose my_col value is between the value of the current row x and the value x - 2 and belong to the same partition.
The order chosen in the ORDER BY clause plays in this example the following role: If the order DESC is chosen, rows with values in the range from x to x + 2 are included in the window frame. If the order ASC is chosen, rows with values in the range from x - 2 to x are included in the window frame.
The offset value is specified as a number.
INFO
Currently only numeric offsets are supported. Intervals are not yet supported.
RANGE BETWEEN window frames limitations
RANGE BETWEENwindow frames with explicit offsets must have only oneORDER BYexpression.- Data types supported for
ORDER BYexpression of window frame with explicit offsets:INTEGERDECIMAL.
RANGE BETWEENwindow frames with explicit offsets require n as a positive numeric constant value including 0 if order_expr is of data typeINTEGERorFLOAT. See Syntax.RANGE BETWEENwindow frames with explicit offsets are supported by the following functions: COUNT, MIN, MAX, SUM and AVG.DISTINCTis not supported for these functions when used in conjunction withRANGE BETWEENwindow frames with explicit offsets.- The
COUNTfunction can only be used with one argument when used in conjunction withRANGE BETWEENwindow frames with explicit offsets. - Wildcard queries can not be used with
COUNTwhen used in conjunction withRANGE BETWEENwindow frames with explicit offsets. - The specification of the frame start and frame end points must be consistent. If this is not the case, i.e. if the frame start is after the frame end or the specification is reversed, an error is returned.
Examples
Valid definitions: ...ORDER BY day_of_month ASC RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING......ORDER BY day_of_month ASC RANGE BETWEEN 1 FOLLOWING AND 2 FOLLOWING......ORDER BY day_of_month ASC RANGE BETWEEN 2 PRECEDING AND 1 PRECEDING...
Invalid definitions: ...ORDER BY day_of_month ASC RANGE BETWEEN 1 FOLLOWING AND 1 PRECEDING......ORDER BY day_of_month ASC RANGE BETWEEN 2 FOLLOWING AND 1 FOLLOWING......ORDER BY day_of_month ASC RANGE BETWEEN 1 PRECEDING AND 2 PRECEDING...
- If
ORDER BYinOVER()containsNULLvalues, the following must be considered when using the window frame specificationRANGE BETWEEN:- By specifying
ORDER BY order_exmp NULLS FIRSTtogether withUNBOUNDED PRECEDING, rows withNULLvalues in the column order_exp are included in the frame. See Syntax. - By specifying
ORDER BY order_exmp NULLS LASTtogether withUNBOUNDED FOLLOWING, rows withNULLvalues in the column order_exp are included in the frame. See Syntax. - If the value of column order_exp of the
ORDER BYclause in theOVER()clause of the current row isNULL, rows with the valueNULLare included in the frame boundary of the window frame with an explicit offset. Otherwise they are excluded.
- By specifying
See RANGE BETWEEN examples examples, that illustrates the differences in behavior of ROWS and RANGE based window frame definitions.
Cumulative & Sliding window frames
Window frames help you perform various kinds of analytical tasks, including cumulative and sliding (or "moving") calculations.
Different window frame definitions are suitable for different calculation tasks.
The window frame definition also allows a mix of named frame boundaries and explicit offsets.
⚬ Cumulative window frames
For example, to calculate the cumulative sum of previous rows up to the current row, the following query is enough:
SELECT id, col1, SUM(col1) OVER(ORDER BY id ROWS UNBOUNDED PRECEDING) as cum_total from my_folder.my_table;+----+------+-----------+
| ID | COL1 | CUM_TOTAL |
+----+------+-----------+
| 1 | 10 | 10 |
| 2 | 20 | 30 |
| 3 | 20 | 50 |
| 4 | 30 | 80 |
| 5 | 30 | 110 |
| 6 | 40 | 150 |
| 7 | 50 | 200 |
| 8 | 50 | 250 |
| 9 | 60 | 310 |
| 10 | 50 | 360 |
+----+------+-----------+⚬ Sliding window frames
A sliding window frame is a frame of constant width that moves through the rows of a partition, covering a different section each time. The number of rows in the frame remains the same except at the beginning or end of a partition, where it may cover fewer rows.
This window frame type allows you to calculate moving averages based on a fixed-size interval, such as time periods. The average is "moving" because the actual values in the interval change with each "movement" across the underlying dimension, even though the size of the interval remains constant.
To calculate moving averages of adjacent values, you can "slide" the calculation across the rows:
SELECT id, col1, AVG(col1) OVER(ORDER BY id ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS ma
FROM my_folder.my_table ORDER BY id ASC;+----+------+--------+
| ID | COL1 | MA |
+----+------+--------+
| 1 | 10 | 15.000 |
| 2 | 20 | 16.666 |
| 3 | 20 | 23.333 |
| 4 | 30 | 26.666 |
| 5 | 30 | 33.333 |
| 6 | 40 | 40.000 |
| 7 | 50 | 46.666 |
| 8 | 50 | 53.333 |
| 9 | 60 | 53.333 |
| 10 | 50 | 55.000 |
+----+------+--------+Each value of the MA column in this example is an average of 3 values:
- Value of the col1 column
- col1 value of the previous row
- col1 value of the following row.
Another example uses a RANGE-based and ROWS-based window frame definition. Additionaly we use a mix of named frame boundaries and explicit offsets. In this example, we calculate an average of the col1 value of the current row and the same value of the following two rows:
SELECT id, col1,
AVG(col1) OVER(ORDER BY id RANGE BETWEEN CURRENT ROW AND 2 FOLLOWING) AS ma_1,
AVG(col1) OVER(ORDER BY id ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING) AS ma_2
FROM my_folder.my_table;+----+------+--------+--------+
| ID | COL1 | MA_1 | MA_2 |
+----+------+--------+--------+
| 1 | 10 | 16.666 | 16.666 |
| 2 | 20 | 23.333 | 23.333 |
| 3 | 20 | 26.666 | 26.666 |
| 4 | 30 | 33.333 | 33.333 |
| 5 | 30 | 40.000 | 40.000 |
| 6 | 40 | 46.666 | 46.666 |
| 7 | 50 | 53.333 | 53.333 |
| 8 | 50 | 53.333 | 53.333 |
| 9 | 60 | 55.000 | 55.000 |
| 10 | 50 | 50.000 | 50.000 |
+----+------+--------+--------+Please note that in this case, the results in columns ma_1 and ma_2 are the same even though we used different window frame definitions. This is very often not the case.
In this example this is due to the underlying data structure that only contains unique values for the id column and that is used in the ORDER BY clause within the OVER() clause. For more information on the differences between RANGE and ROWS-based window frames, see more examples below and RANGE and ROWS-based window frames.
Further Examples
Please create this tables if You want to try the examples yourself.
Example: UNBOUNDED PRECEDING
Calculate the cumulative totals across all rows in ascending order:
SELECT grp, ord, col1,
SUM(col1) OVER(ORDER BY grp, ord ASC ROWS UNBOUNDED PRECEDING) as sum_1,
SUM(col1) OVER(ORDER BY grp, ord ASC RANGE UNBOUNDED PRECEDING) as sum_2,
SUM(col1) OVER(ORDER BY grp, ord ASC) as sum_3
FROM my_folder.my_table ORDER BY grp, ord ASC;+-----+-----+------+-------+-------+-------+
| GRP | ORD | COL1 | SUM_1 | SUM_2 | SUM_3 |
+-----+-----+------+-------+-------+-------+
| g1 | a | 10 | 10 | 10 | 10 |
| g1 | b | 20 | 30 | 30 | 30 |
| g1 | c | 20 | 50 | 50 | 50 |
| g2 | a | 30 | 80 | 80 | 80 |
| g2 | b | 30 | 110 | 110 | 110 |
| g2 | c | 40 | 150 | 150 | 150 |
| g3 | a | 50 | 200 | 200 | 200 |
| g3 | b | 50 | 250 | 250 | 250 |
| g3 | c | 60 | 310 | 310 | 310 |
| g3 | d | 50 | 360 | 360 | 360 |
+-----+-----+------+-------+-------+-------+Example: BETWEEN I Named Boundaries
Calculate the cumulative totals across all rows in descending order:
SELECT grp, ord, col1,
SUM(col1) OVER(ORDER BY grp, ord ASC ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) as sum_1,
SUM(col1) OVER(ORDER BY grp, ord ASC RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) as sum_2,
SUM(col1) OVER(ORDER BY grp, ord) as sum_3
FROM my_folder.my_table ORDER BY grp, ord ASC;+-----+-----+------+-------+-------+-------+
| GRP | ORD | COL1 | SUM_1 | SUM_2 | SUM_3 |
+-----+-----+------+-------+-------+-------+
| g1 | a | 10 | 360 | 360 | 10 |
| g1 | b | 20 | 350 | 350 | 30 |
| g1 | c | 20 | 330 | 330 | 50 |
| g2 | a | 30 | 310 | 310 | 80 |
| g2 | b | 30 | 280 | 280 | 110 |
| g2 | c | 40 | 250 | 250 | 150 |
| g3 | a | 50 | 210 | 210 | 200 |
| g3 | b | 50 | 160 | 160 | 250 |
| g3 | c | 60 | 110 | 110 | 310 |
| g3 | d | 50 | 50 | 50 | 360 |
+-----+-----+------+-------+-------+-------+Example: BETWEEN II Named Boundaries
Calculate totals per partition:
SELECT grp, ord, col1,
SUM(col1) OVER(ORDER BY grp, ord ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as sum_1,
SUM(col1) OVER(PARTITION BY grp ORDER BY grp, ord ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as sum_2,
SUM(col1) OVER(PARTITION BY grp ORDER BY grp, ord ASC RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as sum_3
FROM my_folder.my_table ORDER BY grp, ord ASC;+-----+-----+------+-------+-------+-------+
| GRP | ORD | COL1 | SUM_1 | SUM_2 | SUM_3 |
+-----+-----+------+-------+-------+-------+
| g1 | a | 10 | 360 | 50 | 50 |
| g1 | b | 20 | 360 | 50 | 50 |
| g1 | c | 20 | 360 | 50 | 50 |
| g2 | a | 30 | 360 | 100 | 100 |
| g2 | b | 30 | 360 | 100 | 100 |
| g2 | c | 40 | 360 | 100 | 100 |
| g3 | a | 50 | 360 | 210 | 210 |
| g3 | b | 50 | 360 | 210 | 210 |
| g3 | c | 60 | 360 | 210 | 210 |
| g3 | d | 50 | 360 | 210 | 210 |
+-----+-----+------+-------+-------+-------+Example: BETWEEN III Named Boundaries
Another example of calculating cumulative totals:
SELECT grp, ord, col1,
SUM(col1) OVER(ORDER BY grp, ord ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as sum_1,
SUM(col1) OVER(PARTITION BY grp ORDER BY grp, ord ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as sum_2,
SUM(col1) OVER(PARTITION BY grp ORDER BY grp, ord ASC RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as sum_3
FROM my_folder.my_table ORDER BY grp, ord ASC;+-----+-----+------+-------+-------+-------+
| GRP | ORD | COL1 | SUM_1 | SUM_2 | SUM_3 |
+-----+-----+------+-------+-------+-------+
| g1 | a | 10 | 10 | 10 | 10 |
| g1 | b | 20 | 30 | 30 | 30 |
| g1 | c | 20 | 50 | 50 | 50 |
| g2 | a | 30 | 80 | 30 | 30 |
| g2 | b | 30 | 110 | 60 | 60 |
| g2 | c | 40 | 150 | 100 | 100 |
| g3 | a | 50 | 200 | 50 | 50 |
| g3 | b | 50 | 250 | 100 | 100 |
| g3 | c | 60 | 310 | 160 | 160 |
| g3 | d | 50 | 360 | 210 | 210 |
+-----+-----+------+-------+-------+-------+Example: BETWEEN IV Mixed Boundaries ROWS
Moving averages:
SELECT grp, ord, col1,
AVG(col1) OVER( PARTITION BY grp ORDER BY grp, ord ASC
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) as ma_1,
AVG(col1) OVER( PARTITION BY grp ORDER BY grp, ord ASC
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 FOLLOWING) as ma_2,
AVG(col1) OVER( PARTITION BY grp ORDER BY grp, ord ASC
ROWS BETWEEN 1 PRECEDING AND UNBOUNDED FOLLOWING) as ma_3
FROM my_folder.my_table ORDER BY grp, ord ASC;+-----+-----+------+--------+--------+--------+
| GRP | ORD | COL1 | MA_1 | MA_2 | MA_3 |
+-----+-----+------+--------+--------+--------+
| g1 | a | 10 | 15.000 | 15.000 | 16.666 |
| g1 | b | 20 | 16.666 | 16.666 | 16.666 |
| g1 | c | 20 | 20.000 | 16.666 | 20.000 |
| g2 | a | 30 | 30.000 | 30.000 | 33.333 |
| g2 | b | 30 | 33.333 | 33.333 | 33.333 |
| g2 | c | 40 | 35.000 | 33.333 | 35.000 |
| g3 | a | 50 | 50.000 | 50.000 | 52.500 |
| g3 | b | 50 | 53.333 | 53.333 | 52.500 |
| g3 | c | 60 | 53.333 | 52.500 | 53.333 |
| g3 | d | 50 | 55.000 | 52.500 | 55.000 |
+-----+-----+------+--------+--------+--------+Example: BETWEEN V Mixed Boundaries RANGE
Please note, this query returns an error:
SELECT grp, ord, col1,
AVG(col1) OVER(PARTITION BY grp ORDER BY grp, ord ASC
RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING) as ma_1
FROM my_folder.my_table ORDER BY grp, ord ASC;Error: invalid window aggregation group in the window specificationBecause, a query in which window frame is defined by RANGE BETWEEN with numeric boundaries can only contain one ORDER BY expression.
This query will also return an error because the window frame definition of this type requires numeric or timestamp values in the ORDER BY clause.
SELECT grp, ord, col1,
AVG(col1) OVER(PARTITION BY grp ORDER BY ord ASC
RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING) as ma_1
FROM my_folder.my_table ORDER BY grp, ord ASC;ORDER BY clause is invalid: The ORDER BY clause for window frame 'RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING' must have a numeric, date, or timestamp expression.Here is a correct query using RANGE window frames:
SELECT grp, id, col1,
AVG(col1) OVER(PARTITION BY grp ORDER BY id ASC
RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING) as ma_1,
AVG(col1) OVER(PARTITION BY grp ORDER BY id ASC
RANGE BETWEEN UNBOUNDED PRECEDING AND 1 FOLLOWING) as ma_2,
AVG(col1) OVER(PARTITION BY grp ORDER BY id ASC
RANGE BETWEEN 1 PRECEDING AND UNBOUNDED FOLLOWING) as ma_3
FROM my_folder.my_table ORDER BY grp, ord ASC;+-----+----+------+--------+--------+--------+
| GRP | ID | COL1 | MA_1 | MA_2 | MA_3 |
+-----+----+------+--------+--------+--------+
| g1 | 1 | 10 | 15.000 | 15.000 | 16.666 |
| g1 | 2 | 20 | 16.666 | 16.666 | 16.666 |
| g1 | 3 | 20 | 20.000 | 16.666 | 20.000 |
| g2 | 4 | 30 | 30.000 | 30.000 | 33.333 |
| g2 | 5 | 30 | 33.333 | 33.333 | 33.333 |
| g2 | 6 | 40 | 35.000 | 33.333 | 35.000 |
| g3 | 7 | 50 | 50.000 | 50.000 | 52.500 |
| g3 | 8 | 50 | 53.333 | 53.333 | 52.500 |
| g3 | 9 | 60 | 53.333 | 52.500 | 53.333 |
| g3 | 10 | 50 | 55.000 | 52.500 | 55.000 |
+-----+----+------+--------+--------+--------+Example: RANGE and ROWS BETWEEN Comparison
In the following example we can see the different behaviour of RANGE-based and ROW-based window frames by calculation of the sums using the same dataset and similar syntax:
SELECT quarter,
prod_cat,
product,
sale,
SUM(sale) OVER(PARTITION BY prod_cat ORDER BY quarter ASC RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as sum_quarter,
SUM(sale) OVER(PARTITION BY prod_cat ORDER BY quarter ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as cum_total
from my_folder.my_table2 order by prod_cat, quarter, product;The RANGE window frame syntax calculates the sum of row values for the sale column for partitions defined by the prod_cat column, aggregated per quarter and cummulated over quarters.
The ROWS window frame syntax calculates the sum of row values for the sale column for partitions defined by the prod_cat column, cummulated over rows.
The column values are different. Colored lines mark the rows where the sum_quarter* and cum_total columns both have equal values.
+---------+----------+-------------+------+-------------+-----------+
| QUARTER | PROD_CAT | PRODUCT | SALE | SUM_QUARTER | CUM_TOTAL |
+---------+----------+-------------+------+-------------+-----------+
| 1 | Hat | Blue Hat | 3.5 | 12 | 3.5 |
| 1 | Hat | Red Hat | 4 | 12 | 7.5 |
| 1 | Hat | Yellow Hat | 4.5 | 12 | 12 |
| 2 | Hat | Blue Hat | 2.8 | 19.5 | 14.8 |
| 2 | Hat | Red Hat | 3.2 | 19.5 | 18 |
| 2 | Hat | Yellow Hat | 1.5 | 19.5 | 19.5 |
| 3 | Hat | Blue Hat | 3.3 | 33.1 | 22.8 |
| 3 | Hat | Red Hat | 4.8 | 33.1 | 27.6 |
| 3 | Hat | Yellow Hat | 5.5 | 33.1 | 33.1 |
| 4 | Hat | Blue Hat | 3.5 | 45.2 | 36.6 |
| 4 | Hat | Red Hat | 4 | 45.2 | 40.6 |
| 4 | Hat | Yellow Hat | 4.6 | 45.2 | 45.2 |
| 1 | Jeans | Blue Jeans | 13.5 | 52.6 | 13.5 |
| 1 | Jeans | Green Jeans | 16.5 | 52.6 | 30 |
| 1 | Jeans | Grey Jeans | 22.6 | 52.6 | 52.6 |
| 2 | Jeans | Blue Jeans | 12.8 | 107.4 | 65.4 |
| 2 | Jeans | Green Jeans | 10.3 | 107.4 | 75.7 |
| 2 | Jeans | Grey Jeans | 31.7 | 107.4 | 107.4 |
| 3 | Jeans | Blue Jeans | 11.4 | 166.5 | 118.8 |
| 3 | Jeans | Green Jeans | 23.8 | 166.5 | 142.6 |
| 3 | Jeans | Grey Jeans | 23.9 | 166.5 | 166.5 |
| 4 | Jeans | Blue Jeans | 19.1 | 220.3 | 185.6 |
| 4 | Jeans | Green Jeans | 17.4 | 220.3 | 203 |
| 4 | Jeans | Grey Jeans | 17.3 | 220.3 | 220.3 |
+---------+----------+-------------+------+-------------+-----------+The differences arise from different definition of window frame.
For ROWS-based window frames, current row represents a physical row for which the value is currently being calculated. In our example, the value for the column cum_total is calculated as the sum of the value of the column sale of the current row and all rows before it, implied they belong to the same partition.
For RANGE-based window frames, all adjacent rows that have the same value in the column quarter and belong to the same partition are included in the window frame used to calculate the sum_quarter value (as if they were all treated as current row). Therefore, the values in the sum_quarter column are the same for any row with the same value in the quarter and prod_cat columns.
Example: Supported non-ranking Window Functions
Return cumulative values over quarter for each product-partition:
SELECT
product,
sale,
quarter,
COUNT(sale) OVER (PARTITION BY prod_cat,product ORDER BY quarter ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) count_sale,
SUM(sale) OVER (PARTITION BY prod_cat,product ORDER BY quarter ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) sum_sale,
AVG(sale) OVER (PARTITION BY prod_cat,product ORDER BY quarter ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) avg_sale,
MIN(sale) OVER (PARTITION BY prod_cat,product ORDER BY quarter ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) min_sale,
MAX(sale) OVER (PARTITION BY prod_cat,product ORDER BY quarter ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) max_sale
FROM my_folder.my_table2
ORDER BY prod_cat,product,quarter;+-------------+------+---------+------------+----------+---------------+----------+----------+
| PRODUCT | SALE | QUARTER | COUNT_SALE | SUM_SALE | AVG_SALE | MIN_SALE | MAX_SALE |
+-------------+------+---------+------------+----------+---------------+----------+----------+
| Blue Hat | 3.5 | 1 | 1 | 3.5 | 3.5 | 3.5 | 3.5 |
| Blue Hat | 2.8 | 2 | 2 | 6.3 | 3.15 | 2.8 | 3.5 |
| Blue Hat | 3.3 | 3 | 3 | 9.6 | 3.2 | 2.8 | 3.5 |
| Blue Hat | 3.5 | 4 | 4 | 13.1 | 3.275 | 2.8 | 3.5 |
| Red Hat | 4 | 1 | 1 | 4 | 4 | 4 | 4 |
| Red Hat | 3.2 | 2 | 2 | 7.2 | 3.6 | 3.2 | 4 |
| Red Hat | 4.8 | 3 | 3 | 12 | 4 | 3.2 | 4.8 |
| Red Hat | 4 | 4 | 4 | 16 | 4 | 3.2 | 4.8 |
| Yellow Hat | 4.5 | 1 | 1 | 4.5 | 4.5 | 4.5 | 4.5 |
| Yellow Hat | 1.5 | 2 | 2 | 6 | 3 | 1.5 | 4.5 |
| Yellow Hat | 5.5 | 3 | 3 | 11.5 | 3.833333333 | 1.5 | 5.5 |
| Yellow Hat | 4.6 | 4 | 4 | 16.1 | 4.025 | 1.5 | 5.5 |
| Blue Jeans | 13.5 | 1 | 1 | 13.5 | 13.5 | 13.5 | 13.5 |
| Blue Jeans | 12.8 | 2 | 2 | 26.3 | 13.15 | 12.8 | 13.5 |
| Blue Jeans | 11.4 | 3 | 3 | 37.7 | 12.566666667 | 11.4 | 13.5 |
| Blue Jeans | 19.1 | 4 | 4 | 56.8 | 14.2 | 11.4 | 19.1 |
| Green Jeans | 16.5 | 1 | 1 | 16.5 | 16.5 | 16.5 | 16.5 |
| Green Jeans | 10.3 | 2 | 2 | 26.8 | 13.4 | 10.3 | 16.5 |
| Green Jeans | 23.8 | 3 | 3 | 50.6 | 16.866666667 | 10.3 | 23.8 |
| Green Jeans | 17.4 | 4 | 4 | 68 | 17 | 10.3 | 23.8 |
| Grey Jeans | 22.6 | 1 | 1 | 22.6 | 22.6 | 22.6 | 22.6 |
| Grey Jeans | 31.7 | 2 | 2 | 54.3 | 27.15 | 22.6 | 31.7 |
| Grey Jeans | 23.9 | 3 | 3 | 78.2 | 26.066666667 | 22.6 | 31.7 |
| Grey Jeans | 17.3 | 4 | 4 | 95.5 | 23.875 | 17.3 | 31.7 |
+-------------+------+---------+------------+----------+---------------+----------+----------+Please note
The same results will be produced, if wie modify the query and simplify it:
SELECT
product,
sale,
quarter,
COUNT(sale) OVER (PARTITION BY prod_cat,product ORDER BY quarter) count_sale,
SUM(sale) OVER (PARTITION BY prod_cat,product ORDER BY quarter) sum_sale,
AVG(sale) OVER (PARTITION BY prod_cat,product ORDER BY quarter) avg_sale,
MIN(sale) OVER (PARTITION BY prod_cat,product ORDER BY quarter) min_sale,
MAX(sale) OVER (PARTITION BY prod_cat,product ORDER BY quarter) max_sale
FROM my_folder.my_table2
ORDER BY prod_cat,product,quarter;This is because without explicitly specifying the window frame and defining an ORDER BY clause, ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW is used as the default value for window frame.
If you also omit the ORDER BY clause, a different default value is used for the window frame definition and the results are different:
SELECT
product,
sale,
quarter,
COUNT(sale) OVER (PARTITION BY prod_cat,product) count_sale,
SUM(sale) OVER (PARTITION BY prod_cat,product) sum_sale,
AVG(sale) OVER (PARTITION BY prod_cat,product) avg_sale,
MIN(sale) OVER (PARTITION BY prod_cat,product) min_sale,
MAX(sale) OVER (PARTITION BY prod_cat,product) max_sale
FROM my_folder.my_table2
ORDER BY prod_cat,product,quarter;+-------------+------+---------+------------+----------+----------+----------+----------+
| PRODUCT | SALE | QUARTER | COUNT_SALE | SUM_SALE | AVG_SALE | MIN_SALE | MAX_SALE |
+-------------+------+---------+------------+----------+----------+----------+----------+
| Blue Hat | 3.5 | 1 | 4 | 13.1 | 3.275 | 2.8 | 3.5 |
| Blue Hat | 2.8 | 2 | 4 | 13.1 | 3.275 | 2.8 | 3.5 |
| Blue Hat | 3.3 | 3 | 4 | 13.1 | 3.275 | 2.8 | 3.5 |
| Blue Hat | 3.5 | 4 | 4 | 13.1 | 3.275 | 2.8 | 3.5 |
| Red Hat | 4 | 1 | 4 | 16 | 4 | 3.2 | 4.8 |
| Red Hat | 3.2 | 2 | 4 | 16 | 4 | 3.2 | 4.8 |
| Red Hat | 4.8 | 3 | 4 | 16 | 4 | 3.2 | 4.8 |
| Red Hat | 4 | 4 | 4 | 16 | 4 | 3.2 | 4.8 |
| Yellow Hat | 4.5 | 1 | 4 | 16.1 | 4.025 | 1.5 | 5.5 |
| Yellow Hat | 1.5 | 2 | 4 | 16.1 | 4.025 | 1.5 | 5.5 |
| Yellow Hat | 5.5 | 3 | 4 | 16.1 | 4.025 | 1.5 | 5.5 |
| Yellow Hat | 4.6 | 4 | 4 | 16.1 | 4.025 | 1.5 | 5.5 |
| Blue Jeans | 13.5 | 1 | 4 | 56.8 | 14.2 | 11.4 | 19.1 |
| Blue Jeans | 12.8 | 2 | 4 | 56.8 | 14.2 | 11.4 | 19.1 |
| Blue Jeans | 11.4 | 3 | 4 | 56.8 | 14.2 | 11.4 | 19.1 |
| Blue Jeans | 19.1 | 4 | 4 | 56.8 | 14.2 | 11.4 | 19.1 |
| Green Jeans | 16.5 | 1 | 4 | 68 | 17 | 10.3 | 23.8 |
| Green Jeans | 10.3 | 2 | 4 | 68 | 17 | 10.3 | 23.8 |
| Green Jeans | 23.8 | 3 | 4 | 68 | 17 | 10.3 | 23.8 |
| Green Jeans | 17.4 | 4 | 4 | 68 | 17 | 10.3 | 23.8 |
| Grey Jeans | 22.6 | 1 | 4 | 95.5 | 23.875 | 17.3 | 31.7 |
| Grey Jeans | 31.7 | 2 | 4 | 95.5 | 23.875 | 17.3 | 31.7 |
| Grey Jeans | 23.9 | 3 | 4 | 95.5 | 23.875 | 17.3 | 31.7 |
| Grey Jeans | 17.3 | 4 | 4 | 95.5 | 23.875 | 17.3 | 31.7 |
+-------------+------+---------+------------+----------+----------+----------+----------+If window frame definition and also ORDER BY clause are omitted, RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING is used as window border definition. The implied order of the rows in the table is retained.
Example: "Assymetric" Boundaries
With this example we show that window frame does not necessarily have to enclose the current row.
SELECT grp,
ord,
col1,
SUM(col1) OVER (PARTITION BY grp ORDER BY ord ROWS BETWEEN 2 PRECEDING AND 1 PRECEDING) sum_preceding,
SUM(col1) OVER (PARTITION BY grp ORDER BY ord ROWS BETWEEN 1 FOLLOWING AND 2 FOLLOWING) sum_following
FROM my_folder.my_table
ORDER BY grp, ord;+-----+-----+------+---------------+---------------+
| GRP | ORD | COL1 | SUM_PRECEDING | SUM_FOLLOWING |
+-----+-----+------+---------------+---------------+
| g1 | a | 10 | null | 40 |
| g1 | b | 20 | 10 | 20 |
| g1 | c | 20 | 30 | null |
| g2 | a | 30 | null | 70 |
| g2 | b | 30 | 30 | 40 |
| g2 | c | 40 | 60 | null |
| g3 | a | 50 | null | 110 |
| g3 | b | 50 | 50 | 110 |
| g3 | c | 60 | 100 | 50 |
| g3 | d | 50 | 110 | null |
+-----+-----+------+---------------+---------------+NULL values occur where calculations are performed for rows that are located at the beginning or end of a partition. The sum of the preceding or following rows is zero because the rows are outside the window frame and column values for them are NULL.
Example: Using GROUP BY with Window Functions
The GROUP BY clause does not affect window function calculations. It can be used in the same query as the window function. Note: If you use the GROUP BY clause in a query in which you use a window function, all columns that you include in the SELECT clause, except the results of the window function result column, must be included in the GROUP BY clause.
The results will be the same, as without GROUP BY clause.
In this example, we calculate the average of all products whose SALE is at least 10 greater than SALE of the current row.
SELECT prod_cat,
product,
sale,
AVG(sale) OVER(ORDER BY sale RANGE BETWEEN CURRENT ROW AND 10 FOLLOWING) AS average
FROM my_folder.my_table2
WHERE prod_cat = 'Jeans'
GROUP BY prod_cat, product, sale
ORDER BY sale;
-- AND
SELECT prod_cat,
product,
sale,
AVG(sale) OVER(ORDER BY sale RANGE BETWEEN CURRENT ROW AND 10 FOLLOWING) AS average
FROM my_folder.my_table2
WHERE prod_cat = 'Jeans'
ORDER BY sale; Have the same result.
For example, the first row of the output table (row 4), the row with PROD_CAT = 'Jeans' and PRODUCT = 'Blue Jeans' and SALE = 10.3, has the value 14.7875 in the AVERAGE column. To calculate the AVERAGE = 14.7875, values were used that lie between 10.3 (SALE value of the current row) and 10.3 + 10 (inclusive): (10.3 + 11.4 + 12.8 + 13.5 + 16.5 + 17.3 + 17.4 + 19.1)/8 = 14.7875.
Last row (row 15) contains the value 31.7 in both columns SALE and AVERAGE. This is because window frame for this row consists of the rows with the SALE values that are between 31.7 (inclusive) (SALE value of the current row) and 31.7 + 10 (inclusive). Since there is only one such value, the result is the value itself.
+----------+-------------+------+--------------+
| PROD_CAT | PRODUCT | SALE | AVERAGE |
+----------+-------------+------+--------------+
| Jeans | Green Jeans | 10.3 | 14.7875 |
| Jeans | Blue Jeans | 11.4 | 15.428571429 |
| Jeans | Blue Jeans | 12.8 | 17.028571429 |
| Jeans | Blue Jeans | 13.5 | 17.733333333 |
| Jeans | Green Jeans | 16.5 | 20.085714286 |
| Jeans | Grey Jeans | 17.3 | 20.683333333 |
| Jeans | Green Jeans | 17.4 | 21.36 |
| Jeans | Blue Jeans | 19.1 | 22.35 |
| Jeans | Grey Jeans | 22.6 | 25.5 |
| Jeans | Green Jeans | 23.8 | 26.466666667 |
| Jeans | Grey Jeans | 23.9 | 27.8 |
| Jeans | Grey Jeans | 31.7 | 31.7 |
+----------+-------------+------+--------------+This query returns an error, because the column sale is specified in the SELECT clause but not in the GROUP BY clause.
SELECT prod_cat,
product,
sale,
AVG(sale) OVER(ORDER BY sale RANGE BETWEEN CURRENT ROW AND 10 FOLLOWING) AS average
FROM my_folder.my_table2
WHERE prod_cat = 'Jeans'
GROUP BY prod_cat, product
ORDER BY sale;[MY_TABLE2.SALE] is not a valid group by expressionFor more information, see RANGE-based window frames.
Example: Handling of NULL values
See also ORDER BY clause.
In general, it is recommended to avoid NULL values in the order_expr, because depending on the window frame definition, it can lead to unexpected results.
Here we show, using two different window frame definitions as examples, what effects the NULLS FIRST and NULLS LAST settings in the ORDER BY clause within the OVER() clause can have.
With NULLS LAST:
- With the window frame definition with explicit offsets, lines with NULL in order_expr are included in the frame if the order_expr of the current line is
NULL. - With the window frame definition with
UNBOUNDED FOLLOWING, lines withNULLin order_expr are included in the frame.
SELECT grp2, col2,
SUM(col2) OVER(ORDER BY grp2 NULLS LAST RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING) sum_1,
SUM(col2) OVER(ORDER BY grp2 NULLS LAST RANGE BETWEEN 1 PRECEDING AND UNBOUNDED FOLLOWING) sum_2
FROM my_folder.my_table
ORDER BY grp2 NULLS LAST;+------+------+-------+-------+
| GRP2 | COL2 | SUM_1 | SUM_2 |
+------+------+-------+-------+
| 1 | 10 | 135 | 373 |
| 1 | 20 | 135 | 373 |
| 2 | 30 | 235 | 373 |
| 2 | 35 | 235 | 373 |
| 2 | 40 | 235 | 373 |
| 3 | 45 | 205 | 343 |
| 3 | 55 | 205 | 343 |
| NULL | 25 | 138 | 138 |
| NULL | 53 | 138 | 138 |
| NULL | 60 | 138 | 138 |
+------+------+-------+-------+For rows with GRP2 = 1, SUM_1 is calculated as (10 + 20) + (30 + 35 + 40) = 135. The values with GRP2 = NULL are not included in the window frame. The rows with GRP2 = 2 and 3 also do not include rows with GRP2 = NULL values in the calculated totals.
The last three rows with GRP2 = NULL form a "separate" window frame.
For SUM_2 and GRP2 = 1, the result is (10 + 20) + (30 + 35 + 40) + (45 + 55) + (25 + 53 + 60) = 373, since all subsequent rows with UNBOUNDED FOLLOWING are included in the calculation and rows with GRP2 = NULL are sorted to the end of the table.
Ranking window functions
The syntax for a ranking window function is essentially the same as the syntax for other window functions. The exceptions include: Ranking window functions require the ORDER BY clause inside the OVER clause. For some ranking functions, such as RANK itself, no input argument is required. For the RANK function, the value returned is based solely on numeric ranking, as determined by the ORDER BY clause inside the OVER clause. Therefore, passing a column name or expression to the function is unnecessary. The simplest ranking function is named RANK. You can use this function to: Rank salespeople on revenue (sales), from highest to lowest. Rank countries based on their per-capita GDP (income per person), from highest to lowest. Rank countries on air pollution, from lowest to highest. This function simply identifies the numeric ranking position of a row in an ordered set of rows. The first row has rank 1, the second has rank 2, and so on. The following example shows the rank order of salespeople based on Amount Sold:
| Salesperson | Amount Sold | Rank |
|---|---|---|
| Smith | 2000 | 1 |
| Jones | 1500 | 2 |
| Torkelson | 1200 | 3 |
| Dolenz | 1100 | 4 |
The rows must already be sorted before the rankings can be assigned. Therefore, you must use an ORDER BY clause within the OVER clause. Consider the following example: you’d like to know where your store profit ranks among branches of the store chain (whether your store ranks first, second, third, and so on). This example ranks each store by profitability within its city. The rows are put in descending order (highest profit first), so the most profitable store is ranked 1:
SELECT city, branch_ID, net_profit,
RANK() OVER (PARTITION BY city ORDER BY net_profit DESC) AS rank
FROM store_sales
ORDER BY city, rank;+-----------+-----------+------------+------+
| CITY | BRANCH_ID | NET_PROFIT | RANK |
|-----------+-----------+------------+------|
| Montreal | 3 | 10000.00 | 1 |
| Montreal | 4 | 9000.00 | 2 |
| Vancouver | 2 | 15000.00 | 1 |
| Vancouver | 1 | 10000.00 | 2 |
+-----------+-----------+------------+------+Note
The net_profit column does not need to be passed as an argument to the RANK function. Instead, the input rows are sorted by net_profit. The RANK function merely needs to return the position of the row (1, 2, 3, and so on) within the partition. The output of a ranking function depends on: The individual row passed to the function. The values of the other rows in the partition. The order of all the rows in the partition. Snowflake provides several different ranking functions. For a list of these functions, and more details about their syntax, see Window functions. To rank your store against all other stores in the chain, not just against other stores in your city, use the query below:
SELECT
branch_ID,
net_profit,
RANK() OVER (ORDER BY net_profit DESC) AS sales_rank
FROM store_sales;The following query uses the first ORDER BY clause to control processing by the window function and the second ORDER BY clause to control the order of the entire query’s output:
SELECT
branch_ID,
net_profit,
RANK() OVER (ORDER BY net_profit DESC) AS sales_rank
FROM store_sales
ORDER BY branch_ID;Illustrated example
This example uses a sales scenario to illustrate many of the concepts described earlier in this topic. Suppose that you need to generate a financial report that shows values based on sales over the last week: Daily sales Ranking within the week (that is, sales ranked highest to lowest for the week) Sales so far this week (that is, the “cumulative sum” for all days from the beginning of the week up through and including the current day) Total sales for the week Three-day moving average (that is, the average over the current day and the two previous days) The report might look something like this:
+--------+-------+------+--------------+-------------+--------------+
| Day of | Sales | Rank | Sales So Far | Total Sales | 3-Day Moving |
| Week | Today | | This Week | This Week | Average |
|--------+-------+------+--------------+-------------|--------------+
| 1 | 10 | 4 | 10 | 84 | 10.0 |
| 2 | 14 | 3 | 24 | 84 | 12.0 |
| 3 | 6 | 5 | 30 | 84 | 10.0 |
| 4 | 6 | 5 | 36 | 84 | 9.0 |
| 5 | 14 | 3 | 50 | 84 | 10.0 |
| 6 | 16 | 2 | 66 | 84 | 11.0 |
| 7 | 18 | 1 | 84 | 84 | 12.0 |
+--------+-------+------+--------------+-------------+--------------+The SQL for this query is somewhat complex. Rather than show the example as a single query, this discussion breaks down the SQL for the individual columns. In a real-world scenario, you would have years of data, so to calculate sums and averages for one specific week of data, you would need to use a one-week window, or use a filter similar to: ... WHERE date >= start_of_relevant_week and date <= end_of_relevant_week ... However, for this example, assume that the table contains only the most recent week’s worth of data.
CREATE TABLE store_sales_2 (
day INTEGER,
sales_today INTEGER
);+-------------------------------------------+
| status |
|-------------------------------------------|
| Table STORE_SALES_2 successfully created. |
+-------------------------------------------+INSERT INTO store_sales_2 (day, sales_today) VALUES
(1, 10),
(2, 14),
(3, 6),
(4, 6),
(5, 14),
(6, 16),
(7, 18);+-------------------------+
| number of rows inserted |
|-------------------------|
| 7 |
+-------------------------+Calculating sales rank
The Rank column is calculated using the RANK function:
SELECT day,
sales_today,
RANK()
OVER (ORDER BY sales_today DESC) AS Rank
FROM store_sales_2
ORDER BY day;+-----+-------------+------+
| DAY | SALES_TODAY | RANK |
|-----+-------------+------|
| 1 | 10 | 5 |
| 2 | 14 | 3 |
| 3 | 6 | 6 |
| 4 | 6 | 6 |
| 5 | 14 | 3 |
| 6 | 16 | 2 |
| 7 | 18 | 1 |
+-----+-------------+------+Although there are 7 days in the time period, there are only 5 different ranks (1, 2, 3, 5, 6). There were two ties (for 3rd place and 6th place), so there are no rows with ranks 4 or 7. Calculating sales so far this week The Sales So Far This Week column is calculated using SUM as a window function with a window frame:
SELECT day,
sales_today,
SUM(sales_today)
OVER (ORDER BY day
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
AS "SALES SO FAR THIS WEEK"
FROM store_sales_2
ORDER BY day;+-----+-------------+------------------------+
| DAY | SALES_TODAY | SALES SO FAR THIS WEEK |
|-----+-------------+------------------------|
| 1 | 10 | 10 |
| 2 | 14 | 24 |
| 3 | 6 | 30 |
| 4 | 6 | 36 |
| 5 | 14 | 50 |
| 6 | 16 | 66 |
| 7 | 18 | 84 |
+-----+-------------+------------------------+This query orders the rows by date and then, for each date, calculates the sum of sales from the start of the window up to the current date (inclusive).
Calculating total sales this week
The Total Sales This Week column is calculated using SUM.
SELECT day,
sales_today,
SUM(sales_today)
OVER ()
AS total_sales
FROM store_sales_2
ORDER BY day;+-----+-------------+-------------+
| DAY | SALES_TODAY | TOTAL_SALES |
|-----+-------------+-------------|
| 1 | 10 | 84 |
| 2 | 14 | 84 |
| 3 | 6 | 84 |
| 4 | 6 | 84 |
| 5 | 14 | 84 |
| 6 | 16 | 84 |
| 7 | 18 | 84 |
+-----+-------------+-------------+Calculating a three-day moving average
The 3-Day Moving Average column is calculated using AVG as a window function with a window frame:
SELECT day,
sales_today,
AVG(sales_today)
OVER (ORDER BY day ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
AS "3-DAY MOVING AVERAGE"
FROM store_sales_2
ORDER BY day;+-----+-------------+----------------------+
| DAY | SALES_TODAY | 3-DAY MOVING AVERAGE |
|-----+-------------+----------------------|
| 1 | 10 | 10.000 |
| 2 | 14 | 12.000 |
| 3 | 6 | 10.000 |
| 4 | 6 | 8.666 |
| 5 | 14 | 8.666 |
| 6 | 16 | 12.000 |
| 7 | 18 | 16.000 |
+-----+-------------+----------------------+The difference between this window frame and the window frame described earlier is the starting point: a fixed boundary versus an explicit offset.
Putting it all together
Here’s the final version of the query, showing all of the columns:
SELECT day,
sales_today,
RANK()
OVER (ORDER BY sales_today DESC) AS Rank,
SUM(sales_today)
OVER (ORDER BY day
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
AS "SALES SO FAR THIS WEEK",
SUM(sales_today)
OVER ()
AS total_sales,
AVG(sales_today)
OVER (ORDER BY day ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
AS "3-DAY MOVING AVERAGE"
FROM store_sales_2
ORDER BY day;+-----+-------------+------+------------------------+-------------+----------------------+
| DAY | SALES_TODAY | RANK | SALES SO FAR THIS WEEK | TOTAL_SALES | 3-DAY MOVING AVERAGE |
|-----+-------------+------+------------------------+-------------+----------------------|
| 1 | 10 | 5 | 10 | 84 | 10.000 |
| 2 | 14 | 3 | 24 | 84 | 12.000 |
| 3 | 6 | 6 | 30 | 84 | 10.000 |
| 4 | 6 | 6 | 36 | 84 | 8.666 |
| 5 | 14 | 3 | 50 | 84 | 8.666 |
| 6 | 16 | 2 | 66 | 84 | 12.000 |
| 7 | 18 | 1 | 84 | 84 | 16.000 |
+-----+-------------+------+------------------------+-------------+----------------------+Ranking function examples
The following example shows how to rank sales based on the total amount (in dollars) that each salesperson has sold. The ORDER BY clause within the OVER clause sorts the totals in descending order (highest to lowest). The query calculates the rank of each salesperson relative to all other salespeople. Create the table and insert the data:
CREATE TABLE sales_table (salesperson_name VARCHAR, sales_in_dollars INTEGER);
INSERT INTO sales_table (salesperson_name, sales_in_dollars) VALUES
('Smith', 600),
('Jones', 1000),
('Torkelson', 700),
('Dolenz', 800);Now query the data:
SELECT
salesperson_name,
sales_in_dollars,
RANK() OVER (ORDER BY sales_in_dollars DESC) AS sales_rank
FROM sales_table;+------------------+------------------+------------+
| SALESPERSON_NAME | SALES_IN_DOLLARS | SALES_RANK |
|------------------+------------------+------------|
| Jones | 1000 | 1 |
| Dolenz | 800 | 2 |
| Torkelson | 700 | 3 |
| Smith | 600 | 4 |
+------------------+------------------+------------+The output is not necessarily ordered by rank. To display reults ordered by rank, specify an ORDER BY clause for the query itself (in addition to the ORDER BY clause for the window function), as shown here:
SELECT
salesperson_name,
sales_in_dollars,
RANK() OVER (ORDER BY sales_in_dollars DESC) AS sales_rank
FROM sales_table
ORDER BY 3;+------------------+------------------+------------+
| SALESPERSON_NAME | SALES_IN_DOLLARS | SALES_RANK |
|------------------+------------------+------------|
| Jones | 1000 | 1 |
| Dolenz | 800 | 2 |
| Torkelson | 700 | 3 |
| Smith | 600 | 4 |
+------------------+------------------+------------+The preceding example has two ORDER BY clauses: One controls the order of the ranking. One controls the order of the output. These clauses are independent. For example, you could order the rankings based on total sales (as shown above), but order the output rows based on the salesperson’s last name:
SELECT
salesperson_name,
sales_in_dollars,
RANK() OVER (ORDER BY sales_in_dollars DESC) AS sales_rank
FROM sales_table
ORDER BY salesperson_name;+------------------+------------------+------------+
| SALESPERSON_NAME | SALES_IN_DOLLARS | SALES_RANK |
|------------------+------------------+------------|
| Dolenz | 800 | 2 |
| Jones | 1000 | 1 |
| Smith | 600 | 4 |
| Torkelson | 700 | 3 |
+------------------+------------------+------------+