Skip to content
Guides>SQL Guide>Window Functions

Working with window functions

In this section we describe the conceptual structure and general usage of window functions.

Overview

Unlike other SQL functions, a window function calculates the values on a group or tuple of rows. Such a group is called a partition. Partitions are defined by specifying implicit or explicit "window frames". A window frame is calculated relative to the current row and contains a fixed or variable set of rows. The function results are calculated per row within each partition. Each row within the window frame therefore becomes the current row.

The window functions calculate the values ​​over the defined partition and return them for each row belonging to this partition.

The OVER() clause defines the components, that are necessary to accomplish this behavior:

  • A PARTITION BY clause
  • An ORDER BY clause
  • A specification of the window frame.

From query to query all of this components may be optional or contains multiple OVER() clauses.

A query with an empty OVER() clause can also be valid. In this case, the values ​​are calculated across all rows (as if all rows belonged to the same partition) but, unlike aggregate functions, are not output as one row but per row.

Syntax

sql
function_name([function_arguments]) 
    OVER    ([PARTITION BY part_expr [, ...]]
             [ORDER BY order_expr [ASC | DESC] [NULLS {FIRST | LAST}] [, ...]]
             [{   {ROWS | RANGE} UNBOUNDED PRECEDING  
                | {ROWS | RANGE} n PRECEDING
                | {ROWS | RANGE} CURRENT ROW
                | {ROWS | RANGE} BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
                | {ROWS | RANGE} BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING 
                | {ROWS | RANGE} BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
                | {ROWS | RANGE} BETWEEN n {PRECEDING | FOLLOWING} AND n {PRECEDING | FOLLOWING}
                | {ROWS | RANGE} BETWEEN UNBOUNDED PRECEDING AND n {PRECEDING | FOLLOWING}
                | {ROWS | RANGE} BETWEEN n {PRECEDING | FOLLOWING} AND UNBOUNDED FOLLOWING
                }])

Arguments

function_name: Name of an SQL Function.

function_arguments (optional): Function arguments.

OVER(): This clause specifies that this function should be applied as a window function (and not as an aggregate function) and specifies the window frame over which rows the function should be applied.

An OVER() clause with empty parentheses uses all rows as one partition and an implicit default window frame.

PARTITION BY part_expr (optional): This clause specifies the part_expr, that used to group rows into partitions.

Values ​​are calculated per partition if this clause is specified. If this clause is omitted, all rows in the query are considered as one partition.

ORDER BY order_expr (optional or required): This clause sorts rows within a partition before the function calculates the values. ORDER BY clause is required, if a window frame is specified explicitly or for some specific functions and optional in all other cases.

ORDER BY within the OVER() clause is to be distinguished from the ORDER BY clause, which is used to sort the results of a query, although they have the same syntax.

More about data sorting for window functions and ORDER BY within an OVER() clause can be found below in the section ORDER BY clause.


Window Frame definition options:

See Range- vs. row-based window frames, RANGE-based window frames, ROW-based window frames and Further examples for more information and more examples.

RANGE UNBOUNDED PRECEDING or ROWS UNBOUNDED PRECEDING: This keywords specifies a window frame that contains rows between the first row in the partition and the current row.

RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING or ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING : This keywords specifies a window frame that contains rows between the current row the and last row in the partition.

ROWS BETWEEN n {PRECEDING | FOLLOWING} AND n {PRECEDING | FOLLOWING} : This clause allows to define a ROWS-based window frame with the explicitly specified offset of size n.

ROWS BETWEEN n {PRECEDING | FOLLOWING} AND n {PRECEDING | FOLLOWING} or
ROWS BETWEEN UNBOUNDED PRECEDING AND n {PRECEDING | FOLLOWING} or
ROWS BETWEEN n {PRECEDING | FOLLOWING} AND UNBOUNDED FOLLOWING

This clauses allows defining a ROWS-based window frame with the explicitly specified offset of size n or use the UNBOUNDED keyword to define one of the boundaries as a dynamic frame boundary.

RANGE BETWEEN n {PRECEDING | FOLLOWING} AND n {PRECEDING | FOLLOWING} or
RANGE BETWEEN UNBOUNDED PRECEDING AND n {PRECEDING | FOLLOWING} or
RANGE BETWEEN n {PRECEDING | FOLLOWING} AND UNBOUNDED FOLLOWING

This clause allows to define a RANGE-based window frame with the explicitly specified offset of size n or use the UNBOUNDED keyword to define one of the boundaries as a dynamic frame boundary.

When specifying RANGE BETWEEN window frames with explicit offsets, only one ORDER BY expression is allowed.

Not all data types are supported by this type of window frame. See more about data types, limitations and other options for this type of window frames under RANGE-based window frames.

n: is an unsigned constant, positive numeric value including 0.

With ORDER BY ord_expr ASC, the syntax ...n FOLLOWING means, that all rows of the partition of the current row with order_expr values, that are between x and x+n will be included in the window frame, where x is the value of ord_expr expression for current row. The syntax ...n PRECEDING means in this case, that all rows of the partition of the current row with order_expr values, that are between x-n and x will be included in the window frame. See RANGE BETWEEN examples.

With ORDER BY DESC both clauses behave in opposite ways.

Important Notes

  • Window frames are supported by all window functions. However, the syntax and supported types of window frames can vary from function to function.
  • If no window frame was specified explicitely and ORDER BY clause is specified, default syntax for window frame specification is used. The default window frame syntax for functions COUNT, MIN, MAX, SUM and AVG is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. See also example and notes for this example.
  • If an empty OVER() is used, that is, no window frame is explicitly specified and no ORDER BY clause is specified, RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING is the default for the functions COUNT, MIN, MAX, SUM and AVG. See also example and notes for this example.
  • The offsets, specified as 0 PRECEDING or 0 FOLLOWING are equivalent to CURRENT ROW.
  • When you define an explicit window frame, the window functions require that the data in the window be in a known order. Therefore, the ORDER BY clause within the OVER() clause is mandatory for the window frame syntax, although this ORDER BY clause is generally optional.

Examples: Index & Tables

Examples:

Please create this tables if you want to try the examples yourself: Create my_table and my_table2 tables (please unfold).
sql
CREATE TABLE IF NOT EXISTS my_folder.my_table(
  ord STRING,
  grp STRING,
  col1 INTEGER,
  grp2 INTEGER,
  col2 INTEGER,
  id INTEGER,
  date_str STRING
);
INSERT INTO my_folder.my_table (ord, grp, col1, grp2, col2, id, date_str) VALUES 
    ('a', 'g1', 10, 1, 10,   1,'2023-06-30'),
    ('b', 'g1', 20, 1, 20,   2,'2023-07-31'),
    ('c', 'g1', 20, NULL, 25,   3,'2023-08-31'),
    ('a', 'g2', 30, 2, 30,   4,'2023-09-30'),
    ('b', 'g2', 30, 2, 35, 5,'2023-10-31'),
    ('c', 'g2', 40, 2, 40,   6,'2023-11-30'),
    ('a', 'g3', 50, 3, 45, 7,'2023-12-31'),
    ('b', 'g3', 50, NULL, 53,   8,'2024-01-31'),
    ('c', 'g3', 60, NULL, 60, 9,'2024-02-29'),
    ('d', 'g3', 50, 3, 55,   10,'2024-03-31'); 
SELECT  * FROM my_folder.my_table;
txt
+-----+-------+------+------+------+----+-------------------------+
| ORD | GRP   | COL1 | GRP2 | COL2 | ID | DATE                    |
+-----+-------+------+------+------+----+-------------------------+
| a   | g1    | 10   | 1    | 10   | 1  | 2023-06-30 00:00:00.000 |
| b   | g1    | 20   | 1    | 20   | 2  | 2023-07-31 00:00:00.000 |
| c   | g1    | 20   | NULL | 25   | 3  | 2023-08-31 00:00:00.000 |
| a   | g2    | 30   | 2    | 30   | 4  | 2023-09-30 00:00:00.000 |
| b   | g2    | 30   | 2    | 35   | 5  | 2023-10-31 00:00:00.000 |
| c   | g2    | 40   | 2    | 40   | 6  | 2023-11-30 00:00:00.000 |
| a   | g3    | 50   | 3    | 45   | 7  | 2023-12-31 00:00:00.000 |
| b   | g3    | 50   | NULL | 53   | 8  | 2024-01-31 00:00:00.000 |
| c   | g3    | 60   | NULL | 60   | 9  | 2024-02-29 00:00:00.000 |
| d   | g3    | 50   | 3    | 55   | 10 | 2024-03-31 00:00:00.000 |
+-----+-------+------+------+------+----+-------------------------+

And:

sql
CREATE TABLE IF NOT EXISTS my_folder.my_table2(
  product STRING, 
  prod_cat STRING, 
  sale DECIMAL, 
  quarter INTEGER,
  quarter_2 DECIMAL
);
INSERT INTO my_folder.my_table2 VALUES 
    ('Blue Hat',	'Hat',	  3.5,1,1.1),
    ('Red Hat',		'Hat',	  4.0,1,1.1),
    ('Yellow Hat',	'Hat',	  4.5,1,1.1),
    ('Blue Hat',	'Hat',	  2.8,2,1.2),
    ('Red Hat',		'Hat',	  3.2,2,1.2),
    ('Yellow Hat',	'Hat',	  1.5,2,1.2),
    ('Blue Hat',	'Hat',	  3.3,3,1.3),
    ('Red Hat',		'Hat',	  4.8,3,1.3),
    ('Yellow Hat',	'Hat',	  5.5,3,1.3),
    ('Blue Hat',	'Hat',	  3.5,4,1.4),
    ('Red Hat',		'Hat',	  4.0,4,1.4),
    ('Yellow Hat',	'Hat',	  4.6,4,1.4),
    ('Blue Jeans',	'Jeans',    13.5,1,1.1),
    ('Grey Jeans',	'Jeans',	22.6,1,1.1),
    ('Green Jeans',	'Jeans',	16.5,1,1.1),
    ('Blue Jeans',	'Jeans',	12.8,2,1.2),
    ('Grey Jeans',	'Jeans',	31.7,2,1.2),
    ('Green Jeans',	'Jeans',	10.3,2,1.2),
    ('Blue Jeans',	'Jeans',	11.4,3,1.3),
    ('Grey Jeans',	'Jeans',	23.9,3,1.3),
    ('Green Jeans',	'Jeans',	23.8,3,1.3),
    ('Blue Jeans',	'Jeans',	19.1,4,1.4),
    ('Grey Jeans',	'Jeans',	17.3,4,1.4),
    ('Green Jeans',	'Jeans',	17.4,4,1.4);
SELECT  * FROM my_folder.my_table2;
txt
+-------------+----------+------+---------+-----------+
| PRODUCT     | PROD_CAT | SALE | QUARTER | QUARTER_2 |
+-------------+----------+------+---------+-----------+
| Blue Hat    | Hat      | 3.5  | 1       | 1.1       |
| Red Hat     | Hat      | 4    | 1       | 1.1       |
| Yellow Hat  | Hat      | 4.5  | 1       | 1.1       |
| Blue Hat    | Hat      | 2.8  | 2       | 1.2       |
| Red Hat     | Hat      | 3.2  | 2       | 1.2       |
| Yellow Hat  | Hat      | 1.5  | 2       | 1.2       |
| Blue Hat    | Hat      | 3.3  | 3       | 1.3       |
| Red Hat     | Hat      | 4.8  | 3       | 1.3       |
| Yellow Hat  | Hat      | 5.5  | 3       | 1.3       |
| Blue Hat    | Hat      | 3.5  | 4       | 1.4       |
| Red Hat     | Hat      | 4    | 4       | 1.4       |
| Yellow Hat  | Hat      | 4.6  | 4       | 1.4       |
| Blue Jeans  | Jeans    | 13.5 | 1       | 1.1       |
| Grey Jeans  | Jeans    | 22.6 | 1       | 1.1       |
| Green Jeans | Jeans    | 16.5 | 1       | 1.1       |
| Blue Jeans  | Jeans    | 12.8 | 2       | 1.2       |
| Grey Jeans  | Jeans    | 31.7 | 2       | 1.2       |
| Green Jeans | Jeans    | 10.3 | 2       | 1.2       |
| Blue Jeans  | Jeans    | 11.4 | 3       | 1.3       |
| Grey Jeans  | Jeans    | 23.9 | 3       | 1.3       |
| Green Jeans | Jeans    | 23.8 | 3       | 1.3       |
| Blue Jeans  | Jeans    | 19.1 | 4       | 1.4       |
| Grey Jeans  | Jeans    | 17.3 | 4       | 1.4       |
| Green Jeans | Jeans    | 17.4 | 4       | 1.4       |
+-------------+----------+------+---------+-----------+

Window vs. aggregate functions

Some aggregate functions have window function implementations of the same name (see COUNT, MIN, MAX, SUM and AVG).

The differences lie in the scope of rows used for the calculation and in the way the results are output.

AGGREGATE FUNCTIONWINDOW FUNCTION
Input DataAll rowsRows of the partition, defined by OVER clause
Output DataOne rowPer row of the corresponding partition

Compare the output of AVG function as aggregation function and window function:

sql
SELECT AVG(col1) FROM my_folder.my_table;

SELECT AVG(col1) OVER() FROM my_folder.my_table;
txt
+---------- +
| AVG(COL1) |
+-----------+
| 36        |
+-----------+

+------------------+
| AVG(COL1) OVER() |
+------------------+
| 36               |
| 36               |
| 36               |
| 36               |
| 36               |
| 36               |
| 36               |
| 36               |
| 36               |
| 36               |
+------------------+

In this example, we used the AVG window function with an empty OVER() clause, which resulted in the same average value being calculated because all 10 rows were analyzed as one partition, but it was output per row and not in one row as in the aggregation variant.

In following example we use the same table, but analyse the partitions, based on column grp:

sql
SELECT grp, AVG(col1) FROM my_folder.my_table GROUP BY grp; -- aggregate

SELECT grp, col1, AVG(col1) OVER(PARTITION BY grp) FROM my_folder.my_table; -- window
txt
+-----+------------+
| GRP | AVG(COL1)  |
+-----+------------+
| g1  | 16.666667  |
| g2  | 33.333333  |
| g3  | 52.500000  |
+-----+------------+

+-----+------+----------------------------------+
| GRP | COL1 | AVG(COL1) OVER(PARTITION BY GRP) |
+-----+------+----------------------------------+
| g1  | 10   | 16.666                           |
| g1  | 20   | 16.666                           |
| g1  | 20   | 16.666                           |
| g2  | 30   | 33.333                           |
| g2  | 30   | 33.333                           |
| g2  | 40   | 33.333                           |
| g3  | 50   | 52.500                           |
| g3  | 50   | 52.500                           |
| g3  | 60   | 52.500                           |
| g3  | 50   | 52.500                           |
+-----+------+----------------------------------+

The calculation is restarted when the value of the column defining the partition changes.

The functionality of window functions becomes even clearer when using the explicit window frame definition.

This example uses an explicit window definition to calculate the values ​​per row as an average of the previous row, the current row, and a following row per grp partition:

sql
SELECT grp, col1, ord,
        AVG(col1) OVER( PARTITION BY grp 
                        ORDER BY ord ASC 
                        ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) as MA
    FROM my_folder.my_table ORDER BY grp, ord;
txt
+-----+------+-----+--------+
| GRP | COL1 | ORD | MA     |
+-----+------+-----+--------+
| g1  | 10   | a   | 15.000 |
| g1  | 20   | b   | 16.666 |
| g1  | 20   | c   | 20.000 |
| g2  | 30   | a   | 30.000 |
| g2  | 30   | b   | 33.333 |
| g2  | 40   | c   | 35.000 |
| g3  | 50   | a   | 50.000 |
| g3  | 50   | b   | 53.333 |
| g3  | 60   | c   | 53.333 |
| g3  | 50   | d   | 55.000 |
+-----+------+-----+--------+

The value for row GRP='g1' and ORD='a' is calculated as an average of values ​​10 and 20 because it is the first row in the partition (you can also see it in rows GRP='g2' and ORD='a' and GRP='g3' and ORD='a').

The value for row GRP='g1' and ORD='c' is calculated as an average of values ​​20 and 20 because it is the last row in the partition (same as for last values ​​of other grp partitions).

Other values ​​were calculated as an average of a previous, current and following row.

The ORDER BY clause within the OVER() clause influences the sorting of the window data within the partition, whereas the final order by clause only influences the output sorting.

The output of the window function therefore depends not only on the value of the individual line, but also on the values ​​of the other lines that qualify for the defined window frame.

ORDER BY clause

The ORDER BY clause of the OVER() clause ensures that the calculation of the result is based on the sorted data.

The ORDER BY clause is required when window functions compute rolling operations on the subsets of rows in each partition and apply them to data with explicit window borders (e.g. calculating running totals, moving averages etc.). For window functions of this kind, the set of "previous" and "following" rows must be deterministic and thus the ORDER BY clause is necessary.

The ORDER BY clause within the OVER() clause affects the sorting of the window data used in the calculation, but not the output of the query.

The output of the query can be sorted using the regular ORDER BY clause.

Compare this example with the example above:

sql
SELECT grp, col1, ord,
        AVG(col1) OVER( PARTITION BY grp 
                        ORDER BY ord ASC 
                        ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) as MA
    FROM my_folder.my_table ORDER BY grp, col1 DESC;                                 
txt
+-----+------+-----+--------+
| GRP | COL1 | ORD | MA     |
+-----+------+-----+--------+
| g1  | 20   | c   | 20.000 |
| g1  | 20   | b   | 16.666 |
| g1  | 10   | a   | 15.000 |
| g2  | 40   | c   | 35.000 |
| g2  | 30   | b   | 33.333 |
| g2  | 30   | a   | 30.000 |
| g3  | 60   | c   | 53.333 |
| g3  | 50   | d   | 55.000 |
| g3  | 50   | b   | 53.333 |
| g3  | 50   | a   | 50.000 |
+-----+------+-----+--------+

The results for each grp x ord pair did not change because the window definition remained the same. Only the output sorting ORDER BY grp, ord was changed to ORDER BY grp, col1 DESC, which resulted in a different sorting of the output.

The syntax of the ORDER BY clause within an OVER() clause is similar to the syntax of the regular ORDER BY clause with regard to the keywords ASC and DESC and the handling of NULL values.

! However, the ORDER BY clause can have a different meaning in different window frame types. For more information see ROWS-based window frames, RANGE-based window frames and Example: Handling of NULL Values.

In summary, the following must be considered when deciding whether to use an ORDER BY clause in an OVER() clause:

  • Both the ORDER BY clause and the PARTITION BY clause within an OVER() clause are optional and are used independently.
  • Some functions require the presence of an ORDER BY clause in an OVER() clause.
  • For some functions, an ORDER BY clause within an OVER() clause is optional.
  • Some functions do not allow an ORDER BY clause within an OVER() clause.
  • An ORDER BY clause within an OVER() clause is evaluated by some functions as an implicit window frame definition.

Caution

Generally speaking, SQL is an explicit language, with few implied clauses. However, for some window functions, an ORDER BY clause implies a window frame. For details, see Usage notes for window frames.

Because behavior that is implied rather than explicit can lead to results that are difficult to understand, Snowflake recommends declaring window frames explicitly.

Types of window frames

Depending on the clauses used and the purpose of the calculation, one can distinguish between different types of window frames. One speaks of

  • implicit & explicit window frames,
  • ROW-based & RANGE-based window frames,
  • cummulative & sliding window frames.

Implicit & Explicit window frames

Explicit window frame definition is described in the Syntax. This type of definition includes ways to use an offset of defined size or open boundaries to limit window frames for the rows to which the function is to be applied.

Implicit window frames are defined by the system if the OVER() clause does not contain an explicit window frame definition. For more about default definitions of implicit window frames, see Syntax and the Important Notes.

RANGE- & ROWS-based window frames

Both ROWS based window frame specifications and RANGE based specifications specify start and end points that specify the group of rows to which the function is applied. These boundaries can be specified as exact offset values ​​(explicit boundaries as number of rows n) or named boundaries (implicit boundaries).

Explicit offset values ​​are defined with numbers or intervals and keywords PRECEDING or FOLLOWING.

When defining the named start and end points, you can use the following keywords:

  • CURRENT ROW
  • UNBOUNDED PRECEDING: from the beginning of the partition
  • UNBOUNDED FOLLOWING: to the end of the partition.

⚬ ROWS-based window frames

With the ROWS based window frame definition, the frame is calculated as exact count of rows, relative to the current row and includes rows that physically fit into the frame defined by the specified offset. The physical order of the rows in the query has an influence and can be additionally specified using the ORDER BY clause.

Boundaries of the window frame and the current row are always included. This applies to both explicitly specified window frames and implicitly specified ones. A window frame, for example, that is delimited as ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING contains the current row and all physically subsequent rows up to the end of the partition. A specification of ROWS BETWEEN CURRENT ROW AND 14 FOLLOWING means that 14 following rows, counting from the current row, including the current row, are included in the window frame.

If the specified number of preceding or following rows exceeds the boundaries of the partition, the exceeding values ​​are treated as NULL.

⚬ RANGE-based window frames

A RANGE-based window frame in is a method of defining a dynamic set of rows used for calculations in a window function. Unlike ROWS, which refers to a fixed number of rows, RANGE is based on the values ​​in a specific column and considers all rows within a specific range of values.

The column values ​​used in the ORDER BY clause affect not only the sorting of the rows, but also the number of rows in the window frame defined as RANGE based window frame.

The process can be illustrated as follows:

  • The data is sorted using the ORDER BY clause within the OVER() clause
  • Then the rows are iterated over:
    1. The current row is determined
    2. The value of the PARTITION BY expression part_expr of the current row is determined
    3. The value of the ORDER BY expression order_expr of the current row is determined
    4. All rows are found whose ORDER BY expression order_expr has values ​​that are within the value range of the specified window frame boundaries.
    5. For these rows, it is checked whether the value of the PARTITION BY expression part_expr corresponds to the value of the PARTITION BY expression part_expr of the current row.
    6. The window frame includes the rows for which 4. and 5. apply, i.e. rows whose order_expr value lies within the window frame boundaries and which belong to the partition (including the current row).
    7. The value is calculated and output.
    8. The process is repeated from point 1. for the next row.

For example, with ORDER BY my_col ASC RANGE BETWEEN 2 PRECEDING AND CURRENT ROW the window frame contains

  • all rows that have the same value in column my_col as the current row and belong to the same partition (including the current row)
  • all rows, whose my_col value is between the value of the current row x and the value x - 2 and belong to the same partition.

The order chosen in the ORDER BY clause plays in this example the following role: If the order DESC is chosen, rows with values ​​in the range from x to x + 2 are included in the window frame. If the order ASC is chosen, rows with values ​​in the range from x - 2 to x are included in the window frame.

The offset value is specified as a number.

INFO

Currently only numeric offsets are supported. Intervals are not yet supported.

RANGE BETWEEN window frames limitations

  • RANGE BETWEEN window frames with explicit offsets must have only one ORDER BY expression.
  • Data types supported for ORDER BY expression of window frame with explicit offsets:
    • INTEGER
    • DECIMAL.
  • RANGE BETWEEN window frames with explicit offsets require n as a positive numeric constant value including 0 if order_expr is of data type INTEGER or FLOAT. See Syntax.
  • RANGE BETWEEN window frames with explicit offsets are supported by the following functions: COUNT, MIN, MAX, SUM and AVG.
  • DISTINCT is not supported for these functions when used in conjunction with RANGE BETWEEN window frames with explicit offsets.
  • The COUNT function can only be used with one argument when used in conjunction with RANGE BETWEEN window frames with explicit offsets.
  • Wildcard queries can not be used with COUNT when used in conjunction with RANGE BETWEEN window frames with explicit offsets.
  • The specification of the frame start and frame end points must be consistent. If this is not the case, i.e. if the frame start is after the frame end or the specification is reversed, an error is returned.

Examples

Valid definitions:
...ORDER BY day_of_month ASC RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING...
...ORDER BY day_of_month ASC RANGE BETWEEN 1 FOLLOWING AND 2 FOLLOWING...
...ORDER BY day_of_month ASC RANGE BETWEEN 2 PRECEDING AND 1 PRECEDING...

Invalid definitions:
...ORDER BY day_of_month ASC RANGE BETWEEN 1 FOLLOWING AND 1 PRECEDING...
...ORDER BY day_of_month ASC RANGE BETWEEN 2 FOLLOWING AND 1 FOLLOWING...
...ORDER BY day_of_month ASC RANGE BETWEEN 1 PRECEDING AND 2 PRECEDING...

  • If ORDER BY in OVER() contains NULL values, the following must be considered when using the window frame specification RANGE BETWEEN:
    • By specifying ORDER BY order_exmp NULLS FIRST together with UNBOUNDED PRECEDING, rows with NULL values in the column order_exp are included in the frame. See Syntax.
    • By specifying ORDER BY order_exmp NULLS LAST together with UNBOUNDED FOLLOWING, rows with NULL values in the column order_exp are included in the frame. See Syntax.
    • If the value of column order_exp of the ORDER BY clause in the OVER() clause of the current row is NULL, rows with the value NULL are included in the frame boundary of the window frame with an explicit offset. Otherwise they are excluded.

See RANGE BETWEEN examples examples, that illustrates the differences in behavior of ROWS and RANGE based window frame definitions.

Cumulative & Sliding window frames

Window frames help you perform various kinds of analytical tasks, including cumulative and sliding (or "moving") calculations.

Different window frame definitions are suitable for different calculation tasks.

The window frame definition also allows a mix of named frame boundaries and explicit offsets.

⚬ Cumulative window frames

For example, to calculate the cumulative sum of previous rows up to the current row, the following query is enough:

sql
SELECT id, col1, SUM(col1) OVER(ORDER BY id ROWS UNBOUNDED PRECEDING) as cum_total from my_folder.my_table;
txt
+----+------+-----------+
| ID | COL1 | CUM_TOTAL |
+----+------+-----------+
| 1  | 10   | 10        |
| 2  | 20   | 30        |
| 3  | 20   | 50        |
| 4  | 30   | 80        |
| 5  | 30   | 110       |
| 6  | 40   | 150       |
| 7  | 50   | 200       |
| 8  | 50   | 250       |
| 9  | 60   | 310       |
| 10 | 50   | 360       |
+----+------+-----------+

⚬ Sliding window frames

A sliding window frame is a frame of constant width that moves through the rows of a partition, covering a different section each time. The number of rows in the frame remains the same except at the beginning or end of a partition, where it may cover fewer rows.

This window frame type allows you to calculate moving averages based on a fixed-size interval, such as time periods. The average is "moving" because the actual values ​​in the interval change with each "movement" across the underlying dimension, even though the size of the interval remains constant.

To calculate moving averages of adjacent values, you can "slide" the calculation across the rows:

sql
SELECT id, col1, AVG(col1) OVER(ORDER BY id ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS ma 
    FROM my_folder.my_table ORDER BY id ASC;
txt
+----+------+--------+
| ID | COL1 | MA     |
+----+------+--------+
| 1  | 10   | 15.000 |
| 2  | 20   | 16.666 |
| 3  | 20   | 23.333 |
| 4  | 30   | 26.666 |
| 5  | 30   | 33.333 |
| 6  | 40   | 40.000 |
| 7  | 50   | 46.666 |
| 8  | 50   | 53.333 |
| 9  | 60   | 53.333 |
| 10 | 50   | 55.000 |
+----+------+--------+

Each value of the MA column in this example is an average of 3 values:

  • Value of the col1 column
  • col1 value of the previous row
  • col1 value of the following row.

Another example uses a RANGE-based and ROWS-based window frame definition. Additionaly we use a mix of named frame boundaries and explicit offsets. In this example, we calculate an average of the col1 value of the current row and the same value of the following two rows:

sql
SELECT id, col1, 
      AVG(col1) OVER(ORDER BY id RANGE BETWEEN CURRENT ROW AND 2 FOLLOWING) AS ma_1,
      AVG(col1) OVER(ORDER BY id ROWS  BETWEEN CURRENT ROW AND 2 FOLLOWING) AS ma_2 
    FROM my_folder.my_table;
txt
+----+------+--------+--------+
| ID | COL1 | MA_1   | MA_2   |
+----+------+--------+--------+
| 1  | 10   | 16.666 | 16.666 |
| 2  | 20   | 23.333 | 23.333 |
| 3  | 20   | 26.666 | 26.666 |
| 4  | 30   | 33.333 | 33.333 |
| 5  | 30   | 40.000 | 40.000 |
| 6  | 40   | 46.666 | 46.666 |
| 7  | 50   | 53.333 | 53.333 |
| 8  | 50   | 53.333 | 53.333 |
| 9  | 60   | 55.000 | 55.000 |
| 10 | 50   | 50.000 | 50.000 |
+----+------+--------+--------+

Please note that in this case, the results in columns ma_1 and ma_2 are the same even though we used different window frame definitions. This is very often not the case.

In this example this is due to the underlying data structure that only contains unique values ​​for the id column and that is used in the ORDER BY clause within the OVER() clause. For more information on the differences between RANGE and ROWS-based window frames, see more examples below and RANGE and ROWS-based window frames.

Further Examples

Please create this tables if You want to try the examples yourself.

Example: UNBOUNDED PRECEDING

Calculate the cumulative totals across all rows in ascending order:

sql
SELECT  grp, ord, col1, 
        SUM(col1) OVER(ORDER BY grp, ord ASC ROWS UNBOUNDED PRECEDING) as sum_1, 
        SUM(col1) OVER(ORDER BY grp, ord ASC RANGE UNBOUNDED PRECEDING) as sum_2,
        SUM(col1) OVER(ORDER BY grp, ord ASC) as sum_3
    FROM my_folder.my_table ORDER BY grp, ord ASC;
txt
+-----+-----+------+-------+-------+-------+
| GRP | ORD | COL1 | SUM_1 | SUM_2 | SUM_3 |
+-----+-----+------+-------+-------+-------+
| g1  | a   | 10   | 10    | 10    | 10    |
| g1  | b   | 20   | 30    | 30    | 30    |
| g1  | c   | 20   | 50    | 50    | 50    |
| g2  | a   | 30   | 80    | 80    | 80    |
| g2  | b   | 30   | 110   | 110   | 110   |
| g2  | c   | 40   | 150   | 150   | 150   |
| g3  | a   | 50   | 200   | 200   | 200   |
| g3  | b   | 50   | 250   | 250   | 250   |
| g3  | c   | 60   | 310   | 310   | 310   |
| g3  | d   | 50   | 360   | 360   | 360   |
+-----+-----+------+-------+-------+-------+

Example: BETWEEN I Named Boundaries

Calculate the cumulative totals across all rows in descending order:

sql
SELECT  grp, ord, col1, 
        SUM(col1) OVER(ORDER BY grp, ord ASC ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) as sum_1, 
        SUM(col1) OVER(ORDER BY grp, ord ASC RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) as sum_2,
        SUM(col1) OVER(ORDER BY grp, ord) as sum_3
    FROM my_folder.my_table ORDER BY grp, ord ASC;
txt
+-----+-----+------+-------+-------+-------+
| GRP | ORD | COL1 | SUM_1 | SUM_2 | SUM_3 |
+-----+-----+------+-------+-------+-------+
| g1  | a   | 10   | 360   | 360   | 10    |
| g1  | b   | 20   | 350   | 350   | 30    |
| g1  | c   | 20   | 330   | 330   | 50    |
| g2  | a   | 30   | 310   | 310   | 80    |
| g2  | b   | 30   | 280   | 280   | 110   |
| g2  | c   | 40   | 250   | 250   | 150   |
| g3  | a   | 50   | 210   | 210   | 200   |
| g3  | b   | 50   | 160   | 160   | 250   |
| g3  | c   | 60   | 110   | 110   | 310   |
| g3  | d   | 50   | 50    | 50    | 360   |
+-----+-----+------+-------+-------+-------+

Example: BETWEEN II Named Boundaries

Calculate totals per partition:

sql
SELECT  grp, ord, col1, 
        SUM(col1) OVER(ORDER BY grp, ord ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as sum_1, 
        SUM(col1) OVER(PARTITION BY grp ORDER BY grp, ord ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as sum_2, 
        SUM(col1) OVER(PARTITION BY grp ORDER BY grp, ord ASC RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as sum_3
    FROM my_folder.my_table ORDER BY grp, ord ASC;
txt
+-----+-----+------+-------+-------+-------+
| GRP | ORD | COL1 | SUM_1 | SUM_2 | SUM_3 |
+-----+-----+------+-------+-------+-------+
| g1  | a   | 10   | 360   | 50    | 50    |
| g1  | b   | 20   | 360   | 50    | 50    |
| g1  | c   | 20   | 360   | 50    | 50    |
| g2  | a   | 30   | 360   | 100   | 100   |
| g2  | b   | 30   | 360   | 100   | 100   |
| g2  | c   | 40   | 360   | 100   | 100   |
| g3  | a   | 50   | 360   | 210   | 210   |
| g3  | b   | 50   | 360   | 210   | 210   |
| g3  | c   | 60   | 360   | 210   | 210   |
| g3  | d   | 50   | 360   | 210   | 210   |
+-----+-----+------+-------+-------+-------+

Example: BETWEEN III Named Boundaries

Another example of calculating cumulative totals:

sql
SELECT  grp, ord, col1, 
        SUM(col1) OVER(ORDER BY grp, ord ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as sum_1, 
        SUM(col1) OVER(PARTITION BY grp ORDER BY grp, ord ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as sum_2, 
        SUM(col1) OVER(PARTITION BY grp ORDER BY grp, ord ASC RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as sum_3
    FROM my_folder.my_table ORDER BY grp, ord ASC;
txt
+-----+-----+------+-------+-------+-------+
| GRP | ORD | COL1 | SUM_1 | SUM_2 | SUM_3 |
+-----+-----+------+-------+-------+-------+
| g1  | a   | 10   | 10    | 10    | 10    |
| g1  | b   | 20   | 30    | 30    | 30    |
| g1  | c   | 20   | 50    | 50    | 50    |
| g2  | a   | 30   | 80    | 30    | 30    |
| g2  | b   | 30   | 110   | 60    | 60    |
| g2  | c   | 40   | 150   | 100   | 100   |
| g3  | a   | 50   | 200   | 50    | 50    |
| g3  | b   | 50   | 250   | 100   | 100   |
| g3  | c   | 60   | 310   | 160   | 160   |
| g3  | d   | 50   | 360   | 210   | 210   |
+-----+-----+------+-------+-------+-------+

Example: BETWEEN IV Mixed Boundaries ROWS

Moving averages:

sql
SELECT  grp, ord, col1, 
        AVG(col1) OVER( PARTITION BY grp ORDER BY grp, ord ASC 
                        ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) as ma_1, 
        AVG(col1) OVER( PARTITION BY grp ORDER BY grp, ord ASC 
                        ROWS BETWEEN UNBOUNDED PRECEDING AND 1 FOLLOWING) as ma_2, 
        AVG(col1) OVER( PARTITION BY grp ORDER BY grp, ord ASC 
                        ROWS BETWEEN 1 PRECEDING AND UNBOUNDED FOLLOWING) as ma_3 
    FROM my_folder.my_table ORDER BY grp, ord ASC;
txt
+-----+-----+------+--------+--------+--------+
| GRP | ORD | COL1 |  MA_1  |  MA_2  |  MA_3  |
+-----+-----+------+--------+--------+--------+
| g1  | a   | 10   | 15.000 | 15.000 | 16.666 |
| g1  | b   | 20   | 16.666 | 16.666 | 16.666 |
| g1  | c   | 20   | 20.000 | 16.666 | 20.000 |
| g2  | a   | 30   | 30.000 | 30.000 | 33.333 |
| g2  | b   | 30   | 33.333 | 33.333 | 33.333 |
| g2  | c   | 40   | 35.000 | 33.333 | 35.000 |
| g3  | a   | 50   | 50.000 | 50.000 | 52.500 |
| g3  | b   | 50   | 53.333 | 53.333 | 52.500 |
| g3  | c   | 60   | 53.333 | 52.500 | 53.333 |
| g3  | d   | 50   | 55.000 | 52.500 | 55.000 |
+-----+-----+------+--------+--------+--------+

Example: BETWEEN V Mixed Boundaries RANGE

Please note, this query returns an error:

sql
SELECT  grp, ord, col1, 
        AVG(col1) OVER(PARTITION BY grp ORDER BY grp, ord ASC 
                       RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING) as ma_1
    FROM my_folder.my_table ORDER BY grp, ord ASC;
txt
Error: invalid window aggregation group in the window specification

Because, a query in which window frame is defined by RANGE BETWEEN with numeric boundaries can only contain one ORDER BY expression.

This query will also return an error because the window frame definition of this type requires numeric or timestamp values ​​in the ORDER BY clause.

sql
SELECT  grp, ord, col1, 
        AVG(col1) OVER(PARTITION BY grp ORDER BY ord ASC 
            RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING) as ma_1
    FROM my_folder.my_table ORDER BY grp, ord ASC;
ORDER BY clause is invalid: The ORDER BY clause for window frame 'RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING' must have a numeric, date, or timestamp expression.

Here is a correct query using RANGE window frames:

sql
SELECT  grp, id, col1, 
        AVG(col1) OVER(PARTITION BY grp ORDER BY id ASC 
            RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING) as ma_1, 
        AVG(col1) OVER(PARTITION BY grp ORDER BY id ASC 
            RANGE BETWEEN UNBOUNDED PRECEDING AND 1 FOLLOWING) as ma_2, 
        AVG(col1) OVER(PARTITION BY grp ORDER BY id ASC 
            RANGE BETWEEN 1 PRECEDING AND UNBOUNDED FOLLOWING) as ma_3 
    FROM my_folder.my_table ORDER BY grp, ord ASC;
txt
+-----+----+------+--------+--------+--------+
| GRP | ID | COL1 | MA_1   | MA_2   | MA_3   |
+-----+----+------+--------+--------+--------+
| g1  | 1  | 10   | 15.000 | 15.000 | 16.666 |
| g1  | 2  | 20   | 16.666 | 16.666 | 16.666 |
| g1  | 3  | 20   | 20.000 | 16.666 | 20.000 |
| g2  | 4  | 30   | 30.000 | 30.000 | 33.333 |
| g2  | 5  | 30   | 33.333 | 33.333 | 33.333 |
| g2  | 6  | 40   | 35.000 | 33.333 | 35.000 |
| g3  | 7  | 50   | 50.000 | 50.000 | 52.500 |
| g3  | 8  | 50   | 53.333 | 53.333 | 52.500 |
| g3  | 9  | 60   | 53.333 | 52.500 | 53.333 |
| g3  | 10 | 50   | 55.000 | 52.500 | 55.000 |
+-----+----+------+--------+--------+--------+

Example: RANGE and ROWS BETWEEN Comparison

In the following example we can see the different behaviour of RANGE-based and ROW-based window frames by calculation of the sums using the same dataset and similar syntax:

sql
SELECT  quarter,
        prod_cat,
        product,
        sale,
        SUM(sale) OVER(PARTITION BY prod_cat ORDER BY quarter ASC RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as sum_quarter,
        SUM(sale) OVER(PARTITION BY prod_cat ORDER BY quarter ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as cum_total
    from my_folder.my_table2 order by prod_cat, quarter, product;

The RANGE window frame syntax calculates the sum of row values ​​for the sale column for partitions defined by the prod_cat column, aggregated per quarter and cummulated over quarters.

The ROWS window frame syntax calculates the sum of row values ​​for the sale column for partitions defined by the prod_cat column, cummulated over rows.

The column values ​​are different. Colored lines mark the rows where the sum_quarter* and cum_total columns both have equal values.

txt
+---------+----------+-------------+------+-------------+-----------+
| QUARTER | PROD_CAT | PRODUCT     | SALE | SUM_QUARTER | CUM_TOTAL |
+---------+----------+-------------+------+-------------+-----------+
| 1       | Hat      | Blue Hat    | 3.5  | 12          | 3.5       |
| 1       | Hat      | Red Hat     | 4    | 12          | 7.5       |
| 1       | Hat      | Yellow Hat  | 4.5  | 12          | 12        |
| 2       | Hat      | Blue Hat    | 2.8  | 19.5        | 14.8      |
| 2       | Hat      | Red Hat     | 3.2  | 19.5        | 18        |
| 2       | Hat      | Yellow Hat  | 1.5  | 19.5        | 19.5      |
| 3       | Hat      | Blue Hat    | 3.3  | 33.1        | 22.8      |
| 3       | Hat      | Red Hat     | 4.8  | 33.1        | 27.6      |
| 3       | Hat      | Yellow Hat  | 5.5  | 33.1        | 33.1      |
| 4       | Hat      | Blue Hat    | 3.5  | 45.2        | 36.6      |
| 4       | Hat      | Red Hat     | 4    | 45.2        | 40.6      |
| 4       | Hat      | Yellow Hat  | 4.6  | 45.2        | 45.2      |
| 1       | Jeans    | Blue Jeans  | 13.5 | 52.6        | 13.5      |
| 1       | Jeans    | Green Jeans | 16.5 | 52.6        | 30        |
| 1       | Jeans    | Grey Jeans  | 22.6 | 52.6        | 52.6      |
| 2       | Jeans    | Blue Jeans  | 12.8 | 107.4       | 65.4      |
| 2       | Jeans    | Green Jeans | 10.3 | 107.4       | 75.7      |
| 2       | Jeans    | Grey Jeans  | 31.7 | 107.4       | 107.4     |
| 3       | Jeans    | Blue Jeans  | 11.4 | 166.5       | 118.8     |
| 3       | Jeans    | Green Jeans | 23.8 | 166.5       | 142.6     |
| 3       | Jeans    | Grey Jeans  | 23.9 | 166.5       | 166.5     |
| 4       | Jeans    | Blue Jeans  | 19.1 | 220.3       | 185.6     |
| 4       | Jeans    | Green Jeans | 17.4 | 220.3       | 203       |
| 4       | Jeans    | Grey Jeans  | 17.3 | 220.3       | 220.3     |
+---------+----------+-------------+------+-------------+-----------+

The differences arise from different definition of window frame.

For ROWS-based window frames, current row represents a physical row for which the value is currently being calculated. In our example, the value for the column cum_total is calculated as the sum of the value of the column sale of the current row and all rows before it, implied they belong to the same partition.

For RANGE-based window frames, all adjacent rows that have the same value in the column quarter and belong to the same partition are included in the window frame used to calculate the sum_quarter value (as if they were all treated as current row). Therefore, the values ​​in the sum_quarter column are the same for any row with the same value in the quarter and prod_cat columns.

Example: Supported non-ranking Window Functions

Return cumulative values over quarter for each product-partition:

sql
SELECT
    product,
    sale, 
    quarter,
    COUNT(sale) OVER (PARTITION BY prod_cat,product  ORDER BY quarter ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) count_sale,
    SUM(sale)   OVER (PARTITION BY prod_cat,product  ORDER BY quarter ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) sum_sale,
    AVG(sale)   OVER (PARTITION BY prod_cat,product  ORDER BY quarter ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) avg_sale,
    MIN(sale)   OVER (PARTITION BY prod_cat,product  ORDER BY quarter ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) min_sale,
    MAX(sale)   OVER (PARTITION BY prod_cat,product  ORDER BY quarter ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) max_sale
  FROM my_folder.my_table2
  ORDER BY prod_cat,product,quarter;
txt
+-------------+------+---------+------------+----------+---------------+----------+----------+
| PRODUCT     | SALE | QUARTER | COUNT_SALE | SUM_SALE | AVG_SALE      | MIN_SALE | MAX_SALE |
+-------------+------+---------+------------+----------+---------------+----------+----------+
| Blue Hat    | 3.5  | 1       | 1          | 3.5      | 3.5           | 3.5      | 3.5      |
| Blue Hat    | 2.8  | 2       | 2          | 6.3      | 3.15          | 2.8      | 3.5      |
| Blue Hat    | 3.3  | 3       | 3          | 9.6      | 3.2           | 2.8      | 3.5      |
| Blue Hat    | 3.5  | 4       | 4          | 13.1     | 3.275         | 2.8      | 3.5      |
| Red Hat     | 4    | 1       | 1          | 4        | 4             | 4        | 4        |
| Red Hat     | 3.2  | 2       | 2          | 7.2      | 3.6           | 3.2      | 4        |
| Red Hat     | 4.8  | 3       | 3          | 12       | 4             | 3.2      | 4.8      |
| Red Hat     | 4    | 4       | 4          | 16       | 4             | 3.2      | 4.8      |
| Yellow Hat  | 4.5  | 1       | 1          | 4.5      | 4.5           | 4.5      | 4.5      |
| Yellow Hat  | 1.5  | 2       | 2          | 6        | 3             | 1.5      | 4.5      |
| Yellow Hat  | 5.5  | 3       | 3          | 11.5     | 3.833333333   | 1.5      | 5.5      |
| Yellow Hat  | 4.6  | 4       | 4          | 16.1     | 4.025         | 1.5      | 5.5      |
| Blue Jeans  | 13.5 | 1       | 1          | 13.5     | 13.5          | 13.5     | 13.5     |
| Blue Jeans  | 12.8 | 2       | 2          | 26.3     | 13.15         | 12.8     | 13.5     |
| Blue Jeans  | 11.4 | 3       | 3          | 37.7     | 12.566666667  | 11.4     | 13.5     |
| Blue Jeans  | 19.1 | 4       | 4          | 56.8     | 14.2          | 11.4     | 19.1     |
| Green Jeans | 16.5 | 1       | 1          | 16.5     | 16.5          | 16.5     | 16.5     |
| Green Jeans | 10.3 | 2       | 2          | 26.8     | 13.4          | 10.3     | 16.5     |
| Green Jeans | 23.8 | 3       | 3          | 50.6     | 16.866666667  | 10.3     | 23.8     |
| Green Jeans | 17.4 | 4       | 4          | 68       | 17            | 10.3     | 23.8     |
| Grey Jeans  | 22.6 | 1       | 1          | 22.6     | 22.6          | 22.6     | 22.6     |
| Grey Jeans  | 31.7 | 2       | 2          | 54.3     | 27.15         | 22.6     | 31.7     |
| Grey Jeans  | 23.9 | 3       | 3          | 78.2     | 26.066666667  | 22.6     | 31.7     |
| Grey Jeans  | 17.3 | 4       | 4          | 95.5     | 23.875        | 17.3     | 31.7     |
+-------------+------+---------+------------+----------+---------------+----------+----------+

Please note

The same results will be produced, if wie modify the query and simplify it:

sql
SELECT
    product,
    sale, 
    quarter,
    COUNT(sale) OVER (PARTITION BY prod_cat,product ORDER BY quarter) count_sale,
    SUM(sale)   OVER (PARTITION BY prod_cat,product ORDER BY quarter) sum_sale,
    AVG(sale)   OVER (PARTITION BY prod_cat,product ORDER BY quarter) avg_sale,
    MIN(sale)   OVER (PARTITION BY prod_cat,product ORDER BY quarter) min_sale,
    MAX(sale)   OVER (PARTITION BY prod_cat,product ORDER BY quarter) max_sale
  FROM my_folder.my_table2
  ORDER BY prod_cat,product,quarter;

This is because without explicitly specifying the window frame and defining an ORDER BY clause, ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW is used as the default value for window frame.

If you also omit the ORDER BY clause, a different default value is used for the window frame definition and the results are different:

sql
SELECT
    product,
    sale, 
    quarter,
    COUNT(sale) OVER (PARTITION BY prod_cat,product) count_sale,
    SUM(sale)   OVER (PARTITION BY prod_cat,product) sum_sale,
    AVG(sale)   OVER (PARTITION BY prod_cat,product) avg_sale,
    MIN(sale)   OVER (PARTITION BY prod_cat,product) min_sale,
    MAX(sale)   OVER (PARTITION BY prod_cat,product) max_sale
  FROM my_folder.my_table2
  ORDER BY prod_cat,product,quarter;
txt
+-------------+------+---------+------------+----------+----------+----------+----------+
| PRODUCT     | SALE | QUARTER | COUNT_SALE | SUM_SALE | AVG_SALE | MIN_SALE | MAX_SALE |
+-------------+------+---------+------------+----------+----------+----------+----------+
| Blue Hat    | 3.5  | 1       | 4          | 13.1     | 3.275    | 2.8      | 3.5      |
| Blue Hat    | 2.8  | 2       | 4          | 13.1     | 3.275    | 2.8      | 3.5      |
| Blue Hat    | 3.3  | 3       | 4          | 13.1     | 3.275    | 2.8      | 3.5      |
| Blue Hat    | 3.5  | 4       | 4          | 13.1     | 3.275    | 2.8      | 3.5      |
| Red Hat     | 4    | 1       | 4          | 16       | 4        | 3.2      | 4.8      |
| Red Hat     | 3.2  | 2       | 4          | 16       | 4        | 3.2      | 4.8      |
| Red Hat     | 4.8  | 3       | 4          | 16       | 4        | 3.2      | 4.8      |
| Red Hat     | 4    | 4       | 4          | 16       | 4        | 3.2      | 4.8      |
| Yellow Hat  | 4.5  | 1       | 4          | 16.1     | 4.025    | 1.5      | 5.5      |
| Yellow Hat  | 1.5  | 2       | 4          | 16.1     | 4.025    | 1.5      | 5.5      |
| Yellow Hat  | 5.5  | 3       | 4          | 16.1     | 4.025    | 1.5      | 5.5      |
| Yellow Hat  | 4.6  | 4       | 4          | 16.1     | 4.025    | 1.5      | 5.5      |
| Blue Jeans  | 13.5 | 1       | 4          | 56.8     | 14.2     | 11.4     | 19.1     |
| Blue Jeans  | 12.8 | 2       | 4          | 56.8     | 14.2     | 11.4     | 19.1     |
| Blue Jeans  | 11.4 | 3       | 4          | 56.8     | 14.2     | 11.4     | 19.1     |
| Blue Jeans  | 19.1 | 4       | 4          | 56.8     | 14.2     | 11.4     | 19.1     |
| Green Jeans | 16.5 | 1       | 4          | 68       | 17       | 10.3     | 23.8     |
| Green Jeans | 10.3 | 2       | 4          | 68       | 17       | 10.3     | 23.8     |
| Green Jeans | 23.8 | 3       | 4          | 68       | 17       | 10.3     | 23.8     |
| Green Jeans | 17.4 | 4       | 4          | 68       | 17       | 10.3     | 23.8     |
| Grey Jeans  | 22.6 | 1       | 4          | 95.5     | 23.875   | 17.3     | 31.7     |
| Grey Jeans  | 31.7 | 2       | 4          | 95.5     | 23.875   | 17.3     | 31.7     |
| Grey Jeans  | 23.9 | 3       | 4          | 95.5     | 23.875   | 17.3     | 31.7     |
| Grey Jeans  | 17.3 | 4       | 4          | 95.5     | 23.875   | 17.3     | 31.7     |
+-------------+------+---------+------------+----------+----------+----------+----------+

If window frame definition and also ORDER BY clause are omitted, RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING is used as window border definition. The implied order of the rows in the table is retained.

Example: "Assymetric" Boundaries

With this example we show that window frame does not necessarily have to enclose the current row.

sql
SELECT grp,
       ord,
       col1,
        SUM(col1) OVER (PARTITION BY grp ORDER BY ord ROWS BETWEEN 2 PRECEDING AND 1 PRECEDING) sum_preceding,
        SUM(col1) OVER (PARTITION BY grp ORDER BY ord ROWS BETWEEN 1 FOLLOWING AND 2 FOLLOWING) sum_following
    FROM my_folder.my_table
    ORDER BY grp, ord;
txt
+-----+-----+------+---------------+---------------+
| GRP | ORD | COL1 | SUM_PRECEDING | SUM_FOLLOWING |
+-----+-----+------+---------------+---------------+
| g1  | a   | 10   | null          | 40            |
| g1  | b   | 20   | 10            | 20            |
| g1  | c   | 20   | 30            | null          |
| g2  | a   | 30   | null          | 70            |
| g2  | b   | 30   | 30            | 40            |
| g2  | c   | 40   | 60            | null          |
| g3  | a   | 50   | null          | 110           |
| g3  | b   | 50   | 50            | 110           |
| g3  | c   | 60   | 100           | 50            |
| g3  | d   | 50   | 110           | null          |
+-----+-----+------+---------------+---------------+

NULL values ​​occur where calculations are performed for rows that are located at the beginning or end of a partition. The sum of the preceding or following rows is zero because the rows are outside the window frame and column values ​​for them are NULL.

Example: Using GROUP BY with Window Functions

The GROUP BY clause does not affect window function calculations. It can be used in the same query as the window function. Note: If you use the GROUP BY clause in a query in which you use a window function, all columns that you include in the SELECT clause, except the results of the window function result column, must be included in the GROUP BY clause.

The results will be the same, as without GROUP BY clause.

In this example, we calculate the average of all products whose SALE is at least 10 greater than SALE of the current row.

sql
SELECT prod_cat, 
    product, 
    sale,
    AVG(sale) OVER(ORDER BY sale RANGE BETWEEN CURRENT ROW AND 10 FOLLOWING) AS average
  FROM my_folder.my_table2
  WHERE prod_cat = 'Jeans' 
  GROUP BY prod_cat, product, sale   
  ORDER BY sale;

-- AND 

SELECT prod_cat, 
    product, 
    sale,
    AVG(sale) OVER(ORDER BY sale RANGE BETWEEN CURRENT ROW AND 10 FOLLOWING) AS average
  FROM my_folder.my_table2
  WHERE prod_cat = 'Jeans'
  ORDER BY sale;            

Have the same result.

For example, the first row of the output table (row 4), the row with PROD_CAT = 'Jeans' and PRODUCT = 'Blue Jeans' and SALE = 10.3, has the value 14.7875 in the AVERAGE column. To calculate the AVERAGE = 14.7875, values ​​were used that lie between 10.3 (SALE value of the current row) and 10.3 + 10 (inclusive): (10.3 + 11.4 + 12.8 + 13.5 + 16.5 + 17.3 + 17.4 + 19.1)/8 = 14.7875.

Last row (row 15) contains the value 31.7 in both columns SALE and AVERAGE. This is because window frame for this row consists of the rows with the SALE values ​​that are between 31.7 (inclusive) (SALE value of the current row) and 31.7 + 10 (inclusive). Since there is only one such value, the result is the value itself.

txt
+----------+-------------+------+--------------+
| PROD_CAT | PRODUCT     | SALE | AVERAGE      |
+----------+-------------+------+--------------+
| Jeans    | Green Jeans | 10.3 | 14.7875      |
| Jeans    | Blue Jeans  | 11.4 | 15.428571429 |
| Jeans    | Blue Jeans  | 12.8 | 17.028571429 |
| Jeans    | Blue Jeans  | 13.5 | 17.733333333 |
| Jeans    | Green Jeans | 16.5 | 20.085714286 |
| Jeans    | Grey Jeans  | 17.3 | 20.683333333 |
| Jeans    | Green Jeans | 17.4 | 21.36        |
| Jeans    | Blue Jeans  | 19.1 | 22.35        |
| Jeans    | Grey Jeans  | 22.6 | 25.5         |
| Jeans    | Green Jeans | 23.8 | 26.466666667 |
| Jeans    | Grey Jeans  | 23.9 | 27.8         |
| Jeans    | Grey Jeans  | 31.7 | 31.7         |
+----------+-------------+------+--------------+

This query returns an error, because the column sale is specified in the SELECT clause but not in the GROUP BY clause.

sql
SELECT prod_cat, 
    product, 
    sale,
    AVG(sale) OVER(ORDER BY sale RANGE BETWEEN CURRENT ROW AND 10 FOLLOWING) AS average
  FROM my_folder.my_table2
  WHERE prod_cat = 'Jeans' 
  GROUP BY prod_cat, product
  ORDER BY sale;
txt
[MY_TABLE2.SALE] is not a valid group by expression

For more information, see RANGE-based window frames.

Example: Handling of NULL values

See also ORDER BY clause.

In general, it is recommended to avoid NULL values ​​in the order_expr, because depending on the window frame definition, it can lead to unexpected results.

Here we show, using two different window frame definitions as examples, what effects the NULLS FIRST and NULLS LAST settings in the ORDER BY clause within the OVER() clause can have.

With NULLS LAST:

  • With the window frame definition with explicit offsets, lines with NULL in order_expr are included in the frame if the order_expr of the current line is NULL.
  • With the window frame definition with UNBOUNDED FOLLOWING, lines with NULL in order_expr are included in the frame.
sql
SELECT grp2, col2,
    SUM(col2) OVER(ORDER BY grp2 NULLS LAST RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING) sum_1,
    SUM(col2) OVER(ORDER BY grp2 NULLS LAST RANGE BETWEEN 1 PRECEDING AND UNBOUNDED FOLLOWING) sum_2
  FROM my_folder.my_table
  ORDER BY grp2 NULLS LAST;
txt
+------+------+-------+-------+
| GRP2 | COL2 | SUM_1 | SUM_2 |
+------+------+-------+-------+
| 1    | 10   | 135   | 373   |
| 1    | 20   | 135   | 373   |
| 2    | 30   | 235   | 373   |
| 2    | 35   | 235   | 373   |
| 2    | 40   | 235   | 373   |
| 3    | 45   | 205   | 343   |
| 3    | 55   | 205   | 343   |
| NULL | 25   | 138   | 138   |
| NULL | 53   | 138   | 138   |
| NULL | 60   | 138   | 138   |
+------+------+-------+-------+

For rows with GRP2 = 1, SUM_1 is calculated as (10 + 20) + (30 + 35 + 40) = 135. The values ​​with GRP2 = NULL are not included in the window frame. The rows with GRP2 = 2 and 3 also do not include rows with GRP2 = NULL values ​​in the calculated totals.

The last three rows with GRP2 = NULL form a "separate" window frame.

For SUM_2 and GRP2 = 1, the result is (10 + 20) + (30 + 35 + 40) + (45 + 55) + (25 + 53 + 60) = 373, since all subsequent rows with UNBOUNDED FOLLOWING are included in the calculation and rows with GRP2 = NULL are sorted to the end of the table.

Ranking window functions

The syntax for a ranking window function is essentially the same as the syntax for other window functions. The exceptions include: Ranking window functions require the ORDER BY clause inside the OVER clause. For some ranking functions, such as RANK itself, no input argument is required. For the RANK function, the value returned is based solely on numeric ranking, as determined by the ORDER BY clause inside the OVER clause. Therefore, passing a column name or expression to the function is unnecessary. The simplest ranking function is named RANK. You can use this function to: Rank salespeople on revenue (sales), from highest to lowest. Rank countries based on their per-capita GDP (income per person), from highest to lowest. Rank countries on air pollution, from lowest to highest. This function simply identifies the numeric ranking position of a row in an ordered set of rows. The first row has rank 1, the second has rank 2, and so on. The following example shows the rank order of salespeople based on Amount Sold:

SalespersonAmount SoldRank
Smith20001
Jones15002
Torkelson12003
Dolenz11004

The rows must already be sorted before the rankings can be assigned. Therefore, you must use an ORDER BY clause within the OVER clause. Consider the following example: you’d like to know where your store profit ranks among branches of the store chain (whether your store ranks first, second, third, and so on). This example ranks each store by profitability within its city. The rows are put in descending order (highest profit first), so the most profitable store is ranked 1:

sql
SELECT city, branch_ID, net_profit,
       RANK() OVER (PARTITION BY city ORDER BY net_profit DESC) AS rank
    FROM store_sales
    ORDER BY city, rank;
txt
+-----------+-----------+------------+------+
| CITY      | BRANCH_ID | NET_PROFIT | RANK |
|-----------+-----------+------------+------|
| Montreal  |         3 |   10000.00 |    1 |
| Montreal  |         4 |    9000.00 |    2 |
| Vancouver |         2 |   15000.00 |    1 |
| Vancouver |         1 |   10000.00 |    2 |
+-----------+-----------+------------+------+

Note

The net_profit column does not need to be passed as an argument to the RANK function. Instead, the input rows are sorted by net_profit. The RANK function merely needs to return the position of the row (1, 2, 3, and so on) within the partition. The output of a ranking function depends on: The individual row passed to the function. The values of the other rows in the partition. The order of all the rows in the partition. Snowflake provides several different ranking functions. For a list of these functions, and more details about their syntax, see Window functions. To rank your store against all other stores in the chain, not just against other stores in your city, use the query below:

sql
SELECT
    branch_ID,
    net_profit,
    RANK() OVER (ORDER BY net_profit DESC) AS sales_rank
  FROM store_sales;

The following query uses the first ORDER BY clause to control processing by the window function and the second ORDER BY clause to control the order of the entire query’s output:

sql
SELECT
    branch_ID,
    net_profit,
    RANK() OVER (ORDER BY net_profit DESC) AS sales_rank
  FROM store_sales
  ORDER BY branch_ID;

Illustrated example

This example uses a sales scenario to illustrate many of the concepts described earlier in this topic. Suppose that you need to generate a financial report that shows values based on sales over the last week: Daily sales Ranking within the week (that is, sales ranked highest to lowest for the week) Sales so far this week (that is, the “cumulative sum” for all days from the beginning of the week up through and including the current day) Total sales for the week Three-day moving average (that is, the average over the current day and the two previous days) The report might look something like this:

txt
+--------+-------+------+--------------+-------------+--------------+
| Day of | Sales | Rank | Sales So Far | Total Sales | 3-Day Moving |
| Week   | Today |      | This Week    | This Week   | Average      |
|--------+-------+------+--------------+-------------|--------------+
|      1 |    10 |    4 |           10 |          84 |         10.0 |
|      2 |    14 |    3 |           24 |          84 |         12.0 |
|      3 |     6 |    5 |           30 |          84 |         10.0 |
|      4 |     6 |    5 |           36 |          84 |          9.0 |
|      5 |    14 |    3 |           50 |          84 |         10.0 |
|      6 |    16 |    2 |           66 |          84 |         11.0 |
|      7 |    18 |    1 |           84 |          84 |         12.0 |
+--------+-------+------+--------------+-------------+--------------+

The SQL for this query is somewhat complex. Rather than show the example as a single query, this discussion breaks down the SQL for the individual columns. In a real-world scenario, you would have years of data, so to calculate sums and averages for one specific week of data, you would need to use a one-week window, or use a filter similar to: ... WHERE date >= start_of_relevant_week and date <= end_of_relevant_week ... However, for this example, assume that the table contains only the most recent week’s worth of data.

sql
CREATE TABLE store_sales_2 (
    day INTEGER,
    sales_today INTEGER
    );
txt
+-------------------------------------------+
| status                                    |
|-------------------------------------------|
| Table STORE_SALES_2 successfully created. |
+-------------------------------------------+
sql
INSERT INTO store_sales_2 (day, sales_today) VALUES
    (1, 10),
    (2, 14),
    (3,  6),
    (4,  6),
    (5, 14),
    (6, 16),
    (7, 18);
txt
+-------------------------+
| number of rows inserted |
|-------------------------|
|                       7 |
+-------------------------+

Calculating sales rank

The Rank column is calculated using the RANK function:

sql
SELECT day, 
       sales_today, 
       RANK()
           OVER (ORDER BY sales_today DESC) AS Rank
    FROM store_sales_2
    ORDER BY day;
txt
+-----+-------------+------+
| DAY | SALES_TODAY | RANK |
|-----+-------------+------|
|   1 |          10 |    5 |
|   2 |          14 |    3 |
|   3 |           6 |    6 |
|   4 |           6 |    6 |
|   5 |          14 |    3 |
|   6 |          16 |    2 |
|   7 |          18 |    1 |
+-----+-------------+------+

Although there are 7 days in the time period, there are only 5 different ranks (1, 2, 3, 5, 6). There were two ties (for 3rd place and 6th place), so there are no rows with ranks 4 or 7. Calculating sales so far this week The Sales So Far This Week column is calculated using SUM as a window function with a window frame:

sql
SELECT day, 
       sales_today, 
       SUM(sales_today)
           OVER (ORDER BY day
               ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
               AS "SALES SO FAR THIS WEEK"
    FROM store_sales_2
    ORDER BY day;
txt
+-----+-------------+------------------------+
| DAY | SALES_TODAY | SALES SO FAR THIS WEEK |
|-----+-------------+------------------------|
|   1 |          10 |                     10 |
|   2 |          14 |                     24 |
|   3 |           6 |                     30 |
|   4 |           6 |                     36 |
|   5 |          14 |                     50 |
|   6 |          16 |                     66 |
|   7 |          18 |                     84 |
+-----+-------------+------------------------+

This query orders the rows by date and then, for each date, calculates the sum of sales from the start of the window up to the current date (inclusive).

Calculating total sales this week

The Total Sales This Week column is calculated using SUM.

sql
SELECT day, 
       sales_today, 
       SUM(sales_today)
           OVER ()
               AS total_sales
    FROM store_sales_2
    ORDER BY day;
txt
+-----+-------------+-------------+
| DAY | SALES_TODAY | TOTAL_SALES |
|-----+-------------+-------------|
|   1 |          10 |          84 |
|   2 |          14 |          84 |
|   3 |           6 |          84 |
|   4 |           6 |          84 |
|   5 |          14 |          84 |
|   6 |          16 |          84 |
|   7 |          18 |          84 |
+-----+-------------+-------------+

Calculating a three-day moving average

The 3-Day Moving Average column is calculated using AVG as a window function with a window frame:

sql
SELECT day, 
       sales_today, 
       AVG(sales_today)
           OVER (ORDER BY day ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
               AS "3-DAY MOVING AVERAGE"
    FROM store_sales_2
    ORDER BY day;
txt
+-----+-------------+----------------------+
| DAY | SALES_TODAY | 3-DAY MOVING AVERAGE |
|-----+-------------+----------------------|
|   1 |          10 |               10.000 |
|   2 |          14 |               12.000 |
|   3 |           6 |               10.000 |
|   4 |           6 |                8.666 |
|   5 |          14 |                8.666 |
|   6 |          16 |               12.000 |
|   7 |          18 |               16.000 |
+-----+-------------+----------------------+

The difference between this window frame and the window frame described earlier is the starting point: a fixed boundary versus an explicit offset.

Putting it all together

Here’s the final version of the query, showing all of the columns:

sql
SELECT day, 
       sales_today, 
       RANK()
           OVER (ORDER BY sales_today DESC) AS Rank,
       SUM(sales_today)
           OVER (ORDER BY day
               ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
               AS "SALES SO FAR THIS WEEK",
       SUM(sales_today)
           OVER ()
               AS total_sales,
       AVG(sales_today)
           OVER (ORDER BY day ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
               AS "3-DAY MOVING AVERAGE"
    FROM store_sales_2
    ORDER BY day;
txt
+-----+-------------+------+------------------------+-------------+----------------------+
| DAY | SALES_TODAY | RANK | SALES SO FAR THIS WEEK | TOTAL_SALES | 3-DAY MOVING AVERAGE |
|-----+-------------+------+------------------------+-------------+----------------------|
|   1 |          10 |    5 |                     10 |          84 |               10.000 |
|   2 |          14 |    3 |                     24 |          84 |               12.000 |
|   3 |           6 |    6 |                     30 |          84 |               10.000 |
|   4 |           6 |    6 |                     36 |          84 |                8.666 |
|   5 |          14 |    3 |                     50 |          84 |                8.666 |
|   6 |          16 |    2 |                     66 |          84 |               12.000 |
|   7 |          18 |    1 |                     84 |          84 |               16.000 |
+-----+-------------+------+------------------------+-------------+----------------------+

Ranking function examples

The following example shows how to rank sales based on the total amount (in dollars) that each salesperson has sold. The ORDER BY clause within the OVER clause sorts the totals in descending order (highest to lowest). The query calculates the rank of each salesperson relative to all other salespeople. Create the table and insert the data:

sql
CREATE TABLE sales_table (salesperson_name VARCHAR, sales_in_dollars INTEGER);
INSERT INTO sales_table (salesperson_name, sales_in_dollars) VALUES
    ('Smith', 600),
    ('Jones', 1000),
    ('Torkelson', 700),
    ('Dolenz', 800);

Now query the data:

sql
SELECT
    salesperson_name,
    sales_in_dollars,
    RANK() OVER (ORDER BY sales_in_dollars DESC) AS sales_rank
  FROM sales_table;
txt
+------------------+------------------+------------+
| SALESPERSON_NAME | SALES_IN_DOLLARS | SALES_RANK |
|------------------+------------------+------------|
| Jones            |             1000 |          1 |
| Dolenz           |              800 |          2 |
| Torkelson        |              700 |          3 |
| Smith            |              600 |          4 |
+------------------+------------------+------------+

The output is not necessarily ordered by rank. To display reults ordered by rank, specify an ORDER BY clause for the query itself (in addition to the ORDER BY clause for the window function), as shown here:

sql
SELECT
    salesperson_name,
    sales_in_dollars,
    RANK() OVER (ORDER BY sales_in_dollars DESC) AS sales_rank
  FROM sales_table
  ORDER BY 3;
txt
+------------------+------------------+------------+
| SALESPERSON_NAME | SALES_IN_DOLLARS | SALES_RANK |
|------------------+------------------+------------|
| Jones            |             1000 |          1 |
| Dolenz           |              800 |          2 |
| Torkelson        |              700 |          3 |
| Smith            |              600 |          4 |
+------------------+------------------+------------+

The preceding example has two ORDER BY clauses: One controls the order of the ranking. One controls the order of the output. These clauses are independent. For example, you could order the rankings based on total sales (as shown above), but order the output rows based on the salesperson’s last name:

sql
SELECT
    salesperson_name,
    sales_in_dollars,
    RANK() OVER (ORDER BY sales_in_dollars DESC) AS sales_rank
  FROM sales_table
  ORDER BY salesperson_name;
txt
+------------------+------------------+------------+
| SALESPERSON_NAME | SALES_IN_DOLLARS | SALES_RANK |
|------------------+------------------+------------|
| Dolenz           |              800 |          2 |
| Jones            |             1000 |          1 |
| Smith            |              600 |          4 |
| Torkelson        |              700 |          3 |
+------------------+------------------+------------+