Analytic functions

Analytic functions compute values over a list (or a sliding window) of rows and return a computed value for each row.

The OVER clause defines a window of rows that are being using to calculate the result: for each row the window is different and the aggregating function returns the result calculated over that set of rows.

Analytic functions allow computing moving averages, cumulative sums, or extract first or last values from a defined window.

See also

Contrast with aggregating functions that return a result for a whole group of rows.

Defining window

In order to use analytic functions a window needs to be defined.

Warning

While ANSI SQL supports windows without specifying partitions currently BlazingSQL requires specifying the PARTITION BY clause.

Syntax

The syntax to use analytic functions in BlazingSQL follows the ANSI standard:

SELECT <col_1>
    , <col_2>
    , <analytic_function> OVER (
        PARTITION BY <col_1>
        [ ORDER BY <col_3> { ASC | DESC } { NULLS FIRST | NULLS LAST } }]
        [ ROWS | RANGE ]
    )
FROM <table_name>

PARTITION BY

The PARTITION BY clause specifies the column (or columns) that the window function will further delineate the boundaries of rows; these boundaries can further be refined by using ROWS or RANGE qualifiers.

ORDER BY

The rows within the specified can be ordered (some of the aggregating functions require the rows to be sorted). Just like with the reqular SELECT statement, the ORDER BY clause can be used within the specified window.

ASC and DESC qualifiers

The rows within window can be sorted by one or more column, and can be mixed and matched in terms of the monotonicity.

Note

The ASC qualifier is explicit and informs the engine to sort the rows in an ascending order. However, if omitted, the engine implicitly orders the rows in and ascending order. To sort in a descending order the DESC is required.

As an example, the code below sorts the results by col_3 descending and then by col_4 ascending.

SELECT col_1
    , col_2
    , SUM(col_5) OVER (
        PARTITION BY col_1
        ORDER BY col_3 DESC
            , col_4 ASC
    )

Handling of NULL values

If the columns used to sort the rows contain NULL values certain aggregating functions will return different result depending on where the NULL values appear: either on start or end of the defined window. BlazingSQL supports specifying this explicitly using NULLS FIRST or NULLS LAST clauses.

As an example, the code below sorts the results by col_3 descending and pushes the NULL values to be top so they appear first in the order.

SELECT col_1
    , col_2
    , SUM(col_5) OVER (
        PARTITION BY col_1
        ORDER BY col_3 DESC NULLS FIRST
    )

ROWS or RANGE

The partitioning window can further be refined to delineate the range that the window covers using the ROWS or RANGE.

The ROWS clause uses the exact specified number of rows to determine the window boundries. As an example, it can include 1 row prior and 2 rows after the current row. The RANGE clause, on the other hand, determines the window based on a logical range of rows around the current one; the logic of which rows to include is derived from the ORDER BY column.

Warning

The RANGE is not currently supported.

PRECEDING and FOLLOWING qualifiers

The ROWS or RANGE qualifiers are normally followed by offset qualifiers. These can specify either backward inclusion using the PRECEDING clause, for example

SELECT col_1
    , col_2
    , SUM(col_5) OVER (
        PARTITION BY col_1
        ORDER BY col_3 DESC NULLS FIRST
        ROWS 1 PRECEDING
    )

or

SELECT col_1
    , col_2
    , SUM(col_5) OVER (
        PARTITION BY col_1
        ORDER BY col_3 DESC NULLS FIRST
        ROWS UNBOUNDED PRECEDING
    )

The latter example includes all the rows for the partition up to the current row while the former includes only one row prior.

Similarly, a forward looking behavior can be achieved.

SELECT col_1
    , col_2
    , SUM(col_5) OVER (
        PARTITION BY col_1
        ORDER BY col_3 DESC NULLS FIRST
        ROWS 1 FOLLOWING
    )

but this functionality requires specifying an exact number of rows.

BETWEEN

It is also possible to specify a symmetrical or asymmetrical window around the current row.

SELECT col_1
    , col_2
    , SUM(col_5) OVER (
        PARTITION BY col_1
        ORDER BY col_3 DESC NULLS FIRST
        ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
    )

A special case where CURRENT ROW is specified explicitly can be used to create a backward

SELECT col_1
    , col_2
    , SUM(col_5) OVER (
        PARTITION BY col_1
        ORDER BY col_3 DESC NULLS FIRST
        ROWS BETWEEN 1 PRECEDING AND CURERENT ROW

of forward looking windows.

SELECT col_1
    , col_2
    , SUM(col_5) OVER (
        PARTITION BY col_1
        ORDER BY col_3 DESC NULLS FIRST
        ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING

Supported aggregating functions

LEAD

Supported datatypes: TINYINT, SMALLINT, INT, BIGINT, DECIMAL, FLOAT, DOUBLE TIMESTAMP, VARCHAR, BOOLEAN

Return the value from the following row (or rows, if offset specified) in the specified window.

See also

LAG

Examples

Return value from the following row.

SELECT <col_1>
    , LEAD(<col_2>) OVER (
        PARTITION BY <col_2>
        ORDER BY <col_3>
    )
FROM <table_name>

Offset by 2 rows.

SELECT <col_1>
    , LEAD(<col_2>, 2) OVER (
        PARTITION BY <col_2>
        ORDER BY <col_3>
    )
FROM <table_name>

LAG

Supported datatypes: TINYINT, SMALLINT, INT, BIGINT, DECIMAL, FLOAT, DOUBLE TIMESTAMP, VARCHAR, BOOLEAN

Return the value from the preceding row (or rows, if offset specified) in the specified window.

See also

LEAD

Examples

Return value from the previous row.

SELECT <col_1>
    , LAG(<col_2>) OVER (
        PARTITION BY <col_2>
        ORDER BY <col_3>
    )
FROM <table_name>

Offset by 2 rows.

SELECT <col_1>
    , LAG(<col_2>, 2) OVER (
        PARTITION BY <col_2>
        ORDER BY <col_3>
    )
FROM <table_name>

FIRST_VALUE

Supported datatypes: TINYINT, SMALLINT, INT, BIGINT, DECIMAL, FLOAT, DOUBLE TIMESTAMP, VARCHAR, BOOLEAN

Return the first value from the specified window.

Warning

This function includes NULL values and if the NULL values is first, NULL will be returned.

See also

LAST_VALUE

Example

SELECT <col_1>
    , FIRST_VALUE(<col_2>) OVER (
        PARTITION BY <col_2>
        ORDER BY <col_3>
    )
FROM <table_name>

LAST_VALUE

Supported datatypes: TINYINT, SMALLINT, INT, BIGINT, DECIMAL, FLOAT, DOUBLE TIMESTAMP, VARCHAR, BOOLEAN

Return the last value from the specified window.

Warning

This function includes NULL values and if the NULL values is last, NULL will be returned.

See also

FIRST_VALUE

Example

SELECT <col_1>
    , LAST_VALUE(<col_2>) OVER (
        PARTITION BY <col_2>
        ORDER BY <col_3>
    )
FROM <table_name>

ROW_NUMBER

Obtain a row number per partition. This function does not require ORDER BY clause but the results are then non-deterministic.

Example

SELECT <col_1>
    , ROW_NUMBER() OVER (
        PARTITION BY <col_2>
        ORDER BY <col_3>
    )
FROM <table_name>

AVG

Supported datatypes: TINYINT, SMALLINT, INT, BIGINT, DECIMAL, FLOAT, DOUBLE

Calculate average of a numeric column.

See also

MAX, MIN, SUM

Examples

Average of values per partition.

SELECT <col_1>
    , AVG(<col_2>) OVER (
        PARTITION BY <col_2>
        ORDER BY <col_3>
    )
FROM <table_name>
GROUP BY <col_1>

Moving average.

SELECT <col_1>
    , AVG(<col_2>) OVER (
        PARTITION BY <col_2>
        ORDER BY <col_3>
        ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING
    )
FROM <table_name>
GROUP BY <col_1>

MAX

Supported datatypes: TINYINT, SMALLINT, INT, BIGINT, DECIMAL, FLOAT, DOUBLE

Calculate maximum value of a numeric column.

See also

AVG, MIN, SUM

Examples

Maximum of values per partition.

SELECT <col_1>
    , MAX(<col_2>) OVER (
        PARTITION BY <col_2>
        ORDER BY <col_3>
    )
FROM <table_name>

Maximum of values in a moving window.

SELECT <col_1>
    , MAX(<col_2>) OVER (
        PARTITION BY <col_2>
        ORDER BY <col_3>
        ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING
    )
FROM <table_name>

MIN

Supported datatypes: TINYINT, SMALLINT, INT, BIGINT, DECIMAL, FLOAT, DOUBLE

Calculate minimum value of a numeric column.

See also

AVG, MAX, SUM

Examples

Minimum of values per partition.

SELECT <col_1>
    , MIN(<col_2>) OVER (
        PARTITION BY <col_2>
        ORDER BY <col_3>
    )
FROM <table_name>

Minimum of values in a moving window.

SELECT <col_1>
    , MIN(<col_2>) OVER (
        PARTITION BY <col_2>
        ORDER BY <col_3>
        ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING
    )
FROM <table_name>

SUM

Supported datatypes: TINYINT, SMALLINT, INT, BIGINT, DECIMAL, FLOAT, DOUBLE

Calculate sum of values of a numeric column.

See also

AVG, MAX, MIN

Examples

Sum of values per partition. This is useful to later compute share of each value in the total value per partition.

SELECT <col_1>
    , SUM(<col_2>) OVER (
        PARTITION BY <col_2>
        ORDER BY <col_3>
    )
FROM <table_name>

Calculate cumulative sum.

SELECT <col_1>
    , MIN(<col_2>) OVER (
        PARTITION BY <col_2>
        ORDER BY <col_3>
        ROWS UNBOUNDED PRECEDING
    )
FROM <table_name>