Analytic functions¶
Analytic functions compute values over a list (or a sliding window) of rows and return a computed value for each row.
The OVER
clause defines a window of rows that are being using to calculate the result:
for each row the window is different and the aggregating function returns the result
calculated over that set of rows.
Analytic functions allow computing moving averages, cumulative sums, or extract first or last values from a defined window.
See also
Contrast with aggregating functions that return a result for a whole group of rows.
Defining window¶
In order to use analytic functions a window needs to be defined.
Warning
While ANSI SQL supports windows without specifying partitions
currently BlazingSQL requires specifying the PARTITION BY
clause.
Syntax¶
The syntax to use analytic functions in BlazingSQL follows the ANSI standard:
SELECT <col_1>
, <col_2>
, <analytic_function> OVER (
PARTITION BY <col_1>
[ ORDER BY <col_3> { ASC | DESC } { NULLS FIRST | NULLS LAST } }]
[ ROWS | RANGE ]
)
FROM <table_name>
PARTITION BY
¶
The PARTITION BY
clause specifies the column (or columns) that
the window function will further delineate the boundaries of rows; these boundaries
can further be refined by using ROWS or RANGE qualifiers.
ORDER BY
¶
The rows within the specified can be ordered (some of the aggregating functions
require the rows to be sorted). Just like with the reqular SELECT
statement,
the ORDER BY
clause can be used within the specified window.
ASC
and DESC
qualifiers¶
The rows within window can be sorted by one or more column, and can be mixed and matched in terms of the monotonicity.
Note
The ASC
qualifier is explicit and informs the engine to sort
the rows in an ascending order. However, if omitted, the engine implicitly
orders the rows in and ascending order. To sort in a descending order the
DESC
is required.
As an example, the code below sorts the results by col_3
descending and
then by col_4
ascending.
SELECT col_1
, col_2
, SUM(col_5) OVER (
PARTITION BY col_1
ORDER BY col_3 DESC
, col_4 ASC
)
Handling of NULL
values¶
If the columns used to sort the rows contain NULL
values certain
aggregating functions will return different result depending on where the
NULL
values appear: either on start or end of the defined window.
BlazingSQL supports specifying this explicitly using NULLS FIRST
or NULLS LAST
clauses.
As an example, the code below sorts the results by col_3
descending and
pushes the NULL
values to be top so they appear first in the order.
SELECT col_1
, col_2
, SUM(col_5) OVER (
PARTITION BY col_1
ORDER BY col_3 DESC NULLS FIRST
)
ROWS
or RANGE
¶
The partitioning window can further be refined to delineate the range
that the window covers using the ROWS
or RANGE
.
The ROWS
clause uses the exact specified number of rows to determine the
window boundries. As an example, it can include 1 row prior and
2 rows after the current row.
The RANGE
clause, on the other hand, determines the window based
on a logical range of rows around the current one; the logic of which
rows to include is derived from the ORDER BY
column.
Warning
The RANGE
is not currently supported.
PRECEDING
and FOLLOWING
qualifiers¶
The ROWS
or RANGE
qualifiers are normally followed
by offset qualifiers. These can specify either backward inclusion using
the PRECEDING
clause, for example
SELECT col_1
, col_2
, SUM(col_5) OVER (
PARTITION BY col_1
ORDER BY col_3 DESC NULLS FIRST
ROWS 1 PRECEDING
)
or
SELECT col_1
, col_2
, SUM(col_5) OVER (
PARTITION BY col_1
ORDER BY col_3 DESC NULLS FIRST
ROWS UNBOUNDED PRECEDING
)
The latter example includes all the rows for the partition up to the current row while the former includes only one row prior.
Similarly, a forward looking behavior can be achieved.
SELECT col_1
, col_2
, SUM(col_5) OVER (
PARTITION BY col_1
ORDER BY col_3 DESC NULLS FIRST
ROWS 1 FOLLOWING
)
but this functionality requires specifying an exact number of rows.
BETWEEN
¶
It is also possible to specify a symmetrical or asymmetrical window around the current row.
SELECT col_1
, col_2
, SUM(col_5) OVER (
PARTITION BY col_1
ORDER BY col_3 DESC NULLS FIRST
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
)
A special case where CURRENT ROW
is specified explicitly can be used
to create a backward
SELECT col_1
, col_2
, SUM(col_5) OVER (
PARTITION BY col_1
ORDER BY col_3 DESC NULLS FIRST
ROWS BETWEEN 1 PRECEDING AND CURERENT ROW
of forward looking windows.
SELECT col_1
, col_2
, SUM(col_5) OVER (
PARTITION BY col_1
ORDER BY col_3 DESC NULLS FIRST
ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING
Supported aggregating functions¶
LEAD¶
Supported datatypes: TINYINT, SMALLINT, INT, BIGINT, DECIMAL, FLOAT, DOUBLE TIMESTAMP, VARCHAR, BOOLEAN
Return the value from the following row (or rows, if offset specified) in the specified window.
See also
Examples¶
Return value from the following row.
SELECT <col_1>
, LEAD(<col_2>) OVER (
PARTITION BY <col_2>
ORDER BY <col_3>
)
FROM <table_name>
Offset by 2 rows.
SELECT <col_1>
, LEAD(<col_2>, 2) OVER (
PARTITION BY <col_2>
ORDER BY <col_3>
)
FROM <table_name>
LAG¶
Supported datatypes: TINYINT, SMALLINT, INT, BIGINT, DECIMAL, FLOAT, DOUBLE TIMESTAMP, VARCHAR, BOOLEAN
Return the value from the preceding row (or rows, if offset specified) in the specified window.
See also
Examples¶
Return value from the previous row.
SELECT <col_1>
, LAG(<col_2>) OVER (
PARTITION BY <col_2>
ORDER BY <col_3>
)
FROM <table_name>
Offset by 2 rows.
SELECT <col_1>
, LAG(<col_2>, 2) OVER (
PARTITION BY <col_2>
ORDER BY <col_3>
)
FROM <table_name>
FIRST_VALUE¶
Supported datatypes: TINYINT, SMALLINT, INT, BIGINT, DECIMAL, FLOAT, DOUBLE TIMESTAMP, VARCHAR, BOOLEAN
Return the first value from the specified window.
Warning
This function includes NULL
values and if the NULL
values is
first, NULL
will be returned.
See also
Example¶
SELECT <col_1>
, FIRST_VALUE(<col_2>) OVER (
PARTITION BY <col_2>
ORDER BY <col_3>
)
FROM <table_name>
LAST_VALUE¶
Supported datatypes: TINYINT, SMALLINT, INT, BIGINT, DECIMAL, FLOAT, DOUBLE TIMESTAMP, VARCHAR, BOOLEAN
Return the last value from the specified window.
Warning
This function includes NULL
values and if the NULL
values is
last, NULL
will be returned.
See also
Example¶
SELECT <col_1>
, LAST_VALUE(<col_2>) OVER (
PARTITION BY <col_2>
ORDER BY <col_3>
)
FROM <table_name>
ROW_NUMBER¶
Obtain a row number per partition. This function does not
require ORDER BY
clause but the results are then
non-deterministic.
Example¶
SELECT <col_1>
, ROW_NUMBER() OVER (
PARTITION BY <col_2>
ORDER BY <col_3>
)
FROM <table_name>
AVG¶
Supported datatypes: TINYINT, SMALLINT, INT, BIGINT, DECIMAL, FLOAT, DOUBLE
Calculate average of a numeric column.
Examples¶
Average of values per partition.
SELECT <col_1>
, AVG(<col_2>) OVER (
PARTITION BY <col_2>
ORDER BY <col_3>
)
FROM <table_name>
GROUP BY <col_1>
Moving average.
SELECT <col_1>
, AVG(<col_2>) OVER (
PARTITION BY <col_2>
ORDER BY <col_3>
ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING
)
FROM <table_name>
GROUP BY <col_1>
MAX¶
Supported datatypes: TINYINT, SMALLINT, INT, BIGINT, DECIMAL, FLOAT, DOUBLE
Calculate maximum value of a numeric column.
Examples¶
Maximum of values per partition.
SELECT <col_1>
, MAX(<col_2>) OVER (
PARTITION BY <col_2>
ORDER BY <col_3>
)
FROM <table_name>
Maximum of values in a moving window.
SELECT <col_1>
, MAX(<col_2>) OVER (
PARTITION BY <col_2>
ORDER BY <col_3>
ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING
)
FROM <table_name>
MIN¶
Supported datatypes: TINYINT, SMALLINT, INT, BIGINT, DECIMAL, FLOAT, DOUBLE
Calculate minimum value of a numeric column.
Examples¶
Minimum of values per partition.
SELECT <col_1>
, MIN(<col_2>) OVER (
PARTITION BY <col_2>
ORDER BY <col_3>
)
FROM <table_name>
Minimum of values in a moving window.
SELECT <col_1>
, MIN(<col_2>) OVER (
PARTITION BY <col_2>
ORDER BY <col_3>
ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING
)
FROM <table_name>
SUM¶
Supported datatypes: TINYINT, SMALLINT, INT, BIGINT, DECIMAL, FLOAT, DOUBLE
Calculate sum of values of a numeric column.
Examples¶
Sum of values per partition. This is useful to later compute share of each value in the total value per partition.
SELECT <col_1>
, SUM(<col_2>) OVER (
PARTITION BY <col_2>
ORDER BY <col_3>
)
FROM <table_name>
Calculate cumulative sum.
SELECT <col_1>
, MIN(<col_2>) OVER (
PARTITION BY <col_2>
ORDER BY <col_3>
ROWS UNBOUNDED PRECEDING
)
FROM <table_name>