Aggregating data¶
Aggregating data using SQL is done using GROUP BY
function.
See also
GROUP BY function¶
The GROUP BY
function allows specifying single column or multiple
columns as level of aggregation.
SELECT <col_1>
, <col_2>
, COUNT(*) AS cnt
FROM <table_name>
GROUP BY <col_1>
, <col_2>
Additional predicates and clauses can be added, too
SELECT <col_1>
, <col_2>
, COUNT(*) AS cnt
FROM <table_name>
WHERE <col_2> <= 0
GROUP BY <col_1>
, <col_2>
ORDER BY <col_1> ASC
, <col_2> DESC
Since GROUP BY
by definition returns a list of unique values found in the columns used to
specify the aggregation level there is no need to add the DISTINCT qualifier.
However, the following syntax is also supported.
SELECT DISTINCT <col_1>
, <col_2>
, COUNT(*) AS cnt
FROM <table_name>
GROUP BY <col_1>
, <col_2>
HAVING qualifier¶
The difference between the WHERE statement and the HAVING
clause
is the order of operations:
with the
WHERE
predicate, the table rows get filtered first and then the aggregating function is applied.the
HAVING
clause applies the filter to already aggregated results that is any of the aggregating functions
SELECT <col_1>
, <col_2>
, COUNT(*) AS cnt
FROM <table_name>
GROUP BY <col_1>
, <col_2>
HAVING COUNT(*) > 100
Both, the WHERE
and the HAVING
statements can appear in the same query.
SELECT <col_1>
, <col_2>
, COUNT(*) AS cnt
FROM <table_name>
WHERE <col_2> <= 0
GROUP BY <col_1>
, <col_2>
HAVING COUNT(*) > 100
Aggregating functions¶
Below is a full list of currently supported aggregation functions by BlazingSQL.
AVG¶
Supported datatypes: TINYINT, SMALLINT, INT, BIGINT, DECIMAL, FLOAT, DOUBLE
Calculate average of a numeric column.
Example¶
SELECT <col_1>
, AVG(<col_2>) AS average
FROM <table_name>
GROUP BY <col_1>
COUNT¶
Supported datatypes: TINYINT, SMALLINT, INT, BIGINT, DECIMAL, FLOAT, DOUBLE TIMESTAMP, VARCHAR, BOOLEAN
Count the number of rows in each specified group.
Example¶
The *
syntax counts all the rows in the table irrespective if NULL
values.
SELECT <col_1>
, COUNT(*) AS cnt
FROM <table_name>
GROUP BY <col_1>
Specifying a column to count over will exclude NULL
from the final count.
SELECT <col_1>
, COUNT(<col_with_nulls>) AS cnt_no_nulls
FROM <table_name>
GROUP BY <col_1>
Adding the DISTINCT
qualifier will count all the unique values in the
specified column in each aggregation group.
SELECT <col_1>
, COUNT(DISTINCT <col_2>) AS cnt_unqiue
FROM <table_name>
GROUP BY <col_1>
MAX¶
Supported datatypes: TINYINT, SMALLINT, INT, BIGINT, DECIMAL, FLOAT, DOUBLE
Calculate a maximum value in the specified column.
See also
Example¶
SELECT <col_1>
, MAX(<col_2>) AS maximum
FROM <table_name>
GROUP BY <col_1>
MIN¶
Supported datatypes: TINYINT, SMALLINT, INT, BIGINT, DECIMAL, FLOAT, DOUBLE
Calculate a minimum value in the specified column.
See also
Example¶
SELECT <col_1>
, MIN(<col_2>) AS minimum
FROM <table_name>
GROUP BY <col_1>
STDDEV¶
Supported datatypes: TINYINT, SMALLINT, INT, BIGINT, DECIMAL, FLOAT, DOUBLE
Calculate standard deviation of a numeric column according to the formula
See also
Example¶
SELECT <col_1>
, STDDEV(<col_2>) AS standard_deviation
FROM <table_name>
GROUP BY <col_1>
STDDEV_POP¶
Supported datatypes: TINYINT, SMALLINT, INT, BIGINT, DECIMAL, FLOAT, DOUBLE
Calculate the population standard deviation of a numeric column according to the formula
See also
Example¶
SELECT <col_1>
, STDDEV_POP(<col_2>) AS standard_deviation
FROM <table_name>
GROUP BY <col_1>
STDDEV_SAMP¶
Supported datatypes: TINYINT, SMALLINT, INT, BIGINT, DECIMAL, FLOAT, DOUBLE
Calculate the sample standard deviation of a numeric column according to the formula
See also
Example¶
SELECT <col_1>
, STDDEV_SAMP(<col_2>) AS standard_deviation
FROM <table_name>
GROUP BY <col_1>
SUM¶
Supported datatypes: TINYINT, SMALLINT, INT, BIGINT, DECIMAL, FLOAT, DOUBLE
Calculate a sum of all non-NULL
values in the specified column.
Example¶
SELECT <col_1>
, SUM(<col_2>) AS sum_of_col_2
FROM <table_name>
GROUP BY <col_1>
VARIANCE¶
Supported datatypes: TINYINT, SMALLINT, INT, BIGINT, DECIMAL, FLOAT, DOUBLE
Calculate variance of a numeric column according to the formula
Example¶
SELECT <col_1>
, VARIANCE(<col_2>) AS var
FROM <table_name>
GROUP BY <col_1>
VAR_POP¶
Supported datatypes: TINYINT, SMALLINT, INT, BIGINT, DECIMAL, FLOAT, DOUBLE
Calculate the population variance of a numeric column according to the formula
Example¶
SELECT <col_1>
, VAR_POP(<col_2>) AS var
FROM <table_name>
GROUP BY <col_1>
VAR_SAMP¶
Supported datatypes: TINYINT, SMALLINT, INT, BIGINT, DECIMAL, FLOAT, DOUBLE
Calculate the sample variance of a numeric column according to the formula
Example¶
SELECT <col_1>
, VAR_SAMP(<col_2>) AS var
FROM <table_name>
GROUP BY <col_1>