Aggregating data

Aggregating data using SQL is done using GROUP BY function.

GROUP BY function

The GROUP BY function allows specifying single column or multiple columns as level of aggregation.

SELECT <col_1>
    , <col_2>
    , COUNT(*) AS cnt
FROM <table_name>
GROUP BY <col_1>
    , <col_2>

Additional predicates and clauses can be added, too

SELECT <col_1>
    , <col_2>
    , COUNT(*) AS cnt
FROM <table_name>
WHERE <col_2> <= 0
GROUP BY <col_1>
    , <col_2>
ORDER BY <col_1> ASC
    , <col_2> DESC

Since GROUP BY by definition returns a list of unique values found in the columns used to specify the aggregation level there is no need to add the DISTINCT qualifier. However, the following syntax is also supported.

SELECT DISTINCT <col_1>
    , <col_2>
    , COUNT(*) AS cnt
FROM <table_name>
GROUP BY <col_1>
    , <col_2>

HAVING qualifier

The difference between the WHERE statement and the HAVING clause is the order of operations:

  1. with the WHERE predicate, the table rows get filtered first and then the aggregating function is applied.

  2. the HAVING clause applies the filter to already aggregated results that is any of the aggregating functions

SELECT <col_1>
    , <col_2>
    , COUNT(*) AS cnt
FROM <table_name>
GROUP BY <col_1>
    , <col_2>
HAVING COUNT(*) > 100

Both, the WHERE and the HAVING statements can appear in the same query.

SELECT <col_1>
    , <col_2>
    , COUNT(*) AS cnt
FROM <table_name>
WHERE <col_2> <= 0
GROUP BY <col_1>
    , <col_2>
HAVING COUNT(*) > 100

Aggregating functions

Below is a full list of currently supported aggregation functions by BlazingSQL.

AVG

Supported datatypes: TINYINT, SMALLINT, INT, BIGINT, DECIMAL, FLOAT, DOUBLE

Calculate average of a numeric column.

See also

COUNT, SUM

Example

SELECT <col_1>
    , AVG(<col_2>) AS average
FROM <table_name>
GROUP BY <col_1>

COUNT

Supported datatypes: TINYINT, SMALLINT, INT, BIGINT, DECIMAL, FLOAT, DOUBLE TIMESTAMP, VARCHAR, BOOLEAN

Count the number of rows in each specified group.

See also

AVG, SUM

Example

The * syntax counts all the rows in the table irrespective if NULL values.

SELECT <col_1>
    , COUNT(*) AS cnt
FROM <table_name>
GROUP BY <col_1>

Specifying a column to count over will exclude NULL from the final count.

SELECT <col_1>
    , COUNT(<col_with_nulls>) AS cnt_no_nulls
FROM <table_name>
GROUP BY <col_1>

Adding the DISTINCT qualifier will count all the unique values in the specified column in each aggregation group.

SELECT <col_1>
    , COUNT(DISTINCT <col_2>) AS cnt_unqiue
FROM <table_name>
GROUP BY <col_1>

MAX

Supported datatypes: TINYINT, SMALLINT, INT, BIGINT, DECIMAL, FLOAT, DOUBLE

Calculate a maximum value in the specified column.

See also

MIN

Example

SELECT <col_1>
    , MAX(<col_2>) AS maximum
FROM <table_name>
GROUP BY <col_1>

MIN

Supported datatypes: TINYINT, SMALLINT, INT, BIGINT, DECIMAL, FLOAT, DOUBLE

Calculate a minimum value in the specified column.

See also

MAX

Example

SELECT <col_1>
    , MIN(<col_2>) AS minimum
FROM <table_name>
GROUP BY <col_1>

STDDEV

Supported datatypes: TINYINT, SMALLINT, INT, BIGINT, DECIMAL, FLOAT, DOUBLE

Calculate standard deviation of a numeric column according to the formula

\[\sqrt{\frac{\sum{(x-\mu)^2}}{N-1}}\]

Example

SELECT <col_1>
    , STDDEV(<col_2>) AS standard_deviation
FROM <table_name>
GROUP BY <col_1>

STDDEV_POP

Supported datatypes: TINYINT, SMALLINT, INT, BIGINT, DECIMAL, FLOAT, DOUBLE

Calculate the population standard deviation of a numeric column according to the formula

\[\sqrt{\frac{\sum{(x-\mu)^2}}{N}}\]

See also

STDDEV, STDDEV_SAMP

Example

SELECT <col_1>
    , STDDEV_POP(<col_2>) AS standard_deviation
FROM <table_name>
GROUP BY <col_1>

STDDEV_SAMP

Supported datatypes: TINYINT, SMALLINT, INT, BIGINT, DECIMAL, FLOAT, DOUBLE

Calculate the sample standard deviation of a numeric column according to the formula

\[\sqrt{\frac{\sum{(x-\mu)^2}}{N-1}}\]

See also

STDDEV, STDDEV_POP

Example

SELECT <col_1>
    , STDDEV_SAMP(<col_2>) AS standard_deviation
FROM <table_name>
GROUP BY <col_1>

SUM

Supported datatypes: TINYINT, SMALLINT, INT, BIGINT, DECIMAL, FLOAT, DOUBLE

Calculate a sum of all non-NULL values in the specified column.

See also

AVG, COUNT

Example

SELECT <col_1>
    , SUM(<col_2>) AS sum_of_col_2
FROM <table_name>
GROUP BY <col_1>

VARIANCE

Supported datatypes: TINYINT, SMALLINT, INT, BIGINT, DECIMAL, FLOAT, DOUBLE

Calculate variance of a numeric column according to the formula

\[\frac{\sum{(x-\mu)^2}}{N-1}\]

See also

VAR_POP, VAR_SAMP

Example

SELECT <col_1>
    , VARIANCE(<col_2>) AS var
FROM <table_name>
GROUP BY <col_1>

VAR_POP

Supported datatypes: TINYINT, SMALLINT, INT, BIGINT, DECIMAL, FLOAT, DOUBLE

Calculate the population variance of a numeric column according to the formula

\[\frac{\sum{(x-\mu)^2}}{N}\]

See also

VARIANCE, VAR_SAMP

Example

SELECT <col_1>
    , VAR_POP(<col_2>) AS var
FROM <table_name>
GROUP BY <col_1>

VAR_SAMP

Supported datatypes: TINYINT, SMALLINT, INT, BIGINT, DECIMAL, FLOAT, DOUBLE

Calculate the sample variance of a numeric column according to the formula

\[\frac{\sum{(x-\mu)^2}}{N-1}\]

See also

VARIANCE, VAR_POP

Example

SELECT <col_1>
    , VAR_SAMP(<col_2>) AS var
FROM <table_name>
GROUP BY <col_1>