Aggregating data¶

Aggregating data using SQL is done using GROUP BY function.

See also

GROUP BY function¶

The GROUP BY function allows specifying single column or multiple columns as level of aggregation.

SELECT <col_1>
    , <col_2>
    , COUNT(*) AS cnt
FROM <table_name>
GROUP BY <col_1>
    , <col_2>

Additional predicates and clauses can be added, too

SELECT <col_1>
    , <col_2>
    , COUNT(*) AS cnt
FROM <table_name>
WHERE <col_2> <= 0
GROUP BY <col_1>
    , <col_2>
ORDER BY <col_1> ASC
    , <col_2> DESC

Since GROUP BY by definition returns a list of unique values found in the columns used to specify the aggregation level there is no need to add the DISTINCT qualifier. However, the following syntax is also supported.

SELECT DISTINCT <col_1>
    , <col_2>
    , COUNT(*) AS cnt
FROM <table_name>
GROUP BY <col_1>
    , <col_2>

HAVING qualifier¶

The difference between the WHERE statement and the HAVING clause is the order of operations:

with the WHERE predicate, the table rows get filtered first and then the aggregating function is applied.
the HAVING clause applies the filter to already aggregated results that is any of the aggregating functions

SELECT <col_1>
    , <col_2>
    , COUNT(*) AS cnt
FROM <table_name>
GROUP BY <col_1>
    , <col_2>
HAVING COUNT(*) > 100

Both, the WHERE and the HAVING statements can appear in the same query.

SELECT <col_1>
    , <col_2>
    , COUNT(*) AS cnt
FROM <table_name>
WHERE <col_2> <= 0
GROUP BY <col_1>
    , <col_2>
HAVING COUNT(*) > 100

Aggregating functions¶

Below is a full list of currently supported aggregation functions by BlazingSQL.

AVG¶

Supported datatypes: TINYINT, SMALLINT, INT, BIGINT, DECIMAL, FLOAT, DOUBLE

Calculate average of a numeric column.

See also

COUNT, SUM

Example¶

SELECT <col_1>
    , AVG(<col_2>) AS average
FROM <table_name>
GROUP BY <col_1>

COUNT¶

Supported datatypes: TINYINT, SMALLINT, INT, BIGINT, DECIMAL, FLOAT, DOUBLE TIMESTAMP, VARCHAR, BOOLEAN

Count the number of rows in each specified group.

See also

AVG, SUM

Example¶

The * syntax counts all the rows in the table irrespective if NULL values.

SELECT <col_1>
    , COUNT(*) AS cnt
FROM <table_name>
GROUP BY <col_1>

Specifying a column to count over will exclude NULL from the final count.

SELECT <col_1>
    , COUNT(<col_with_nulls>) AS cnt_no_nulls
FROM <table_name>
GROUP BY <col_1>

Adding the DISTINCT qualifier will count all the unique values in the specified column in each aggregation group.

SELECT <col_1>
    , COUNT(DISTINCT <col_2>) AS cnt_unqiue
FROM <table_name>
GROUP BY <col_1>

MAX¶

Supported datatypes: TINYINT, SMALLINT, INT, BIGINT, DECIMAL, FLOAT, DOUBLE

Calculate a maximum value in the specified column.

See also

MIN

Example¶

SELECT <col_1>
    , MAX(<col_2>) AS maximum
FROM <table_name>
GROUP BY <col_1>

MIN¶

Supported datatypes: TINYINT, SMALLINT, INT, BIGINT, DECIMAL, FLOAT, DOUBLE

Calculate a minimum value in the specified column.

See also

MAX

Example¶

SELECT <col_1>
    , MIN(<col_2>) AS minimum
FROM <table_name>
GROUP BY <col_1>

STDDEV¶

Supported datatypes: TINYINT, SMALLINT, INT, BIGINT, DECIMAL, FLOAT, DOUBLE

Calculate standard deviation of a numeric column according to the formula

\[\sqrt{\frac{\sum{(x-\mu)^2}}{N-1}}\]

See also

STDDEV_POP, STDDEV_SAMP

Example¶

SELECT <col_1>
    , STDDEV(<col_2>) AS standard_deviation
FROM <table_name>
GROUP BY <col_1>

STDDEV_POP¶

Supported datatypes: TINYINT, SMALLINT, INT, BIGINT, DECIMAL, FLOAT, DOUBLE

Calculate the population standard deviation of a numeric column according to the formula

\[\sqrt{\frac{\sum{(x-\mu)^2}}{N}}\]

See also

STDDEV, STDDEV_SAMP

Example¶

SELECT <col_1>
    , STDDEV_POP(<col_2>) AS standard_deviation
FROM <table_name>
GROUP BY <col_1>

STDDEV_SAMP¶

Supported datatypes: TINYINT, SMALLINT, INT, BIGINT, DECIMAL, FLOAT, DOUBLE

Calculate the sample standard deviation of a numeric column according to the formula

\[\sqrt{\frac{\sum{(x-\mu)^2}}{N-1}}\]

See also

STDDEV, STDDEV_POP

Example¶

SELECT <col_1>
    , STDDEV_SAMP(<col_2>) AS standard_deviation
FROM <table_name>
GROUP BY <col_1>

SUM¶

Supported datatypes: TINYINT, SMALLINT, INT, BIGINT, DECIMAL, FLOAT, DOUBLE

Calculate a sum of all non-NULL values in the specified column.

See also

AVG, COUNT

Example¶

SELECT <col_1>
    , SUM(<col_2>) AS sum_of_col_2
FROM <table_name>
GROUP BY <col_1>

VARIANCE¶

Supported datatypes: TINYINT, SMALLINT, INT, BIGINT, DECIMAL, FLOAT, DOUBLE

Calculate variance of a numeric column according to the formula

\[\frac{\sum{(x-\mu)^2}}{N-1}\]

See also

VAR_POP, VAR_SAMP

Example¶

SELECT <col_1>
    , VARIANCE(<col_2>) AS var
FROM <table_name>
GROUP BY <col_1>

VAR_POP¶

Supported datatypes: TINYINT, SMALLINT, INT, BIGINT, DECIMAL, FLOAT, DOUBLE

Calculate the population variance of a numeric column according to the formula

\[\frac{\sum{(x-\mu)^2}}{N}\]

See also

VARIANCE, VAR_SAMP

Example¶

SELECT <col_1>
    , VAR_POP(<col_2>) AS var
FROM <table_name>
GROUP BY <col_1>

VAR_SAMP¶

Supported datatypes: TINYINT, SMALLINT, INT, BIGINT, DECIMAL, FLOAT, DOUBLE

Calculate the sample variance of a numeric column according to the formula

\[\frac{\sum{(x-\mu)^2}}{N-1}\]

See also

VARIANCE, VAR_POP

Example¶

SELECT <col_1>
    , VAR_SAMP(<col_2>) AS var
FROM <table_name>
GROUP BY <col_1>

Converting data types Functions and operators

BlazingSQL v0.19 documentation

Aggregating data¶

GROUP BY function¶

HAVING qualifier¶

Aggregating functions¶

AVG¶

Example¶

COUNT¶

Example¶

MAX¶

Example¶

MIN¶

Example¶

STDDEV¶

Example¶

STDDEV_POP¶

Example¶

STDDEV_SAMP¶

Example¶

SUM¶

Example¶

VARIANCE¶

Example¶

VAR_POP¶

Example¶

VAR_SAMP¶

Example¶