SQL Reference

group-by-clause

             .-,--------------------------.
             V                            |
>>-GROUP BY------+-grouping-expression-+--+--------------------><
                 +-grouping-sets-------+
                 '-super-groups--------'

The GROUP BY clause specifies an intermediate result table that consists of a grouping of the rows of R. R is the result of the previous clause of the subselect.

In its simplest form, a GROUP BY clause contains a grouping expression. A grouping expression is an expression used in defining the grouping of R. Each column name included in grouping-expression must unambiguously identify a column of R (SQLSTATE 42702 or 42703). The length attribute of each grouping expression must not be more than 255 bytes (SQLSTATE 42907). A grouping expression cannot include a scalar-fullselect (SQLSTATE 42822) or any function that is variant or has an external action (SQLSTATE 42845).

More complex forms of the GROUP BY clause include grouping-sets and super-groups. For a description of these forms, see grouping-sets and super-groups, respectively.

The result of GROUP BY is a set of groups of rows. Each row in this result represents the set of rows for which the grouping-expression is equal. For grouping, all null values from a grouping-expression are considered equal.

A grouping-expression can be used in a search condition in a HAVING clause, in an expression in a SELECT clause or in a sort-key-expression of an ORDER BY clause (see order-by-clause for details). In each case, the reference specifies only one value for each group. For example, if the grouping-expression is col1+col2, then an allowed expression in the select list would be col1+col2+3. Associativity rules for expressions would disallow the similar expression, 3+col1+col2, unless parentheses are used to ensure that the corresponding expression is evaluated in the same order. Thus, 3+(col1+col2) would also be allowed in the select list. If the concatenation operator is used, the grouping-expression must be used exactly as the expression was specified in the select list.

If the grouping-expression contains varying-length strings with trailing blanks, the values in the group can differ in the number of trailing blanks and may not all have the same length. In that case, a reference to the grouping-expression still specifies only one value for each group, but the value for a group is chosen arbitrarily from the available set of values. Thus, the actual length of the result value is unpredictable.

As noted, there are some cases where the GROUP BY clause cannot refer directly to a column that is specified in the SELECT clause as an expression (scalar-fullselect, variant or external action functions). To group using such an expression, use a nested table expression or a common table expression to first provide a result table with the expression as a column of the result. For an example using nested table expressions, see Example A9.

grouping-sets

                     .-,---------------------------------------------.
                     V                                               |
>>-GROUPING SETS--(------+--+-grouping-expression-+---------------+--+---)-->
                         |  '-super-groups--------'               |
                         |    .-,--------------------------.      |
                         |    V                            |      |
                         '-(------+-grouping-expression-+--+---)--'
                                  '-super-groups--------'
 
>--------------------------------------------------------------><

A grouping-sets specification allows multiple grouping clauses to be specified in a single statement. This can be thought of as the union of two or more groups of rows into a single result set. It is logically equivalent to the union of multiple subselects with the group by clause in each subselect corresponding to one grouping set. A grouping set can be a single element or can be a list of elements delimited by parentheses, where an element is either a grouping-expression or a super-group. Using grouping-sets allows the groups to be computed with a single pass over the base table.

The grouping-sets specification allows either a simple grouping-expression to be used, or the more complex forms of super-groups. For a description of super-groups, see super-groups.

Note that grouping sets are the fundamental building block for GROUP BY operations. A simple group by with a single column can be considered a grouping set with one element. For example:

  GROUP BY a

is the same as

  GROUP BY GROUPING SET((a))

and

  GROUP BY a,b,c

is the same as

  GROUP BY GROUPING SET((a,b,c))

Non-aggregation columns from the select list of the subselect that are excluded from a grouping set will return a null for such columns for each row generated for that grouping set. This reflects the fact that aggregation was done without considering the values for those columns. See GROUPING for how to distinguish rows with nulls in actual data from rows with nulls generated from grouping sets.

Example C2 through Example C7 illustrate the use of grouping sets.

super-groups

                                            (1)
>>-+-ROLLUP--(--grouping-expression-list--)-------+------------><
   |                                      (2)     |
   +-CUBE--(--grouping-expression-list--)---------+
   '-| grand-total |------------------------------'
 
grouping-expression-list
 
    .-,-----------------------------------------.
    V                                           |
|-------+-grouping-expression----------------+--+---------------|
        |    .-,----------------------.      |
        |    V                        |      |
        '-(-----grouping-expression---+---)--'
 
grand-total
 
|---(--)--------------------------------------------------------|

Notes:

Alternate specification when used alone in group-by-clause is: grouping-expression-list WITH ROLLUP.
Alternate specification when used alone in group-by-clause is: grouping-expression-list WITH CUBE.

ROLLUP ( grouping-expression-list )

A ROLLUP grouping is an extension to the GROUP BY clause that produces a result set that contains sub-total rows in addition to the "regular" grouped rows. Sub-total rows ⁵³ are "super-aggregate" rows that contain further aggregates whose values are derived by applying the same column functions that were used to obtain the grouped rows.

A ROLLUP grouping is a series of grouping-sets. The general specification of a ROLLUP with n elements

  GROUP BY ROLLUP(C₁,C₂,...,C_n-1,C_n)

is equivalent to

  GROUP BY GROUPING SETS((C₁,C₂,...,C_n-1,C_n)
                         (C₁,C₂,...,C_n-1)
                         ...
                         (C₁,C₂)
                         (C₁)
                         () )

Notice that the n elements of the ROLLUP translate to n+1 grouping sets.

Note that the order in which the grouping-expressions is specified is significant for ROLLUP. For example:

  GROUP BY ROLLUP(a,b)

is equivalent to

  GROUP BY GROUPING SETS((a,b)
                         (a)
                         () )

while

  GROUP BY ROLLUP(b,a)

is the same as

  GROUP BY GROUPING SETS((b,a)
                         (b)
                         () )

The ORDER BY clause is the only way to guarantee the order of the rows in the result set. Example C3 illustrates the use of ROLLUP.

CUBE ( grouping-expression-list )

A CUBE grouping is an extension to the GROUP BY clause that produces a result set that contains all the rows of a ROLLUP aggregation and, in addition, contains "cross-tabulation" rows. Cross-tabulation rows are additional "super-aggregate" rows that are not part of an aggregation with sub-totals.

Like a ROLLUP, a CUBE grouping can also be thought of as a series of grouping-sets. In the case of a CUBE, all permutations of the cubed grouping-expression-list are computed along with the grand total. Therefore, the n elements of a CUBE translate to 2**n (2 to the power n) grouping-sets. For instance, a specification of

  GROUP BY CUBE(a,b,c)

is equivalent to

  GROUP BY GROUPING SETS((a,b,c)
                         (a,b)
                         (a,c)
                         (b,c)
                         (a)
                         (b)
                         (c)
                         () )

Notice that the 3 elements of the CUBE translate to 8 grouping sets.

The order of specification of elements does not matter for CUBE. 'CUBE (DayOfYear, Sales_Person)' and 'CUBE (Sales_Person, DayOfYear)' yield the same result sets. The use of the word 'same' applies to content of the result set, not to its order. The ORDER BY clause is the only way to guarantee the order of the rows in the result set. Example C4 illustrates the use of CUBE.

grouping-expression-list

A grouping-expression-list is used within a CUBE or ROLLUP clause to define the number of elements in the CUBE or ROLLUP operation. This is controlled by using parentheses to delimit elements with multiple grouping-expressions.

The rules for a grouping-expression are described in group-by-clause. For example, suppose that a query is to return the total expenses for the ROLLUP of City within a Province but not within a County. However the clause:

  GROUP BY ROLLUP(Province, County, City)

results in unwanted sub-total rows for the County. In the clause

  GROUP BY ROLLUP(Province, (County, City))

the composite (County, City) forms one element in the ROLLUP and, therefore, a query that uses this clause will yield the desired result. In other words, the two element ROLLUP

  GROUP BY ROLLUP(Province, (County, City))

generates

  GROUP BY GROUPING SETS((Province, County, City)
                         (Province)
                         () )

while the 3 element ROLLUP would generate

  GROUP BY GROUPING SETS((Province, County, City)
                         (Province, County)
                         (Province)
                         () )

Example C2 also utilizes composite column values.

grand-total: Both CUBE and ROLLUP return a row which is the overall (grand total) aggregation. This may be separately specified with empty parentheses within the GROUPING SET clause. It may also be specified directly in the GROUP BY clause, although there is no effect on the result of the query. Example C4 uses the grand-total syntax.

Combining Grouping Sets

This can be used to combine any of the types of GROUP BY clauses. When simple grouping-expression fields are combined with other groups, they are "appended" to the beginning of the resulting grouping sets. When ROLLUP or CUBE expressions are combined, they operate like "multipliers" on the remaining expression, forming additional grouping set entries according to the definition of either ROLLUP or CUBE.

For instance, combining grouping-expression elements acts as follows:

  GROUP BY a, ROLLUP(b,c)

is equivalent to

  GROUP BY GROUPING SETS((a,b,c)
                         (a,b)
                         (a) )

Or similarly,

  GROUP BY a, b, ROLLUP(c,d)

is equivalent to

  GROUP BY GROUPING SETS((a,b,c,d)
                         (a,b,c)
                         (a,b) )

Combining of ROLLUP elements acts as follows:

  GROUP BY ROLLUP(a), ROLLUP(b,c)

is equivalent to

  GROUP BY GROUPING SETS((a,b,c)
                         (a,b)
                         (a)
                         (b,c)
                         (b)
                         () )

Similarly,

  GROUP BY ROLLUP(a), CUBE(b,c)

is equivalent to

  GROUP BY GROUPING SETS((a,b,c)
                         (a,b)
                         (a,c)
                         (a)
                         (b,c)
                         (b)
                         (c)
                         () )

Combining of CUBE and ROLLUP elements acts as follows:

  GROUP BY CUBE(a,b), ROLLUP(c,d)

is equivalent to

  GROUP BY GROUPING SETS((a,b,c,d)
                         (a,b,c)
                         (a,b)
                         (a,c,d)
                         (a,c)
                         (a)
                         (b,c,d)
                         (b,c)
                         (b)
                         (c,d)
                         (c)
                         () )

Like a simple grouping-expression, combining grouping sets also eliminates duplicates within each grouping set. For instance,

  GROUP BY a, ROLLUP(a,b)

is equivalent to

  GROUP BY GROUPING SETS((a,b)
                         (a) )

A more complete example of combining grouping sets is to construct a result set that eliminates certain rows that would be returned for a full CUBE aggregation.

For example, consider the following GROUP BY clause:

  GROUP BY Region, 
           ROLLUP(Sales_Person, WEEK(Sales_Date)), 
           CUBE(YEAR(Sales_Date), MONTH (Sales_Date))

The column listed immediately to the right of GROUP BY is simply grouped, those within the parenthesis following ROLLUP are rolled up, and those within the parenthesis following CUBE are cubed. Thus, the above clause results in a cube of MONTH within YEAR which is then rolled up within WEEK within Sales_Person within the Region aggregation. It does not result in any grand total row or any cross-tabulation rows on Region, Sales_Person or WEEK(Sales_Date) so produces fewer rows than the clause:

  GROUP BY ROLLUP (Region, Sales_Person, WEEK(Sales_Date), 
                   YEAR(Sales_Date), MONTH(Sales_Date) )

having-clause

>>-HAVING--search-condition------------------------------------><

The HAVING clause specifies an intermediate result table that consists of those groups of R for which the search-condition is true. R is the result of the previous clause of the subselect. If this clause is not GROUP BY, R is considered a single group with no grouping columns.

Each column-name in the search condition must do one of the following:

Unambiguously identify a grouping column of R.
Be specified within a column function.
Be a correlated reference. A column-name is a correlated reference if it identifies a column of a table-reference in an outer subselect.

A group of R to which the search condition is applied supplies the argument for each column function in the search condition, except for any function whose argument is a correlated reference.

If the search condition contains a subquery, the subquery can be thought of as being executed each time the search condition is applied to a group of R, and the results used in applying the search condition. In actuality, the subquery is executed for each group only if it contains a correlated reference. For an illustration of the difference, see Example A6 and Example A7.

A correlated reference to a group of R must either identify a grouping column or be contained within a column function.

When HAVING is used without GROUP BY, the select list can only be a column name within a column function, a correlated column reference, a literal, or a special register.

Footnotes:

⁵³: These are called sub-total rows, because that is their most common use, however any column function can be used for the aggregation. For instance, MAX and AVG are used in Example C8.

[ Top of Page | Previous Page | Next Page ]