Search This Blog

25 October 2010

Implementing Aggregate Queries

AGGREGATE FUNCTIONS
Aggregate functions perform calculations on a set of data and return a scalar value. Following aggregate functions are available in SQL Server 2008,

AVG : Returns the average of the values in a data set
CHECKSUM_AGG : Returns the checksum of the values in a data set
COUNT : Returns the number of the values in a data set COUNT(*) returns the number of rows in a set. COUNT(Column Name) Returns number of rows which contain data in specified column. COUNT(DISTINCT ) returns number of unique not null values in data set
COUNT_BIG : it work same as "COUNT" but it returns Bigint data type, while "COUNT" return Int data type.
GROUPING : It returns 0 or 1, the value 1 indicate aggregate while 0 indicate detail result
MAX : Returns the highest value in a data set
MIN : Returns the lowest value in a data set
SUM : Returns the total of the values in the data set
STDEV : Returns the statistical standard deviation of all values in given data set
STDEVP : Returns the statistical standard deviation for the population for all values in data set
VAR : Returns the statistical variance of all values in the specified expression
VARP : Returns the statistical variance for the population for all values in the specified expression

GROUP BY CLAUSE
The "Group By" clause is used in the query with Aggregate functions. When an Aggregate function is used in Select, all other expression in Select must be either Aggregate function or included in Group By clause. The "Group By" allow to find out subtotal of the aggregate data. For example, I have a table like given below

Table A : MonthlySalary
Now look at the following query, it will return Average, Minimum and Maximum salary of each employee

SELECT Employee, AVG(Salary) Average,
MIN(Salary) Minimum, MAX(Salary)Maximum
FROM MonthlySalary
GROUP BY Employee

WITH ROLLUP OPERATOR
The ROLLUP operator is useful in generating quick result set that contain subtotals and total. This operators provides grand total and subtotal based on the column included in GROUP BY clause. On "MonthlySalary" table execute the given query that uses WITH ROLLUP Operator and check the output,

SELECT Employee, Month, SUM(Salary) AS Salary
FROM MonthlySalary
GROUP BY Employee, Month WITH ROLLUP
OR
SELECT Employee, Month, SUM(Salary) as Salary
FROM MonthlySalary
GROUP BY ROLLUP(Employee,Month)


The result set contain subtotal of the salary for all the employees, and the grand total.

WITH CUBE OPERATOR
The CUBE operator should be used when the summary information needed for more then one column. It generates a result set that contain subtotal for all the column specified in GROUP BY clause. Look at the query and the result given below

SELECT Employee, Month, SUM(Salary) AS Salary
FROM MonthlySalary
GROUP BY Employee, Month WITH CUBE
OR
SELECT Employee, Month, SUM(Salary) as Salary
FROM MonthlySalary
GROUP BY CUBE(Employee,Month)


Don't miss the row number 3, 7 and 11 which shows subtotal by month as well, when using WITH CUBE operator. While using WITH ROLLUP operator this rows will not be displayed.

Things to remember while using WITH ROLLUP and WITH CUBE operator
  • CHECKSUM_AGG aggregate function is not compatible with ROLLUP, CUBE and GROUPING SETS
  • Avoid using WITH CUBE operator on large table where their are more then 3 columns in GROUP BY clause, for batter performance.

ROLLUP() AND CUBE()
ROLLUP() and CUBE() are equivalent to WITH ROLLUP and WITH CUBE and use the same query plans. But note that the new ROLLUP and CUBE syntax is only available in compatibility level 100.

GROUPING AGGREGATE FUNCTION
How can a NULL generated by the CUBE operation be distinguished from a NULL returned in the actual data? This can be done by using the GROUPING function. The GROUPING function returns 0 if the column value NULL came from the fact data, and 1 if the column value NULL is generated by the CUBE operation. It will be useful when we have to further use the result set generated, as it is tough to find out which data is summary data and for which column, so using GROUPING function we can easily find it out. Look at the query and the result given below

SELECT Employee, Month, SUM(Salary) AS Salary,
GROUPING(Employee)ForEmployee, GROUPING(Month)ForMonth
FROM MonthlySalary
GROUP BY Employee, Month WITH Cube


GROUPING SETS
It allows us to specify which aggregations do we want in our result set.
For Example:
1 - Following query will give us same output as we get with ROLLUP() or WITH ROLLUP operator

SELECT Employee, Month, SUM(Salary) as Salary
FROM MonthlySalary
GROUP BY GROUPING SETS((Employee,Month), (Employee), 0)


This query aggregates the Salary by Employee and Month, then it aggregate by Employee only, at last it compute the total for all Employees for all Months. The () syntax with no GROUP BY columns display the total.

2 - Following query will give us the same output as we get with CUBE() or WITH CUBE operator

SELECT Employee, Month, SUM(Salary) as Salary
FROM MonthlySalary
GROUP BY GROUPING SETS((Employee,Month), (Employee), (Month), 0)

This query aggregates the Salary by Employee and Month, then it aggregate by Employee only, then it aggregate by Month only, at last it compute the total for all Employees for all Months.

2 comments: