Technology Talk: October 2010

25 October 2010

Implementing Aggregate Queries

AGGREGATE FUNCTIONS
Aggregate functions perform calculations on a set of data and return a scalar value. Following aggregate functions are available in SQL Server 2008,

AVG : Returns the average of the values in a data set
CHECKSUM_AGG : Returns the checksum of the values in a data set
COUNT : Returns the number of the values in a data set COUNT(*) returns the number of rows in a set. COUNT(Column Name) Returns number of rows which contain data in specified column. COUNT(DISTINCT ) returns number of unique not null values in data set
COUNT_BIG : it work same as "COUNT" but it returns Bigint data type, while "COUNT" return Int data type.
GROUPING : It returns 0 or 1, the value 1 indicate aggregate while 0 indicate detail result
MAX : Returns the highest value in a data set
MIN : Returns the lowest value in a data set
SUM : Returns the total of the values in the data set
STDEV : Returns the statistical standard deviation of all values in given data set
STDEVP : Returns the statistical standard deviation for the population for all values in data set
VAR : Returns the statistical variance of all values in the specified expression
VARP : Returns the statistical variance for the population for all values in the specified expression

GROUP BY CLAUSE
The "Group By" clause is used in the query with Aggregate functions. When an Aggregate function is used in Select, all other expression in Select must be either Aggregate function or included in Group By clause. The "Group By" allow to find out subtotal of the aggregate data. For example, I have a table like given below

Table A : MonthlySalary

Now look at the following query, it will return Average, Minimum and Maximum salary of each employee

SELECT Employee, AVG(Salary) Average,
MIN(Salary) Minimum, MAX(Salary)Maximum
FROM MonthlySalary
GROUP BY Employee

WITH ROLLUP OPERATOR
The ROLLUP operator is useful in generating quick result set that contain subtotals and total. This operators provides grand total and subtotal based on the column included in GROUP BY clause. On "MonthlySalary" table execute the given query that uses WITH ROLLUP Operator and check the output,

SELECT Employee, Month, SUM(Salary) AS Salary
FROM MonthlySalary
GROUP BY Employee, Month WITH ROLLUP
OR
SELECT Employee, Month, SUM(Salary) as Salary
FROM MonthlySalary
GROUP BY ROLLUP(Employee,Month)

The result set contain subtotal of the salary for all the employees, and the grand total.

WITH CUBE OPERATOR
The CUBE operator should be used when the summary information needed for more then one column. It generates a result set that contain subtotal for all the column specified in GROUP BY clause. Look at the query and the result given below

SELECT Employee, Month, SUM(Salary) AS Salary
FROM MonthlySalary
GROUP BY Employee, Month WITH CUBE
OR
SELECT Employee, Month, SUM(Salary) as Salary
FROM MonthlySalary
GROUP BY CUBE(Employee,Month)

Don't miss the row number 3, 7 and 11 which shows subtotal by month as well, when using WITH CUBE operator. While using WITH ROLLUP operator this rows will not be displayed.

Things to remember while using WITH ROLLUP and WITH CUBE operator

CHECKSUM_AGG aggregate function is not compatible with ROLLUP, CUBE and GROUPING SETS
Avoid using WITH CUBE operator on large table where their are more then 3 columns in GROUP BY clause, for batter performance.

ROLLUP() AND CUBE()
ROLLUP() and CUBE() are equivalent to WITH ROLLUP and WITH CUBE and use the same query plans. But note that the new ROLLUP and CUBE syntax is only available in compatibility level 100.

GROUPING AGGREGATE FUNCTION
How can a NULL generated by the CUBE operation be distinguished from a NULL returned in the actual data? This can be done by using the GROUPING function. The GROUPING function returns 0 if the column value NULL came from the fact data, and 1 if the column value NULL is generated by the CUBE operation. It will be useful when we have to further use the result set generated, as it is tough to find out which data is summary data and for which column, so using GROUPING function we can easily find it out. Look at the query and the result given below

SELECT Employee, Month, SUM(Salary) AS Salary,
GROUPING(Employee)ForEmployee, GROUPING(Month)ForMonth
FROM MonthlySalary
GROUP BY Employee, Month WITH Cube

GROUPING SETS
It allows us to specify which aggregations do we want in our result set.
For Example:
1 - Following query will give us same output as we get with ROLLUP() or WITH ROLLUP operator

SELECT Employee, Month, SUM(Salary) as Salary
FROM MonthlySalary
GROUP BY GROUPING SETS((Employee,Month), (Employee), 0)

This query aggregates the Salary by Employee and Month, then it aggregate by Employee only, at last it compute the total for all Employees for all Months. The () syntax with no GROUP BY columns display the total.

2 - Following query will give us the same output as we get with CUBE() or WITH CUBE operator

SELECT Employee, Month, SUM(Salary) as Salary
FROM MonthlySalary
GROUP BY GROUPING SETS((Employee,Month), (Employee), (Month), 0)

This query aggregates the Salary by Employee and Month, then it aggregate by Employee only, then it aggregate by Month only, at last it compute the total for all Employees for all Months.

24 October 2010

SQL Server Database Recovery Models in Brief

SQL Server database recovery models gives backup and restore flexibility. The model used to determine how great your risk of data loss will be when a breakdown occurs. There are three types of database recovery models available

Full Recovery
Bulk_Logged Recovery
Simple Recovery

Full Recovery Model :
In the Full recovery model, the database engine logs all operations onto the transaction log, and the database engine never truncates the log. The Full recovery model lets you restore a database to the point of failure
Simple Recovery Model :
In the Simple recovery model, the database engine minimally logs most operations and truncates the transaction log after each checkpoint. In the Simple recovery model, you cannot back up or restore the transaction log. Furthermore, you cannot restore individual data pages.
Bulk-Logged Recovery Model:
In the Bulk-Logged recovery model, the database engine minimally logs bulk operations such as SELECT INTO and BULK INSERT. In this recovery model, if a log backup contains any bulk operation, you can restore the database to the end of the log backup, not to a point in time. The Bulk-Logged recovery model is intended to be used only during large bulk operations.

Important: The Simple recovery model is not appropriate for databases in which the loss of recent changes is unacceptable.

21 October 2010

What Deterministic and Nondeterministic Functions are?

Deterministic Function: Functions which always return the same result any time they are called with a specific set of input values. For example square(2) will always return 4.

Nondeterministic Function: Functions that may return different results each time they are called with a specific set of input values. For example getdate() function will return one result today, but the same function will return different result tomorrow.

20 October 2010

What is a FILESTREAM in SQL Server 2008?

A FILESTREAM is a new feature in the SQL Server 2008. It allows structured data to be stored in the database and associated unstructured data to be stored directly in the NTFS file system. Transactional consistency is maintained at all times between the structured and unstructured data. FILESTREAM combines the SQL Server Database Engine with an NTFS file system by storing varbinary(max) binary large object (BLOB) data as files on the file system.
It is recommended to use the FILESTREAM when any of the following conditions are met:

When the objects that are being stored in the FILESTREAM have a size greater than 1 MB.
When fast read access is an essential requirement.
When an application is developed that uses a middle tier for application logic.

10 October 2010

Database Normalization with Example

In this article, we'll introduce the concept of normalization as it is very difficult to visualize these concepts using words only, I will try to explore first 3 normal form with example

What is normalization?
Normalization is the process of efficiently organizing data to minimize redundancy in a database. Normalization usually involves dividing a database into two or more tables and defining relationships between the tables.

What are different normalization forms?
1NF - No repeating elements or groups of elements
2NF - No partial dependencies on a concatenated key
3NF - No dependencies on non-key attributes
BCNF - Boyce-Codd Normal Form
4NF - Isolate Independent Multiple Relationships
5NF - Isolate Semantically Related Multiple Relationships
ONF - Optimal Normal Form
DKNF - Domain-Key Normal Form

Understanding first 3 normal form with example
For a database to be in 3NF, it must first satisfy all the criteria of a 2NF and 1NF database. Now to understand 1NF, 2NF and 3NF we will take an example of below given table structure with given data and will implement first 3 normal forms on it.
Our table structure is as given below, with highlighted repeating groups of element. Note down that here primary key is “ ordid ”

The data in the table will be like given below

First Normal Form: No Repeating Elements or Groups of Elements
Now after applying 1NF, our table structure will be as given below.

And the data in the table will be like

Here as you can see from there is no single column that uniquely identifies each row. However, if we put a number of columns together, we can satisfy this requirement. The two columns that together uniquely identify each row are ordid and itemid , no two rows have the same combination of ordid and itemid. Therefore, together they qualify to be used as the table's primary key.

Second Normal Form: No Partial Dependencies on a Concatenated Key
Next we test each table for partial dependencies on a concatenated key . This means that for a table that has a concatenated primary key, each column in the table that is not part of the primary key must depend upon the entire concatenated key for its existence. If any column only depends upon one part of the concatenated key, then we say that the entire table has failed Second Normal Form and we must create another table to rectify the failure.

Now we will analyze the column other then concatenated key column one by one,

orddt is the date on which the order was made. Obviously it relies on ordid, an order date has to have an order otherwise it is only a date. But can an order date exist without an itemid ? The answer is no, therefore orddt fails Second Normal Form.
custid is the ID number of the customer who placed the order. Does it rely on ordid ? No: a customer can exist without placing any orders. Does it rely on itemid ? No: for the same reason. This is interesting: custid (along with the rest of the customer columns) does not rely on either member of the primary key. What do we do with these columns? We don't have to worry about them until we get to Third Normal Form.
itemname is the next column that is not itself part of the primary key. This is the name of the inventory item. Obviously it relies on itemid . But it can exist without an ordid, therefore itemname fails the test.
qty refers to the number of items purchased on a particular invoice. Can this quantity exist without an itemid? The answer is no. Can the quantity exist without an ordid? Again the answer is no. So this column does not violate Second Normal Form, qty depends on both parts of our concatenated primary key.
price is similar to itemname . It depends on the itemid but not on the ordid , so it does violate Second Normal Form.

What should we do with a table that fails Second Normal Form? First we take out the second half of the concatenated primary key ( itemid ) and put it in its own table.
All the columns that depend on itemid - whether in whole or in part - follow it into the new table. We call this new table OrderItems. Now our table structure will look like given below

There are several things to notice:

We have brought a copy of the ordid column over into the OrderItems table. This allows each OrderItem to "remember" which order it is a part of.
Now the order table has fewer rows than it did before.
The order table no longer has a concatenated primary key. The primary key now consists of a single column, ordid .
The OrderItems table does have a concatenated primary key.

Remember, NF2 only applies to tables with a concatenated primary key. Now that Order has a single-column primary key, it has passed Second Normal Form. OrderItems , however, still has a concatenated primary key. We have to pass it through the NF2 analysis again. We ask the same question we did before,
Now consider the columns that are not part of the primary key...

itemname relies on itemid , but not ordid . So this column once again fails NF2.
qty relies on both members of the primary key. It does not violate NF2.
price relies on the itemid but not on the ordid , so it does violate Second Normal Form.

So now after applying NF2 on OrderItems table, the table structure should look like

So now our table structure fulfill NF2

Third Normal Form: No Dependencies on Non-Key Attributes
At last, we return to the problem of the repeating Customer information. Right now if a customer places more than one order then we have to input all of that customer's contact information again. This is because there are columns in the order table that rely on "non-key attributes".

To better understand this concept, consider the orddt column. Can it exist without ordid column? No, so the orddt is said to depend on a key attribute ordid

What about custname , can it exist on its own? Yes , it can without referring to an order. The same goes for custaddr . These column actually rely on custid , which is not a key in this table

So now after applying NF3 our table structure will look like,

So, we are done with NF3. Is this information useful to you? Let me know if have any doubt...

06 October 2010

Important date queries in SQL Server 2005

Query to fetch current date in '2010-02-23 00:00:00.000' format
SELECT DATEADD(d, DATEDIFF(d, 0, GETDATE())+0, 0)
OR
SELECT DATEADD(d, 0, DATEDIFF(d, 0, GETDATE()) )

Query to fetch current date in '2010-02-23 11:59:59.000' format
SELECT DATEADD(ss, +43199, DATEADD(d, DATEDIFF(d, 0, GETDATE())+0, 0))
OR
SELECT DATEADD(s, -1, DATEADD(d, 0, DATEDIFF(d, 0, GETDATE()) ) + 1)

Few days back I had a requirement that, I have to find out start date of the week, considering Monday as first day of the week. I have search on the internet for available solution and found the following query,
SELECT DATEADD(wk, DATEDIFF(wk, 0, GETDATE()), 0)
Output is : 2010-10-04 00:00:00.000

Soon I have noticed a problem with above query that instead of using GETDATE() if I pass any date when the day is Sunday, for eg. I pass '2010-10-03' then it should return me ‘2010-09-27’ but it is returning me '2010-10-04'. With the following query, you will notice that output is not meeting our requirement
SELECT DATEADD(wk, DATEDIFF(wk, 0, '2010-10-03'), 0)
Output is : 2010-10-04 00:00:00.000

So, I have used the following query which will help to solve above mentioned problem
SELECT (CAST('2010-01-05' AS DATETIME) - DATEPART(dw, CAST('2010-01-05' AS DATETIME)-1)) +1
Output is : 2010-09-27 00:00:00.000

Pages

Search This Blog