Search This Blog

10 October 2010

Database Normalization with Example

In this article, we'll introduce the concept of normalization as it is very difficult to visualize these concepts using words only, I will try to explore first 3 normal form with example  

What is normalization?
Normalization is the process of efficiently organizing data to minimize redundancy in a database. Normalization usually involves dividing a database into two or more tables and defining relationships between the tables.

What are different normalization forms?
1NF - No repeating elements or groups of elements
2NF - No partial dependencies on a concatenated key
3NF - No dependencies on non-key attributes
BCNF - Boyce-Codd Normal Form
4NF - Isolate Independent Multiple Relationships
5NF - Isolate Semantically Related Multiple Relationships
ONF - Optimal Normal Form
DKNF - Domain-Key Normal Form

Understanding first 3 normal form with example
For a database to be in 3NF, it must first satisfy all the criteria of a 2NF and 1NF database. Now to understand 1NF, 2NF and 3NF we will take an example of below given table structure with given data and will implement first 3 normal forms on it.
Our table structure is as given below, with highlighted repeating groups of element. Note down that here primary key is “ ordid


The data in the table will be like given below



First Normal Form: No Repeating Elements or Groups of Elements
Now after applying 1NF, our table structure will be as given below.

And the data in the table will be like

Here as you can see from there is no single column that uniquely identifies each row. However, if we put a number of columns together, we can satisfy this requirement. The two columns that together uniquely identify each row are ordid and itemid , no two rows have the same combination of ordid and itemid. Therefore, together they qualify to be used as the table's primary key.  

Second Normal Form: No Partial Dependencies on a Concatenated Key
Next we test each table for partial dependencies on a concatenated key . This means that for a table that has a concatenated primary key, each column in the table that is not part of the primary key must depend upon the entire concatenated key for its existence. If any column only depends upon one part of the concatenated key, then we say that the entire table has failed Second Normal Form and we must create another table to rectify the failure.

Now we will analyze the column other then concatenated key column one by one,
  • orddt is the date on which the order was made. Obviously it relies on ordid, an order date has to have an order otherwise it is only a date. But can an order date exist without an itemid ? The answer is no, therefore orddt fails Second Normal Form.
  • custid is the ID number of the customer who placed the order. Does it rely on ordid ? No: a customer can exist without placing any orders. Does it rely on itemid ? No: for the same reason. This is interesting: custid (along with the rest of the customer columns) does not rely on either member of the primary key. What do we do with these columns? We don't have to worry about them until we get to Third Normal Form.
  • itemname is the next column that is not itself part of the primary key. This is the name of the inventory item. Obviously it relies on itemid . But it can exist without an ordid, therefore itemname fails the test.
  • qty refers to the number of items purchased on a particular invoice. Can this quantity exist without an itemid? The answer is no. Can the quantity exist without an ordid? Again the answer is no. So this column does not violate Second Normal Form, qty depends on both parts of our concatenated primary key.
  • price is similar to itemname . It depends on the itemid but not on the ordid , so it does violate Second Normal Form.
What should we do with a table that fails Second Normal Form? First we take out the second half of the concatenated primary key ( itemid ) and put it in its own table.
All the columns that depend on itemid - whether in whole or in part - follow it into the new table. We call this new table OrderItems. Now our table structure will look like given below

There are several things to notice:
  • We have brought a copy of the ordid column over into the OrderItems table. This allows each OrderItem to "remember" which order it is a part of.
  • Now the order table has fewer rows than it did before.
  • The order table no longer has a concatenated primary key. The primary key now consists of a single column, ordid .
  • The OrderItems table does have a concatenated primary key.
Remember, NF2 only applies to tables with a concatenated primary key. Now that Order has a single-column primary key, it has passed Second Normal Form. OrderItems , however, still has a concatenated primary key. We have to pass it through the NF2 analysis again. We ask the same question we did before,
Now consider the columns that are not part of the primary key...
  • itemname relies on itemid , but not ordid . So this column once again fails NF2.
  • qty relies on both members of the primary key. It does not violate NF2.
  • price relies on the itemid but not on the ordid , so it does violate Second Normal Form.
So now after applying NF2 on OrderItems table, the table structure should look like

So now our table structure fulfill NF2

Third Normal Form: No Dependencies on Non-Key Attributes
At last, we return to the problem of the repeating Customer information. Right now if a customer places more than one order then we have to input all of that customer's contact information again. This is because there are columns in the order table that rely on "non-key attributes".

To better understand this concept, consider the orddt column. Can it exist without ordid column? No, so the orddt is said to depend on a key attribute ordid

What about custname , can it exist on its own? Yes , it can without referring to an order. The same goes for custaddr . These column actually rely on custid , which is not a key in this table

So now after applying NF3 our table structure will look like,

So, we are done with NF3. Is this information useful to you? Let me know if have any doubt...

3 comments:

  1. best article on normalization i have ever read... so wonderful.........

    ReplyDelete
  2. Thanks for such a nice comment...

    ReplyDelete
  3. Excellent article. So easy to read and follow.

    In the scenario whereby a data migration is being carried out between a legacy application and new application where the
    data model and structures have already been defined in new would an acceptable approach to migration be:

    - Take the new data model and create a structure in a staging area which maps to it for relevant mapped data for migration;
    - During the extract routines the data from source should be extracted into the normalized staging area matching the structure of new
    - Following transformation load from staging to new

    Would appreciate your feedback on this.

    Jon S

    ReplyDelete