What is normalization?
Normalization is the process of efficiently organizing data to minimize redundancy in a database. Normalization usually involves dividing a database into two or more tables and defining relationships between the tables.
What are different normalization forms?
1NF - No repeating elements or groups of elements
2NF - No partial dependencies on a concatenated key
3NF - No dependencies on non-key attributes
BCNF - Boyce-Codd Normal Form
4NF - Isolate Independent Multiple Relationships
5NF - Isolate Semantically Related Multiple Relationships
ONF - Optimal Normal Form
DKNF - Domain-Key Normal Form
Understanding first 3 normal form with example
For a database to be in 3NF, it must first satisfy all the criteria of a 2NF and 1NF database. Now to understand 1NF, 2NF and 3NF we will take an example of below given table structure with given data and will implement first 3 normal forms on it.
Our table structure is as given below, with highlighted repeating groups of element. Note down that here primary key is “ ordid ”
The data in the table will be like given below
First Normal Form: No Repeating Elements or Groups of Elements
Now after applying 1NF, our table structure will be as given below.
And the data in the table will be like
Here as you can see from there is no single column that uniquely identifies each row. However, if we put a number of columns together, we can satisfy this requirement. The two columns that together uniquely identify each row are ordid and itemid , no two rows have the same combination of ordid and itemid. Therefore, together they qualify to be used as the table's primary key.
Second Normal Form: No Partial Dependencies on a Concatenated Key
Next we test each table for partial dependencies on a concatenated key . This means that for a table that has a concatenated primary key, each column in the table that is not part of the primary key must depend upon the entire concatenated key for its existence. If any column only depends upon one part of the concatenated key, then we say that the entire table has failed Second Normal Form and we must create another table to rectify the failure.
Now we will analyze the column other then concatenated key column one by one,
- orddt is the date on which the order was made. Obviously it relies on ordid, an order date has to have an order otherwise it is only a date. But can an order date exist without an itemid ? The answer is no, therefore orddt fails Second Normal Form.
- custid is the ID number of the customer who placed the order. Does it rely on ordid ? No: a customer can exist without placing any orders. Does it rely on itemid ? No: for the same reason. This is interesting: custid (along with the rest of the customer columns) does not rely on either member of the primary key. What do we do with these columns? We don't have to worry about them until we get to Third Normal Form.
- itemname is the next column that is not itself part of the primary key. This is the name of the inventory item. Obviously it relies on itemid . But it can exist without an ordid, therefore itemname fails the test.
- qty refers to the number of items purchased on a particular invoice. Can this quantity exist without an itemid? The answer is no. Can the quantity exist without an ordid? Again the answer is no. So this column does not violate Second Normal Form, qty depends on both parts of our concatenated primary key.
- price is similar to itemname . It depends on the itemid but not on the ordid , so it does violate Second Normal Form.
All the columns that depend on itemid - whether in whole or in part - follow it into the new table. We call this new table OrderItems. Now our table structure will look like given below
There are several things to notice:
- We have brought a copy of the ordid column over into the OrderItems table. This allows each OrderItem to "remember" which order it is a part of.
- Now the order table has fewer rows than it did before.
- The order table no longer has a concatenated primary key. The primary key now consists of a single column, ordid .
- The OrderItems table does have a concatenated primary key.
Now consider the columns that are not part of the primary key...
- itemname relies on itemid , but not ordid . So this column once again fails NF2.
- qty relies on both members of the primary key. It does not violate NF2.
- price relies on the itemid but not on the ordid , so it does violate Second Normal Form.
So now our table structure fulfill NF2
Third Normal Form: No Dependencies on Non-Key Attributes
At last, we return to the problem of the repeating Customer information. Right now if a customer places more than one order then we have to input all of that customer's contact information again. This is because there are columns in the order table that rely on "non-key attributes".
To better understand this concept, consider the orddt column. Can it exist without ordid column? No, so the orddt is said to depend on a key attribute ordid
What about custname , can it exist on its own? Yes , it can without referring to an order. The same goes for custaddr . These column actually rely on custid , which is not a key in this table
So now after applying NF3 our table structure will look like,
So, we are done with NF3. Is this information useful to you? Let me know if have any doubt...
best article on normalization i have ever read... so wonderful.........
ReplyDeleteThanks for such a nice comment...
ReplyDeleteExcellent article. So easy to read and follow.
ReplyDeleteIn the scenario whereby a data migration is being carried out between a legacy application and new application where the
data model and structures have already been defined in new would an acceptable approach to migration be:
- Take the new data model and create a structure in a staging area which maps to it for relevant mapped data for migration;
- During the extract routines the data from source should be extracted into the normalized staging area matching the structure of new
- Following transformation load from staging to new
Would appreciate your feedback on this.
Jon S