Database Normalization
Database Normalization is a technique that is used to reduce redundancy or duplicate data from the database and store the data logically. There are two types of redundancies in database tables i.e. row-level redundancy and column-level redundancy.
The main aim of Normalization is any of the operations like insertion, deletion, and the update of the data does not cause any anomalies. Using different types of Normalization we can reduce the data redundancy.
Normal Forms
Unnormalized Form (UNF)
A data model in a database that is not normalized by any normalization technique defined by the relational model is said to be in Unnormaized Form (UNF). It is also known as unnormalized relation or non-first normal form. The process of normalization starts from this form of the data model. NoSQL databases are based on the principles of the unnormalized relational model.
First Normal Form (1NF)
A relation is considered to be in the first normal form if it doesn’t contain any multivalued attribute. If all the attributes are single-valued in a table then it is in first normal form (1NF).
For example,
| roll_no | name | course |
|---|---|---|
| 1 | Ashish | DBMS OS |
| 2 | Karan | DBMS |
The above table is not in 1NF as it contains the multivalued attribute.
It can be normalized to the first normal form by distributing the multivalued attributes in different columns as follows.
| roll_no | name | course_1 | course_2 |
|---|---|---|---|
| 1 | Ashish | DBMS | OS |
| 2 | Karan | DBMS |
Now, the table has no multivalued attribute and it is in the first normal form.
Second Normal Form (2NF)
It says no non-prime attribute should be functionally dependent on any partial candidate key or any proper subset of the candidate key. In other words, a relation to be in 2NF there should be no partial dependency. For a table to be in 2NF, it must be in 1NF.
For example,
| roll_no | name | course_name | course_Fee |
|---|---|---|---|
| 1 | Ankit | DBMS | 1100 |
| 2 | Vivek | OS | 1100 |
| 1 | Ankit | OS | 1100 |
| 2 | Vivek | DSA | 2000 |
Here, in the above-given table, the only candidate key is (roll_no, course_name). course_fee is a non-prime attribute as it is not part of any possible candidate key. course_fee is dependent upon course_name which is the proper subset of the candidate key so it violates the condition of the second normal form. Hence, the above table is not in 2NF.
It can be normalized to the second normal form by dividing it into two different tables as follows.
| student | ||
|---|---|---|
| roll_no | name | course_name |
| 1 | Ankit | DBMS |
| 2 | Vivek | OS |
| 1 | Ankit | OS |
| 2 | Vivek | DSA |
| course | |
|---|---|
| course_name | course_fee |
| DBMS | 1100 |
| OS | 1100 |
| DSA | 2000 |
Third Normal Form (3NF)
For a table to be in 3NF, there should be no transitive dependency for non-prime attributes and it must be in 2NF. Transitive dependency means indirect dependency like X→Y (Y is dependent on X) and Y→Z so X→Z.
A relation is in 3NF if it holds at least one of the following conditions.
- If X→Y, then X is a super key.
- Y is a prime attribute.
Here, X→Y means trivial functional dependency and Prime Attribute means part of the Candidate Key.
| year | semester | topper | topper's_dob |
|---|---|---|---|
| 2018 | 1st | Ankit | 04-05-1999 |
| 2019 | 1st | Ashish | 25-08-1998 |
| 2020 | 2nd | Ranjeet | 24-07-1996 |
In the above-given table, topper's_dob which is a non-prime attribute is transitively dependent upon the candidate key (year, semester) through another non-prime attribute topper. Hence, it violates the condition of 3NF.
All violations for 3NF can be avoided by splitting the table into two tables.
| year | semester | topper |
|---|---|---|
| 2018 | 1st | Ankit |
| 2019 | 1st | Ashish |
| 2020 | 2nd | Ranjeet |
| topper | topper's_dob |
|---|---|
| Ankit | 04-05-1999 |
| Ashish | 25-08-1998 |
| Ranjeet | 24-07-1996 |
Elementary Key Normal Form (EKNF)
This normal form is the further improvement of the third normal form. Therefore, All in EKNF are already in 3NF. It happens when more than one unique compound keys overlap which leads to redundancy in the overlapping column.
Boyce-Codd Normal Form (BCNF)
For a relation to be in BCNF it must be in 3NF. For every non-trivial functional dependency X→Y, X is a super key and for every trivial functional dependency X→Y, Y is a subset of X (Y⊆X). This form is stronger than 3NF.
4NF
It is the next level of normalization after BCNF. Unlike 2NF, 3NF, BCNF which are concerned with functional dependencies 4NF is concerned with multivalued dependency. A table is in 4NF if, for every non-trivial multivalued dependency X→→Y, X is a superkey.
Essential Tuple Normal Form (ETNF)
ETNF is stricter than 4NF and less strict than 5NF. A relation will be in ETNF if the relation is in BCNF and some of the keys have only one attribute.
5NF
It is also known as a project-join normal form. It is designed to remove redundancy from the database. A relation is in 5NF if and only if every non-trivial join dependency in that table is implied by the candidate keys.
Domain Key Normal Form (DKNF)
In DKNF, the database contains two constraints.
- Domain Constraints
- Key Constraints
A relation follows DKNF when every constraint on the relation is a logical consequence of the definition of keys and domains. It avoids all non-temporal anomalies.
6NF
A relation to be in 6NF it must satisfy no non-trivial join dependencies at all. It must be in 5NF.