T-SQL/MSBI Knowledge Share Videos

Normalization

What is Normalization?

Normalization is the process of organizing the fields and tables of relational database to minimize redundancy and dependency. Normalization usually involves dividing large tables into smaller and less redundant tables and defining relationships between them.

Types of Normalization :

  • 1NF : No Repeating groups.
  • 2NF : Every Non key attribute should be fully dependent on the whole key(Composite key) but not on the part of the key.
  • 3NF : If any non key attribute is dependent on other non key attribute , and inturn that non key attribute is dependent on any key attribute ,then keep both non key attributes  in other table.
         Apart from this we also have :
  • BCNF (Boyce Codd NF)
  • 4NF
  • 5NF
 But its normal practice that we follow till 3NF. Here only three normal forms are explained. Lets consider the below example which in not in normalized form, where for each employee there are multiple values in some fields.

1st NF :-->
• The table cells must be of single value.
• Eliminate repeating groups in individual tables.
• Create a separate table for each set of related data.
• Identify each set of related data with a primary key.
                                                              

EmpID
EmpName
ProjectNum
Time
ProjectTitle
ProjectMgr
MgrPhn
E001
Rakesh Kumar
P001,
P002,
P003
1.25,
3,
2.54
AAA,
BBB,
CCC
Mohan Kumar,
Guru,
Ramesh
9449558722,
9998735676,
9966722975
E002
Sampath Sharma
P001
0.45
AAA
Mohan Kumar
9449558722
E003
Vikram Rathod
P002,
P004
2.21,
4
BBB,
DDD
Guru,
Guru
9998735676,
9998735676
E004
Dinesh Naidu
P002,
P003
1.5,
0.29
BBB,
CCC
Guru,
Ramesh
9998735676,
9966722975
E005
Pavan Vasudev
P005
3.2
EEE
Mohan Kumar
9449558722
                                                                              Fig 1  

In the below fig 2 you can see that each cell has one and only one value,but we need to remove the redundancy by placing each set of related data in different table and identify Primary key
in each table as in fig 3A and 3B.


EmpID
EmpFName
EmpLName
ProjectNum
Time
ProjectTitle
ProjectMgr
MgrPhn
E001
Rakesh
Kumar
P001
1.25
AAA
Mohan Kumar
9449558722
E001
Rakesh
Kumar
P002
3
BBB
Guru
9998735676
E001
Rakesh
Kumar
P003
2.54
CCC
Ramesh
9966722975
E002
Sampath
Sharma
P001
0.45
AAA
Mohan Kumar
9449558722
E003
Vikram
Rathod
P002
2.21
BBB
Guru
9998735676
E003
Vikram
Rathod
P004
4
DDD
Guru
9998735676
E004
Dinesh
Naidu
P002
1.5
BBB
Guru
9998735676
E004
Dinesh
Naidu
P003
0.29
CCC
Ramesh
9966722975
E005
Pavan
Vasudev
P005
3.2
EEE
Mohan Kumar
9449558722
Fig 2



EmpID
EmpFName
EmpLName
EmpID
ProjectNum
Time
ProjectTitle
ProjectMgr
MgrPhn
E001
Rakesh
Kumar
E001
P001
1.25
AAA
Mohan Kumar
9449558722
E002
Sampath
Sharma
E001
P002
3
BBB
Guru
9998735676
E003
Vikram
Rathod
E001
P003
2.54
CCC
Ramesh
9966722975
E004
Dinesh
Naidu
E002
P001
0.45
AAA
Mohan Kumar
9449558722
E005
Pavan
Vasudev
E003
P002
2.21
BBB
Guru
9998735676
Fig 3A
E003
P004
4
DDD
Guru
9998735676
E004
P002
1.5
BBB
Guru
9998735676
E004
P003
0.29
CCC
Ramesh
9966722975
E005
P005
3.2
EEE
Mohan Kumar
9449558722
Fig 3B

2nd NF :-->
Every Non key attribute should be fully dependent on the whole key(Composite key) but not on the part of the key.

In the fig 3B we can see that only Time is dependent on both EmpId and ProjectNum, where as ProjectTitle,ProjectMgr,MgrPhn are only dependent on ProjectNum but not on EmpID. So keep EmpID,ProjectNum and Time in one table . ProjectNum, ProjectTitle,ProjectMgr,MgrPhn in other table as shown in fig 4C.


EmpID
EmpFName
EmpLName
EmpID
ProjectNum
Time
E001
Rakesh
Kumar
E001
P001
1.25
E002
Sampath
Sharma
E001
P002
3
E003
Vikram
Rathod
E001
P003
2.54
E004
Dinesh
Naidu
E002
P001
0.45
E005
Pavan
Vasudev
E003
P002
2.21
Fig 4A
E003
P004
4
E004
P002
1.5
E004
P003
0.29
E005
P005
3.2
Fig 4B



ProjectNum
ProjectTitle
ProjectMgr
MgrPhn
P001
AAA
Mohan Kumar
9449558722
P002
BBB
Guru
9998735676
P003
CCC
Ramesh
9966722975
P004
DDD
Guru
9998735676
P005
EEE
Mohan Kumar
9449558722
Fig 4C


3rd NF :-->
If any non key attribute is dependent on other non key attribute , and inturn that non key attribute is dependent on any key attribute ,then keep both non key attributes  in other table.

In fig 4C you can see that MgrPhn is dependent on ProjectMgr but not dependent on ProjectNum and ProjectMgr is dependent on ProjectNum. So place MgrPhn and ProjectMgr in other table as shown in fig 5D.

EmpID
EmpFName
EmpLName
EmpID
ProjectNum
Time
E001
Rakesh
Kumar
E001
P001
1.25
E002
Sampath
Sharma
E001
P002
3
E003
Vikram
Rathod
E001
P003
2.54
E004
Dinesh
Naidu
E002
P001
0.45
E005
Pavan
Vasudev
E003
P002
2.21
Fig 5A
E003
P004
4
E004
P002
1.5
E004
P003
0.29
E005
P005
3.2
Fig 5B


ProjectNum
ProjectTitle
ProjectMgr
ProjectMgr
MgrPhn
P001
AAA
Mohan Kumar
Mohan Kumar
9449558722
P002
BBB
Guru
Guru
9998735676
P003
CCC
Ramesh
Ramesh
9966722975
P004
DDD
Guru
Fig 5D
P005
EEE
Mohan Kumar
Fig 5C

No comments: