Thursday, September 8, 2011

DBMS -Normalization

DBMS -Normalization

In the design of a relational database management system (RDBMS), the process of organizing data to minimize redundancy is called Normalization. 
The goal of database normalization is decompose relations with anomalies in order to produce smaller, well-structured relations.  Normalization usually involves dividing large tables into smaller
(and less redundant) tables and defining relationships between them. The objective is to isolate data so that additions, deletions, and
modifications of a field can be made in just one table and then propagated through the rest of the database via the defined relationship.Normalization didn't any headache who is the primary key of the relation.But if there is key then problem is occurs and normalization is needed.

Let us take a relation :-Record of some student
                      ROLL NO                        SUBJECT                             SCORE
                           25                                 Maths                                       81
                                                                Phys                                         82
                                                                chem                                        87  
     
----------------------------------------------------------------                          
                         26                                Maths                                        80
                                                             Phys                                          87
                                                             Chem                                        89 
--------------------------------------------------------------------------------------
This relation contents the student roll no. and subject wise score.So if we know the roll no we retrieve the information for the table. Here ROLL NO is the primary key.

1NF(frist normal from):-The first normal from says that all the attribute is single valued or atomic.
so the upper table is wrong.because the attribute is multivalued.so we decompose the table such that there should not be any multivalued attribute.
  ROLL NO                        SUBJECT                             SCORE
                           25                                 Maths                                       81
                           25                                 Phys                                         82
                           25                                 chem                                        87  
     
---------------------------------------------------------------------------------------------------------------                      
                         26                                Maths                                        80
                         26                                Phys                                          87
                         26                                Chem                                        89 
---------------------------------------------------------------------------------------------------------------
Functional Dependency:-A functional dependency ,denoted by X-->Y,between two set of attributes X and Y that are subset of R specifies a constraint on the possible tuples  that can from a relation state  r of R. Constraint is that ,for any two tuples t1 &t2 in r that have t1[X]=t2[X],the must also have t1[Y]=t2[Y].

--------------------------------------------------------------------------------------------------------------
Suppose we write  x, logx, x^2, e^x, sinx=f(x);
so x, logx, x^2, e^x, sinx is functionally dependent on x.in DBMS is same concept.
---------------------------------------------------------------------------------------------------------------

Trivial /Partial Dependency:-A functional dependency α → β is called a partial dependency if there is a proper subset γ of α such that γ → β.We say that β is partially dependent on α.A functional dependency α → β is called a partial dependency if there is aproper subset γ of α such that γ → β.We say that β is partially dependent on α.

So we again considering a relation
              CLASS                     ROLL                          NAME                              CLASS TEACHER
                X                               11                               ABC                                       XYZ
                IX                              12                              MNO                                      PQR
                V                               45                               PQR                                       XYZ
--------------------------------------------------------------------------------------------------------------
 2NF(second normal from): A relation is said to be second normal from if
                                          (i) it is in 1NF.
                                          (ii) There is no partial dependency i.e,all non key attribute is dependent on        
                                               entire candidate key. 

Here the upper relation have following dependency
CLASS+ROLL → NAME
CLASS+ROLL →CLASS TEACHER
CLASS → CLASS TEACHER
Here CLASS AND ROLL  together make the candidate key .the violation of 2NF is  CLASS → CLASS TEACHER(for reason ii).So we decompose the table to keep it in 2NF.
----------------------------------------------------------------------------------
If a relation  Q violets 2NF from due to present of the dependency X   → Y then decompose Q into two reletion Q-Y and X U Y.Repeat this for each such violation.
-------------------------------------------------------------------------------------------------------------
so we decompose the above relation
1.CLASS         CLASS TEACHER
2.ROLL       NAME          CLASS TEACHER
So the two relaton in second normal from .


Transitive Dependency:-A functional dependency X-->Y in a relation schema R is a transitive dependency if there is a set of attribute Z neither a candidate key nor a subset of any key of  R ,and both X--> Z and Z-->Y
.( if   X--> Z and Z-->Y then X-->Y)

So we consider an another relation below.
                       PROJECT-NO                  PROJECT-DATE                    SUPPLIER                 ITEM
                                 1                                 12/4/90                                    APPLE                      CD
                                 2                                 18/5/90                                      MS                          OS
                                 3                                 12/5/90                                      HP                          RAM





3NF(Third Normal From):-For a relation is said to be in 3NF if their exists
                           i)It must be in 2NF.
                           ii)All non key attribute should be non Transitively dependent  on the candidate key.i.e there     
                             should not be any transitive dependency. 
The above relation in second normal from.the above relation PROJECT-NO is the primary key.the exists functional  dependencies are      
                                         PROJECT-NO --->  SUPPLIER
                                         SUPPLIER ---> ITEM
                                         PROJECT-NO  ---> PROJECT-DATE
                                         PROJECT-NO ----> ITEM
So here    PROJECT-NO --->  SUPPLIER and SUPPLIER ---> ITEM creates the transitive dependency. So  the relation violets the the third normal from.so we decompose the table.The decomposition rule is discussed my previous post.Here we decompose the table into two parts.such as 
1.         PROJECT-NO                  PROJECT-DATE                    SUPPLIER
                      1                                 12/4/90                                    APPLE
                      2                                 18/5/90                                      MS
                      3                                 12/5/90                                      HP

2. SUPPLIER                 ITEM
      APPLE                      CD
        MS                          OS
        HP                          RAM

Boyce-Codd Normal Form(BCNF) :-Actually BCNF  is the modification of third normal from.
Let us consider a relation
                 PROJECT             ITEM                 SUPPLIER                    DATE OF SUPPLY
                    ABC                    OS                        MS                               12/5/11
                    XYZ                    RAM                     HP                                12/5/11

Definition :-A relation is said to be in BCNF if their exists
            i)It must be in 3NF.
           ii)Their can't be any determinent ,which is not a candidate  
                key.
          iii)All determinant can be candidate key.
here PROJECT +ITEM ---> SUPPLIER
            PROJECT +ITEM ---> DATE OF SUPPLY
            SUPPLIER ---> ITEM

The above relation is 3NF.The violation of BCNF is SUPPLIER ---> ITEM(violation point isii)Their can't be any determinant ,which is not a candidate key).So we decompose the relation  such as
1.            PROJECT                        SUPPLIER                    DATE OF SUPPLY
                    ABC                                    MS                               12/5/11
                    XYZ                                    HP                                12/5/11

2.         ITEM                 SUPPLIER                    
                 OS                        MS                  
                 RAM                     HP

However normalization does not mandatory to database design but it is necessary for good database design.If the data base does not normalize properly there some error  occur such as insertion anomaly,deletion anomaly ,update anomaly. so the recommendation is try to normalize database up to 3NF.

0 comments: