People
who have started learning about databases must have frequently come up with the
word “Normalization” in quite early
stages. The name suggests that it is just about normalizing stuff. But, the
story is not that simple. You need to know much about normalization in order to
handle databases. It is one of the most important things to learn in database
management. (It is assumed that you have the basic knowledge about databases
and know how to manage tables). So, let’s straight away start discussing about
Normalization in thorough depth.
So what is Normalization?
Normalization is a process of reducing redundancies of data in a database. Quite often we come across tables having a lot of bulk data with many columns. All these data might not be necessary all the time whenever we use those tables. So, a better option is to split up the bulk table into small parts and use only those tables which suit the actual purpose at a given instance of time. In this way, redundancy is reduced. To make the long story short, we can simply say that normalization is a process of dividing a big table into smaller ones in order to reduce redundancy.
To understand the concept in deep, let us take up a simple example.
Suppose we are to manage all the databases of a company (say, My Company). The company must keep track of all the employees, customers, product details and the salary details of all the employees. A simple and straight forward way to do this is to put all this information into a single table and manage all those simultaneously.
See below.
Looking
at the above table, you may feel that it is perfectly fine. After all, what is
the problem with it? We have a big table; we have all the information required
by the company together in a single space, thus saving a lot of memory. Well
and good!
But, now think! If suppose, we need to frequently retrieve/update data about just the employees. Here, does the customer’s information or the product details really matter. Definitely no. So, why use the entire table for using just a part of it? We need a solution to this. And the solution is normalization. What we create using normalization is often called as normal forms. Let study about the popular and most widely used normal forms.
The First Normal Form
To solve the above problem, the first and foremost thing to be done is to divide the entire raw database into smaller tables based on the actual groupings. When each table has been designed, a primary key is assigned to most or all tables. Note that the primary key must be a unique value, so try to select a data element for the primary key that naturally uniquely identifies a specific piece of data.
So, let us take up the same previous example and prepare our
First normal form. See the figure below:
As we can see, the big raw database is divided into three
smaller tables- one for employee, customers and products details, each.
Thus, to access any one of these tables, we need not handle
the other two tables.
The Second Normal Form
The objectives of the second normal form is to take data that is only partly dependent on the primary key and enter that data into another table. Let us take up the same example of Fig 1-2. Consider the table- Employee.
Here, the entire table has information about the personal details as well as the salary information. But, it is well understood that, to pay salary to an employee, the company does not actually need the employee’s personal details. Just his emp_id is sufficient. So, why not use just that? This is the second normal form. Same goes with Customers table. We can separate customer’s information from the order details.
See the figure below:
The Third Normal Form
The third normal form’s objective is to remove data in a
table that is not dependent on the primary key. See the same example of Fig 1-3. For the table named Emp_Pay, the position and position_desc
fields are not dependent on primary key (emp_id).
So, the better option is to move both these fields to another table.
See below:
See below:
Advantages
As we have already seen in the sections before,
normalization has many advantages. Let us list out a few ones:
Ø Greater overall database organization
Ø Reduction of redundant data
Ø Data consistency within the database
Ø A much more flexible database design
Ø A better handle on database security
Ø Enforces concept of referential integrity
Disadvantages
Although there are many advantages of normalization, there
are some disadvantages too. After all, the popular saying “All the coins have
two sides” still holds true. So, let’s go through the disadvantages too.
There is one substantial drawback of a normalized database:
reduced database performance.
The factors compromised include CPU usage, memory usage, and
input/output (I/O). To In other words, a normalized database requires much more
CPU, memory, and I/O to process transactions and database queries than does a
denormalized database.
Denormalizing a Database
Denormalization is the
process of taking a normalized database and modifying table structures to allow
controlled redundancy for increased database performance. A denormalized
database is not the same as a database that has not been normalized. The
purpose of denormalization is to get rid of the problems discussed in the
previous sections. Denormalization might
involve recombining separate tables or creating duplicate data within tables to
reduce the number of tables that need to be joined to retrieve the requested data,
which results in less I/O and CPU time.
This is normally advantageous in larger
data warehousing applications in which aggregate calculations are being made
across millions of rows of data within tables. There are costs to
denormalization, however. Data redundancy is increased in a denormalized
database, which can improve performance but requires more extraneous efforts to
keep track of related data.
So, now you might be confused whether to normalize your
database or not. The solution is to normalize the database upto a certain
extent. So that, redundancy is controlled to a great extent, without
compromising on other factors like CPU usage, memory usage, and input/output
(I/O).
-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-
Very good answer. Everyone can Easily understand the answer. thnx
ReplyDeleteMade my life simpler!
ReplyDeletePlease give another example? More fun, maybe?
Thankyou.
I easily understood normalisation with these examples thank you very much
ReplyDeletewhere is bcnf?????
ReplyDelete