Crating a Database and Rules of Data Normalization
When creating a database the first question that must be asked is what is the desired output of the database. In other words what do we want to know. Once the output has been identified then the process requires brainstorming; the suggesting of ideas of what data is required for input and processing to achieve the output goals without any restriction on the data suggestions. Once all of the suggestions for data input have been completed then normalization begins. The process of identifying and elimination anomalies is called normalization. Using normalization, you start with a collection of relations (tables), apply sets of rules to eliminate anomalies, and produce a new collection of problem-free relations. The set of rules are called Normal Forms. A relation is in first normal form (1NF) if it does not contain repeating groups. Problem: For example we have been given the task of creating a database to track students in college, the classes they take, the fees that they pay, the extra curricular activities they engage in, financial aid received and the books they purchase. Brainstorming might come up with the following data required.
StudentSSN - StuId - StudentName - StudentAddress- StudentCity - StudentState - StudentZip - StudentDOB - StudentDOE - StudentClass1 to 30 - Studentclassgrade1 to 30 - StudentClassSectionNumber1 to 30- StudentClassDescription1 to 30 - Studentclassunits1 to 30 - ClassId1 to 100 - Studentclassinstructor1-100 - StudentClassbooksAuthors 1 to 100 - StudentCalssbookISBN 1 to 100 - StudentClassbooktitle1 to 100 - StudentFees 1 to 100, StudentBooks1 to 100,StudentActivities1 to 100. As you can see there are a number of repeating groups associated with activity on the part of the student.
1. Eliminate Repeating Groups. Make a separate table for each set of related attributes, and give each table a primary key. Looking over our brain storming the first thing we will do is organize the data of related attributes in tables. The tables will be students, classes, books, aid, fees, activities, instructors.
a. students take one to many classes
Entity Relationship Diagram
1) class has one to many books
2) Instructors teach one to many classes
b. students pay one to many fees
c. students obtain none to many aid packages
d. students engage in none to many activities
Student is the entity of our focus and would be referred to from the students perspective as the top container. StuID is the common field between the Student, Classes, Fees, Financial Aid, and Activities Tables. StudentID is the primary key in the Student table and the foreign key in the Classes, Fees, Financial Aid and Activities Tables. The tables would be joined on the StuID field with a one to many relationship.
TeacherId is the common field between the Teacher table and the
Classes table. TeacherId is the primary field in the Teacher table and a foreign
key in the Classes table. The join is a one to many.
ClassId is the Primary key in the classes table and the foreign key in the Books table. Since a class can have more than one book the join will be a one to many.
BookId is the primary key in the book table.
With the above joins we can extract data via queries from all seven tables at the same time. We are in first normal form because none of the relations have repeating groups.
Stars have orbiting planets (one to many). Planets have orbiting moons (1 to many). Planets contain continents (one to many). Continents contain nations (one to many). Nations have one capitol (one to one, entity sub-type). Nations contain states (one to many). States have one capitol (one to one, entity sub-type).
2. Eliminate Redundant Data - If an attribute depends on only part of a multi-valued key, remove it to a separate table.
3. Eliminate Columns not dependent on key.
4. Isolate Independent Multiple Relationships. No table may contain two or more 1:n or N:m relationships that are not directly related.
5. Isolate semantically related multiple relationships. There may be practical constraints on information that justify separating logically related many to many relationships.