This lab exercise illustrates the problem of handling a multivalued data item. One approach is to fool the system by defining a fixed number of separate fields with different names, when they are really repetitions of the same field. The result is a violation of first normal form. Normalization can improve the quality of a database design. The following table, UnPerson, is not normalized because it has multiple values for language for each person.
UnPerson | ||||
Personcode | FirstName | Lang1 | Lang2 | Lang3 |
1 | Rita | German | English | Dutch |
2 | Wei | Mandarin | English | |
3 | Alice | English | French | |
4 | Ned | English | Japanese | |
5 | Sophie | English |
To appreciate the problems of using an unnormalize database, write the following queries
The correct data model, which recognizes a m:m relationship between person and language, generally requires three tables. In this particular case, the Lang table is not necessary since there are no additional attributes of language to be stored. A separate Lang table would be necessary if you wanted to keep a master list of all languages, to control the set of valid values for language, or to store any languages which did not appear in the UnPerson table. Whether you decide to include the the Lang table is not a major issue since it takes little effort to create and maintain, and takes little storage space.
|
|
|
Copy the first two columns of UnPerson into Person.
insert into person (personcode, firstname) select personcode, firstname from unperson
Create a row for each pair of personcode and language in the unnormalized table (UnPerson). This requires three queries because there are three columns for language in UnPerson.
insert into personlang (personcode, language) select personcode, lang1 from unperson where lang1 is not null insert into personlang (personcode, language) select personcode, lang2 from unperson where lang2 is not null insert into personlang (personcode, language) select personcode, lang3 from unperson where lang3 is not null
Create a table containing the distinct values for language, assuming you want to maintain a master list of languages.
insert into language (language) select distinct language from personlang
1.
select firstname from unperson where lang1 = 'French' or lang2 = 'French' or lang3 = 'French' 2. select firstname from unperson where (lang1 = 'English' and lang2 is null and lang3 is null) or (lang1 is null and lang2 = 'English' and lang3 is null) or (lang1 is null and lang2 is null and lang3 = 'English')
3a.
The first approach is based on the observation that there are at most three languages for a person
select firstname from unperson where (lang1 is not null and lang2 is not null and lang3 is not null)
to produce a list of persons speaking three languages. If nothing is retrieved, then you would have to find those persons speaking two languages, and so on.
3b.
Here is a multistep solution that first computes the number of languages spoken by each person.
Add a column to the person table with the columname langcount
Alter table unperson add column langcount
Determine who speaks three languages
Update unperson set langcount = 3 where (lang1 is not null and lang2 is not null and lang3 is not null)
Determine those who speaks two languages
Update unperson set langcount = 2 where (lang1 is not null and lang2 is not null and lang3 is null) or (lang1 is not null and lang2 is null and lang3 is not null) or (lang1 is null and lang2 is not null and lang3 is not null)
Determine who speaks one language
Update unperson set langcount = 1 where (lang1 is not null and lang2 is null and lang3 is null) or (lang1 is null and lang2 is not null and lang3 is null) or (lang1 is null and lang2 is null and lang3 is not null)
Query the updated table
select firstname from unperson where langcount in (select max(langcount) from person)
4.
Alter table unperson add column lang4 Update unperson set lang4 = 'French' where firstname = 'Rita'
5.
First query the database to discover the languages spoken by Wei.
select firstname, lang1, lang2, lang3 from unperson where firstname = 'Wei'
Now, update the table. Note that the user has to remember that lang1 is the primary language.
update unperson set lang1 = 'English', lang2 = Mandarin where firstname = 'Wei' 6. insert into unperson (personcode, firstname, lang1, lang2) values (6,'George', 'English', 'French')
1.
select firstname from person, personlang where person.personcode = personlang.personcode and language = 'French'
2.
select firstname from person, personlang where person.personcode = personlang.personcode and language = 'English' and firstname in (select firstname from person, personlang where person.personcode = personlang.personcode group by firstname having count(*) = 1)
3.
Here is a multistep solution
a. Create a view called langscount
create view langscount (personcode, langcount) (select person.personcode, count(language) from person, personlang where person.personcode = personlang.personcode group by person.personcode)
b. Query the view
select firstname, langcount from person, langscount where person.personcode = langscount.personcode and langcount = (select max(langcount) from langscount)
4.
Assuming there is only one person with the name Rita
insert into personlang (personcode, language) select personcode, language from person, lang where firstname = 'Rita' and language = 'French'
5.
A 1:m relationship between lang and person can record the fact that a person has a single primary language and that a language can be a primary language for many people. The revised tables, showing the foreign key for the new relationship, are:
|
|
|
To change Wei's primary language to English
update person set primelang = 'English' where firstname = 'Wei'
6.
insert into person (personcode, firstname) values (6,'George') insert into personlang (personcode, language) values (6, 'French') insert into personlang (personcode, language) values (6, 'English')
This page is part of the promotional and support material for Data Management (open edition) by Richard T. Watson |