Chapter 3 The Single Entity
I want to be alone.
Attributed to Greta Garbo
Learning Objectives
Students completing this chapter will be able to
model a single entity;
define a single database;
write queries for a single-table database.
The relational model
The relational model introduced by Codd in 1970 is the most popular technology for managing large collections of data. In this chapter, the major concepts of the relational model are introduced. Extensive coverage of the relational model is left until Chapter 8, by which time you will have sufficient practical experience to appreciate fully its usefulness, value, and elegance.
A relation, similar to the mathematical concept of a set, is a two-dimensional table arranged in rows and columns. This is a very familiar idea. You have been using tables for many years. A relational database is a collection of relations, where relation is a mathematical term for a table. One row of a table stores details of one observation, instance, or case of an item about which facts are retained—for example, one row for details of a particular student. All the rows in a table store data about the same type of item. Thus, a database might have one table for student data and another table for class data. Similarly, each column in the table contains the same type of data. For example, the first column might record a student’s identification number. A key database design question is to decide what to store in each table. What should the rows and columns contain?
In a relational database, each row must be uniquely identified. There must be a primary key, such as student identifier, so that a particular row can be designated. The use of unique identifiers is very common. Telephone numbers and e-mail addresses are examples of unique identifiers. Selection of the primary key, or unique identifier, is another key issue of database design.
💠 Global legal entity identifier (LEI)
There is no global standard for identifying legal entities across markets and jurisdictions. The need for such a standard was amplified by Lehman Brothers collapse in 2008. Lehman had 209 registered subsidiaries, legal entities, in 21 countries, and it was party to more than 900,000 derivatives contracts upon its collapse. Key stakeholders, such as financial regulators and Lehman’s creditors, were unable to assess their exposure. Furthermore, others were unable to assess the possible ripple on them of the effects of the collapse because of the transitive nature of many investments (i.e., A owes B, B owes C, and C owes D).
The adoption of a global legal entity identifier (LEI), should improve financial system regulation and corporate risk management. Regulators will find it easier to monitor and analyze threats to financial stability and risk managers will be more able evaluate their companies’ risks.
The tables in a relational database are connected or related by means of the data in the tables. You will learn, in the next chapter, that this connection is through a pair of values—a primary key and a foreign key. Consider a table of airlines serving a city. When examining this table, you may not recognize the code of an airline, so you then go to another table to find the name of the airline. For example, if you inspect the next table, you find that AM is an international airline serving Atlanta.
International airlines serving Atlanta
Airline |
---|
AM |
JL |
KX |
LM |
MA |
OS |
RG |
SN |
SR |
LH |
LY |
If you don’t know which airline has the abbreviation AM, then you need to look at the table of airline codes to discover that AeroMexico, with code AM, serves Atlanta. The two tables are related by airline code. Later, you will discover which is the primary key and which is the foreign key.
A partial list of airline codes
Code | Airline |
---|---|
AA | American Airlines |
AC | Air Canada |
AD | Lone Star Airlines |
AE | Mandarin Airlines |
AF | Air France |
AG | Interprovincial Airlines |
AI | Air India |
AM | AeroMexico |
AQ | Aloha Airlines |
When designing the relational model, Codd provided commands for processing multiple records at a time. His intention was to increase the productivity of programmers by moving beyond the record-at-a-time processing that is found in most programming languages. Consequently, the relational model supports set processing (multiple records-at-a-time), which is most frequently implemented as Structured Query Language (SQL).11
The relational model separates the logical design of a database from its physical storage. This notion of data independence simplifies data modeling and database programming. In this section, we focus on logical database design, and now that you have had a brief introduction to the relational model, you are ready to learn data modeling.
Getting started
As with most construction projects, building a relational database must be preceded by a design phase. Data modeling, our design technique, is a method for creating a plan or blueprint of a database. The data model must accurately mirror real-world relationships if it is to support processing business transactions and managerial decision making.
Rather than getting bogged down with a theory first, application later approach to database design and use, we will start with application. We will get back to theory when you have some experience in data modeling and database querying. After all, you did not learn to talk by first studying sentence formation; you just started by learning and using simple words. We start with the simplest data model, a single entity, and the simplest database, a single table, as follows.
Share code | Share name | Share price | Share quantity | Share dividend | PE ratio |
---|---|---|---|---|---|
FC | Freedonia Copper | 27.5 | 10,529 | 1.84 | 16 |
PT | Patagonian Tea | 55.25 | 12,635 | 2.50 | 10 |
AR | Abyssinian Ruby | 31.82 | 22,010 | 1.32 | 13 |
SLG | Sri Lankan Gold | 50.37 | 32,868 | 2.68 | 16 |
ILZ | Indian Lead & Zinc | 37.75 | 6,390 | 3.00 | 12 |
BE | Burmese Elephant | 0.07 | 154,713 | 0.01 | 3 |
BS | Bolivian Sheep | 12.75 | 231,678 | 1.78 | 11 |
NG | Nigerian Geese | 35.00 | 12,323 | 1.68 | 10 |
CS | Canadian Sugar | 52.78 | 4,716 | 2.50 | 15 |
ROF | Royal Ostrich Farms | 33.75 | 1,234,923 | 3.00 | 6 |
Modeling a single-entity database
The simplest database contains information about one entity, which is some real-world thing. Some entities are physical—CUSTOMER, ORDER, and STUDENT; others are abstract or conceptual—WORK ASSIGNMENT, and AUTHORSHIP. We represent an entity by a rectangle: the following figure shows a representation of the entity SHARE. The name of the entity is shown in singular form in uppercase in the top part of the rectangle.
The entity SHARE
An entity has characteristics or attributes. An attribute is a discrete element of data; it is not usually broken down into smaller components. Attributes are describe the data we want to store. Some attributes of the entity SHARE are share code, share name, share price, share quantity (number owned), share dividend, and PE ratio (price-to-earnings ratio).12 Attributes are shown below the entity’s name. Notice that we refer to share price, rather than price, to avoid confusion if there should be another entity with an attribute called price. Attribute names must be carefully selected so that they are self-explanatory and unique. For example, share dividend is easily recognized as belonging to the entity SHARE.
The entity SHARE and its attributes
An instance is a particular occurrence of an entity (e.g., facts about Freedonia Copper). To avoid confusion, each instance of an entity needs to be uniquely identified. Consider the case of customer billing. In most cases, a request to bill Smith $100 cannot be accurately processed because a firm might have more than one Smith in its customer file. If a firm has carefully controlled procedures for ensuring that each customer has a unique means of identification, then a request to bill customer number 1789 $100 can be accurately processed. An attribute or collection of attributes that uniquely identifies an instance of an entity is called an identifier. The identifier for the entity SHARE is share code, a unique identifier assigned by the stock exchange to a firm issuing shares.
There may be several attributes, or combinations of attributes, that are feasible identifiers for an instance of an entity. Attributes that are identifiers are prefixed by an asterisk. The following figure shows an example of a representation of an entity, its attributes, and identifier.
The entity SHARE, its attributes, and identifier
Briefly, entities are things in the environment about which we wish to store information. Attributes describe an entity. An entity must have a unique identifier.
The modeling language used in this text is designed to record the essential details of a data model. The number of modeling symbols to learn is small, and they preserve all the fundamental concepts of data modeling. Since data modeling often occurs in a variety of settings, the symbols used have been selected so that they can be quickly drawn using pencil-and-paper, whiteboard, or a general-purpose drawing program. This also means that models can be quickly revised as parts can be readily erased and redrawn.
The symbols are distinct and visual clutter is minimized because only the essential information is recorded. This also makes the language easy for clients to learn so they can read and amend models.
Models can be rapidly translated to a set of tables for a relational database. More importantly, since this text implements the fundamental notions of all data modeling languages, you can quickly convert to another data modeling dialect. Data modeling is a high-level skill, and the emphasis needs to be on learning to think like a data modeler rather than on learning a modeling language. This text’s goal is to get you off to a fast start.
❓ Skill builder
A ship has a name, registration code, gross tonnage, and a year of >construction. Ships are classified as cargo or passenger. Draw a data model >for a ship.
Creating a single-table database
The next stage is to translate the data model into a relational database. The translation rules are very direct:
Each entity becomes a table.
The entity name becomes the table name.
Each attribute becomes a column.
The identifier becomes the primary key.
The American National Standards Institute’s (ANSI) recommended language for relational database definition and manipulation is SQL, which is both a data definition language (DDL) (to define a database), a data manipulation language (DML) (to query and maintain a database), and a data control language (DCL) (to control access). SQL is a common standard for describing and querying databases and is available with many commercial relational database products, including DB2, Oracle, and Microsoft SQL Server, and open source products such as MySQL and PostgreSQL.
In this book, MySQL is the relational database for teaching SQL. Because SQL is a standard, it does not matter which implementation of the relational model you use as the SQL language is common across both the proprietary and open variants.13 SQL uses the CREATE14 statement to define a table. It is not a particularly friendly command, and most products have friendlier interfaces for defining tables. However, it is important to learn the standard, because this is the command that SQL understands. Also, a table definition interface will generate a CREATE statement for execution by SQL. Your interface interactions ultimately translate into a standard SQL command.
It is common practice to abbreviate attribute names, as is done in the following example.
Defining a table
The CREATE command to establish a table called share
is as follows:
CREATE TABLE share (
shrcode CHAR(3),
shrfirm VARCHAR(20) NOT NULL,
shrprice DECIMAL(6,2),
shrqty DECIMAL(8),
shrdiv DECIMAL(5,2),
shrpe DECIMAL(2),
PRIMARY KEY (shrcode));
The first line of the command names the table; subsequent lines describe each of its columns. The first component is the name of the column (e.g., shrcode
). The second component is the data type (e.g., CHAR), and its length is shown in parentheses. shrfirm
is a variable-length character field of length 20, which means it can store up to 20 characters, including spaces. The column shrdiv
stores a decimal number that can be as large as 999.99 because its total length is 5 digits and there are 2 digits to the right of the decimal point. Some examples of allowable data types are shown in the following table. The third component (e.g., NOT NULL), which is optional, indicates any instance that cannot have null values. A column might have a null value when it is either unknown or not applicable. In the case of the share table, we specify that shrfirm
must be defined for each instance in the database.
The final line of the CREATE statement defines shrcode
as the primary key, the unique identifier for SHARE When a primary key is defined, the relational database management system (RDBMS) will enforce the requirement that the primary key is unique and not null. Before any row is added to the table SHARE, the RDBMS will check that the value of shrcode
is not null and that there does not already exist a duplicate value for shrcode
in an existing row of share
. If either of these constraints is violated, the RDBMS will not permit the new row to be inserted. This constraint, the entity integrity rule, ensures that every row has a unique, non-null primary key. Allowing the primary key to take a null value would mean there could be a row of share
that is not uniquely identified. Note that an SQL statement is terminated by a semicolon, though this is not always enforced.
SQL statements can be written in any mix of valid upper and lowercase characters. To make it easier for you to learn the syntax, this book adopts the following conventions:
SQL keywords are in uppercase.
Table and column names are in lowercase.
There are more elaborate layout styles, but we will bypass those because it is more important at this stage to learn SQL. You should lay out your SQL statements so that they are easily read by you and others.
The following table shows some of the data types supported by most relational databases. Other implementations of the relational model may support some of these data types and additional ones. It is a good idea to review the available data types in your RDBMS before defining your first table.
Some allowable data types
Category | Data type | Description |
---|---|---|
Numeric | SMALLINT | A 15-bit signed binary value |
INTEGER | A 31-bit signed binary value | |
FLOAT(p) | A scientific format number of p binary digits precision | |
DECIMAL(p,q) | A packed decimal number of p digits total length; q decimal spaces to the right of the decimal point may be specified | |
String | CHAR(n) | A fixed-length string of n characters |
VARCHAR(n) | A variable-length string of up to n characters | |
text | A variable-length string of up to 65,535 characters | |
Date/time | DATE | Date in the form yyyymmdd |
TIME | Time in the form hhmmss | |
timesTAMP | A combination of date and time to the nearest microsecond | |
time with time zone | Same as time, with the addition of an offset from universal time coordinated (UTC) of the specified time | |
timestamp with time zone | Same as timestamp, with the addition of an offset from UTC of the specified time | |
Logical | Boolean | A set of truth values: TRUE, FALSE, or UNKNOWN |
The CHAR and VARCHAR data types are similar but differ in the way character strings are stored and retrieved. Both can be up to 255 characters long. The length of a CHAR column is fixed to the declared length. When values are stored, they are right-padded with spaces to the specified length. When CHAR values are retrieved, trailing spaces are removed. VARCHAR columns store variable-length strings and use only as many characters as are needed to store the string. Values are not padded; instead, trailing spaces are removed. In this book, we use VARCHAR to define most character strings, unless they are short (less than five characters is a good rule-of-thumb).
Data modeling with MySQL Workbench
MySQL Workbench is a professional quality, open source, cross-platform tool for data modeling and SQL querying. In this text, you will also learn some of the features of Workbench that support data modeling and using SQL. You will find it helpful to complete the tutorial on creating a data model prior to further reading, as we will assume you have such proficiency.
The entity share created with MySQL Workbench
You will notice some differences from the data model we have created previously. Workbench automatically generates the SQL code to create the table, so when modeling you establish the names you want for tables and columns. A gold key symbol is used to indicate the identifier, which becomes the primary key. An open diamond indicates that a column can be null, whereas a closed blue diamond indicate the column must have a value, as with shrfirm
in this case.
When specifying columns in Workbench you must also indicate the datatype. We opt to turn off the display of a column’s datatype15 in a model to maintain focus on the entity.
A major advantage of using a tool such as Workbench is that it will automatically generate the CREATE statement code (Database > Forward Engineer …) and execute the code to create the database. The Workbench tutorial will have shown you how to do this, and you should try this out for yourself by creating a database with the single share
table.
Inserting rows into a table
The rows of a table store instances of an entity. A particular shareholding (e.g., Freedonia Copper) is an example of an instance of the entity share
. The SQL statement INSERT is used to add rows to a table. Although most implementations of the relational model use an application for row insertion, if authorized, you might also use a RDBMS interface spreadsheet or import a structured text file, such as Comma Separated Value (CSV) file. The INSERT command is defined for completeness.
The following command adds one row to the table share
:
INSERT INTO share
(shrcode,shrfirm,shrprice,shrqty,shrdiv,shrpe)
VALUES ('FC','Freedonia Copper',27.5,10529,1.84,16);
There is a one-to-one correspondence between a column name in the first set of parentheses and a value in the second set of parentheses. That is, shrcode
has the value “FC”, shrfirm
the value “Freedonia Copper”, and so on. Notice that the value of a column that stores a character string (e.g., shrfirm
) is contained within straight quotes.
The list of field names can be omitted when values are inserted in all columns of the table in the same order as that specified in the CREATE statement, so the preceding expression could be written
The data for the share
table will be used in subsequent examples. If you have ready access to a relational database, it is a good idea to now create a table and enter the data. Then you will be able to use these data to practice querying the table.
Data for share
*Code | Name | Price | Quantity | Dividend | PE |
---|---|---|---|---|---|
FC | Freedonia Copper | 27.5 | 10,529 | 1.84 | 16 |
PT | Patagonian Tea | 55.25 | 12,635 | 2.50 | 10 |
AR | Abyssinian Ruby | 31.82 | 22,010 | 1.32 | 13 |
SLG | Sri Lankan Gold | 50.37 | 32,868 | 2.68 | 16 |
ILZ | Indian Lead & Zinc | 37.75 | 6,390 | 3.00 | 12 |
BE | Burmese Elephant | 0.07 | 154,713 | 0.01 | 3 |
BS | Bolivian Sheep | 12.75 | 231,678 | 1.78 | 11 |
NG | Nigerian Geese | 35.00 | 12,323 | 1.68 | 10 |
CS | Canadian Sugar | 52.78 | 4,716 | 2.50 | 15 |
ROF | Royal Ostrich Farms | 33.75 | 1,234,923 | 3.00 | 6 |
Notice that shrcode
,the primary key, is asterisked in the preceding table. This is a convention we will use, as in data modeling, to denote the primary key. In the relational model, an identifier becomes a primary key, a column that guarantees that each row of the table can be uniquely addressed.
MySQL Workbench offers a spreadsheet interface for entering data, as explained in the tutorial.
Inserting rows with MySQL Workbench
❓ Skill builder
Create a relational database for the ship entity you modeled previously. >Insert some rows.
Querying a single-table database
The objective of developing a database is to make it easier to use the stored data to solve problems. Typically, a manager raises a question (e.g., How many shares have a PE ratio greater than 12?). A question or request for information, usually called a query, is then translated into a specific data manipulation or query language. The most widely used query language for relational databases is SQL. After the query has been executed, the resulting data are displayed. In the case of a relational database, the answer to a query is always a table.
There is also a query language called relational algebra, which describes a set of operations on tables. Sometimes it is useful to think of queries in terms of these operations. Where appropriate, we will introduce the corresponding relational algebra operation.
Generally we use a four-phase format for describing queries:
A brief explanation of the query’s purpose
The query in italics, prefixed by •, and some phrasing you might expect from a manager
The SQL version of the query
The results of the query.
Displaying an entire table
All the data in a table can be displayed using the SELECT statement. In SQL, the all part is indicated by an asterisk (*).
List all data in the share table.
shrcode | shrfirm | shrprice | shrqty | shrdiv | shrpe |
---|---|---|---|---|---|
AR | Abyssinian Ruby | 31.82 | 22010 | 1.32 | 13 |
BE | Burmese Elephant | 0.07 | 154713 | 0.01 | 3 |
BS | Bolivian Sheep | 12.75 | 231678 | 1.78 | 11 |
CS | Canadian Sugar | 52.78 | 4716 | 2.50 | 15 |
FC | Freedonia Copper | 27.50 | 10529 | 1.84 | 16 |
ILZ | Indian Lead & Zinc | 37.75 | 6390 | 3.00 | 12 |
NG | Nigerian Geese | 35.00 | 12323 | 1.68 | 10 |
PT | Patagonian Tea | 55.25 | 12635 | 2.50 | 10 |
ROF | Royal Ostrich Farms | 33.75 | 1234923 | 3.00 | 6 |
SLG | Sri Lankan Gold | 50.37 | 32868 | 2.68 | 16 |
Project—choosing columns
The relational algebra operation project creates a new table from the columns of an existing table. Project takes a vertical slice through a table by selecting all the values in specified columns. The projection of share
on columns shrfirm
and shrpe
produces a new table with 10 rows and 2 columns. The SQL syntax for the project operation simply lists the columns to be displayed.
Report a firm’s name and price-earnings ratio.
shrfirm | shrpe |
---|---|
Abyssinian Ruby | 13 |
Burmese Elephant | 3 |
Bolivian Sheep | 11 |
Canadian Sugar | 15 |
Freedonia Copper | 16 |
Indian Lead & Zinc | 12 |
Nigerian Geese | 10 |
Patagonian Tea | 10 |
Royal Ostrich Farms | 6 |
Sri Lankan Gold | 16 |
Restrict—choosing rows
The relational algebra operation restrict creates a new table from the rows of an existing table. The operation restricts the new table to those rows that satisfy a specified condition. Restrict takes all columns of an existing table but only those rows that meet the specified condition. The restriction of share
to those rows where the PE ratio is less than 12 will give a new table with five rows and six columns.
Restrict is implemented in SQL using the WHERE clause to specify the condition on which rows are restricted.
Get all firms with a price-earnings ratio less than 12.
shrcode | shrfirm | shrprice | shrqty | shrdiv | shrpe |
---|---|---|---|---|---|
BE | Burmese Elephant | 0.07 | 154713 | 0.01 | 3 |
BS | Bolivian Sheep | 12.75 | 231678 | 1.78 | 11 |
NG | Nigerian Geese | 35.00 | 12323 | 1.68 | 10 |
PT | Patagonian Tea | 55.25 | 12635 | 2.50 | 10 |
ROF | Royal Ostrich Farms | 33.75 | 1234923 | 3.00 | 6 |
In this example, we have a less than condition for the WHERE clause. All permissible comparison operators are listed below.
Operator | Meaning |
---|---|
= | Equal to |
< | Less than |
<= | Less than orequal to |
> | Greater than |
>= | Greater than or equal to |
<> | Not equal to |
In addition to the comparison operators, the BETWEEN construct is available.
The expression a BETWEEN x AND y
is equivalent to a >= x AND a <= y
.
Combining project and restrict—choosing rows and columns
SQL permits project and restrict to be combined. A single SQL SELECT statement can specify which columns to project and which rows to restrict.
List the name, price, quantity, and dividend of each firm where the share holding is at least 100,000.
shrfirm | shrprice | shrqty | shrdiv |
---|---|---|---|
Burmese Elephant | 0.07 | 154713 | 0.01 |
Bolivian Sheep | 12.75 | 231678 | 1.78 |
Royal Ostrich Farms | 33.75 | 1234923 | 3.00 |
More about WHERE
The WHERE clause can contain several conditions linked by AND or OR. A clause containing AND means all specified conditions must be true for a row to be selected. In the case of OR, at least one of the conditions must be true for a row to be selected.
Find all firms where the PE is 12 or higher and the share holding is less than 10,000.
shrcode | shrfirm | shrprice | shrqty | shrdiv | shrpe |
---|---|---|---|---|---|
CS | Canadian Sugar | 52.78 | 4716 | 2.5 | 15 |
ILZ | Indian Lead & Zinc | 37.75 | 6390 | 3.0 | 12 |
The power of the primary key
The purpose the primary key is to guarantee that any row in a table can be uniquely addressed. In this example, we use shrcode
to return a single row because shrcode
is unique for each instance of share
. The sought code (AR) must be specified in quotes because shrcode
was defined as a character string when the table was created.
Report firms whose code is AR.
shrcode | shrfirm | shrprice | shrqty | shrdiv | shrpe |
---|---|---|---|---|---|
AR | Abyssinian Ruby | 31.82 | 22010 | 1.32 | 13 |
A query based on a non-primary-key column cannot guarantee that a single row is accessed, as the following illustrates.
Report firms with a dividend of 2.50.
shrcode | shrfirm | shrprice | shrqty | shrdiv | shrpe |
---|---|---|---|---|---|
CS | Canadian Sugar | 52.78 | 4716 | 2.5 | 15 |
PT | Patagonian Tea | 55.25 | 12635 | 2.5 | 10 |
💠 Mis-identification mess
Jieun Kim of Los Angeles and Jieun Kim of Chicagoland were mistakenly issued the same Social Security number when they emigrated to the US. They have the same name and were born on the same day in South Korea. As a result, their banking and savings accounts have been shut down and their credit cards blocked. They have suspected of engaging in identity theft
The IN crowd
The keyword IN is used with a list to specify a set of values. IN is always paired with a column name. All rows for which a value in the specified column has a match in the list are selected. It is a simpler way of writing a series of OR statements.
Report data on firms with codes of FC, AR, or SLG.
The foregoing query could have also been written as
shrcode | shrfirm | shrprice | shrqty | shrdiv | shrpe |
---|---|---|---|---|---|
AR | Abyssinian Ruby | 31.82 | 22010 | 1.32 | 13 |
FC | Freedonia Copper | 27.50 | 10529 | 1.84 | 16 |
SLG | Sri Lankan Gold | 50.37 | 32868 | 2.68 | 16 |
The NOT IN crowd
A NOT IN list is used to report instances that do not match any of the values.
Report all firms other than those with the code CS or PT.
is equivalent to
shrcode | shrfirm | shrprice | shrqty | shrdiv | shrpe |
---|---|---|---|---|---|
AR | Abyssinian Ruby | 31.82 | 22010 | 1.32 | 13 |
BE | Burmese Elephant | 0.07 | 154713 | 0.01 | 3 |
BS | Bolivian Sheep | 12.75 | 231678 | 1.78 | 11 |
FC | Freedonia Copper | 27.50 | 10529 | 1.84 | 16 |
ILZ | Indian Lead & Zinc | 37.75 | 6390 | 3.00 | 12 |
NG | Nigerian Geese | 35.00 | 12323 | 1.68 | 10 |
ROF | Royal Ostrich Farms | 33.75 | 1234923 | 3.00 | 6 |
SLG | Sri Lankan Gold | 50.37 | 32868 | 2.68 | 16 |
❓ Skill Builder
List those shares where the value of the holding exceeds one million.
Ordering columns
The order of reporting columns is identical to their order in the SQL command. For instance, compare the output of the following queries.
shrcode | shrfirm |
---|---|
NG | Nigerian Geese |
PT | Patagonian Tea |
shrfirm | shrcode |
---|---|
Nigerian Geese | NG |
Patagonian Tea | PT |
Ordering rows
People can generally process an ordered (e.g., sorted alphabetically) report faster than an unordered one. In SQL, the ORDER BY clause specifies the row order in a report. The default ordering sequence is ascending (A before B, 1 before 2). Descending is specified by adding DESC after the column name.
List all firms where PE is at least 10, and order the report in descending PE. Where PE ratios are identical, list firms in alphabetical order.
shrcode | shrfirm | shrprice | shrqty | shrdiv | shrpe |
---|---|---|---|---|---|
FC | Freedonia Copper | 27.50 | 10529 | 1.84 | 16 |
SLG | Sri Lankan Gold | 50.37 | 32868 | 2.68 | 16 |
CS | Canadian Sugar | 52.78 | 4716 | 2.50 | 15 |
AR | Abyssinian Ruby | 31.82 | 22010 | 1.32 | 13 |
ILZ | Indian Lead & Zinc | 37.75 | 6390 | 3.00 | 12 |
BS | Bolivian Sheep | 12.75 | 231678 | 1.78 | 11 |
NG | Nigerian Geese | 35.00 | 12323 | 1.68 | 10 |
PT | Patagonian Tea | 55.25 | 12635 | 2.50 | 10 |
Numeric versus character sorting
Numeric data in character fields (e.g., a product code) do not always sort the way you initially expect. The difference arises from the way data are stored:
Numeric fields are right justified and have leading zeros.
Character fields are left justified and have trailing spaces.
For example, the value 1066 stored as CHAR(4) would be stored as ‘1066’ and the value 45 would be stored as ‘45’. If the column containing these data is sorted in ascending order, then ‘1066’ precedes ‘45’ because the leftmost character ‘1’ is less than ‘4’. You can avoid this problem by always storing numeric values as numeric data types (e.g., integer or decimal) or preceding numeric values with zeros when they are stored as character data. Alternatively, start numbering at 1,000 so that all values are four digits, though the best solution is to store numeric data as numbers rather than characters.
Derived data
One of the important principles of database design is to avoid redundancy. One form of redundancy is including a column in a table when these data can be derived from other columns. For example, we do not need a column for yield because it can be calculated by dividing dividend by price and multiplying by 100 to obtain the yield as a percentage. This means that the query language does the calculation when the value is required.
Get firm name, price, quantity, and firm yield.
shrfirm | shrprice | shrqty | yield |
---|---|---|---|
Abyssinian Ruby | 31.82 | 22010 | 4.148334 |
Burmese Elephant | 0.07 | 154713 | 14.285714 |
Bolivian Sheep | 12.75 | 231678 | 13.960784 |
Canadian Sugar | 52.78 | 4716 | 4.736643 |
Freedonia Copper | 27.50 | 10529 | 6.690909 |
Indian Lead & Zinc | 37.75 | 6390 | 7.947020 |
Nigerian Geese | 35.00 | 12323 | 4.800000 |
Patagonian Tea | 55.25 | 12635 | 4.524887 |
Royal Ostrich Farms | 33.75 | 1234923 | 8.888889 |
Sri Lankan Gold | 50.37 | 32868 | 5.320627 |
You can give the results of the calculation a column name. In this case, a good choice is yield
. Note the use of AS to indicate the name of the column in which the results of the calculation are displayed.
In the preceding query, the keyword AS is introduced to specify an alias, or temporary name. The statement specifies that the result of the calculation is to be reported under the column heading yield
. You can rename any column or specify a name for the results of an expression using an alias.
Aggregate functions
SQL has built-in functions to enhance its retrieval power and handle many common aggregation queries, such as computing the total value of a column. Four of these functions (AVG, SUM, MIN, and MAX) work very similarly. COUNT is a little different.
COUNT
COUNT computes the number of rows in a table. Use COUNT(*) to count all rows irrespective of their content (i.e., null or not null), and use COUNT(columnname
) to count rows without a null value for columname
. Count can be used with a WHERE clause to specify a condition.
How many firms are there in the portfolio?
investments |
---|
10 |
How many firms have a holding greater than 50,000?
bigholdings |
---|
3 |
Subqueries
Sometimes we need the answer to another query before we can write the query of ultimate interest. For example, to list all shares with a PE ratio greater than the portfolio average, you first must find the average PE ratio for the portfolio. You could do the query in two stages:
and
where x is the value returned from the first query.
Unfortunately, the two-stage method introduces the possibility of errors. You might forget the value returned by the first query or enter it incorrectly. It also takes longer to get the results of the query. We can solve these problems by using parentheses to indicate the first query is nested within the second one. As a result, the value returned by the inner or nested subquery, the one in parentheses, is used in the outer query. In the following example, the nested query returns 11.20, which is then automatically substituted in the outer query.
Report all firms with a PE ratio greater than the average for the portfolio.
shrfirm | shrpe |
---|---|
Abyssinian Ruby | 13 |
Canadian Sugar | 15 |
Freedonia Copper | 16 |
Indian Lead & Zinc | 12 |
Sri Lankan Gold | 16 |
⚠️ The preceding query is often mistakenly written as SELECT shrfirm, shrpe from share WHERE shrpe > avg(shrpe); You need to use a subquery to find the average, so the computed value can be used in the outer query
❓ Skill builder
Find the name of the firm for which the value of the holding is greatest.
Regular expression—pattern matching
Regular expression is a concise and powerful method for searching for a specified pattern in a nominated column. Regular expression processing is supported by languages such as Java, R, and PHP. In this chapter, we introduce a few typical regular expressions and will continue developing your knowledge of this feature in the next chapter.
Search for a string
List all firms containing ‘Ruby’ in their name.
shrfirm |
---|
Abyssinian Ruby |
Search for alternative strings
In some situations you want to find columns that contain more than one string. In this case, we use the alternation symbol ‘|’ to indicate the alternatives being sought. For example, a|b finds ‘a’ or ‘b’.
List firms containing gold or zinc in their name, irrespective of case.
There can be variations on the spelling of zinc in a company name, such as Zinc and ZINC. To cater for all variations, convert a firm’s name to lowercase with the built-in MySQL function LOWER().16
shrfirm |
---|
Indian Lead & Zinc |
Sri Lankan Gold |
Search for a beginning string
If you are interested in finding a value in a column that starts with a particular character string, then use ^ to indicate this option.
List the firms whose name begins exactly with Sri.
shrfirm |
---|
Sri Lankan Gold |
Search for an ending string
If you are interested in finding if a value in a column ends with a particular character string, then use $ to indicate this option.
List the firms whose name ends with any form of geese.
shrfirm |
---|
Nigerian Geese |
❓ Skill builder
List the firms containing “ian” in their name.
DISTINCT—eliminating duplicate rows
The DISTINCT clause is used to eliminate duplicate rows. It can be used with column functions or before a column name. When used with a column function, it ignores duplicate values.
Report the different values of the PE ratio.
shrpe |
---|
13 |
3 |
11 |
15 |
16 |
12 |
10 |
6 |
Find the number of different PE ratios.
Different PEs |
---|
8 |
When used before a column name, DISTINCT prevents the selection of duplicate rows. Notice a slightly different use of the keyword AS. In this case, because the alias includes a space, the entire alias is enclosed in straight quotes.
DELETE
Rows in a table can be deleted using the DELETE clause in an SQL statement. DELETE is typically used with a WHERE clause to specify the rows to be deleted. If there is no WHERE clause, all rows are deleted.
Erase the data for Burmese Elephant. All the shares have been sold.
In the preceding statement, shrfirm
is used to indicate the row to be deleted.
UPDATE
Rows can be modified using SQL’s UPDATE clause, which is used with a WHERE clause to specify the rows to be updated.
Change the share price of FC to 31.50.
Increase the total number of shares for Nigerian Geese by 10% because of the recent bonus issue.
Quotes
There are three types of quotes that you can typically use with SQL. Double and single quotes are equivalent and can be used interchangeably. Note that single and double quotes must be straight rather than curly, and the back quote is to the left of the 1 key.
Type of quote | Representation |
---|---|
Single | ’ |
Double | “ |
Back | ` |
The following SQL illustrates the use of three types of quotes to find a person with a last name of O’Hara and where the column names are person first
and person last
.
person first |
---|
Sheila |
Debriefing
Now that you have learned how to model a single entity, create a table, and specify queries, you are on the way to mastering the fundamental skills of database design, implementation, and use. Remember, planning occurs before action. A data model is a plan for a database. The action side of a database is inserting rows and running queries.
Summary
The relational database model is an effective means of representing real-world relationships. Data modeling is used to determine what data must be stored and how data are related. An entity is something in the environment. An entity has attributes, which describe it, and an identifier, which uniquely distinguishes an instance of an entity. Every entity must have a unique identifier. A relational database consists of tables with rows and columns. A data model is readily translated into a relational database. The SQL statement CREATE is used to define a table. Rows are added to a table using INSERT. In SQL, queries are written using the SELECT statement. Project (choosing columns) and restrict (choosing rows) are common table operations. The WHERE clause is used to specify row selection criteria. WHERE can be combined with IN and NOT IN, which specify values for a single column. The rows of a report are sorted using the ORDER BY clause. Arithmetic expressions can appear in SQL statements, and SQL has built-in functions for common arithmetic operations. A subquery is a query within a query. Regular expressions are used to find string patterns within character strings. Duplicate rows are eliminated with the DISTINCT clause. Rows can be erased using DELETE or modified with UPDATE.
Exercises17
Draw data models for the following entities. In each case, make certain that you show the attributes and identifiers. Also, create a relational database and insert some rows for each of the models.
Aircraft: An aircraft has a manufacturer, model number, call sign (e.g., N123D), payload, and a year of construction. Aircraft are classified as civilian or military.
Car: A car has a manufacturer, range name, and style code (e.g., a Honda Accord DX, where Honda is the manufacturer, Accord is the range, and DX is the style). A car also has a vehicle identification code, registration code, and color.
Restaurant: A restaurant has an address, seating capacity, phone number, and style of food (e.g., French, Russian, Chinese).
Cow: A dairy cow has a name, date of birth, breed (e.g., Holstein), and a numbered plastic ear tag.
Do the following queries using SQL:
List a share’s name and its code.
List full details for all shares with a price less than $1.
List the names and prices of all shares with a price of at least $10.
Create a report showing firm name, share price, share holding, and total value of shares held. (Value of shares held is price times quantity.)
List the names of all shares with a yield exceeding 5 percent.
Report the total dividend payment of Patagonian Tea. (The total dividend payment is dividend times quantity.)
Find all shares where the price is less than 20 times the dividend.
Find the share(s) with the minimum yield.
Find the total value of all shares with a PE ratio > 10.
Find the share(s) with the maximum total dividend payment.
Find the value of the holdings in Abyssinian Ruby and Sri Lankan Gold.
Find the yield of all firms except Bolivian Sheep and Canadian Sugar.
Find the total value of the portfolio.
List firm name and value in descending order of value.
List shares with a firm name containing “Gold.”
Find shares with a code starting with “B.”
Run the following queries and explain the differences in output. Write each query as a manager might state it.
SELECT shrfirm FROM share WHERE shrfirm NOT REGEXP ‘s’;
SELECT shrfirm FROM share WHERE shrfirm NOT REGEXP ‘S’;
SELECT shrfirm FROM share WHERE shrfirm NOT REGEXP ‘s|S’;
SELECT shrfirm FROM share WHERE shrfirm NOT REGEXP ‘^S’;
SELECT shrfirm FROM share WHERE shrfirm NOT REGEXP ‘s$’;
A weekly newspaper, sold at supermarket checkouts, frequently reports stories of aliens visiting Earth and taking humans on short trips. Sometimes a captured human sees Elvis commanding the spaceship. To keep track of all these reports, the newspaper has created the following data model.
The paper has also supplied some data for the last few sightings and asked you to create the database, add details of these aliens, and answer the following queries:
a. What's the average number of heads of an alien?
a. Which alien has the most heads?
b. Are there any aliens with a double o in their names?
c. How many aliens are chartreuse?
d. Report details of all aliens sorted by smell and color.
Eduardo, a bibliophile, has a collection of several hundred books. Being a little disorganized, he has his books scattered around his den. They are piled on the floor, some are in bookcases, and others sit on top of his desk. Because he has so many books, he finds it difficult to remember what he has, and sometimes he cannot find the book he wants. Eduardo has a simple personal computer file system that is fine for a single entity or file. He has decided that he would like to list each book by author(s)’ name and type of book (e.g., literature, travel, reference). Draw a data model for this problem, create a single entity table, and write some SQL queries.
How do you identify each instance of a book? (It might help to look at a few books.)
How should Eduardo physically organize his books to permit fast retrieval of a particular one?
Are there any shortcomings with the data model you have created?
What is an identifier? Why does a data model have an identifier?
What are entities?
What is the entity integrity rule?
Officially pronounced as “S-Q-L,” but often also pronounced as “sequel”.↩︎
Attributes are shown in italics.↩︎
Now would be a good time to install the MySQL Community server on your computer, unless your instructor has set up a class server.↩︎
SQL keywords are shown in uppercase.↩︎
Preferences > Diagram > Show Column Types↩︎
For a full list of MySQL functions and operators, see https://dev.mysql.com/doc/refman/8.0/en/string-functions.html.↩︎
Also see the more than 80 SQL exercises for the ClassicModels database {https://www.richardtwatson.com/open/Reader/ClassicModels.html} [https://www.richardtwatson.com/open/Reader/ClassicModels.html] that gradually increase in difficulty and demonstrate useful business applications.↩︎