To test the performance of a database, first get some data. There are two alternatives to keying in a large volume of test data. First, existing files can be used as a source of test data and imported into appropriate tables. Second, test data can be generated for each table. Here, we discuss the use of Excel to generate test data. This exercise assumes considerable Excel experience.
The RAND function is very useful for generating random test data because it computes an evenly distributed random number r in the range 0 <= r < 1.
When combined with the ROUND function, RAND can be used to generate random integers. The syntax for round is ROUND(value, digits), where value is the number to round and digits specifies the number of decimal places of the rounding (i.e., ROUND(5.4,0) equals 5). To generate an integer random number between a and b, use: ROUND(RAND()*(b-a) + a, 0)
Excel stores a date as an integer, and days since some starting point are numbered sequentially. Thus to generate random dates in a year, you can add a random integer between 1 and 365 to the last day of the previous year. For example to generate a random date in 1997, enter the following in the designated cells:
A | |
1 | 31-DEC-1996 |
2 | =ROUND(RAND()*364+1,0) + A1 |
If A2 is not a date, you need to set the format of the cell (Format>Cells ...)
Imagine that a company has locations in three cities (Atlanta, New York, and Chicago) and sales are distributed, respectively, in the following proportions: .4, .3,.3. To randomly select city names to match this distribution do the following.
1 | 1 | Atlanta |
2 | 2 | Atlanta |
3 | 3 | Atlanta |
4 | 4 | Atlanta |
5 | 5 | New York |
6 | 6 | New York |
7 | 7 | New York |
8 | 8 | Chicago |
9 | 9 | Chicago |
10 | 10 | Chicago |
Sometimes, you may want to select two values from a list (e.g., the description of an item and its price). VLOOKUP can be used as before. Consider the case where a company sells 10 items and the following price list is established. Here we would name two arrays: ITEM A1:B10 and PRICE B1:C10
A | B | C | |
1 | 1 | Boots - snakeproof |
95 |
2 | 2 | Camel saddle |
598 |
3 | 3 | Compass |
8 |
4 | 4 | Elephant polo stick |
352 |
5 | 5 | Exploring in 10 Easy Lessons | 20 |
6 | 6 | Geo positioning system |
295 |
7 | 7 | Hammock |
55 |
8 | 8 | Hat - polar explorer |
34 |
9 | 9 | How to Win Foreign Friends |
26 |
10 | 10 | Map case |
15 |
To randomly select an item from ITEM, use:
VLOOKUP(ROUND(RAND()*9+1,0),ITEM,2)
Now, another lookup is required to match the selected item to its price. Assume the selected item (e.g., Hammock) is stored in cell X100, then the lookup command would be:
VLOOKUP(X100,PRICE,2)
This would look for the value of X100 (i.e., Hammock) in the first column of PRICE and return its price, which is stored in the second column.
Generate data for the following table:
Column | Values |
Date | In the range 1-Jan-98 to 31-Dec-98 |
Sales quantity | In the range 1 to 8 |
Location | Either Paris, London, or Singapore in the proportions .4,.4, .2 respectively |
Item | One of five possible items (all equally likely)
|
Price | Respective prices of the above items are:
|
Type of sale | Catalog or store in a 3:1 ratio |
It is a good idea to create two worksheets.
This page is part of the promotional and support material for Data Management (open edition) by Richard T. Watson |