The time to access data on a disk drive is relatively long compared to main memory. It takes approximately 105 times longer to access a disk than to access main memory. Because of the length of time it takes to access a disk drive the computer is often waiting for a disk to retrieve data before it can continue to process a request for information.
A page is the minimum amount of disk storage accessed at a time. It is usually around 1-4 kbytes. A record is the unit of data that a DBMS retrieves. Records can vary in size. A page may hold several small records, or a large record may be distributed over several pages.
The two types of delay that can occur prior to reading a record from a disk are rotational delay and access arm delay. Access arm delay is reduced by minimizing the movement of the read/write head. To minimize read/write head movement, data that are likely to be accessed at the same time should be stored on the same track or location, either on a single surface or on a cylinder. Rotational delay is reduced by rotating the disk faster. The database designer can only work to minimize the read/write head movement, the disk manufacturer sets the rotational speed of the disk.
Clustering records means to physically store those records that are frequently used together close together on a disk. Intrafile clustering applies to the clustering of records that are in a single file. Interfile clustering applies to clustering records together that are in different files.
The disk manager is part of the operating system and is concerned with the storage of pages. The file manager is one level above the disk manager and is concerned with the storage of files.
An index is a file that contains the values of a field in a file (the index field) and the address of the field's corresponding record in that file. Thus, an index file typically contains two fields, one that stores the value of the indexed field in the original file and one that contains a pointer to the matching record in the original file.
Indexing can improve the speed of data retrieval especially in situations where the records requested are a low percentage of the overall records contained in the database. It speeds up retrieval by reducing disk accesses. On the other hand, when the majority of records in the database meet the criteria of the query, an index may decrease the speed at which they are retrieved. Furthermore, when a new record is added to a file, two or more disk writes are necessary, since an entry must be added to both the file and its index. The tradeoff is between faster retrieval and slower updates.
CREATE INDEX natcodeindx ON nation (natcode);
CD-ROM storage would meet the needs of the company best. Once data are written to this media, it is not possible to alter them and thus has the highest legal standing of the different media forms. It is also suitable for storing large files and is a low cost alternative.
Teaching tip: Ask students how they would estimate the storage space required for storing the scanned items.RAID is the best solution for this situation. RAID level 3 is probably the best choice.
This is a situation where the company has a high volume data base but does not require extremely fast retrieval (1-2 minutes). Mass storage is ideally suited for such a case, where data are copied from mass storage to magnetic disk as required for analysis.
Speed and reliability are important characteristics. Level 1 RAID would be a good implementation choice since the database is relatively small (10 million customers and 100 bytes for each customer is 1 Gbytes), thus storage requirements and costs will be low. You may want to index the customer table by customer code as well as interfile cluster customer and subscription tables so that customer and subscription data are stored together. Because the subscription table is so small (126 magazines), this table could be held in memory to speed up access.
(1.5*10^6)*(5*10^6) = 7.5*10^12 (i.e., 7.5 Tbytes). There is also the need to consider storage for backup and recovery and thus you should probably triple this answer (i.e., 22.5 TBytes)
RAID 5 because of the high I/O rates and low volatility.
Lossy
The data deluge refers to high rate of growtn of data that organizations need to store. For some businesses, growth rates exceed 100% per year, and many others face rates in the 50% or so range. As a result, data managers have to carefully estimate their organizations' needs and have sufficient storage space available to meet future demand.
According to Wikipedia, Apple has more than 28 million songs in the iTunes store. What storage technology might be a good choice for this library?
The first step is to calculate the size of the library. Assuming 4Mb for a the average length of a song. The required space is 4*106*28*106 = 1.12*1012 bytes, or 100 Tb. You also have to consider backup and recovery capability. There will need to be redundancy to cover disk errors and so forth. A safe estimate is 300Tb. Apple probably has a proprietary solution, but if it does not then you might select RAID 5 because of its high I/O rates and the low volatility of music files. A file is loaded once and read many times. Rumor has it that Apple uses Hadoop as its files structure, which fits the need for high fault tolerance, rapid retrieval, and low cost.
This page is part of the promotional and support
material for Data Management (open edition) by Richard T. Watson For questions and comments please contact the author |