Chapter 1 Managing Data
All the value of this company is in its people. If you burned down all our plants, and we just kept our people and our information files, we should soon be as strong as ever.
Thomas Watson, Jr., former chairman of IBM5
Learning objectives
Students completing this chapter will
understand the key concepts of data management;
recognize that there are many components of an organization’s memory;
understand the problems with existing data management systems;
realize that successful data management requires an integrated understanding of organizational behavior and information technology.
Introduction
Imagine what would happen to a bank that forgot who owed it money or a digital newspaper that lost the account details of its subscribers. Both would soon be in serious difficulty, if not out of business. Organizations have data management systems to record the myriad of details necessary for transacting business and making informed decisions. Since the birth of agriculture, societies and organizations have recorded data. The system may be as simple as carving a notch in a stick to keep a tally, or as intricate as modern database technology. A memory system can be as personal as a to-do list or as public as Wikipedia.
The management of organizational data, generally known as data management, requires skills in designing, using, and managing the memory systems of modern organizations. It requires multiple perspectives. Data managers need to see the organization as a social system and to understand data management technology. The integration of these views, the socio-technical perspective, is a prerequisite for successful data management. Today’s organizations are data-driven, and decisions are increasingly based on insights arising from data analytics.
Individuals also need to manage data. You undoubtedly are more familiar with individual memory management systems. They provide a convenient way of introducing some of the key concepts of data management.
Individual data management
As humans, we are well aware of our limited capacity to remember many things. The brain, our internal memory, can get overloaded with too much detail, and its memory decays with time. We store a few things internally: our cell phone number, where we last parked our car, and faces of people we have met recently. We use external memory to keep track of those many things we would like to remember. External memory comes in a variety of forms.
On our smartphones, we have calendars to remind us of meetings and project deadlines. We have a contact app to record the addresses and phone numbers of those we contact frequently. We use to-do lists to remind us of the things we must do today or this week. The interesting thing about these aides-mémoire is that each has a unique way of storing data and supporting its rapid retrieval.
Calendars come in many shapes and forms, but they are all based on the same organizing principle. A set amount of space is allocated for each day of the year, and the spaces are organized in date and time order, which supports rapid retrieval. Some calendars have added features to speed up access. For example, electronic calendars usually have a button to select today’s data.
A calendar
Address books also have a standard format. They typically contain predefined spaces for storing address details (e.g., name, company, phone, and email). Rapid access is supported by a search engine.
An address book
The structure of to-do lists tends to be fairly standard. They are often set up in list format with a small left-hand margin. The idea is to enter each item to be done on the right side of the screen. The left side is used to check or mark those tasks that have been completed. The beauty of the check method is that you can quickly scan the left side to identify incomplete tasks.
A to-do or reminder list
Many people use some form of the individual memory systems just described. They are typically included in the suite of standard applications for a smart phone.
These three examples of individual memory systems illustrate some features common to all data management systems:
There is a storage medium. Data are stored electronically in each case.
There is a structure for storing data. For instance, the address book has labeled spaces for entering pertinent data.
The interface is organized for rapid data entry and retrieval. A calendar is stored in date and time sequence so that the data space for any appointment for a particular day can be found quickly.
The selection of a data management system frequently requires a trade-off decision. In these examples, the trade-off is screen dimensions versus the amount of data that can be seen without scrolling. For example, you will notice the address book sample screen is truncated and will need to be scrolled to see full address details.
❓ Skill builder
Smart phones have dramatically changed individual data management. We now have calendars, address books, to-do lists, and many more apps literally in our hands. What individual data are still difficult to manage? What might be the characteristics of an app for these data?
There are differences between internal and external memories. Our internal memory is small, fast, and convenient (our brain is always with us—well, most of the time). External memory is often slower to reference and not always as convenient. The two systems are interconnected. We rely on our internal memory to access external memory. Our internal memory and our brain’s processing skills manage the use of external memories. For example, we depend on our internal memory to recall how to use our smartphone and its apps. Again, we see some trade-offs. Ideally, we would like to store everything in our fast and convenient internal memory, but its limited capacity means that we are forced to use external memory for many items.
Organizational data management
Organizations, like people, need to remember many things. If you look around any office, you will see examples of the apparatus of organizational memory: people, bookshelves, planning boards, and computers. The same principles found in individual memory systems apply to an organization’s data management systems.
There is a storage medium. In the case of computers, the storage medium varies. Small files might be stored on a USB drive and large, archival files on a magnetic disk. The chapter Data Structure and Storage discusses electronic storage media in more detail.
A table is a common structure for storing data. For example, if we want to store details of customers, we can set up a table with each row containing individual details of a customer and each column containing data on a particular feature (e.g., customer code).
Storage devices are organized for rapid data entry and retrieval. Time is the manager’s enemy: too many things to be done in too little time. Customers expect rapid responses to their questions and quick processing of their transactions. Rapid data access is a key goal of nearly all data management systems, but it always comes at a price. Fast access memories cost more, so there is nearly always a trade-off between access speed and cost.
As you will see, selecting how and where to store organizational data frequently involves a trade-off. Data managers need to know and understand what the compromises entail. They must know the key questions to ask when evaluating choices.
When we move from individual to organizational memory, some other factors come into play. To understand them, we need to review the different types of information systems. The automation of routine business transactions was the earliest application of information technology to business. A transaction processing system (TPS) handles common business tasks such as accounting, inventory, purchasing, and sales. The realization that the data collected by these systems could be sorted, summarized, and rearranged gave birth to the notion of a management information system (MIS). Furthermore, it was recognized that when internal data captured by a TPS is combined with appropriate external data, the raw material is available for a decision support system (DSS). Online analytical processing (OLAP), data mining (DM), business intelligence (BI), and machine learning (ML) are techniques analyzing data captured by business transactions and gathered from other sources (these systems are covered in detail in the chapter on Organizational Intelligence). The purpose of each of these systems is described in the following table and their interrelationship can be understood by examining the information systems cycle.
Types of information systems
Type | System’s purpose | |
---|---|---|
TPS | Transaction processing system | Collect and store data from routine transactions |
MIS | Management information system | Convert data from a TPS into information for planning, controlling, and managing an organization |
DSS | Decision support system | Support managerial decision making by providing models for processing and analyzing data |
BI | Business intelligence | Gather, store, and analyze data to improve decision making |
OLAP | Online analytical processing | Provide a multidimensional view of data |
DM | Data mining | Use of statistical analysis and artificial intelligence techniques to identify hidden relationships in data |
ML | Machine learning | Using software to make decisions or recommendations traditionally made by humans. |
The information systems cycle
The various systems and technologies found in an organization are linked in a cycle. The routine ongoing business of the organization is processed by TPSs, the systems that handle the present. Data collected by TPSs are stored in databases, a record of the past, the history of the organization and its interaction with those with whom it conducts business. These data are converted into information by analysts using a variety of software (e.g., a DSS). These technologies are used by the organization to prepare for the future (e.g., sales in Finland have expanded, so we will build a new service center in Helsinki). The business systems created to prepare for the future determine the transactions the company will process and the data that will be collected, and the process continues. The entire cycle is driven by people using technology (e.g., a customer booking a hotel room via a Web browser).
The information systems cycle
Decision making, or preparing for the future, is the central activity of modern organizations. Today’s organizations are busy turning out goods, services, and decisions. Knowledge and information workers, over half of the U.S. labor force, produce the bulk of GDP. Many of these people are decision makers. Their success, and their organization’s as well, depends on the quality of their decisions.
Industrial society is a producer of goods, and the hallmark of success is product quality. Japanese manufacturers convincingly demonstrated that focusing on product quality is the key to market leadership and profitability. The methods and the philosophy of quality gurus, such as W. Edwards Deming, have been internationally recognized and adopted by many providers of goods and services. We are now in the information age as is evidenced by the key consumer products of the times, such as smart phones, tablets, and wearables. These are all information appliances, and they are supported by a host of information services. For example, consider how Apple connects together its various devices and services through cloud-based systems. For example, a person can buy an electronic book from Apple’s store to read with the iBooks app on an iPhone or iPad.
In the information society, which is based on innovation, knowledge, and services, the key determinant of success has shifted from product quality to decision quality. In the turbulent environment of global business, successful organizations are those able to quickly make high-quality decisions about what customers will buy, how much they will pay, and how to deliver a high-quality experience with a minimum of fuss. Companies are very dependent on information systems to create value for their customers.
Desirable attributes of data
Once we realize the critical importance of data to organizations, we can recognize some desirable attributes of data.
Desirable attributes of data
Shareable | Readily accessed by more than one person at a time |
---|---|
Transportable | Easily moved to a decision maker |
Secure | Protected from destruction and unauthorized use |
Accurate | Reliable, precise records |
Timely | Current and up-to-date |
Relevant | Appropriate to the decision |
Transportable
Data should be transportable from their storage location to the decision maker. Technologies that transport data have a long history. Homing pigeons were used to relay messages by the Egyptians and Persians 3,000 years ago. The telephone revolutionized business and social life because it rapidly transmitted voice data. Computers have changed the nature of many aspects of business because they enable the transport of text, visuals, voice, and video.
Today, transportability is more than just getting data to a decision maker’s desk. It means getting product availability data to a salesperson in a client’s office or advising a delivery driver, en route, of address details for an urgent parcel pickup. The general notion is that decision makers should have access to relevant data whenever and wherever required, although many organizations are still some way from fully achieving this goal.
Secure
In an information society, organizations value data as a resource. As you have already learned, data support day-to-day business transactions and decision making. Because the forgetful organization will soon be out of business, organizations are very vigilant in protecting their data. There are a number of actions that organizations take to protect data against loss, sabotage, and theft. A common approach is to duplicate data and store the copy, or copies, at other locations. This technique is popular for data stored in computer systems. Access to data is often restricted through the use of physical barriers (e.g., a vault) or electronic barriers (e.g., a password). Another approach, which is popular with firms that employ knowledge workers, is a noncompete contract. For example, some software companies legally restrain computer programmers from working for a competitor for two years after they leave, hoping to prevent the transfer of valuable data, in the form of the programmer’s knowledge of software, to competitors.
Accurate
You probably recall friends who excel in exams because of their good memories. Similarly, organizations with an accurate memory will do better than their less precise competitors. Organizations need to remember many details precisely. For example, an airline needs accurate data to predict the demand for each of its flights. The quality of decision making will drop dramatically if managers use a data management system riddled with errors.
Polluted data threatens a firm’s profitability. One study suggests that missing, wrong, and otherwise bad data cost U.S. firms billions of dollars annually. The consequences of bad data include improper billing, cost overruns, delivery delays, and product recalls. Because data accuracy is so critical, organizations need to be watchful when capturing data—the point at which data accuracy is most vulnerable.
Timely
The value of a collection of data is often determined by its age. You can fantasize about how rich you would be if you knew tomorrow’s stock prices. Although decision makers are most interested in current data, the required currency of data can vary with the task. Operational managers often want real-time data. They want to tap the pulse of the production line so that they can react quickly to machine breakdowns or quality slippages. In contrast, strategic planners might be content with data that are months old because they are more concerned with detecting long-term trends.
Relevant
Organizations must maintain data that are relevant to transaction processing and decision making. In processing a credit card application, the most relevant data might be the customer’s credit history, current employment status, and income level. Hair color would be irrelevant. When assessing the success of a new product line, a marketing manager probably wants an aggregate report of sales by marketing region. A voluminous report detailing every sale would be irrelevant. Data are relevant when they pertain directly to the decision and are aggregated appropriately.
Relevance is a key concern in designing a data management system. Clients have to decide what should be stored because it is pertinent now or could have future relevance. Of course, identifying data that might be relevant in the future is difficult, and there is a tendency to accumulate too much. Relevance is also an important consideration when extracting and processing data from a data management system. Provided the germane data are available, query languages can be used to aggregate data appropriately.
In the final years of the twentieth century, organizations started to share much of their data, both high and low volatility, via the Web. This move increased shareability, timeliness, and availability, and it has lowered the cost of distributing data.
In summary, a data management system for maintaining an organization’s memory supports transaction processing, remembering the past, and decision making. Its contents must be shareable, secure, and accurate. Ideally, the clients of a data management system must be able to get timely and relevant data when and where required. A major challenge for data management professionals is to create data management systems that meet these criteria. Unfortunately, some existing systems fail in this regard, though we can understand some of the reasons why by reviewing the components of existing organizational memory systems.
Components of organizational memory
An organization’s memory resides on various media in a variety of ways. It is in people’s minds, standard operating procedures, roles, organizational culture, physical storage equipment, and electronic devices. It is scattered around the organization like pieces of a jigsaw puzzle designed by a berserk artist. The pieces don’t fit together, they sometimes overlap, there are gaps, and there are no edge pieces to define the boundaries. Organizations struggle to design structures and use data management technology to link some of the pieces. To understand the complexity of this wicked puzzle, we need to examine some of the pieces. Data managers have a particular need to understand the different forms of organizational memory because their activities often influence a number of the components.
Components of organizational memory
People
People are the linchpin of an organization’s memory. They recall prior decisions and business actions. They create, maintain, evolve, and use data management systems. They are the major component of an organization’s memory because they know how to use many of the other components. People extract data from the various elements of organizational memory to provide as complete a picture of a situation as possible.
Each person in an organization has a role and a position in the hierarchy. Role and position are both devices for remembering how the organization functions and how to process data. By labeling people (e.g., Chief Information Officer) and placing their names on an organizational chart, the organization creates another form of organizational memory.
Organizational culture is the shared beliefs, values, attitudes, and norms that influence the behavior and expectations of each person in an organization. As a long-lived and stable memory system, culture establishes acceptable behavior and influences decision making.
People develop skills for doing their particular job—learning what to do, how to do it, and who can help them get things done. For example, they might discover someone in employee benefits who can handle personnel problems or a contact in a software company who can answer questions promptly. This social capital, which often take years to develop, is used to make things happen and to learn about the business environment. Despite its high value, social capital is rarely documented, at least not beyond an address book, and much is typically lost when a person leaves an organization.
Conversations are an important method for knowledge workers to create, modify, and share organizational memory and to build relationships and social capital. Discussions with customers are a key device for learning how to improve an organization’s products and services and learning about competitors. The conversational company can detect change faster and react more rapidly. The telephone, instant message, e-mail, coffee machine, cocktail hour, and cafeteria are all devices for promoting conversation and creating social networks. Some firms deliberately create structures for supporting dialog to make the people component of organizational memory more effective.
Standard operating procedures exist for many organizational tasks. Processing a credit application, selecting a marketing trainee, and preparing a departmental budget are typical procedures that are clearly defined by many organizations. They are described on Web pages, computer programs, and job specifications. They are the way an organization remembers how to perform routine activities. This organizational capital is a critical resource.
Successful people learn how to use organizational memory. They learn what data are stored where, how to retrieve them, and how to put them together. In promoting a new product, a salesperson might send the prospect a package containing some brochures and an email of a product review in a trade journal, and supply the phone number and e-mail address of the firm’s technical expert for that product. People’s recall of how to use organizational memory is the core component of organizational memory. Academics call this metamemory; people in business call it learning the ropes. New employees spend a great deal of time building their metamemory so that they can use organizational memory effectively. Without this knowledge, organizational memory has little value.
Tables
A table is a common form of storing organizational data. The following table shows a price list in tabular form. Often, the first row defines the meaning of data in subsequent rows.
A price list
Product | Price |
---|---|
Pocket knife | 4.50 |
Compass | 10.00 |
Geopositioning system | 100.00 |
Map measure | 4.95 |
A table is a general form that describes a variety of other structures used to store data. Computer-based files are tables or can be transformed into tables; the same is true for general ledgers, worksheets, and spreadsheets. Accounting systems make frequent use of tables. As you will discover in the next section, the table is the central structure of the relational database model.
Data stored in tables typically have certain characteristics:
Data in one column are of the same type. For example, each cell of the column headed “Price” contains a number. (Of course, the exception is the first row of each column, which contains the title of the column.)
Data are limited by the width of the available space.
Rapid searching is one of the prime advantages of a table. For example, if the price list is sorted by product name, you can quickly find the price of any product.
Tables are a common form of storing organizational data because their structure is readily understood. People learn to read and build tables in the early years of their schooling. Also, a great deal of the data that organizations want to remember can be stored in tabular form.
Documents
A document—of which reports, manuals, brochures, and memos are examples—is a common medium for storing organizational data. Although documents may be subdivided into sections, chapters, paragraphs, and sentences, they lack the regularity and discipline of a table. Each row of a table has the same number of columns, but each paragraph of a document does not have the same number of sentences.
Most documents are now stored electronically. Because of the widespread use of word processing, text files are a common means of storing documents. Typically, such files are read sequentially like a book. Although there is support for limited searching of the text, such as finding the next occurrence of a specified text string, text files are usually processed linearly.
Hypertext, the familiar linking technology of the Web, supports nonlinear document processing. A hypertext document has built-in linkages between sections of text that permit the reader to jump quickly from one part to another, or to a different document. As a result, readers can find data they require more rapidly.
Although hypertext is certainly more reader-friendly than a flat, sequential text file, it takes time and expertise to establish the links between the various parts of the text and to other documents. Someone familiar with the topic has to decide what should be linked and then establish these links. While it takes the author more time to prepare a document this way, the payoff is the speed at which readers of the document can find what they want. A diligent author can save a great deal of time for many readers.
Multimedia
Many Web sites display multimedia objects, such as sound and video clips. Automotive company Web sites have video clips of cars, music outlets provide sound clips of new releases, and clothing companies have online catalogs displaying photos of their latest products. Maintaining a Web site, because of the many multimedia objects that some sites contain, has become a significant data management activity for some organizations. Consider the different types of data that a news outfit such as the Australian Broadcasting Corporation has to store to provide a timely, informative, and engaging Web site.
Images
Images are visual data: photographs and sketches. Image banks are maintained for several reasons. *First**, images are widely used for identification and security. Police departments keep fingerprints and mug shots. Second, images are used as evidence. Highly valuable items such as paintings and jewelry often are photographed for insurance records. Third, images are used for advertising and promotional campaigns, and organizations need to maintain records of material used in these ventures. Image archiving and retrieval are essential for online retailers. Fourth, some organizations specialize in selling images and maintain extensive libraries of clip art and photographs (e.g., Getty Images).
Graphics
Maps and engineering drawings are examples of electronically stored graphics. An organization might maintain a map of sales territories and customers. Manufacturers have extensive libraries of engineering drawings that define the products they produce. Graphics often contain a high level of detail. An engineering drawing will define the dimensions of all parts and may refer to other drawings for finer detail about any components.
A graphic differs from an image in that it contains explicitly embedded data. Consider the difference between an engineering plan for a widget and a photograph of the same item. An engineering plan shows dimensional data and may describe the composition of the various components. The embedded data are used to manufacture the widget. A photograph of a widget does not have embedded data and contains insufficient data to manufacture the product. An industrial spy will receive far more for an engineering plan than for a photograph of a widget.
A geographic information systems (GIS) is a specialized graphical storage system for geographic data. The underlying structure of a GIS is a map on which data are displayed. A power company can use a GIS to store and display data about its electricity grid and the location of transformers. Using a pointing device such as a mouse, an engineer can click on a transformer’s location to display a window of data about the transformer (e.g., type, capacity, installation date, and repair history). GISs have found widespread use in governments and organizations that have geographically dispersed resources.
Audio
News organizations, such as National Public Radio (NPR) in the U.S., provide audio versions of their new stories for replay. Some firms conduct a great deal of their business by phone. In many cases, it is important to maintain a record of the conversation between the customer and the firm’s representative. The Royal Hong Kong Jockey Club, which covers horse racing gambling in Hong Kong, records all conversations between its operators and customers. Phone calls are stored on a highly specialized voice recorder, which records the time of the call and other data necessary for rapid retrieval. In the case of a customer dispute, an operator can play back the original conversation.
Video
A video clip can give a potential customer additional detail that cannot be readily conveyed by text or a still image. Consequently, some auto companies use video and virtual reality to promote their cars. On a visit to Rivian’s Web site, you can view video clips of the latest models in action.
Models
Organizations build mathematical models to describe their business. These models, usually placed in the broader category of DSS, are then used to analyze existing problems and forecast future business conditions. A mathematical model can often produce substantial benefits to the organization. Some of these models are now so detailed, they are a digital twin of a product or organization and support extensive exploration of possible product or business scenarios.
Machine learning can be used to create decision making models when an organization has a large set of data of prior decisions and their associated factors. The system learns by iteratively fitting a large number of mathematical equations to the data so that the model generated accurately predicts prior decisions.
Knowledge
Organizations build systems to capture the knowledge of their experienced decision makers and problem solvers. This expertise is typically represented as a set of rules, semantic nets, and frames in a knowledge base, another form of organizational memory.
Decisions
Decision making is the central activity of modern organizations. Very few organizations, however, have a formal system for recording decisions. Most keep the minutes of meetings, but these are often very brief and record only a meeting’s outcome. Because they do not record details such as the objectives, criteria, assumptions, and alternatives that were considered prior to making a decision, there is no formal audit trail for decision making. As a result, most organizations rely on humans to remember the circumstances and details of prior decisions.
Components of organizational memory
Organizations are not limited to their own memory stores. There are firms whose business is to store data for resale to other organizations. Such businesses have existed for many years and are growing as the importance of data in a postindustrial society expands. U.S. lawyers can use document management services to access the laws and court decisions of all 50 American states and the U.S. federal government. Similar legal data services exist in many other nations. There is a range of other services that provide news, financial, business, scientific, and medical data.
Problems with data management systems
Successful management of data is a critical skill for nearly every organization. Yet few have gained complete mastery, and there are a variety of problems that typically afflict data management in most firms.
Problems with organizational data management systems
Problem | Examples |
---|---|
Redundancy | Same data are stored in different systems |
Lack of data control | Data are poorly managed |
Poor interface | Data are difficult to access |
Delays | There are frequently delays following requests for reports |
Lack of reality | Data management systems do not reflect the complexity of the real world |
Lack of data integration | Data are dispersed across different systems |
Redundancy
In many cases, data management systems have grown haphazardly. As a result, it is often the situation that the same data are stored in several different memories. The classic example is a customer’s address, which might be stored in the sales reporting system, accounts receivable system, and a salesperson’s address book. The danger is that when the customer changes address, the alteration is not recorded in all systems. Data redundancy causes additional work because the same item must be entered several times. Redundancy causes confusion when what is supposedly the same item has different values.
Lack of data control
Allied with the redundancy problem is poor data control. Although data are an important organizational resource, they frequently do not receive the same degree of management attention as other important organizational resources, such as people and money. Organizations have a personnel department to manage human resources and a treasury to handle cash. The IS department looks after data captured by the computer systems it operates, but there are many other data stores scattered around the organization. Data are stored everywhere in the organization (e.g., on tablets and the cloud), but there is a general lack of data management. This lack is particularly surprising, since many pundits claim that data are a key competitive resource.
Poor interface
Too frequently, the potential clients of data management systems are deterred by an unfriendly interface. The computer interface for accessing a data store is sometimes difficult to remember for the occasional inquirer. People become frustrated and give up because their queries are rejected and error messages are unintelligible.
Delays
Globalization and technology have accelerated the pace of business in recent years. Managers must make more decisions more rapidly. They cannot afford to wait for programmers to write special-purpose programs to retrieve data and format reports. They expect their questions to be answered rapidly, often within an hour and sometimes more quickly. Managers, or their support personnel, need query languages that provide rapid access to the data they need, in a format that they want.
Lack of reality
Organizational data stores must reflect the reality and complexity of the real world. Consider a typical bank customer who might have a personal checking account, mortgage account, credit card account, and some certificates of deposit. When a customer requests an overdraft extension, the bank officer needs full details of the customer’s relationship with the bank to make an informed decision. If customer data are scattered across unrelated data stores, then these data are not easily found, and in some cases important data might be overlooked. The request for full customer details is reasonable and realistic, and the bank officer should expect to be able to enter a single query to obtain it. Unfortunately, this is not always the case, because data management systems do not always reflect reality.
In this example, the reality is that the personal checking, mortgage, and credit card accounts, and certificates of deposit all belong to one customer. If the bank’s data management system does not record this relationship, then it does not mimic reality. This might make it impossible to retrieve a single customer’s data with a single query.
A data management system must meet the decision making needs of managers, who must be able to request both routine and ad hoc reports. To do so effectively, a data management system must reflect the complexity of the real world. If it does not store required organizational data or record a real-world relationship between data elements, then some managerial queries might not be answerable quickly.
Lack of data integration
There is a general lack of data integration in most organizations. Not only are data dispersed in different forms of organizational memory (e.g., files and image stores). Many organizations maintain file systems that are not interconnected. Appropriate files in the accounting system might not be linked to the production system.
This lack of integration will be a continuing problem for most organizations for two important reasons. First, earlier computer systems might not have been integrated because of the limitations of available technology. Organizations created simple file systems to support a particular function. Many of these legacy systems are still in use. Second, integration is a long-term goal. As new systems are developed and old ones rewritten, organizations can evolve integrated systems. It is likely too costly and disruptive to try to solve the data integration problem in one or a few step.
Many data management problems can be solved with present technology. Data modeling and relational database technology, topics covered in Section 2, help overcome many of the current problems.
A brief history of data management systems
Data management is not a new organizational concern. It is an old problem that has become more significant, important, and critical because of the emergence of data as a critical resource for effective performance in the modern economy. Organizations have always needed to manage their data so that they could remember a wide variety of facts necessary to conduct their affairs. The recent history of computer-based data management systems is depicted in the following figure.
File systems were the earliest form of data management. Limited by the sequential nature of magnetic tape technology, it was very difficult to integrate data from different files. The advent of magnetic disk technology in the mid-1950s stimulated development of integrated file systems, and the hierarchical database management system (DBMS) emerged in the 1960s, followed some years later by the network DBMS. The spatial database, or geographic information system (GIS), appeared around 1970. Until the mid-1990s, the hierarchical DBMS, mainly in the form of IBM’s DL/I product, was the predominant technology for managing data. It was replaced by the relational DBMS, a concept first discussed by Edgar Frank Codd in an academic paper in 1970 but not commercially available until the mid-1970s. In the late 1980s, the notion of an object-oriented DBMS, primed by the ideas of object-oriented programming, emerged as a solution to situations not handled well by the relational DBMS. Also around this time, the idea of modeling a database as a graph resulted in graph database technology. Towards the end of the 20th century, XML was developed for exchanging data between computers, and it can also be used as a data store as you will learn in section 3. More recently, distributed files system, such as Hadoop and Blockchain, have emerged as alternative models for data management. Other recent data management systems include NoSQL databases (Not Only SQL). While these are beyond the scope of an introductory data management text, if you decide to pursue a career in data management you should learn about their advantages and the applications to which they are well-suited. For example, graph databases are a good fit for the analysis of social networks.
This book concentrates on the relational model, currently the most widely used data management system. In 2022, the relational database market generated over USD50 billion in revenue.6 As mentioned, Section 2 is devotedto the development of the necessary skills for designing and using a relational database.
Data, information, and knowledge
Often the terms data and information are used interchangeably, but they are distinctly different. Data are raw, unsummarized, and unanalyzed facts. Information is data that have been processed into a meaningful form.
A list of a supermarket’s daily receipts is data, but it is not information, because it is too detailed to be very useful for decision making. A summary of the data that gives daily departmental totals is information, because the store manager can use the report to monitor store performance. The same report might be data for a regional manager, because it is too detailed for meaningful decision making at the regional level. Information for a regional manager might be a weekly report of sales by department for each supermarket.
Data are always data, but one person’s information can be another person’s data. Information that is meaningful to one person can be too detailed for another. A manager’s notion of information can change quickly, however. When a problem is identified, a manager might request finer levels of detail to diagnose the problem’s cause. Thus, what was previously data suddenly becomes information because it helps solve the problem. There is a need for information systems that let managers customize the processing of data so that they always get information. As their needs change, they need to be able to adjust the detail of the reports they receive.
Knowledge is the capacity to use information. The education and experience that managers accumulate provide them with the expertise to make sense of the information they receive. Knowledge means that managers can interpret information and use it in decision making. In addition, knowledge is the capacity to recognize what information would be useful for making decisions. For example, a sales manager might know that requesting a report of profitability by product line is useful when they has to decide whether to employ a new product manager. Thus, when a new information system is delivered, managers need to be taught what information the system can deliver what that information means, and how it might be used.
The relationship between data, information, and knowledge is depicted in the following figure. A knowledgeable person requests information to support decision making. To fulfill the request, data are converted into information. Personal knowledge is then applied to interpret the requested information and reach a conclusion. Of course, the cycle can be repeated several times if more information is needed before a decision can be made. Notice how knowledge is essential for grasping what information to request and interpreting that information in terms of the required decision.
The relationship between data, information, and knowledge
The challenge
A major challenge facing organizations is to make effective use of the data currently stored in their diverse data management systems. This challenge exists because these various systems are not integrated and many potential clients not only lack the training to access the systems but often are unaware what data exist. Before data managers can begin to address this problem, however, they must understand how organizational memories are used. In particular, they need to understand the relationship between information and managerial decision making. Data management is not a new problem. It has existed since the early days of civilization and will be an enduring problem for organizations and societies.
Summary
Organizations must maintain a memory to process transactions and make decisions. Organizational data should be shareable, transportable, secure, and accurate, and provide timely, relevant information. The essential components are people (the most important), text, multimedia data, models, and knowledge. A wide variety of technologies can be used to manage data. External memories enlarge the range of data available to an organization. Data management systems often have some major shortcomings: redundancy, poor data control, poor interfaces, long lead times for query resolution, an inability to supply answers for questions posed by managers, and poor data integration. Data are raw facts; information is data processed into a meaningful form. Knowledge is the capacity to use information.
Key terms and concepts | |
---|---|
Data | Internal memory |
Database management system (DBMS) | Knowledge |
Data management | Machine learning (ML) |
Data mining (DM) | Management information system (MIS) |
Data security | Metamemory |
Decision making | Online analytical processing (OLAP) |
Decision quality | Organizational culture |
Decision support system (DSS) | Organizational memory |
Digital twin | Standard operating procedures |
External memory | Tables |
Geographic information system (GIS) | Transaction processing system (TPS) |
Information |
References and additional readings
Davenport, T. H. (1998). Putting the enterprise into the enterprise system. Harvard Business Review, 76(4), 121-131.
Watson, R. T. (2020). Capital, Systems and Objects: The Foundation and Future of Organizations. Singapore: Springer
Exercises
What are the major differences between internal and external memory?
What is the difference between the things you remember and the things you record on your computer?
What features are common to most individual memory systems?
What do you think organizations did before computers were invented?
Discuss the memory systems you use. How do they improve your performance? What are the shortcomings of your existing systems? How could you improve them?
Describe the most “organized” person you know. Why is that person so organized? Why haven’t you adopted some of the same methods? Why do you think people differ in the extent to which they are organized?
Think about the last time you enrolled in a class. What data do you think were recorded for this transaction?
What roles do people play in organizational memory?
What do you think is the most important attribute of organizational memory? Justify your answer.
What is the difference between transaction processing and decision making?
When are data relevant?
Give some examples of specialized memories.
How can you measure the quality of a decision?
What is organizational culture? Can you name some organizations that have a distinctive culture?
What is hypertext? How does it differ from linear text? Why might hypertext be useful in an organizational memory system?
What is imaging? What are the characteristics of applications well suited for imaging?
What is an external memory? Why do organizations use external memories instead of building internal memories?
What is the common name used to refer to systems that help organizations remember knowledge?
What is a DSS? What is its role in organizational memory?
What are the major shortcomings of many data management systems? Which do you think is the most significant shortcoming?
What is the relationship between data, information, and knowledge?
Estimate how much data Netflix requires to store its many movies.
Using the Web, find some stories about firms using data management systems. You might enter keywords such as “database” and “business analytics” and access the sites of publications such as Computerworld. Identify the purpose of each system. How does the system improve organizational productivity? What are the attributes of the technology that make it useful? Describe any trade-offs the organization might have made. Identify other organizations in which the same technology might be applied.
Make a list of the organizational memory systems identified in this chapter. Interview several people working in organizations. Ask them to indicate which organizational memory systems they use. Ask which system is most important and why. Write up your findings and your conclusion.