SMART DATA QUALITY
The poor quality of organizational data is a growing concern, one that is affecting all industries. In health care, this issue is particularly salient. Data are no longer used for only internal purposes; healthcare organizations now depend on externally generated data for clinical decision making, quality assurance, cost and utilization analysis, and benchmarking.
Introduction :
The poor quality of organizational data is a growing concern, one that is affecting all industries. In health care, this issue is particularly salient. Data are no longer used for only internal purposes; healthcare organizations now depend on externally generated data for clinical decision making, quality assurance, cost and utilization analysis, and benchmarking. How serious and widespread is the problem of poor or incomplete healthcare data? Merida L. Johns and data quality expert Thomas C. Redman cite studies in which 20% to 80% of patient charts in healthcare institutions were missing crucial data like complete physician notes. Johns cites a 1988 study by Hsia et al. documenting error rates of more than 20% in the diagnosis and coding of patient medical records . Such poor data quality can cost an organization time, money, and the trust of clients or patients. For example, Redman notes that typical error rates are 1% to 5%, which translates to 10% of the revenue for a medium to large organization , or 40% to 60% of service organization expenses . Larry English cites Joseph Juran's work, indicating that poor quality costs an organization 20% to 40% of sales because of customer complaints, rework, and throwaway work . By contrast, high-quality data can reduce organizational costs and help make positive contributions to the bottom line. It can eliminate redundant files, databases, and processes; reduce the need for corrections and rework; and provide customers, clinicians, and senior leadership with accurate,timely,andmeaningful information. . However, now that information is considered an enterprise resource that enables rapid, significant, and strategic decisions, data quality is being recognized as a key to success. The increasing number of data warehousing projects in healthcare organizations is also contributing to this heightened awareness of data quality. According to Merida Johns and Ken Orr, data warehousing projects are the primary reason managers have started to focus attention on data quality (Johns 1997, p. 194; Orr 1998, p. 69). Many recent data warehousing projects have failed, not because of the quality of the co ding or table structures, but because of the poor quality and reliability of data sources. In 1998, the Veterans Health Administration (VHA) identified a need to examine the quality of the data used for decision making and initiated a data quality project in response. The following case study discusses the key issues that genera ted interest in the project; the Data Quality Summit that marked its launch; roles, responsibilities, and action items of the project's working groups; and plans for the future. Applicable and relevant, this case study should be reviewed by any healthcare professional who recognizes our increasing need for improved data quality. Background* The Department ofVeterans Affairs (VA) was established March 15,1989, with Cabinet rank succeeding the Veterans Administration and assuming responsibility for providing federal benefits to veterans and their dependents. Headed by the Secretary of Veterans Affairs, the VA is the second largest of the 14 Cabinet departments and operates nationwide programs of health care, assistance services, and cemeteries: the Veterans Health Administration (VHA), the Veterans Benefits Administration (VBA), and the National Cemetery System (NCS). Led by the Under Secretary for Health, the VHA is the VA's healthcare system and the largest integrated delivery system in the United States. It includes 172 medical centers, approximately 551 ambulatory care and community-based outpatient clinics, 131 nursing hornes, 40 domiciliaries, and 73 comprehensive horne care programs. VA healthcare facilities provide a broad spectrum of medical, surgical, and rehabilitative care. In fiscal year 1999, the VA expects to treat approximately 750,000 patients in its hospitals, 106,000 in nursing hornes, and 25,000 in domiciliaries. The VA's outpatient clinics expect to register approximately 35.8 million visits in fiscal year 1999. Nearly 3.6 million individuals will receive care in all VA healthcare facilities in 1999. The VA also conducts an array of research activities concentrating on some of the most challenging issues facing medical science today: aging.
The 5 Characteristics of Data Quality :??????
???Organizations are more often concerned about getting data, having the most sophisticated tools to execute an ETL/ELT faster, and having a top-notch dashboard as compared to being concerned about whether the data they are using is “usable data”. The keyword here is?usable data.Imagine making decisions based on a dashboard with old, inaccurate, and non-relevant data. dangerous can that be to a business. Through the?5 characteristics of data quality: accuracy, completeness, reliability, relevance, and timeliness.
领英推荐
1.Accuracy :
Information accuracy is determined by checking whether the information provided is correct in every detail. To determine whether or not data is accurate, ask if information reflects a real-world situation. For example, someone owns 4 insurance policies for his 4 Ferraris. Is this accurate data .Is it possible for someone to have 4 Ferraris??Of course, it is possible, but we are talking about Ferraris, which could also be a duplicate data issue that can lead to many problems for all parties involved. These problems can include inflated numbers or even false results during analysis. So, why is accuracy so important? Inaccurate data can cause significant damage, with severe consequences leading to costly mistakes, lost productivity, and poor business decisions.
?2. Completeness
Completeness is represented by how comprehensive the information is.?Think about whether all the data you need is available. Imagine a chart that is supposed to contain data about the number of inhabitants per country, but the chart includes only the number of inhabitants. Without the country information, the chart is useless. The data is incomplete and not comprehensive. So, why does completeness matter as a data quality characteristic?If the information is incomplete, it might be unusable, leading to inaccuracies and bad business decisions.
3. Reliability:
Reliability is measured by comparison. If a piece of information doesn’t contradict another piece of information from a different source, that means the information is reliable.On the other hand, if the information from different sources or systems is contradictory, we shouldn’t trust that information. Imagine having a different birth date for the same person in each system. That is unreliable information. It can cause people to make mistakes that cost your organization money and lead to reputational damage.
4. Relevance :
This is perhaps the hardest characteristic to measure because it can be dubious. Relevance is measured by the need for the data. The problem with relevance is that, for a data engineer, a certain subset of data can be relevant but, for the business, it doesn’t make any sense. That’s why it can be dubious. A good approach is to have all the data consumers (from IT to business) onboard and see if the majority agrees with the need for that piece of data. At the end of the day, if we are collecting irrelevant information, we are wasting time and money.
5. Timeliness
As you know, out-of-date information can have a huge impact on the decision-making process. A decision made based on out-of-date information costs organizations time and money.So, how can we measure information timeliness. We must know whether we will be using real-time or near real-time information or whether we can use the information within a defined threshold.?If we are using real-time or near real-time information for a decision-making process, and if we are collecting data from 6 hours ago, we might have a big problem. An example is the stock market, which changes almost by the minute. In this particular case, if we are deciding something based on the data we received 6 hours ago, that can be catastrophic to the organization. If we don’t need real-time or near real-time information, we should guarantee that what we are collecting is within our date threshold. Data outside the threshold would be useless.