There is No Bad Data, Just Bad Management!

1.0 Introduction

Data is neither good nor bad; nor is it neutral.?Data doesn’t speak for itself. It must be analyzed and interpreted, and the practice of making meaning from data is where opportunities and challenges lie. Data can be used to advance numerous goals and values, ranging from the mundane to the significant. Businesses can use data to drive economic imperatives, both socially beneficial and morally bankrupt. News media can use data to create informed citizenry or to manipulate people. Governments can use data to improve governance or to centralize control. People can use data to empower themselves and their communities or to reify injustices and inequalities (Data & Society, 2015).

1.2 Data To understand data management, one must first understand data.?On the simplest level, data can be defined as “facts and statistics collected together for reference and analysis”. From an information science perspective, data can be defined more contextually in the scope of research to mean that it “is collected, observed, or created, for purposes of analysis to produce original research results”. It is important to recognize that data goes beyond spreadsheets of numbers. Data can take many forms: bio-specimens, video recordings, images, software programs, algorithms, paper lab notebooks. It is perhaps most useful to think of data as everything that would be needed to reproduce a given scientific output (Surkis A et al, 2015).

1.3 Bad Data Bad or Dirty Data refers to information that can be erroneous, misleading, and without general formatting (Vasudev, 2015). Bad data has been defined by number of people. According to (Bhatt, 2018) ‘In simple terms’, bad data is an inaccurate set of information. It also includes missing data, wrong information, inappropriate data (for example, data entered in wrong columns), non-conforming data, duplicate data, and poor entry (misspells, typos, variations in spellings, format etc).

To prevent bad data and avoid some of its frightening costs:

  1. Make a Data Management Plan
  2. Use a Consistent, Standard Format
  3. Do a Data Quality Assessment
  4. Keep Your Database Efficient and Streamlined
  5. Save in Open, Non-proprietary Formats
  6. Backup Your Data
  7. Practice Good Data Security (SmartyStreets, n.d.).

1.4 Data Management is an administrative process that includes acquiring, validating, storing, protecting, and processing required data to ensure the accessibility, reliability, and timeliness of the data for its users. Organizations and enterprises are making use of Big Data more than ever before to inform business decisions and gain deep insights into customer behavior, trends, and opportunities for creating extraordinary customer experiences (Galetto, 2016).

Production of good data heavily relies on proper and appropriate data management. According to (Galetto, 2016) in order to make sense of the vast quantities of data that enterprises are gathering, analyzing, and storing today, companies turn to data management solutions and platforms. Data management solutions make processing, validation, and other essential functions simpler and less time-intensive.

In data management, the concept of the?data lifecycle?is often used.?First three stages of this lifecycle: creating or collecting data, processing data from its rawest form to another form for analysis, and analyzing the data so that the results can be distributed. The last three stages of the data lifecycle involve preserving the data after the study is finished, providing access to the data to others, and finally reusing the data to conduct new studies or to test the reproducibility of the original results (Surkis A et al, 2015).?

It has been shown that poor data quality in decision-making can have far-reaching socioeconomic consequences. It can reduce client’s satisfaction; increase operational costs, lower effectiveness of decision-making and the ability to plan. Poor data quality can result in lowered morale among health workers and create mistrust in organizations resulting into inefficiencies (Benjamin K et al, 2014).

1.5 Data management includes the following:

  • Data processing
  • Data security
  • Data destruction
  • Data reference and master Data Management
  • Documents and records storage
  • Data warehousing and business intelligence management
  • Records management
  • Data governance
  • Data architecture
  • Database management
  • Data quality management
  • Contact data management systems

According to (Ortega, 2016) Organizations are constantly challenged to maintain the right level of data quality. This is especially true in a risk-averse industry such as healthcare, where decisions could literally mean the difference between life and death. In addition, ensuring the privacy of patient data and compliance with various regulations from HIPAA, HITECH, PSQIA and others is not only mandatory and complex, but also could be costly in fines and fees. Noncompliance is not a viable alternative when someone’s life could be at risk.

2.0 The implication of “no bad data just bad management” statement when applied to the six stages of data processing

As we ponder about the implication of the above statement, it is important to understand the stages of data processing from data collection to storage and demand. At each stage of data processing we seek to understand what happens and how management play an important role in ensuring the quality of data is achieved, since data on its own cannot go BAD but rather how it is managed is what determines the quality of data. As the saying goes, “garbage in garbage out”, the quality of data is depended on how well or poorly is managed.

Many institutions rely on data-driven decision-making to establish goals, plans, and initiatives to advance success and outcomes (Pearlman, 2019).

Poor management of data can affect data processing at all of the six stages of data processing, which includes:

2.1 Data collection

Data collection is the first step in data processing, Data may be pulled from available sources including data warehouses (Kay, 2018). The challenge to ourselves and institutional partners is to determine whether we have plans, processes, and tools in place to collect accurate data as needed (Pearlman, 2019).

Management support system must ensure that the data sources available are trustworthy and well-built so the data collected is of high quality. It is the management that identifies the problem, defines the scope and data category, carries out the planning, identifies the field team, and provides resources such as (personnel, data tools, maps, guidelines, logistics, training and sensitization). Failure on the part of management to plan properly will result in inaccurate, incomplete, and unreliable data called “BAD DATA”.

2.2 Data preparation

This is the stage in which the raw data is cleaned up and organized for the following stage. During this stage data is diligently checked for errors. It is a very important stage of data processing as it ensures the elimination of bad data, incomplete and incorrect data. Just as (Pearlman, 2019) puts it, cleaning up data is traditionally the most time consuming part of the data preparation process, but it’s crucial for removing faulty data and filling?in gaps.

2.2.1 Importance task here include:

  • Removing extraneous data and outliers.
  • Filling in missing value.
  • Conforming data to a standardized pattern.
  • Masking private and sensitive data entry.

Once the data has been cleansed, it must be validated by testing for errors in the data preparation process up to this point. Often times, an error in the system will become apparent during this step and will need to be resolved before moving forward (Kay, 2018). The management role at this stage is to coordinate, supervise, provide resources for data preparation, define error and privacy and review data before moving to the next stage. Management failure at this point will result into errors prone data.

2.3 Data input

According to (Kay, 2018) this is the first stage in which raw data begins to take form of usable information.

It involves entering the clean data into its destination and translating it into a language that it can understand. According to (Cindy, 2018) it is important to Validate Input:?This is especially so When your data set is supplied by a known or unknown source (an end-user, another application, a malicious user, or any number of other sources) you should require input validation. That data should be verified and validated to ensure that the input is accurate. Once again it is the management that provides the necessary resources, expertise, continuously monitoring the process to ensure all these activities involved in data input are fully implemented to produce quality data. ?

2.4 Processing

There are four components to the description of the data processing for a trial:

1.?????Hardware, i.e. any physical entity used for data processing. This may include computers, printers, and electronic hardware, but also includes paper, pens, and other equipment used to collect, transfer, and archive the data

2.?????software, i.e. the programs needed to make the hardware manipulate and process the data for the study

3.?????personnel that are needed for the data processing

4.?????The systems and organization that must be in place to bring all of the different components together (Smith PG et al, 2015).

During this stage the data inputted to the computer in the previous stage is processed for interpretation. Processing is done using machine learning algorithm, though the process itself may vary slightly depending on the source of data being processed and its intended use such as examining advertising patterns, medical diagnosis from connected devices, determining customer needs (Kay, 2018).

The processing stage is where management typically exerts the greatest control over data. It also is the point at which management can derive the most value from data, assuming that powerful processing tools are available to obtain the intended results.

The most frequent processing procedures available to management are basic activities such as segregating numbers into relevant groups, aggregating them, taking ratios, plotting, and making tables. The goal of these processing activities is to turn a vast collection of facts into meaningful nuggets of information that can then be used for informed decision making, corporate strategy, and other managerial function (Kirkwood, Jr, 2020).

2.5 Data output/interpretation.

The output/interpretation stage is the stage at which data is finally usable to non-data scientist. It is translated, readable and often in form of graphs, videos, images, plain text etc.

The members of the company or institution can now begin to self-serve data for their own analytical projects (Kay, 2018).

The importance of data interpretation is evident and this is why it needs to be done properly. Data is very likely to arrive from multiple sources and has a tendency to enter the analysis process with haphazard ordering. Data analysis tends to be extremely subjective. That is to say, the nature and goal of interpretation will vary from business to business, likely correlating to the type of data being analyzed. While there are several different types of processes that are implemented based on individual data nature, the two broadest and most common categories are “quantitative analysis” and “qualitative analysis”(Lebied, 2018).?

2.6 Data storage???????????????????????????????

This is the final stage of data processing. After all the data id processed, it is then stored for future use. While some information may be put to use immediately, much of it will serve a purpose later on. In addition properly stored data is necessary for compliance with data protection legislation like GDPR. When data is properly stored it can be quickly and easily accessed by members of the organization when needed. According to(Bigelow, 2006)Storage is a resource, and it must be allocated and managed as a resource in order to truly benefit a corporation. Successful data storage management strategies leverage a suite of tools to configure, provision, archive and report storage activities, according to a defined set of management policies or processes.

3.0 Implication on levels of information

At the levels of information we look at how management of data affect each level of information and how this form a “chain reaction” of transfer from one level to the next and how this can have dare consequences to an institution, a country or even global

3.1 Operational information:

Operational information relates to the day-to-day operations of the organization and thus, is useful in exercising control over the operations that are repetitive in nature. Since such activities are controlled at lower levels of management, operational information is needed by the lower management (Chad, n.d.). Management at this level need to monitor data collection activities as this is a very sensitive area since mistakes made at this level will lead to poor data quality. According to(Redman, 2017) poor data quality impacts the typical enterprise in many ways.

At the operational level, poor data leads directly to customer dissatisfaction, increase cost and lower employee jab satisfaction.

3.2 Tactical information:

Tactical information helps middle level managers allocating resources and establishing controls to implement the top level plans of the organization. For example, information regarding the alternative sources of funds and their uses in the short run, opportunities for deployment of surplus funds in short- term securities, etc. may be required at the middle levels of management (Chad, n.d.).

?The information dealt with at this stage is generated from the operational level. Bad management at operational level will result to “bad” data (poor quality). There is no evidence that the data needed and used by managers is any better than the data used by customer service employee and the impact is far reaching.

First, poor data quality comprises decision-making. It is widely accepted maxim that decisions are no better than the data on which they are based. The slightest suspicion of poor data quality often hinders managers from reaching any decision (Redman, 2017). And therefore managers must play their role in ensuring quality data production.

3.3 Strategic information

The strategic information helps in identifying and evaluating these options so that a manager makes informed choices which are different from the competitors and the limitations of what the rivals are doing or planning to do. Such choices are made by leaders only.

Strategic information is used by managers to define goals and priorities, initiate new programmes and develop policies for acquisition and use of corporate resources. Strategic information is predictive in nature, relies heavily on external sources of data, has a long-term perspective, and is mostly in summary form. It may sometimes include ‘what if’ scenarios. However, the strategic information is not only external information.

For long, it was believed that strategic information are basically information regarding the external environment. However, it is now well recognized that the internal factors are equally responsible for success or failures of strategies and thus, internal information is also required for strategic decision making (Chad, n.d.).

?Bad management at operational level will have far reaching consequences at strategic levels. The consequences that stem directly from operational and tactical information issues are becoming clear. First since selecting a strategy is itself a decision-making process, we should expect strategy making to be adversely affected.

Indeed since strategy has much long-term consequences and require data from outside enterprise that may be hard to acquire and that maybe of uncertain quality, we should expect the impact to be at least great at this level.

4.0 The consequences of bad data

Companies rely on accurate data to assist marketing, sales and customer service activities and inaccurate data can undermine good customer experience. If companies have the wrong information on their customers, they will waste time chasing leads that don’t exist.

A key problem of bad data is that you are making important business decisions based on that very same bad data. Decision-making is only as good as the information on which it is based, and errors in data mean any analysis run can be completely wrong. For example, a report telling you that all sales leads are coming from sales reps means you will make hiring and process decisions based on that information without knowing the data is incorrect.

Loss revenue occurs in numerous ways as a result of bad data, including communication fails that don’t convert to sales if underlying customer data is incorrect or in inefficiencies in business processes which depend on data. All of which includes errors in reporting, product ordering and nearly anything else that relies on quality data. Inefficiencies can result in expensive rework efforts to fix problems and improve data management, to meet the requirements of the various business processes. Poor data management can be responsible for reputational damage that can range from small, everyday harm to large public relations disasters due to data breaches (Melanie, 2019).

“Data governance is a system of decision rights and accountabilities for information-related processes, executed according to agreed-upon models which describe who can take what actions with what information, and when, under what circumstances, using what methods.”?Essentially, data governance is an overarching concept that defines who is responsible and has which role for what data at what point in the process. Roles and responsibilities for enterprise data alone, however, are not sufficient to improve data quality in a sustainable way. Data governance needs to be complemented by other elements of data management that together form a comprehensive data management framework (N.L. Martijn et al, 2016).

4.1 Data governance consists of various interrelated elements:

·????????Initial cleansing?and?business process redesign?interventions are required before data governance can function properly within an organization. This sets a basic data quality level at the start of a data governance project. Often it also involves redesign of CRUD processes, compare process redesign

·????????Roles and responsibilities?are essential to prevent lack of clear ownership for data management. When someone is made responsible, it can be assumed that less errors will enter the system, and that errors are detected and solved earlier, leading to more efficient processes.

·????????Data standards?describe how to represent, process, use and handle enterprise data. Implementing data standards, in combination with a governance structure that enforces the standards, leads to higher data and process quality. Use of data standards is a prerequisite for other data governance interventions. Standards also make it possible to measure data quality levels to indicate progress.

·????????Consultation?is meant to improve communication between departments (horizontal), and between management levels (vertical). Besides communication enhancements in primary processes and workflows, so called consultation platforms are recommended to improve the adaptability of data governance measures themselves. Errors should be identified and traced by feedback from users, so the company can learn from experience.

·????????Data sharing?with supply chain partners is important for efficient alignment within supply chains, both internally, as well as externally. This includes monitoring of data provided by the supplier to ensure sufficient quality. Data protocols can be part of the contract provisions.

·????????Monitoring?should provide continuous insight in the current quality of the data to facilitate manageability of data by responsible employees, for instance, implementing tools that produce real-time data quality overviews on a dashboard. Again, this requires the ability to measure data quality level (Martijn N et al, 2015).

5.0 Good data management strategy

According to(Kirkwood, Jr, 2020) a good data management strategy is solid on both offense and defense. There are five additional core components that should be part of your data management strategy:

  • Identification of data and what it means, how it’s structured, where it came from, or where it’s located
  • Storage that lets you easily access, share, and process your data
  • Provisioning that lets you package your data so that it can be reused and shared, and lets you add rules and guidelines for accessing the data
  • Governance to establish, manage, and communicate the policies and mechanisms in place for using data
  • Processing that lets you move and combine data stored in different systems to provide a single, unified view

5.1 Some benefits of Data Management.

  • Minimized Errors: Effective data management helps in minimizing potential errors and reducing the damages caused by bad data. The greater occurrence of processes like copy-paste, drag and drop, and linking of documents, the greater the likelihood of data errors. Therefore, an effective data management strategy and data quality initiative must be implemented to better control the health of a business’ most valuable asset
  • Efficiency Improvements: If your data is properly managed, updated, and enhanced, its accessibility and your organizational efficiency will increase exponentially. However, If the data is inaccurate,?mismanaged or error-prone, it can waste tremendous time and resources.
  • Protection from Data Related Problems and Risks: Security of data is very important and proper data management helps in ensuring that vital data is never lost and is protected inside the organization. Data security is an essential part of data management.?It protects employees and companies from various data losses, thefts, and breaches.
  • Data Quality Improvement: Better data management helps in improving data quality and access. Therefore, better search results are obtained in a company with better and faster access to the organization’s data, which can aid in decision making.(Ring Lead, 2019).?

6.0 Conclusion

Management is at the heart of quality data production. The role of management is very vital in all stages of data pressing and data demand. Good management is required at all stages of data processing. When management fails at one stage, example data collection, then the data collected at that stage will be error prone and since one stage feeds the next the cycle will run and final results will be poor data.

Data in its original source needs to be extracted processed into information and utilized for evidence based decision making. For data to be extracted the management must first identify a gap, develop a work plan, provide resources, supervise. Monitor all the data processing stages, define privacy and access, evaluate and ensure proper storage and retrieval for use when needed. The management role in data handling is key in determining the quality of data processed into information. Thus the implication of “no bad data just bad management”?

REFERENCES

1.??????Benjamin Kumwenda,& Jimmy-Gama, Dickson & Manyonga, Velia & Semu-Kamwendo, Noella & Nindi-Mtotha, Beatrice & Chirwa, Maureen. (2014). Factors Affecting Data Quality in the Malawian Health Management Information System. 286-292. 10.2316/P.2014.815-028.

2.??????Galetto, M. (2016). What is Data Management? A Definition of Data Management.

3.??????Kay, P. (2018). The Importance of Data Management to Demonstrate the Value of Analytics.

4.??????Kirkwood, Jr, Hal. P. (2020). DATA PROCESSING AND DATA MANAGEMENT.

5.??????Lebied, M. (2018). A Guide To The Methods, Benefits & Problems of The Interpretation of Data.

6.??????Martijn N., Hulstijn J., de Bruijne M., Tan YH. (2015) Determining the Effects of Data Governance on the Performance and Compliance of Enterprises in the Logistics and Retail Sector. In: Janssen M. Et al. (eds) Open and Big Data Management and Innovation. I3E 2015. Lecture Notes in Computer Science, vol 9373. Springer, Cham

7.??????Melanie. (2019, July 21). The Consequences of Businesses Holding Deficient or Bad Data. Retrieved from https://www.unleashedsoftware.com/blog/consequences-businesses-holding-deficient-bad-data

8.?????N.L. Martijn msc, R.A. Jonker RE RA. (2016, March 10). The Effects of Data Governance in Theory and Practice. Retrieved from https://www.compact.nl/en/articles/the-effects-of-data-governance-in-theory-and-practice/

9.????Ortega, D. (2016). Implications of Poor Data Quality in Healthcare.

10.??Pearlman, S. (2019). What is Data Processing? Six stages of data processing.

11.??Redman, T. (2017). The impact of por data quality on the typical enterprise.

12.??Ring Lead. (2019). The Importance of Data Management In Companies.

13.??SmartyStreets. (n.d.). Data Management: When Good Data Goes Bad. Retrieved from https://smartystreets.com/articles/data-management

14.??Smith PG, Morrow RH, Ross DA, editors. Field Trials of Health Interventions: A Toolbox. 3rd edition. Oxford (UK): OUP Oxford; 2015 Jun 1. Chapter 20, Data management. Available from: https://www.ncbi.nlm.nih.gov/books/NBK305509/

15.??Surkis A, Read K. Research data management. J Med Libr Assoc. 2015;103(3):154–156. doi:10.3163/1536-5050.103.3.011

16.??Bhatt, V. (2018). What is bad data?

17.??Vasudev, M. (2015). Defination of Bad Data.

18.??Bigelow, Stephen. J. (2006). Introduction to data storage management.

19.??Chad, S. (n.d.). Level of Management: Types of Information that are required at Different Levels of Management.

20.??Cindy, N. (2018). What is Data Integrity and How Can You maintain it?

21.??Data & Society. (2015, September 1). Data & Society — Using data to create a more fair and equitable society. Retrieved from https://www.datasociety.net/announcements/2015/09/01/using-data-to-create-a-more-fair-and-equitable-society/?Fbclid=iwar3qgvhglcr5ozlgoxtkuoy3mqqa1nzrgmwjxkgcls74bwzdtd6jd2iu91c

要查看或添加评论,请登录

Millan Ochieng Otieno的更多文章

社区洞察

其他会员也浏览了