The 10 most common data issues facing business and how to cure them.

The 10 most common data issues facing business and how to cure them.

Many companies work with large amounts of data and find it leads to an increase in work loads and reduced value as the amount of data gets bigger. As the legacy data mounts up it also becomes very difficult to work with it, back it up and gain insight from it. The way companies manage data is often used as a measure of the management efficiency and maturity of that company. The problem is that as the company grows the management of the data gets more and more difficult. We have seen the problem that companies experience many times, we have sought and implemented solutions to these problems and now we are sharing our discoveries and passing on some of the to avoid the pain.

The source of our discoveries:


We start off by asking:

  • What are most important activity you do in your business?
  • Do you have any pain associated with this activity?

We start form here because our background is software engineering and relief from painfully tedious processes is something software is quite good at. When asking these questions the answers invariably relate to issues about data and information. While we come across a lot of businesses who believe the problem is software, more often than not it is data that is the root of the problem, not the software or the processes, nor their employees (it is surprising how many of the owners of the systems and processes blame those who have to use those woeful systems).

We discovered issues from analysis of the system and data requirements of a number of companies, these include;

  • A large UK based company with a presence throughout Europe. Our client was a high data user and processed over 100,000 transaction a week. They employed 200 people and has a £1billion plus turnover. We worked with them over a number of years.
  • Data collected during a detailed four month study conducted into an international company examining their data and system needs derived from data collected during 25 hours of in depth workshops held across subject areas involving 50 subjects, telephone interviews with subject matter experts across countries and continents, and from extensive one on one interviews across all senior and board level management. This company employed 300 people worldwide and had a £900 million p.a. plus turnover.

Aside from the larger SME’s we held in depth interviews with 20 smaller  businesses in retail, service, manufacturing, finance, leisure, marketing and technology sectors.

Below we share the 10 most common pains that we found. 

"Data Pain No 1: Inability to compare data held in different locations and from different sources."

This list is not ordered by priority or occurrence but if it were this is the heading that deserves to be at the top. Not simply from a technological perspective but from a business performance point of view, the existence of information and knowledge silos is the biggest pain felt by most companies. Business consultants place it at number one when picking the low hanging fruit leading to efficiency, cost saving and overall improved performance. There are multi-billion pound businesses built on simply providing bits of joined up information. There are whole market sectors based on this activity, recruitment and the staffing sector to name just one. Many of the pains listed below can be alleviated by integrating data and enabling operations between data from different sources. The data that the finance and the sales departments hold are inextricably linked but all too often they are trapped in different silos. Getting information that crosses these and other silos can be a excruciatingly painful process but without the knowledge that spans silos many companies are crippled by processes that create even further silos. They will take data from here and there and then put it into a third inaccessible place a spread sheet, that is by its nature a standalone and cruelly inefficient processor of information. If look at businesses that span geographical areas the problem multiplies. Even now, and despite awareness of the internet, most companies, cut out whole swathes of information because it is external to their systems or they do not know how to capture and integrate it. Worse still many companies avoid dealing with masses of information that is held by them because it is held in different places.

The main points that can be addresses when looking to cure most of causes of a businesses pains are found in joining up information so that it can inform joined up thinking and decision making. This is not as big and expensive a solution as it once was. There are now ways of bringing both big and small data into a manageable space where it can serve the interests of companies of all sizes.

"Data Pain No 2: Non conforming data, e.g. invoice discrepancies etc."

We did some work with one company who had discovered an astounding £5 million pounds worth of invoices, billed to one client alone, had notable discrepancies. Their client naturally did not just put the payments on hold they put the company on hold until the issue was resolved. We gently told them that if they had this issue with one client they would have it with others, the finance department rejected that this could be the case but it unfolded to be exactly the case. The purchase and sales ledger departments had a database application built to attack the issue and help them resolve the problem, unfortunately this turned out to be two stand alone databases one held by each department. Non conforming data breeds non conforming data. The last thing to do when you find problem data is hive it off into separate piles and lose the relationships, this is essentially what they did. It was how they would have done it manually using paper records, each team would have a copy and they would strive to reconcile them and them compare notes. What you need to do is append information to the source data, with appended information all involved can track process and actions. The appended data may be a relationship or metadata this all depends on the data store that is used and the structure. The second thing to do with non conforming data is to design consistency checks and append the results of these to the data. Visualising the data and the relationships between issues reveals patterns, these will enable a prediction based approach to finding and resolving non conforming data. Simple machine learning processes will also go a long way to finding and fixing inconsistency. These routines can then be added as sanity checks for new data as it comes in. The diagnostic that were built to find the issues should have been in place as a part of error checking process but adding them later is better than not adding them at all.

Main points to relieve pain when non conforming data rears its ugly head is use the data for self analysis, change the structure to allow appended tracking items, find patters, devise searches that will bring the non conforming data to the fore. Do not wade through it manually looking for a fix; automate, apply forensic search algorithms, implement machine learning. do not let bad data spawn more bad data.

"Data Pain No 3 Delayed access to data, reports out of date."

A company we were involved with held monthly board meeting, where a main agenda item was operational. They examined sales, income, expenditure, profitability etc. They had two main issues - the first was all the summary reports presented at the meeting were not even close to real time, they were at least half a month out of data and sometimes even longer adrift. The second problem was that a large amount of manpower was expended every month assembling the data to build the summary reports, some subject areas had good sized teams working flat out for the 2 weeks before the meeting so as to prepare the areas summary report. This led to much guesswork and decisions being made on out of date information. I did sit in on one meeting and heard the comment that although we are showing a big drop in sales in for this month, once the end of month (the last two weeks) come in we might have no drop at all. Sounded a bit like driving a car at night with no lights and saying 'because there was a lot of bends in the last half mile there might not be any in the next half mile - so speed up a bit'. It might be the case that there are no bends but ‘it might be’ is not a sound base to make decisions from.

The way to relieve the hurt felt with delayed access to data is to build reporting into the storage. Rather than collecting data and then taking it away and summarising it, make the reporting a feature of the store. This way real time information can be summarised on demand. Capture data as it is created, report it as it is - in real time.

"Data Pain No 4 Limited or no metrics, no information on internal and external KPI’s."

I have been exposed to many large businesses that have grown through different methods, a mix of organic and acquired growth can cause a situation where managers do not really know what is going on with both the inputs and the outputs of the company.  Are the suppliers performing well, is the company performing well, are the division performing well are the departments within a division performing well, without measurements no one knows. I came across one ISP who were unaware of where their downtime was and what caused it. They set themselves a target of three nines (99.9% uptime) and promises this to their customers. Some customers were getting an uptime of 4 nines (i.e. 99.99%) but other were not so they regularly paid compensation to sections of the customers. They provided branded access to capability purchased from different sources. some of their suppliers met vey high availability over short periods but excluded planned downtime from their KPI’s. This was the route of the problem they had no combined logging of their supply and no logging of their resources, the metrics were not collected. Although for this company it was a simple fix, logs and data being at the core of this kind of business. Metrics on performance both as a supplier and as a consumer it is something many companies fail to capture.

Main points to avoid the hurt of missing performance targets is collect and make available all appropriate metrics is to look at ways of monitoring them. Dashboards and alerts are easy to understand and to respond to.

"Data Pain No 5 Lack of understanding of customer, habits, preferences, satisfaction."

Any business that uses the web to sell or to communicate with its customers has information on those customers at its disposal. So why do so many companies list this as a data problem? In the main this is because they are not sure how to get at this data and when they do get the data they are not sure how to use it. There is a lot of data on their customers both internally and externally. The company web site can use Google analytics to capture customers usage, preferences, viewing habits, click throughs and (if it is an ecommerce site) number of conversions. If it is not an transactional site it can still provide a lot of information on what customers do and what they are looking for. Externally customers will have a digital life and using some of the big data sources, like Twitter and Facebook, will open up a very rich seam of information about customers, especially if it is combined with internal data. Finding out about your customers through social media can give a profile of many things. This includes; their interests, their view on products, their habits and many other things that can help to connect with them.

The main ways to alleviate this pain is to start collecting data about your customers and begin analysing that data. Your customers interact with your web site explore what insights analytics can give you, they will likely have a social media presence find them out there and capture information. Capture information from social media about people similar to your customers.

"Data Pain No 6: Extensive time and effort being spent on manual data entry, extraction and analysis."

One day looking for the data you need is one day wasted. Two days assembling data for a report is two more expensive days wasted. Three days inputing data is three days squandered. I could go on but the picture is pretty plain to see. I did some work with one company who ended up employing a team to grab data from multiple sources and combine it in excel to generate graphs to send back to their customers. They were enlightened enough to know that their customers appreciated being given data backed information but it meant that a sales team of 4 front line sales professionals had a support team of 12 manual data miners and an equal number of workers who entered data in multiple places to be be collected by the manual data miners. We created an application for them that scared a big section of those who worked in the finance department as it completed in seconds what it was taking them a week to do. They nicknamed the application the Robot. None of them were made redundant when their time spent on manual entry and extraction was removed, they moved on to more important and less mind-numbingly boring tasks however. They voted the Robot as the best thing that ever happened.

One way of curing these pains is to eliminate manual data entry at every opportunity, automate the capture process. Do the same with the extraction process.  This relieves two pain points waste of effort and the inevitable errors associated with manual input. It has the added bonus of freeing up valuable expertise that may is better spent on more important tasks.

"Data Pain No 7: Lack of ability to share insights and information in meaningful forms."

It is all well and good having access to data but when information is collected and when it is analysed and more importantly once knowledge is gathered, there must be a way of sharing that knowledge in a format that is understandable and easy to read. It also needs to be distributed in a timely manner and when required shared in a secure way. The is no really acceptable reason to send out impossibly dense and difficult material that takes ages to wade through let alone digest. Nor is it acceptable to distribute nothing but a few facile headlines and a picture or two. I could say hands up all of you who have attended a meeting and on entry been given reports comprising hundreds of pages of facts on multiple topics or been shown a slide deck that glosses over everything you though relevant. Data and the insights it provides need to be shared and they need to be delivered in a way that makes them easy to understand while at the same time allowing a drill down into underlying facts and deep dives into specific areas.

The main points for pain relief are having processes to present information in meaningful formats that are secure but easy to access. There are many excellent visualisation tools that can present information far better than just plain tabular formats. There are many secure ways of sharing this information that can be built into the system that hold the information.

" Data Pain No 8: No factual consistency due to different versions and out of date information."

There really is no excuse for sharing out of date and conflicting information but it is all too common, and is symptomatic of data silos and non connected data sources. Many document sharing application actually magnify the problem because they have no inbuilt structure. The same document can exist in many places. Multiple versions can and do exist. The what happens when the document is updated is that different versions begin to breed like rabbits. Multiple versions of documents are spawned and start to be placed in other repositories .

The points where pain can be prevented are, at the data level, creating documents dynamically from the data store ensuring that the latest information is in that document. Within the store it is easy to also add version, change history, creation and update information. These can be within the body of the document or appended to the core data and visible as meta-data.

"Data Pain No 9: Insight lost in aggregated data and summary views."

It is very easy to lose information in aggregate and summary reports. As an example imagine you have a graph of say, the companies sales performance, or, of visitors to the web site. In a period of high activity a downturn in sales on one product or a page getting high bounce rates may be hidden in the peaks caused other products or pages. Now imagine the same scenario but with forensic information such as system logs, a failing machine may be obscured when the information combined with the rest of the machines. I was involved with one business where we collected a large amount of transactional data every Monday if the overall transaction levels were around average we would assume everything was ok, but if we examined every transaction source independently we may find that some sources were not transacting at all, and once discovered we would examine these further, sometimes it was an issue other times it was not. The trouble was with such a large number of transaction sources is it was a very time expensive process and automatic monitoring could throw up equally wasteful false alerts because on occasion there were normal reasons the units would not transact. The more data we had the harder it was to find symptoms of non conformance. We needed the data to learn about itself and tell us when something was wrong. This is exactly what we did by implementing machine learning processes to discover abnormal states.

The point of summary views is to avoid the pain of dealing with very large, dense and increasing amounts of data. A bit like taking too many aspirin can mask something a bit more serious aggregating data can mask important trends, non conformance and important triggers. Data captured in real time can drown the messages it is giving. Keeping information in a place where you can apply machine learning, pattern recognition and algorithms that can predict in a forensic framework will allow very quick insights from very large content.

"Data Pain No 10: Insight lost in aggregated data and summary views."

Data is alive and living somewhere in every company, gone are the days when it was trapped on paper killed off and buried in dusty filing cabinets and archives. Many business people would disdain if you brought out their business plan of years ago and propose it as next years strategy but it is possible that hidden within it are some very valuable insights, especially if the business is successful and was always forward thinking, agile and dynamic. The only problem with these insights is that they were of their time and now need to be looked at form a different view point. Now I am not suggesting that you dig up antiques from the company vault and dust them off but zoom the time scale and think of business planning of the recent future, last week or last month, it is as likely that these need the same treatment as the older material, they need to be looked at from a different viewpoint, especially in a business that is successful, forward thinking, agile and dynamic. The changing perspective also occurs as answers come in, think how many times you have run a report only to run it again because of something you saw from the first run. Gaining knowledge is an iterative process.

The pain of static information is cured by new combinations of old tricks, revisiting a data set and rejigging the questions you ask of it will open up deeper knowledge and insights, a question asked from last week may elicit a different answer if it is asked of this week's results. On demand real time reporting allows the perspective of the most recent knowledge to be acquired but being able to rephrase the question in respect of the answers to the first question is at the heart of interrogative enquiry. Information is flexible and you can re examine it in many different ways with the right tool set at your disposal.

Some of these pains can be very dangerous.

 

First Published on our company blog

Vala Ali Rohani, PhD

Lead Data Scientist | University Lecturer

8 年

Very informative post! thanks Bruce Robbins

Nice article. And the solution lies in the Data Science and a Scientist is Responsible .

要查看或添加评论,请登录

Bruce Robbins的更多文章

  • Let us not assume we are at the point where AI is sentient.

    Let us not assume we are at the point where AI is sentient.

    Bruce Robbins 7 min read Just now Summary While AI has made significant progress, it is not yet sentient. Generative AI…

  • Sports Coaching and an Agile Philosophy

    Sports Coaching and an Agile Philosophy

    “If you don’t know why you failed, how can you improve? If you don’t know why you succeeded, it must be an accident.”…

  • Managing Agile Teams

    Managing Agile Teams

    Is there a place for project management in Agile? There is debate on the role of management in Agile and Lean projects…

    4 条评论
  • Data, Continuous Improvement and Agility

    Data, Continuous Improvement and Agility

    "Two nights in the velodrome. Two sensational gold medals, two world records, two Olympic records.

  • Agile Lean and Barely Legal

    Agile Lean and Barely Legal

    Avoiding disasters by including Legal subject matter experts in agile/lean teams. A friend responded to some comments I…

  • Scaling Agile

    Scaling Agile

    How to apply Agile Principles to large scale transformations I have been asked if Agile methodologies can be scaled…

  • Nouveaux trucs, Nouvelles combines:

    Nouveaux trucs, Nouvelles combines:

    Open source and the tools that are making a devs life a bit easier Not so long ago the difficulty in working with data…

  • MACHINE LEARNING?—?Some Bones

    MACHINE LEARNING?—?Some Bones

    Machine learning is a loose term, it covers many activities. From a software engineering aspect it could be seen as an…

  • How Machine Learning Builds Your Applications For You

    How Machine Learning Builds Your Applications For You

    Or what do user architects have to do with machine learning? User architects are the users of the system. Even before…

  • Room for Improvement

    Room for Improvement

    A Step beyond Dumb Data Storage As an application working with data, XCiPi has capabilities that can benefit a number…

社区洞察

其他会员也浏览了