The Path to AI Success Begins with Quality Data

The Path to AI Success Begins with Quality Data

Artificial Intelligence (AI) is being embedded into the technology landscape and people are starting to plan their own journeys with AI. It sounds simple: feed data to a machine, it magically learns what it is supposed to do, and poof – you have AI. While in theory, that is sort of what happens, poor data quality is a key struggle with AI implementations (a key takeaway from O’Reilly’s recent AI conference). How can you avoid data quality impeding your AI success? It starts with accepting that existing data may not be able to be used as-is, then requires an honest assessment and planned approach to enable a more successful AI implementation.

First, a bit of background and observation. IBM quantified that data quality is a $3.1 Trillion problem for the US economy. While not all those data issues will be fed into AI implementations, clearly there is a problem in the marketplace. Thomas Redman attributes bad data to hidden data factories, but some of this may be understandable. Producers of data – operations teams, outsourcers, data brokers, etc. – have resource constraints and set quality standards that produce clean-enough-to-sell data. On the flip side, consumers of data then have managed their decisions and actions around data uncertainty, not necessarily happy about it but working within the limitations. This leads not only to inefficiencies but inaccuracies in the data.

Data that is clean-enough for people may be too dirty and insufficient to properly train a machine. Wrong answers and ambiguity in the data creates confusion for the machines, even in advanced machine learning techniques like Deep Learning that are designed to work more like the human brain. Further, the current format of the data may work in the current process, but the machines may need some reformatting so that the algorithms can interpret it correctly.  In time, algorithms may evolve to account for these ambiguities, but in the current environment, the data may need some cleansing.

As a best practice, it is recommended to account for some data work at the beginning of the project, often some brief analysis and manipulation. An initial approach may simply be to understand the quality of the data, such as Redman’s straight-forward data quality method called FAM. This or other methods can be used to generate a baseline quality level and help predict the best next steps before feeding the data to the machines. Likely, there will need to be some form of formatting or tagging of the data, but if quality levels are insufficient, additional quality steps to ensure that the data is accurate-enough to train the machine properly. Expect for AI implementations to need people, but by having a strong baseline from the initial analysis, it will be simple to understand the impact AI can have on your organization.

To conclude, AI is starting to work its way into our businesses and business processes. It can have a significant impact but not without high-quality data to feed into the machine. The path to AI success begins with quality data. 

Warren Fish

Director, Governance, Risk, & Compliance

8 年

I can't wait for AI to take over in the kitchen! <https://www.hwdyk.com/q/images/futurama_s03e22_02.jpg>

回复

要查看或添加评论,请登录

Kyle Hoback的更多文章

社区洞察

其他会员也浏览了