AI Enabled Business: A Data Centric View
Debiprasad Banerjee
Artificial Intelligence Evangelist & Advisor | Global Business Leader
A quick recap – The last discussion was about data driven decision-making being the future of all successful enterprises and touched upon the importance of data quality in being able to successfully adopt AI as part of their strategy. In this section we will talk more about the data itself and the challenges associated with using it for AI applications.
Today, businesses worldwide are generating unprecedented amounts of data and given cheaper and abundant storage, most of it is available in online/real time mode. However, can the available data be used by the AI applications in its current form? And if not, then what does it take to do this? But first, a little more background for those who are less initiated into the world of AI.
While deploying an AI application to solve a business problem it is very important to begin at the beginning. Like all important things in life, these may sound very basic and simplistic, but are key to the final success of the project.
1. Define the problem – What is it that you want the AI application to do?
2. Output to Outcome – How will you use the AI output to improve your business outcome?
3. Check the Data – Do you have the right set of data to solve this problem?
While the first two can also be very challenging to answer, lets focus on the third one here.
At this stage it is important to make a clear distinction between two very specific types of outcomes (there are others as well) that are typically desired by the business while deploying an AI application. First one is to augment the human and the second is to replace them. A simple example will demonstrate this key difference – Consider large sets of data from all possible areas of the business brought together in one place to look for hidden insights on customer behavior, sales projections, next bubble … whatever! A human being looking at this would probably do an ok job at best. However, a cleverly selected AI algorithm would be perfect – really fast and effective. Now consider an existing process where humans are doing routine tasks that are repeatable and boring but utilizes some unique human faculty such as looking at an image to derive meaning or listening to a conversation to respond adequately. Trying to put an AI application to replace (partially or totally) the human from this workflow is not a trivial job and is far more complicated than the prior example of parsing data. Consequently, selecting the right AI algorithm, picking and curating the right set of data to train them, deploying the AI and putting in place a continuous learning loop are all very different in terms of scale and complexity in the two examples above. The AI used for the first example is generally called Machine Learning (ML) and the second one is a specific subset of ML called Deep Learning (DL). In both examples of using ML or DL to solve a specific business problem, the challenges related to the data are probably the trickiest to handle but in case of DL it is even more so.
Before we look at the actual data and related challenges that it poses there is one more important thing that needs be touched upon and that is the concept of supervised and unsupervised learning. A large proportion of the AI applications used today are for business applications that require definitive outputs which are best produced by algorithms such as regression or classification and they use the supervised learning method. Whereas problems like clustering can be better solved by unsupervised learning algorithms. In supervised learning, as the name suggests, we tell the AI algorithm what a set of input data means and what meaning (prediction/inference/class) should be derived from it. For example, if the input is an image of a cat then we also need to provide an additional input data field (label) along with the image which says this image is a cat. If you extend this to large and complex data sets that real life business situations have, then it is easy to see how quickly this can become a huge exercise of data tagging/labelling that needs to be done before we can feed the data to any AI application. And this now becomes a part of the whole data cleansing and preparing exercise.
We will take a closer look at the processes pertaining to the data pipeline and its management in the next discussion. Based on what we have discussed until now, it seems likely that considerable resources need to devoted to prepare the data for enabling a technology led transformation where AI can be used to make data driven decisions.
Hi Debi nice to see the article from you. I manage RPA sales and BD now. It will be nice to connect if you are fine. Hoping you still remember!