Applied AI Is About Formulating Questions
Working in the Machine Learning space the most common question I get is “How do you find enough Data?” This can be a fascinating question or a sign someone is looking for an excuse to avoid change. The simple answer is that data is a precious commodity, and AI engineers are as addicted to data as our predecessors were addicted to oil. We purchase, beg, borrow, or you know, find it after it “falls off the back of a truck.” It’s the only way to do what we do best, solve problems in the messiness of the real world.
In another tone of voice, “Where do you get your data?” betrays a defeatist attitude. The person is often implying “I can’t use machine learning for this problem because we don’t have sufficient data.” In my experience, lack of information is cured by creativity. When most of the players in an industry consider the environment to be information poor, the time is ripe for disruptive innovation. Five years ago, who would have thought a person’s social media posts could predict their future pregnancy without any direct human input? It seems obvious now, but in the days before AI driven advertising, it was unthinkable.
People who ignore machine learning despite having access to top notch developers, share a common mental error. These people expect that their current approach to the problem is also the optimal AI approach. They don’t see the real value of AI, the ability to incorporate a whole new kind of intelligence into the problem solving process. The human mind uses narratives, to predict future outcomes: “Stacey went to the store and bought one quart of milk and one box of cookies, she probably intends to consume these items together, while watching a movie, as soon as she gets home.” an AI’s strength lies in finding correlations, the milk and cookies may not evoke an image of sitting on a couch watching “A Handmaid’s Tale”, but in combination with Stacey’s date of birth, income, and height, an AI may determine that she has a 35% chance of changing her preferred brand of shampoo within the next 12 days. The former, is a human-style prediction based on reasoning, the latter is a pure statistical fact. We might be able to rationalize it after the fact, but it could never have been deduced from first principles. While it’s true that Machine Learning thrives on data, an AI’s definition of “relevant data” is so broad it requires us to go outside ourselves and approach challenges from a new perspective. The reward is a superhuman level of optimization and insight.
In my line of work, a typical misguided approach to Machine Learning would be “I have a list of sales prospects, I want to use AI to sift out the good ones so I don’t waste my time.” Someone like this will usually have a purchased database of basic consumer information for his prospects, and a record of the consumer information for previously successful prospects. He or she is probably thinking in terms of comparing the lists to see which prospects are most similar to prior customers, something a human could theoretically do given infinite time. The question should be: “Given my budget, my business model, and my prospects. How do I maximize profit?” Instead of imposing a human narrative on AI, specify the parameters of the problem and indicate the variable that should be maximized. Not the number of mailings, not the conversion rate, not the rationality of the method, just profit.
Once the problem is properly specified, turn the problem over to an applied AI specialist. The problem solving process is a collaboration between human rationality and machine intelligence. Engineers use a combination of experience, experimentation, and higher order reasoning to choose and properly calibrate the variables available to the neural network. The neural network uses deep learning to find correlations between each variable, individually and in combination, until the outputs are optimized for maximum profit. The result is strong predictive power, beyond what traditional human-only statistical methods could reasonably provide. It’s the difference between a hammer and a nail gun: faster, stronger, and more accurate.
This is the true power of AI, and the reason why data is less of a bottleneck than people think. Armed with AI’s ability unlock hidden correlation, the definition of relevant data is constantly expanding. In 2018 your cousin’s DNA sequence is your personal medical information and your sister’s google search history is your private consumer data. With a neural network doing the hard statistical work, only bottleneck is human creativity in using the tools.
Next time someone tells you they don’t have enough information, challenge that assumption. Perhaps they are stuck in a pre-AI mindset and are asking the wrong questions. Even the best engineers can sell themselves short through lack of imagination. Likewise, if you have an employee who wants to incorporate AI into your processes, don’t be too quick to dismiss them. AI is truly a parallel form of intelligence, complimentary to our own. Expect results so good they don’t make any sense. It’s the nature of the process.