Applying Machine Learning to Business Problems
Humberto Moura
Research Scientist & Professor / Machine Learning Engineer / Software Developer
In recent years, we have seen the Artificial Intelligence field of study appear on several news programs on TV, Radio and Internet. Words like Big Data, Data Science, Machine Learning and Deep Learning are quickly incorporated into the vocabulary of the business world.
Bearing in mind that many companies would like to apply these technologies in their businesses, I selected 3 tips for manages and executives getting start to apply Machine Learning in their companies and boost their chances of success.
Tip 1: Differentiate a false expectation from a reality.
Some people insist that Machine Learning solutions are like a silver bullet: you have the data and the technical staff just needs to get your hands on it, and magically, the company will quickly have, for example, accurate answers about new products, insights into customer behaviors and various business solutions automatically, due to some miraculous algorithm (set of instructions).
Other people, advertise some product or service that has been on the market for some time, but it was renamed as a solution that now has Artificial Intelligence, because due to the high demand on the topic and the lack of more accurate information, it makes with old products to sell more, just by the new label. For example, software that internally reached some result to calculate some arithmetic means, but now makes use of a linear regression.
The truth is that to apply Machine Learning successfully it is necessary not only to have technical knowledge on how to extract useful patterns from the data, but mainly how to formulate a good business problem to be solved, in addition to creating a culture in the company of maintaining a complete cycle of care with data, ranging from the correct selection and collection, to the availability of results with a focus on real value for the company’s strategy, to be measured, generally, by the satisfaction of its managers and customers.
For this to happen, 3 critical knowledge is usually necessary:
- Deeply understand the company’s business model and products / services;
- Data analysis techniques and pattern recognition algorithms;
- Knowledge in Information Technology.
Note that these items are difficult to find in one person, as they are different roles, requiring the culture of teamwork. There is little point in having the best data scientist on the market, if there is not a strong synergy of the company to align business managers with the scientist and also provide a minimum of technical IT infrastructure to make the project feasible, in addition to the appetite for taking risks.
Tip 2: Garbage In, Garbage Out!
The quality of the Machine Learning solution is directly related to the quality of the data. As the results are based on a supposed learning about them, this can only happen when a solution (model, algorithm …) is able to faithfully generalize the reality of a business. The most exciting part of working with Machine Learning is, without a doubt, the part of creating models, executing algorithms, and showing results, but it all depends on the quality and relevance of the data that served as input for these tasks. That is, do a good job of base.
In fact, most of the time in a Machine Learning project is spent on organizing, transforming and cleaning data. Some items to be taken care of:
- Is the data really relevant to solving the company’s problem? For example, data on income in a product recommendation solution;
- Where and how will the data come from? Is there a guarantee of availability and updating as the solution requires? Example: data in real time, daily, weekly, monthly, retrieved automatically or extracted manually;
- Does the data contain many blank or null fields? Apply the mean / median / mode of the values or delete the records with missing data?
- Are the values entered reliable? Example: at points of sale, the client ID fields are mostly a standard number such as 0000001.
- Is the data in the required format for solution and is it up to date? Example: we have the age of a person, but it refers to the time when he registered at the store.
It doesn’t matter if the data is structured like Excel spreadsheets or database tables, or if it is unstructured like images, videos and audio; or whether they are obtained, automatically, from the internet or extracted manually. What is important is that they are relevant to the solution of the business problem (generally increasing revenue or reducing expenses) and that they are clean and transformed in an optimal way for the problem, because if garbage comes in, garbage will certainly come out, also. There is no recycling.
Tip 3: Learn to deal with the uncertainties of this type of project
When we work with software development projects, we are used to a kind of reality. There are features, in the form of requirements to be developed, a well-defined schedule, cost and scope, most of the time. In the case of problems, adjustments such as increasing the staff, increasing the budget, purchasing better resources, overtime and prioritizing the scope can effectively solve the problem.
In the case of Machine Learning projects, there may be some very unpleasant surprises, given that the way of conceiving and managing them is slightly different from the usual way. One of these differentials is the way to deal with project uncertainty. When starting a Machine Learning project, it is not possible to promise a perfectly accurate result, in the same way that we tend to do in software development projects.
Imagine a project to recognize wild animals through images taken from a farm camera. When an animal approaches the camera, a photo is created and the Machine Learning software classifies the animal as a chicken, cattle or wolf, for example. Depending on the type of animal that is detected by the software, a different action must be taken, such as counting chickens or sounding an alarm when detecting a wolf. An initial question could be: what is the percentage of correctness in the classification of animals?
Will it be 40%, 75% or 90%? How many images are needed to have good accuracy? One hundred? Five hundred? Thousand? Ten thousand? Two hundred of each animal? When new types of animals appear, how to include them? What is the impact of this inclusion on the new results? Will there be many animals identified incorrectly? What is the impact of incorrectly detecting a wolf?
Before starting the Machine Learning Project, we must always clearly define the success indicators, such as achieving 80% accuracy, and only tolerating 10% error or something. However, it is not possible to say with certainty that a certain standard of success will be reached. Only, we can, through experience, estimate the results. And in some cases, if the technical team is not experienced and does not make the risks clear, depending on the results, frustrations can be generated in project managers and sponsors. In innovation, risk and uncertainty are the factors that define this type of project.
No risks? No innovation! Otherwise, it is not innovation, it is just performing the routine. The good news is that if a Machine Learning project does not reach an expected level of precision, there are several techniques to optimize the results: how to change the algorithms, adjustments in parameters and hyperparameters, try to reduce bias and variance. Therefore, it is always important to have experienced professionals, or start with simple projects and acquire know-how over time.
But sometimes, the only really effective solution is to have a much larger amount of data than we planned to have so far. Therefore, it is important to learn how to deal with the uncertainties of this project planning and communicate them with transparency to all interested parties. Or we will run the risk of hearing a dialogue similar to this: – what is the problem of not getting the necessary precision? We need to solve it! You can buy about 5 more servers that the company will pay for! Or worse: This happened because we did not buy the Deep Learning “Ultra Power Mega Enhanced” cognitive computing services from the renowned company XYZ Corporation.