Machine Learning Algorithms 101
In simplicity, machine learning involves feeding data into an algorithm so that the algorithm can learn from the data. Data is easy to conceptualize, typically being numbers, text, sounds, or images. Algorithms may not be as familiar, especially if you are just starting to learn more about AI.
What is an Algorithm
An algorithm can be described as a structured procedure with steps, to solve a problem or complete a task. A simple example of an algorithm is a recipe for making an apple pie.
According to wikipedia, a computer algorithm is ‘an instance of logic written in software by software developers, to be effective for the intended "target" computer(s) to produce output from given (perhaps null) input.’
Machine Learning Algorithms
Algorithms for machine machine learning can be grouped into four learning styles; supervised, semi-supervised, unsupervised, and reinforcement learning. Algorithms can be further categorized into types, including regression, decision tree, regularization, Bayesian, clustering, neural networks, deep learning, and more.
For a more detailed overview of some popular machine learning algorithms, take a look at this article on the TowardsDataScience website.
Working with Machine Learning Algorithms
Selecting machine learning algorithms for a specific project starts with understanding the type of problem you are solving. Typically this is done by skilled data scientists or machine learning engineers. Once you identify the algorithm to use, you can start to build your machine learning model.
Algorithms are loaded into machine learning platforms via code. For example, using the Google Colab platform, algorithms are loaded into the system using Python code. Data is then uploaded to train the model.
Machine learning projects typically go through the testing of several algorithms to find the one that works best for the issue being worked on. Sometimes this results in an ensemble method, which uses multiple algorithms for better predictive performance.
Algorithms can be further optimized with what is called hyperparameter tuning. Hyperparameters are values and settings used to control the learning process and behavior of algorithms. One of the most common hyperparameters is the learning rate, which controls the speed at which the model learns. There are many other hyperparameter values that can be tuned for model performance. This article from Google’s AI training website goes into more detail on this subject.
Algorithms Gone Wrong
Let's revisit the algorithm for making an apple pie. Say you give this task to your personal robot chef. You produce detailed steps, except you get lazy when it comes to the apples in the recipe. You assume the robot chef will figure out what apples to use. This could be disastrous. Your robot chef may use crab apples instead of your favorite Granny Smith apples, the robot may run to the nearest orchard and pick rotten apples from the ground to use in the pie, or it might decide to use the entire apple including the core and seeds.
Algorithms only work when the data is of high quality, relevant, and not subject to producing biased results. Data with missing values or incorrect values in fields can cause algorithms to break or produce poor results. If you were building a machine learning model to identify images of dogs, and you only included large breeds, it probably would not classify a chihuahua as a dog. One of the biggest general concerns with AI is that models are subject to bias based on the data used for training. Overlooking this issue can have negative impacts on society. Think about algorithms being used to approve loans or to recommend prison sentences. These types of algorithms are being actively used today.
Some of the most popular real world algorithms are subject to being fooled. Take the recent case, where an artist pulled 99 phones down the road in a red wagon, and fooled the Google Maps algorithm into showing traffic jams.
Wrap Up
Algorithms act as the brains for your machine learning models, learning from inputs of data to produce models that can make decisions and predictions on new data. Algorithms learn from the specific data input into models, and poor data will produce poor results.