Machine learning: Skynet or evolution?
HAVE YOU RECENTLY found yourself wondering what the heck is machine learning? Or what's a neural network? Don’t worry, you're not alone.
The rise of the development of machine learning has made deep learning and artificial neural networks—computer programs that facilitate that learning—a common topic of conversation.
While this might not be your run of the mill dinner party conversation, with the nerdy underground, and the money behind us—it certainly is.
One reason for it's popularity is that Google, Facebook, and most big tech companies are investing heavily in them. (Read more about their machine learning efforts: Google. Facebook.) A lot a people are betting on machine learning. Including me.
Black Swan Free Zone
Artificial Intelligence (AI) and technical progress in general, also has some people worried. There are legitimate concerns about how humanity and technology will progress in the future. A common narrative of the worst of our fears is that machines evolve and try to eliminate humanity (Skynet from Terminator). Or they trick us into a state of surrender (The Matrix). These stories represent people’s misgivings over a tech-dominated, dystopian future. And currently neither one is a high probability.
Personally, I think AI and machine learning thoughtfully applied to important human problems (understanding climate change, analyzing risks, studying environmental factors that impact human health, ..) can help make the world a better place. I won't simply dismiss the naysayers as Luddites, but as my Mom would say, "a little education can go a long way."
With that in mind, I've put together some notes on machine learning and neural networks to help people understand these subjects in more detail. In order to keep it simple, I'm going to break it up into two posts. The first (this one) will be about machine learning. The second, to follow shortly, will be on the subject of neural networks.
As someone who has been involved in data science, predictive analytics and machine learning for many years, I have some experience working in the trenches. As the Chief Engineering Officer of a analytics company, I've been lucky to be on the ground floor building these technologies into real products. So here's my 25 cent tour of one of the most interesting and exciting technical trends in the last 30 years. I hope it helps!
Machine Learning 101
Machine learning is a type of artificial intelligence (AI) that is essentially a method of teaching computers to make and improve predictions - based on data. Machine learning is also a type of data analysis that automates the building of analytical models. It deals with designing and developing algorithms to evolve behaviors, or the ability to predict future events, based on empirical data.
One key goal of machine learning is to more broadly generalize from limited sets of data. Using algorithms that iteratively learn from data, machine learning allows computers to find hidden insights without being explicitly programmed where to look.
The iterative aspect of machine learning is critical because as models are exposed to new data, they are able to independently adapt. They learn from previous computations to produce reliable, repeatable decisions and results. It’s a science that’s not new – but one that’s rapidly making in roads on multiple problems.
Because of new computing technologies, machine learning of the past is a far cry from machine learning today. While many machine learning algorithms have been around for a while, the ability to automatically apply complex mathematical calculations to big data – over and over, faster and faster – is a fairly recent development. Here are a few widely publicized examples of machine learning applications you may be familiar with:
- Amazon and Netflix recommendations engines
- Google self driving car
- Credit scoring and next-best offers
- Email spam filtering (Yes!)
Deus ex machina
Machine learning tasks are frequently classified into three main categories, depending on the nature of the learning "signal" or "feedback" available to a learning system. These are supervised learning, unsupervised learning, and reinforcement learning. Supervised learners currently make up the bulk of all machine learners in use today. According to the analytics company SAS, upwards of 70% of all machine learning is of this type. Unsupervised learners make up approximately 10 to 20 percent. Below are definitions of the three main types:
- Supervised learning algorithms are trained using labeled examples, such as inputs where the desired output is known. It is an algorithm that uses a known dataset to make predictions. For example, an experiment could have data points labeled either “T” (true) or “F” (false). The learning algorithm receives a set of inputs along with the corresponding correct outputs, and the algorithm learns by comparing its actual output with correct outputs to find errors. It then modifies the model accordingly. Through methods like linear regression, classification, prediction and gradient boosting, supervised learning uses patterns to predict the values of the label on additional unlabeled data. Supervised learning is commonly used in applications where historical data predicts likely future events. For example, it can anticipate when credit card transactions are likely to be fraudulent or which insurance customer is likely to file a claim.
- Unsupervised learning is used against data when we have no historical labels. The system is not told the "correct answer." The algorithm must figure out what is being shown. The goal is to explore the data and find some recognizable patterns. Unsupervised learning is effective with transactional data. For example, it can identify segments of customers with similar attributes who can then be treated similarly in marketing campaigns. Or it can find the main attributes that separate customer segments from each other. Popular techniques include self-organizing maps, nearest-neighbor mapping, k-means clustering and singular value decomposition. These algorithms are also used to segment text topics, recommend items and identify data outliers.
- Reinforcement learning is often used for gaming, robotics, and navigation. With reinforcement learning, the algorithm discovers through trial and error which actions yield the greatest rewards. This type of learning has three primary components: the agent (the learner or decision maker), the environment (everything the agent interacts with) and actions (what the agent can do). The objective is for the agent to choose actions that maximize the expected reward over a given amount of time. The agent will reach the goal much faster by following a good policy. So the goal in reinforcement learning is to learn the best policy.
Conclusion
Growing volumes and varieties of available data, computational processing that is faster, cheaper and more powerful, combined with more affordable data storage - is changing the way we look at problems. And which ones we can tackle. Data mining and Bayesian analysis continues to grow in popularity.
All of these things mean it's possible to quickly and automatically produce models that can analyze bigger, more complex data and deliver faster, more accurate results – even on a very large scale. The result? High-value predictions that can guide better decisions and smart actions in real time without human intervention.
Hopefully this was helpful, but I'm interested in your feed back as well. Would you like more or less technical detail in a post like this?
"Progress is impossible without change, and those who cannot change their minds cannot change anything." - George Bernard Shaw