How Machines Learn (and Why It Matters)
Machine learning is a scientific discipline that explores the construction and study of algorithms that can learn from data. Such algorithms operate by building a model based on inputs, and using that to make predictions or decisions, rather than following only explicitly programmed instructions. - Machine Learning, Wikipedia.
There are a lot of buzz words floating around in the Data Science space. Machine Learning (ML) is one of the more intriguing of these, and while often brought up in discussions about skills that a data scientist should have, it's also one that tends to be very misunderstood, in part because machine learning is even now far more pervasive than most people are aware of.
In one respect, any time that a programmer writes a program, a computer (we'll use that as our titular machine for now) has "learned" something. The computer will typically take some action (either human or externally initiated, such as a clock reaching a certain time of day), pulling libraries together based upon configuration files and executing them, possibly saving the results of these operations in memory or on physical media such as a hard drive or solid state drive, or sending other information to a network address.
However, for all of this, the computer itself is not really doing much beyond following specific instructions. If a computer makes a mistake in an infinite loop, it will continue making that same mistake until either it runs out of memory or causes some irate user to turn the computer off, thinking (correctly) that it's frozen. In essence, it's actually the programmer who has learned, and it will be the programmer who has to go back and write the correct escape hatch to break out of that infinite loop.
Introspection and Inferencing
On the other hand, suppose that the same computer periodically looks into loops and discovers that the same code is producing the same result time after time after time, and sends an alert up, notifying the user that "Your code appears to be in an infinite loop. Do you wish to stop the program?"
Is this machine learning? It's a step closer anyway. In this particular case, the machine has developed the process of introspection, of looking at the code that it is running from within a different process. Introspection is huge in ML, because it is a meta process - a process in which the machine is taking a step back from the instructions that it has been given and performs actions based upon aberrant conditions. It really doesn't matter that someone no doubt wrote that piece of software that does the introspection, it is the fact that introspection is happening which is significant here.
I use the Pinterest application to post various kinds of pictures to the web. Every so often, I'll inadvertently post something that I've already posted, but from a different source. When that happens, the application will tell me that the picture has already been posted, do I really want to post it again? It also features a recommendation engine, based upon who else posted the same picture.
Pinterest is actually a very sophisticated application that appears very simple at first glance (a lot of the better web applications usually are). When an image is submitted, a hash value is likely generated from the source file, a hash can be thought of as a program that, when a string of text or even the bits of a file are passed into it, creates a numeric string that is specific to that sequence of bytes. This is typically the mechanism by which two resources can be compared to determine if they are identical.
When a picture is repinned, then the hash is both associated with the new account and a pointer is established to the previous account that the hash was assigned from. Because multiple people can pin the same resource, each hash ends up being associated with a graph of accounts, while each account is also bound to multiple hashes.
This is a lot closer to a machine learning application. Not only does it maintain introspection, but it also retains memory of its transactions by time and provenance. Pinterest has learned about what you have, and can also determine what things may be of interest to you based upon your selection. In effect, it has learned more about you as a person, based solely upon what images you post.
What's perhaps as important here is the fact that categorization of resources is done largely by the interactions of the various users in terms of what bucket they put things in. Let's say I have a picture of a sleeping cat. One person may put the picture into their collection "Cats and Kittens", while another may put it into "Sleep and Rest". Not only can the individual categories be searched, but with just two entries the system has learned that the image in question shows not only a cat or a kitten but one that's asleep. By searching on terms, and then using a stemming thesaurus to find term equivalencies, this means that the system is cataloguing content with a minimum of effort.
Thus, the ability to effectively self-categorize inbound content is another aspect of machine learning. Since categories overlap, it also means that it becomes much easier to find relationships (recommend things a given user may like) and as such "learn" about things in the external world.
This latter ability is known as inferencing - making an informed guess about how two or more things are related based upon known information. Anyone who has done algebra in high school likely has at least a rudimentary understanding of inferencing - if you can prove a particular base statement to be true, and have an axiom that indicates when a statement follows a certain pattern then it generates specific new statements, then you have participated in building inferences. The game Clue is actually a non-mathematical example of that - as you gain additional knowledge, you can often infer that other statements are true or false (if Professor Plum was in the library with a rope then he couldn't have been the one who killed the victim, who was found in the kitchen with a knife in his back)
Inferencing can be used in several different ways. In its simplest form, inferences deal with assertions that are either true or false, usually creating a graph of relationships. However, more sophisticated inference makes use of fuzzy logic - working with assertions whose truthness is not known for certain, but can only be identified to a certain degree of probability. In this case, what emerges from the graph is a set of Bayesian statements. Without getting into the math, Bayesian inferencing can be thought of as the probability that a hypothesis is true given prior evidence.
For instance, let's say that you're at a casino playing roulette. You notice after a while that the ball consistently lands on a black about seven times out of ten, rather than the five times out of ten that you'd expect given that there are eighteen each of red and black numbers. In a pure statistical analysis, the bias might simply be statistical noise (it is possible to get a string of blacks in a row for a small enough sample) and would be assumed to even out later with more reds, but a Bayesian approach would be aware of the observation (prior evidence) that the blacks seem to occur more often, and would factor in that probability (perhaps the wheel is rigged) when determining the likelihood of a given bet having a positive payoff.
In economics, a similar approach is used called a running average, where the expectation of a stock price's value will be set based upon the last three days, one month or even two, twelve, twenty or two hundred month data points. This running average is long enough to smooth out statistical noise, but not so long as to miss trend patterns. In both cases, the goal is to attempt to account for hidden biases beyond the immediate statistical probabilities.
When applied to a directed graph (one in which a relationship goes from one object to another, rather than being symmetric), this can be used to determine the probability that a certain outcome will occur given a specific chain of events.
In a machine learning environment, both these events and the specific chains or relationships between them are malleable, with input from some processes affecting the likelihoods of other events. To a certain extent, most higher end machine learning systems use some kind of progressive genetic algorithm - in which the output of a particular algorithm is used to improve the algorithm itself to determine best fitness of a given solution (using input from its surrounding neighbors, for varying definitions of surrounding).
A very simple example of neural networks is John Conway's Game of Life. Author and mathematician Stephen Wolfram explored this subject to a much deeper degree, and has provided a compelling argument that such forms of machine learning are a natural extension of Turing's initial state machine description, though such cellular automata typically do not change the rules that determine their state configuration As the creator of both Mathematica (a favorite tool of mathematicians, data scientists and analysts alike) and of Wolfram Alpha, a machine learning - based online search and mathematics tool, Wolfram may very well have some authority to define what machine learning is. IBM's Watson, which gained fame defeating its human rivals on Jeopardy in 2011, is a similar system, again using inferencing over a wide range of data in order to both handle natural language processing and to, in effect, learn from this data.
The Business Case for Machine Learning
So why should you, as a business leader, be interested in whether or not machines can learn? Here's a not so hypothetical use case. Every so often (say several dozen times a second) there are applications that most of the large trading houses use to sample parts of their market and movements of stocks, looking for trends and patterns, in effect, trying specific configurations of stock or commodities portfolios and making test bets on how these will perform over time.
The bets that provide the best returns are then crossed into the pool of higher performing patterns, while those that have lower returns fail to reproduce (this is why these are called genetic algorithms) - a survival of the fittest where the most profitable routines over time live on and the less profitable or money losing algorithms die off. Those that have matured to a sufficient threshold are then set loose into live accounts, to become progenitors of new generations of computer generated algorithms in turn. These are always changing, because the market itself changes, and what worked for one generation doesn't always work well for later ones, but such algorithms can often return considerably better returns over time.
There's three points to note with this particular case. The first is that the algorithms are not necessarily something that can be expressed easily by a human being; they get results, but untangling why they get results could occupy a research mathematician for years. The second is that once such algorithmic traders get established, then they actually have to adapt to the presence of other algorithmic traders, operating at several hundred times a second or faster (indeed, one reason that most high finance traders locate their server farms close to the exchanges is because the latency of the signal passing from the trader's server to the exchange's becomes a significant factor in the effectiveness of such algorithms).
A final, rather scary point is that even with learning, algorithmic traders still are following trends, and as a consequence, it's become relatively commonplace for entire exchanges to drop or rise by several hundred points out of nowhere, all because two or more algorithms got caught in a positive feedback loop. Such flash crashes have led to exchanges instituting a process of being able to back out all transactions in a given period in order to allow traders to reset their trading algorithms. Yet despite these occasional problems, such machine learning systems can prove highly profitable even given their initial costs.
A similar mechanism comes into play in the business environment. A business is generally focused on making investments that return the largest bang for the buck - hiring another sales vice president vs. bringing on two more field reps, starting a new product line outside of the company's existing product suite vs. expanding within the line, trying to determine the best distribution of stores in different geographic regions, given a known competitor's current distribution.
What's perhaps most important here is understanding that the results of such algorithms is a distribution that provides a numeric cost benefit analysis for a given set of actions. It makes recommendations (e.g., these particular movies may be of interest to you based upon your previous selections) that are partially based upon your local profile and partially upon the accumulated profiles of others, and the more that you train your own genetic algorithm (the longer you choose videos through the service), the more likely that the computer's selections match your tastes. A business has a similar profile (e.g., we make orthopedic shoes), and genetic algorithms and machine learning can then determine best investments (whether to branch out, and if so, what would be best market).
Machine learning also plays a huge part in natural language processing, both textual and spoken, and is playing an increasing role in identification of visual and video imagery. Many social media systems are now using this for auto-tagging - comparing different orientations of a person's face in photographs and using the proportions between identifiable parts - eyes, mouth, nose, cheeks, chin, ears and so forth - in order to make plausible guesses about the identify of a person; human feedback ("no, this isn't me, it's him") can fine tune this, and the computer algorithms to identify the initial metrics can then be fed into the input devices (think of all those selfies on phone cameras) and leave the heavy computing to cloud devices.
Video systems can be analysed in much the same way, and can also be used to identify product logos and packaging in much the same way, making it possible to track the diffusion of production placement in media, and from this determine advertising revenues. Such systems are also at the heart of most transcription servers, which take audio and transcribe it, reducing the overall translation process even for amateur video. It is quite possible that web connected video cameras will some day be able to automatically generate and store full transcriptions of everything from school plays to office meetings to political events and protests. This also goes hand in hand with language translation programs, which also uses machine learning to improve the quality of translation - written now, but real time audio is following close behind. A good example of such transcriptions (in English and Chinese) can be seen demonstrated at a recent TedX talk.
These kinds of machine learning are having an impact even upon hardware. Just as more and more non-graphical processing is taking place using graph theory on GPUs rather than utilizing CPUs, so too are you seeing image and audio processing systems manufacturers beefing up their Digital Signal Processing (DSP) chips to better accommodate the neuro-linguistic processing that image, video and audio recognition require, and most of these in turn are able to more effectively cache large amounts of DSP patterns and templates to manage these services without having to touch the Internet. (Indeed, most voice text to speech systems on mobile devices are an order of magnitude better than their desktop equivalents).
Summary
It can be argued that machine learning systems will end up being at the heart of most automated tasks within the next decade. Machine learning will enable self-healing systems, the deployment of real time virus antibodies, the ability to determine the best sales strategies based upon current market conditions, optimizing mixture systems for volatiles, cloud computing systems that can configure themselves to intelligently release stagnant partitions, and cars that can regulate the maximum driving speed for younger drivers or stop a car that's about to run over an obstacle when going in reverse. Machine learning is becoming the dominant form of computing. And that’s why it matters.
Kurt Cagle is an information architect, data scientist, author and industry analyst who works for Avalon Consulting, LLC., specializing in document and data semantics, ontology design and data virtualization. He is available for consultation. His clients have included Fortune 500 companies and US and European Federal Agencies. He lives in Issaquah, Washington, where he's working on his latest novel, Storm Crow.
Data Engineering | Analytics | GenAI | Data Architecture | ETL | Observability | Spark | Python | Data Modeling | GCP | Big Query
10 年Wonderfully written!
Mathematician
10 年nice but note, mla can never be witty, their search-dimension is already defined, thanks for sharing Bernard ;)
Dad | Tax Professional | ex-Engineer
10 年Pretty insightful! We certainly could adapt ML to early detection of MS, cancer and other dreadful diseases by analyzing genetic maps to beat these diseases hands down.
Writer/ Poet ( self employed)
10 年Is it better for our future artificial gene development to have these categories. Or, keep the same conditions for human beings to survive on this earth? 1- Sight like as an eagle 2- wings like an swan 3- hundred times multiple power of mind capacity and memory 4- hearing like dogs 5- one thousand life years of age? Or, can we imagine robotic or modifying genetic system better than our hand in carrying our cup of coffee to taste the good feeling and analysis its temperature in our body? "And the earth - We have spread it and cast therein firmly set mountains and caused to grow therein [something] of every well-balanced thing". (15-19) Quran.