Machine Learning tribes
My short summary of the book "The Master Algorithm" by Pedro Domingos.
The Master Algorithm by Pedro Domingos, a Portuguese professor at University of Washington, caught my attention as Bill Gates short-listed it in his must-read book list on AI.
It's not an easy book to read if you are completely new to the Machine Learning field or to Data Science, but is gives an interesting perspective on the different tribes on machine learning, their main algorithms, their strengths and weaknesses and on the relationship between them. The tribes presented are the Symbolists, Connectionists, Evolutionarists, Bayesians and Analogizers.
Each of this tribes has its own master algorithm and the author defends that each of these are good for some type of problems, but not for others. So the path pointed by the author to reach a single Master Algorithm is to combine the key features of all of them.
Here is a short summary of the tribes presented:
Symbolists
They view learning as the inverse of deduction and take ideas from philosophy, psychology, and logic. The master algorithm for this tribe is inverse deduction.
They believe that intelligence can be reduced to symbols manipulations. Maths is about solving equations by moving symbols around and the same stands for logicians doing deductions. Since elaborating a set of rules for induction is computationally intensive, symbolists prefer decision tree based algorithms.
Connectionists
Connectionists reverse engineer the brain and are inspired by neuroscience and physics. Their master algorithm is backpropagation.
Connectionists are critical of Symbolists as they believe there is a lot more going under the surface than Symbolists can get. They believe this “lot more” can be achieved in parallel processing and not sequence processing done by the Symbolists. For this, they borrowed the concept of neurons from neuroscience, from which each concepts can be represented by neurons that “fire together” – each neuron connects to other via synapses and learning takes place via synaptic connections.
Big data powered the recent popularity of the now called “deep learning” techniques. Bottom line, these techniques yield results that are typically hard to understand and explain. I pointed to this in another article you can find here.
Evolutionaries
Evolutionaries simulate evolution on a computer and draw on genetics and evolutionary biology. The master algorithm for this tribe is derived from genetic programming.
DNA encodes an organism in a sequence of base pairs. Similarly computer programs can also be encoded as strings of bits whose variations are produced by crossover and mutations. A great mystery to be solved in genetic programming has to do with the role of crossover and its helpfulness (mutation makes seems to make the work to achieve fitness by itself). This and some other problems ended up making this tribe less relevant these days.
Bayesians
Bayesians believe that learning is a form of probabilistic inference with its root in statistics. Their master algorithm is Bayesian inference
The basic idea defended by this tribe is their systematic way of updating degrees of belief in light of new data. They agree with Symbolists that prior assumptions are needed even though they don’t agree on the type of prior knowledge allowed – Bayesians defend that knowledge affectes structure and parameters of the model while Symbolist accent anything can be encoded in logic. Na?ve Bayes, Markov Models, Hidden Markov Models, Bayesian Networks are examples of algorithms used by this tribe and developed in the book in good detail to give an overview and relate these algorithms to each other and to others mainly used by other tribes (e.g. Na?ve Bayes and perceptrons).
Analogizers
Analogizers learn by extrapolating from similarity judgments and are influenced by psychology and mathematical optimization. Their master algorithms is Support Vector Machines.
Analogizers use similarities among various data points to categorize them in to distinct classes: We learn by relating the similarity between two concepts and then figure what else one can infer based on the fact that two concepts are similar. Nearest-neighbour algorithms and Support Vector Machines (SVM) are presented in some detail in the chapters related to this tribe.