An example that illustrates why today's AI is so effective
Nowadays, not a single day passes without at least one major news/tech portal reporting on some topic related/enabled/enhanced with artificial intelligence (AI). From time to time, I try to pick some and post them on LinkedIn. It is in fact one particular area of AI, that is responsible for almost all the hype: machine learning, which uses statistical techniques to implement intelligence (or intelligent behaviour) into IT systems. And to be more precise, as there are several "branches" under machine learning - the loud music comes from the room of connectionism (an approach to the study of human cognition that utilises mathematical models, known as "connectionist networks" or artificial neural networks) and the most frequent word you hear being discussed there is deep learning (ability to train neural networks with several hidden layers in order to achieve better results, that changed the game few years ago).
So what is the essential change in statistical models, that allowed the current boom of AI? It is their (autonomous) ability to find and "extract" useful internal representations (for example features) of the world they are dealing with. We have seen it number of times in the area of computer vision, where convolutional network was able to learn "conceptual" representation of the object it saw - layer by layer, from simple concepts like lines and edges to higher abstractions, like the concept of "face" or "cat's face".
But the real beauty is that the trick of (autonomous) ability to find and extract useful internal representation - is so general, that it successfully works also in other areas. For example in natural language processing (NLP). With NLP techniques, computers are capable to understand plain text/speech and also communicate with us with incredibly "human" voice.:
In NLP, the good, internal (numeric) representation of document/paragraph/sentence is essential, because it greatly affects the overall performance of the system (whether users consider the system or its outputs as "smart" or not). Internal representation in NLP is usually generated from the text by a so-called embedding algorithm, which “embeds” the text parts (for example documents) as points into a space with a fixed dimensionality. This space is usually called a latent semantic space, because its dimensions are potentially/latently representing particular semantic concepts.
Before deep learning "rush", we were using the bag of words/vector space model for internal representation, with its known limitation: the encoding behaviour is purely syntactic (although the IDF weight introduces a semantic weight) and results in extreme sparsity of the embedding space, with at least 5k, but more typically 10k or even 30k dimensions (number of dimensions is derived from number of words used in vocabulary). The lack of capturing semantics in resulting representation was usually partially addressed with Latent semantic analysis/indexing, and later with Latent Dirichlet allocation techniques. In general, when you build some model using such encoding, the words lose their meaning. e.g, if we encode Bratislava as id_4, Slovakia as id_6 and power as id_8, Slovakia will have the same relation to power as with Bratislava. We would prefer a representation in which Slovakia and Bratislava will be "closer" than Slovakia and power.
In 2013, Tomá? Mikolov presented a novel approach to encode terms: The Word2Vec system. A Neural Network autoencoder was put to the task of embedding terms into a latent space by predicting the term’s context from the latent space (or vice versa). Such representations, encapsulate different (semantic) relations between words, like synonyms, antonyms, or analogies. Such as seen on the main picture (above the heading) of this blog, where king to queen is like man to woman.
A very nice explanation of one of proposed techniques by the paper, can be found here. A detailed explanation, that this is one of the best places to gain intuition about why deep learning is so effective, can be found in this older, but nicely written blog by Christopher Olah.:
In 2014, Mikolov introduced the Doc2Vec, an extension to create a numeric representation of any text part (not only words, but also sentences, paragraphs, whole documents or their parts). Again, a "gentle" introduction can be found here. The full list of document representation techniques used in NLP (including more complex representatives, like Hierarchical Attention Network, or Sequence to Sequence Autoencoder) is here.