Neural networks: where it all started
Roberto Battiti
University of Trento / LION (Learning and Intelligent OptimizatioN), trustable special-purpose AI with measurable goals (no AGI)
Neural networks: multi-layer perceptrons
Chapter 8 of LIONbook v 4.0, get the PDF from?
It is a mystery how a system composed of many simple interconnected units can give rise to such incredible activities as recognizing objects, speaking and understanding, drinking a cup of coffee, and fighting for your career. Emergence is the way in which complex systems arise out of a multiplicity of relatively simple interacting components. Similar emergent properties are observed in nature, think about snowflakes forming complex symmetrical patterns starting from simple water molecules.
The real brain is an awesome source of inspiration and proof that intelligent systems can emerge from very simple interconnected computing units. Ever since the early days of computers, the biological metaphor has been irresistible (“electronic brains”), but only as a simplifying analogy rather than a blueprint for building intelligent systems. As Frederick Jelinek put it, “Airplanes don’t flap their wings.” Yet, starting from the sixties, and then again in the late eighties, the principles of biological brains gained ground as a tool in computing. This resulted in a paradigm change, from artificial intelligence based on symbolic rules and reasoning to artificial neural systems where knowledge is encoded in system parameters (like synaptic interconnection weights) and learning occurs by gradually modifying these parameters under the influence of external stimuli.
Creating artificial intelligence based on the “real thing” is the topic of artificial neural networks research. Multilayer perceptron neural networks (MLPs) are flexible (non-parametric) modeling architectures composed of layers of sigmoidal units interconnected in a feed-forward manner only between adjacent layers. A unit recognizing the probability of your grandmother appearing in an image can be built with an MLP network.
Effective training from labeled examples can occur via variations of gradient descent, made popular with the term “error backpropagation.” The weakness of gradient descent as an optimization method does not prevent successful practical results. More advanced techniques based on approximations of second derivatives can be crucial to speedup training.
Most of the recent breakthroughs in A.I. (large language models, machine translation, image recognition, generation of images and movies, and general-purpose chat systems) have their roots in MLPs and variations of gradient descent.
There are striking analogies between human and artificial learning schemes. In particular, increasing the effort during training pays dividends in terms of improved generalization. The effort with a serious and demanding teacher (diversifying test questions, writing on the blackboard, asking you to take notes instead of delivering pre-digested material) can be a pain in the neck during training but increases the power of your mind at later stages of your life. The German philosopher Hegel was using the term Anstrengung des Begriffs (“effort to define the concept”) when defining the role of Philosophy.