A Brief History Of AI (part 1)
Sometimes, in a single day, there are so many reports on groundbreaking AI discoveries and novel AI-supported applications that it becomes challenging to process everything and grasp where the field is heading.
In response, I've decided to step back and explore the historical origins of AI. This 3-part series, presented in non-technical language, will concisely list some fundamental AI achievements. While necessarily incomplete and somewhat subjective, I aim for it to serve as a useful guide through the current media frenzy surrounding AI.
Each part of the series, starting from 1950, 1997, and 2017 respectively, will highlight key milestones and developments, offering a chronological journey through AI's evolution. The division into three parts is tailored to fit the image constraints of a LinkedIn article.
I warmly welcome feedback on any aspect of the articles, including any topics you feel are missing or wish to see discussed more deeply, in the comments section. Let's embark on this exploration together with this first installment.
1950 - Alan Turing Proposes a Test for Machine Intelligence
In 1950 Alan Turing published "Computing Machinery and Intelligence," proposing what is now known as the Turing Test. The Turing Test is performed by having a human evaluator interact with two entities, one a machine and the other a human, through a computer interface that conceals their identities. The evaluator then asks questions or engages in conversation, and based on the responses, must determine which entity is the machine. If the evaluator consistently cannot distinguish the machine from the human based on the responses, the machine is considered to have passed the test, demonstrating human-like intelligence.
1956 - The Term "Artificial Intelligence" (AI) is coined
This happened at the famous Darthmouth conference held in 1956. The proposal for the conference, written by John McCarthy, Marvin Minsky, Nathaniel Rochester, and Claude Shannon, provided an early definition of artificial intelligence. They described it as:
"Every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it."
This definition was groundbreaking as it suggested that any feature of human intelligence could be simulated by a machine.
1957 Rosenblatt's Perceptron - A single layer network that can learn to classify simple patterns
The perceptron is a fundamental type of artificial neural network and one of the earliest models developed for machine learning. Invented in 1957 by Frank Rosenblatt, it was designed to mimic the way a human brain processes information. A perceptron consists of input nodes, each associated with a weight, which are then combined in a weighted sum. This sum is then passed through an activation function, which determines the output of the perceptron. Initially conceptualized for tasks like pattern recognition, the perceptron laid the groundwork for more complex neural networks, despite its limitation of only being able to solve linearly separable problems (problems with two classes the examples of which can be perfectly separated by a line or more general a hyperplane). The formal proof that the perceptron learning algorithm would learn any linearly separable problem in a finite number of steps, caused a huge excitement and expectations of immensely smart learning machines. But this found a sudden end in 1969 (see below)
1964-1967 - The Program Eliza - A mock Psychotherapist
Eliza was an early computer program created in the 1960s to chat with people. Made by Joseph Weizenbaum at MIT, it worked by changing words in a conversation to make it seem like it understood what was being said. The most well-known version acted like a therapist, repeating phrases to make people talk more. Eliza was simple and followed set rules, but it was one of the first programs to show how computers could mimic talking to people. Weizenbaum invited students and colleagues to interact with the system and quickly they were engaged in deep (from their point of view) conversations with the program. At one point his secretary, which had observed him programming the system many month, even insisted that he leave the room so she could talk to Eliza in private.
1969: Proof that Perceptrons are (very) limited!
In 1969, Marvin Minsky and Seymour Papert published "Perceptrons," a seminal work that critically analyzed the limitations of perceptrons, an early form of neural networks. They demonstrated that perceptrons were incapable of processing complex, non-linearly separable functions, such as the XOR problem. This revelation significantly dampened the enthusiasm for neural networks, contributing to reduced funding and interest in AI research. This shift in perception played a crucial role in the onset of the first AI Winter, a period of stagnation in AI development that lasted into the early 1980s. It was already clear at that time that multi-layer networks were more capable, but since no efficient learning algorithm for them was known, it did not help.
领英推荐
1975-early 1980s: First AI Winter
The AI Winter of the late 1970s was a period of reduced funding and interest in artificial intelligence, caused by the failure to meet the overly ambitious expectations set in the field's early days. Key challenges in natural language processing, machine learning, and computer vision, coupled with the era's limited computational power, led to widespread skepticism and a consequent downturn in AI research and development. This period marked a recalibration in the AI community, shifting focus to more realistic goals.
1980s Expert Systems
Expert systems are rule-based AI programs that simulate the decision-making ability of a human expert in specific domains. Developed primarily in the 1980s, these systems combine a comprehensive knowledge base with an inference engine to apply rules to data and solve complex problems. They were particularly significant in fields like medicine and engineering, where they assisted experts in diagnosis and decision-making. Their importance in AI history lies in their demonstration of how machines can use rules and knowledge to address specialized tasks.
1986 Backpropagation for Training Multi-Layer Networks
Backpropagation, a method used for training artificial neural networks consisting of many layers, became widely known and influential in the 1980s, largely thanks to the work of Geoffrey Hinton and his colleagues (although the method itself can be traced back at least to the dissertation of Paul Werbos in 1974). This technique involves adjusting the weights of the neural network by propagating the error back through the network layers. It calculates the gradient of the error function with respect to the neural network's weights, allowing for efficient optimization. Backpropagation is still a core component in most modern neural network architectures (including Large Language Models like ChatGPT or Gemini).
1989 Convolutional Neural Networks for Handwritten Digit Classification
LeCun's LeNet-5, developed in the late 1980s and early 1990s by Yann LeCun, was a pioneering so-called convolutional neural network (CNN) designed primarily for postal digit recognition. CNNs use local arrays of pixels in one layer to identify patterns, which are then compiled and abstracted in subsequent layers to recognize increasingly complex features in the data. LeNet-5 was adept at recognizing handwritten digits and was used by the United States Postal Service to automate the sorting of mail. LeNet-5's architecture featured convolutional layers to detect local features in images, pooling layers to reduce spatial size, and fully connected layers for classification. Its success in accurately classifying digits on envelopes marked a significant advancement in the application of neural networks for practical, real-world tasks. LeNet-5 laid the foundational design principles for modern CNNs, which are now extensively used in various fields, including image and speech recognition.
1990s Interest in Neural Networks Fades, Support Vector Machines Become Popular
In the 1990s, interest in neural networks began to diminish due to a lack of significant breakthroughs. Researchers, grappling with neural networks' limitations in complex problem-solving and computational constraints, shifted their attention to alternative methods. This shift was marked by the rise of support vector machines (SVMs), popularized by Vladimir Vapnik, and other kernel methods, known for their effectiveness in classification tasks. Simultaneously, the field saw advancements in reinforcement learning, notably through the work of Richard Sutton and Andrew Barto, and Bayesian networks, championed by researchers like Judea Pearl. These explorations led to significant progress, temporarily moving the spotlight away from neural networks until their resurgence in the 2000s, fueled by advancements in deep learning.