SmartAB? Wisdom #51: AI In Supply Chain Management (SCM) – Part 1: My Humble Neural Networks Beginnings…
According to Wikipedia: “A close encounter is an event in which a person witnesses an unidentified flying object (UFO). This terminology and the system of classification behind it were first suggested in astronomer and UFO researcher J. Allen Hynek's 1972 book The UFO Experience: A Scientific Inquiry.
Sightings more than 150 meters (500 ft) from the witness are classified as daylight discs, nocturnal lights, or radar/visual reports. Sightings within about 150 meters (500 ft) are subclassified as various types of close encounters. Hynek and others argued that a claimed close encounter must occur within about 150 meters (500 ft) to greatly reduce or eliminate the possibility of misidentifying conventional aircraft or other known phenomena.
Hynek's scale became well known after being referenced in a 1977 film, “Close Encounters of the Third Kind”, which is named after the third level of the scale.”
And back in 1987, at the IEEE First Annual International Conference on Neural Networks San Diego, California June 21-24, 1987 – my AI Encounters of the Third Kind felt like lending on the far side of the Moon, and not in California...
You see, Plenary Talks were given by a few distinguished AI researchers. But if you really wanted to “get your hands dirty” – Poster Sessions were all the rage. Hundreds of PhD Students were eagerly awaiting the opportunity to describe their projects, and the most popular of all of them was the Travelling Salesman problem. And already then, I realized that the future of AI belongs to Supply Chain Management. The undisputed elegance of AI solutions for the difficult problem of Transportation Logistics did the trick…
According to routific.com: “The Travelling Salesman Problem (TSP) is a classic algorithmic problem in the field of computer science and operations research, focusing on optimization. It seeks the shortest possible route that visits every point in a set of locations just once...
The TSP problem is highly applicable in the logistics sector, particularly in route planning and optimization for delivery services. TSP solving algorithms help to reduce travel costs and time.
Real-world applications often require adaptations because they involve additional constraints like time windows, vehicle capacity, and customer preferences.
Advances in technology and algorithms have led to more practical solutions for real-world routing problems. These include heuristic and metaheuristic approaches that provide good solutions quickly.
Tools like Routific use sophisticated algorithms and artificial intelligence to solve TSP and other complex routing problems, transforming theoretical solutions into practical business applications.”
And the company adds the following explanations: “The main problem can be solved by calculating every permutation using a brute force approach, and selecting the optimal solution. However, as the number of destinations increases, the corresponding number of roundtrips grows exponentially, soon surpassing the capabilities of even the fastest computers.
? With 10 destinations, there can be more than 300,000 roundtrip permutations. With 15 destinations, the number of possible routes could exceed 87 billion.
For larger real-world travelling salesman problems, when manual methods such as Google Maps Route Planner or Excel route planner no longer suffice, businesses rely on approximate solutions that are sufficiently optimized by using fast tsp algorithms that rely on heuristics. Finding the exact optimal solution using dynamic programming is usually not practical for large problems.”
Houston, We Have A Problem…
My AI journey can be described by the popular phrase spoken during?Apollo 13, a?NASA?mission in the?Apollo space program. But instead of aiming at the Moon, it was linked to logistics and SCM problems experienced by the banks, worldwide.
?In 1984, I moved to Waterloo, Ontario, from Montreal, Quebec – to pursue my Master's Degree in Electrical Engineering at the University Of Waterloo. NCR (previously known as the National Cash Register) was the largest engineering firm in Waterloo at that time. It offered POS (Point Of Sale) systems for the retail markets, high-throughput check processing equipment, and ATMs – to all major banks around the world…
?NCR’s check processing equipment had a major SCM issue at that time. Scheduled and unscheduled maintenance caused significant delays and work interruptions to NCR clients. In addition, improving reliability often meant selecting only the most reliable sub-components - by adhering to stringent specifications.
?Ask any SCM practitioner about eliminating waste, and you will quickly hear the following comments: “Waste is any step or activity in a process that doesn’t contribute value to the final product. This is also known as “Non-Value-Adding.”
?Many types of inefficiencies comprise waste. Lean Principles aim to identify and eliminate these Non-Value-Adding elements to optimize Production Processes for the benefit of both the organization and the customer. A few of these benefits include:
·?????? Providing customer value
·?????? Increasing efficiency
·?????? Reducing cost”
?So, if NCR wanted to be viewed as a LEAN PRODUCER, it needed to embrace the process of reducing waste… seriously. Since reducing waste is part of the Lean Principles of Continuous Improvement, instead of just well-intentioned strategies, NCR needed to provide a methodology with specific steps toward “working smarter.”
?In retrospect, I can clearly see now that by following Lean Practices, NCR was determined to tackle the Non-Value-Adding processes by comparing them to the 8 Wastes. Today, it is common knowledge that addressing and eliminating these wastes is fundamental to Lean Principles and contributes to a more streamlined and cost-efficient inspections of parts, smarter manufacturing processes, and post-manufacturing support.
You can also frequently find the following Lean Principles narratives: “The 8 Wastes is not just about cost-cutting; it’s about making the most impact with the resources you have, including the talent, materials, and time that go into a product or service. Using the acronym DOWNTIME is a great way to remember what the wastes are:
? Defects
? Overproduction
? Waiting
? Non-Utilized Talent
? Transportation
? Inventory
? Motion
? Extra Processing”
Taking into account that a major bank would often use hundreds of check processing stations spread around multiple floors in high-rises, NCR aimed to increase productivity, minimize waste, optimize work schedules, and enhance the reliability and reputation of their banking products. Hence, there was a profound need to address item processing production and maintenance issues - promptly and effectively.
Things May Not Be What They Seem…
For many organizations, the “visible” costs of poor quality are items like service costs and inspection costs. They are frequently easy to see and determine and can amount to 4-10% of Sales.
NCR understood quite well that the challenge with focusing only on these “visible” costs is that it dramatically understates the impact that these processes may be having on the organization as a whole. And since it has been 112 years since the sinking of the Titanic, metaphorically speaking, 90% of the cost associated with “Poor Quality Icebergs” - exists hidden beneath the surface of the water…
It is not uncommon, therefore, for these costs to add up to 20-35% of sales in addition to the “visible” costs – but the hidden costs are not easily identified…
The Problem To Solve And The Solution That Followed…
In 1984, check-clearing methods were based on the manual handling of physical checks. From the moment a check was deposited until it was cleared, it passed through various processes.
The Magnetic Ink Character Recognition (MICR) technology was designed to read and process checks at high speeds. This technology used magnetic ink to print characters on the bottom of the check, which could be read and processed by machines at the rate of 600 to 1200 items per minute...
With the use of MICR encoding, banks and financial institutions could process a large volume of checks in a short amount of time, reducing the risk of errors and improving customer satisfaction.
Not surprisingly, NCR’s reader-sorters processed documents at a very high speed, magnetized the encoded characters, read the MICR line, and sorted these documents into selected pockets. When documents were rejected by the machine as unreadable, they needed to be sorted and manually processed by the operator.
The performance of any electro-optical equipment can be severely degraded by paper dust and ink smudges. My job was to improve the existing accuracy of MICR readers at NCR - and develop new recognition technologies that are less susceptible to dust and performance degradations.
When training and testing of Character Recognition systems begins - it begs the question: How exactly is your reported accuracy defined? Even the most basic Statistic Handbook will surely introduce you to the following terms:
? False Positives
“A false positive is where you receive a positive result for a test when you should have received a negative result. It’s sometimes called a false alarm.”
? False Negatives
A related concept is a false negative, where you receive a negative result when you should have received a positive one. By the time you tested and analyzed all the true/false positives and negatives - reporting your system accuracy begs a valid question: how EXACTLY your accuracy was calculated?
If you successfully tested 1 million cases and your test has generated only 100 errors, you might be tempted to pat yourself on the back and proclaim 99.99% accuracy!
However, if your 1 million cases test set included only 200 cases of fraud, and you detected only 100 of such cases - your fraud detection prowls are close to merely 50%... So, you might as well stick to flipping a coin.
So, let’s not forget that going through all the heavy-duty data science gymnastics and preparing proper training and testing sets - is done for a single purpose: to allow you to make a better decision!
When MIT Press published its Parallel Distributed Processing books back in 1986 - the world of neurocomputing has forever changed. The books and the software that came with them served as a blueprint. And thousands of scientists and developers used the books to build their own AI systems…
I still remember the excitement of discovering the books, reading them back to back, and coding a few elegant math equations directly from their pages. It took me less than a day to build my first Back Propagation network using the “C” language.
What followed was the infinite fine-tuning of my own Back Propagation neural network – the most popular and most intuitive AI pattern recognition framework ever built…
Back Propagation Neural Network In A Nut Shell
You can find thousands of Back Propagation Neural Network references, guides, courses, and e-books without any difficulty. A simple Google search returns about 35,900,000 results…
Since Back Propagation is the most widely used framework for training artificial neural networks, I will briefly describe forward and backward passes in the process of training it. In the simplest scenario, the architecture of a neural network consists of some sequential layers of artificial neurons, that can be classified into 3 classes:
1. Input
2. Hidden
3. Output
For example, a fully connected Back Propagation Neural Network (BPNN) I built had an input layer, 2 hidden layers, and an output layer. Each layer consists of 1 or more neurons represented by circles. Al neurons are fully connected. So, if the first layer has X neurons and the second layer has Y neurons, then the number of in-between connections is X*Y.
For each connection, there is an associated weight. The weight is a floating-point number that measures the importance of the connections between the neurons. The weights are the learnable parameters by which the network recognizes the input pattern presented to its input layer.
The input layer is the first layer in the neural network and there can only be a single input layer in each network. For example, the input layer may have 100 neurons connected to an array of 10x10 pixels - representing an image of a scanned character “A”.
Alternatively, an input vector might be comprised of 1000 analog values sampled during the time that a single MICR character passes in front of the magnetic ink reader. In general, one input neuron is assigned to each of the values within the input vector.
The output layer is the last layer, which returns the network’s predicted output. Like the input layer, there can only be a single output layer. Once more, if the objective of the network is to recognize MICR digits, then the output layer will be comprised of 10 neurons, allocated to each digit (0,1,2,3,4,5,6,7,8,9)
Between the input and output layers, there might be 1 or more hidden layers. Each neuron uses an activation function like a Sigmoid to capture the non-linear relationship between the inputs and their outputs. Such normalization ensures that no matter how big or small is the neuron’s input value, the output value is always a floating-point number between 0 and 1.
Training BPNNs
To train a neural network, there are 2 distinct passes:
1. Forward Pass
2. Backward Pass
In the forward pass, we present the input vector to the input layer and start propagating the data to the hidden layer(s). Then, we measure the response from the output neuron that was assigned to recognize specific inputs.
For example, we can present 1000 images of the digit “1” and train the output neuron assigned to such a category to respond the strongest. Its response should be much stronger than that of any other neuron in the output layer that was assigned to a different digit.
Since the strongest response can be a floating-point number of 1.0 – we calculate the actual vs. the desired difference as a network error. This network error measures how far the network is from making the correct prediction…
The process of propagating the inputs from the input layer to the output layer is called forward propagation. Once the network error is calculated, then the forward propagation phase has ended, and backward propagation begins.
The backpropagation pass is set to update network weights to reduce the network error. The forward and backward phases are repeated, and each repetition is called an epoch. There could be millions of epochs taking place, until the BPNN is fully trained. The more training data is available, the larger the number of training epochs…
In each epoch, the following occurs:
1. The inputs are propagated from the input to the output layer.
2. The network error is calculated.
3. The error is propagated from the output layer to the input layer.
But how do you know when the BPNN is fully trained? And I mean, when it is not over-trained on your training sets and delivers the best accuracy and generalizations on previously unseen data – your test sets…
This is exactly the question I asked myself, and the solution became my first neural network patent application. The first patent out of 12 fully granted and provisional patent applications that followed…
Trained vs. Untrained BPNN
Going back to my example of recognizing 10 MICR digits, each of the 10 output neurons is trained to respond the strongest when the right input vector is presented to the input layer. And since all the output values are normalized between 0.1 to 0.9 – my UCL (Upper Control Limit) and LCL (Lower Control Limit) – I discovered that it is possible to select the strongest neural output and divide it by the second strongest neural output within the output layer. And I called it my “R” ratio - obtained by dividing the value of the 1-st over the 2-nd Best Outputs…
What followed was quickly unfolding right in front of my eyes. When the training started, the neural outputs took all possible values between 0.1 and 0.9. It was random, and no obvious trend was detected. Similarly, the ratio between the 1st and the 2nd winner was far from 9. It was anything in between…
However, as the training progressed, the two-dimensional plot indicated the persistent convergence into the upper right corner of the chart. And this is exactly where you want your training to be… The “winning category” neuron responds with the strongest value of 0.9, and the second-best “competing category” neuron responds with a weak value of 0.1. Hence, the R ratio between the 1-st and the 2-nd best output, ideally, is around 9...
Why BPNN?
The BPNN architecture is extremely memory-efficient and it uses less memory compared to other optimization algorithms, like the genetic algorithm. This is a very important feature, especially with larger neural networks.
In my next post, I will write about how I even “ported” my “C” language code into an ARM chip Assembler language – the most compact version of BPNN that ever existed…
Moreover, the backpropagation algorithm is elegant and fast, especially for small and medium-sized networks. As more layers and neurons are added, it starts to get slower as more derivatives are calculated back and forth.
In addition, BPNN is generic enough to work with different network architectures, like convolutional neural networks, and hundreds of other AI networks, too. There are only a few parameters to tune BPNN, so there’s less overhead.
Beyond MICR
The success of my MICR BPNN quickly convinced NCR to apply similar techniques to recognizing handwritten characters on a variety of bank checks. And it is not a trivial task, as many of the checks display various colorful backgrounds containing elaborate graphics and photographs.
Separating the handwritten amounts from the image of George Washinton - is not for the faint of heart… In addition, handwriting recognition is much more difficult than recognizing fonts such as MICR – that adhere to the most stringent ANSI banking standards developed over the last 50 years…
Nor is it simple to recognize the precise denominations of the bank notes deposited at your local ATM. Folded, crumpled, and torn bills are hard to classify – no matter the country…
Therefore, I was asked on behalf of NCR to join the most prestigious research institute dealing with cognitive computing. And I’m extremely proud of the work I did at the Microelectronics and Computer Consortium (MCC) in Austin, Texas. MCC was the first, and - at one time - one of the largest, computer industry research and development consortia in the United States, headed by Navy Admiral, Bobby Inman.
At its peak, MCC housed over 400 PhDs under one roof, and I was supervising for 3 years a large group of the world’s brightest neural network and Artificial Intelligence researchers – on behalf of $7B NCR Corporation.
Does The Number Of Neurons And Synapses Matter?
According to cell.com: “In the past few years, computer programs using deep learning have achieved impressive results in complex cognitive tasks that were previously only in the reach of humans. These tasks include processing of natural images and language, or playing arcade and board games.
Since these recent deep learning applications use extended versions of classic artificial neural networks, their success has inspired studies comparing information processing in artificial neural networks and the brain.
It has been demonstrated that when artificial neural networks learn to perform tasks such as image classification or navigation, the neurons in their layers develop representations similar to those seen in brain areas involved in these tasks, such as receptive fields across the visual hierarchy or grid cells in the entorhinal cortex.
This suggests that the brain may use analogous algorithms. Furthermore, thanks to current computational advances, artificial neural networks can now provide useful insights on how complex cognitive functions are achieved in the brain.”
And nih.gov adds the following: “There is no clear correlation between absolute or relative brain size and intelligence. Assuming that absolute brain size is decisive for intelligence, then whales or elephants should be more intelligent than humans, and horses more intelligent than chimpanzees, which definitely is not the case.
If it were relative brain size that counted for intelligence, then shrews should be the most intelligent mammals, which nobody believes. If we take the EQ into account, some inconsistencies are removed; then humans are on top, but many other inconsistencies remain, for example that gorillas have a rather low EQ, but are considered highly intelligent, while capuchin monkeys and dolphins have unusually high EQs, but are not considered to be as intelligent as gorillas. Thus, other factors have to be considered.
The cerebral cortex is considered the ‘seat’ of intelligence and mind in mammals. During their evolution, there was a dramatic increase in cortical surface area with increasing brain size, while the thickness of the cortex increases only slightly.
All this sums up to the fact that the human brain has the largest number of cortical neurons (about 15 billion), despite the fact that the human brain and cortex are much smaller in size than those of cetaceans and elephants (with 10–12 billion or even fewer cortical neurons).
However, this alone cannot explain the superiority of primate—including human—intelligence. Here, differences in the speed of intracortical information processing come into play.
We have reason to assume that in primates in general and in apes and humans in particular cortical information processing is much faster than that in the large-brained elephants and cetaceans. Thus, it is the combination of very many cortical neurons and a relatively high Information Processing Capacity (IPC) that appears to make our brains very smart.”
So, it seems quite proper to mention that back in 1987, when I attended the IEEE First Annual International Conference on Neural Networks in San Diego, California, one of the plenary speakers that year was a brilliant scientist specializing in computational neuroscience at Salk Institute – Terry Sejnowski.
In his fascinating talk, Terry boldly stated that the level of computational neuroscience hasn’t even reached the level of the bee. I still remember the laughter in the room. For some reason, the audience thought that Terry was joking. Yet he wasn’t. He asked us all to remember that:
? No existing plane can dynamically modify its flying pattern the way bees can…
? In addition, he gently reminded us that computers still don’t mate…
30 years later, Yann LeCun explained that our computational intelligence is now approaching the cognition level of a rat. And yet, despite all such well-known facts, the snake oil salesman popped out in spades and started paddling the Autonomous Driving predictions being... just around the corner. They are not, but I digress…
For More Information
Please see my other posts on Linkedin, Twitter, Substack, and CGE’s website.
AI Boogeyman
You can also find additional info in my hardcover and paperback books published on Amazon: “AI Boogeyman – Dispelling Fake News About Job Losses” and on our YouTube Studio channel…
SmartAB? - SUBSCRIBE NOW
A Radically Innovative Advisory Board Subscription Services