Simple, Elegant, Convincing, and Wrong: The fallacy of ‘Explainable AI’ and how to fix it, part 2
This week we have a series of blog posts written by our CEO&Founder Luuk van Dijk on the practicalities and impracticalities of certifying “AI”, running up to the?AUVSI panel discussion on April 25.
2. What is this “AI” you speak of?
______________________________________________
2. What is this “AI” you speak of?
Before we dive into the fallacy of explainability, we first have to define what we mean by “Artificial Intelligence”, which is mostly a marketing term with a shifting definition.? When we talk about the kinds of systems that are all the rage in the self-driving community and say that Daedalean is building for applications in controlling flight, we mostly mean Machine Learning, a field in which so-called Deep Convolutional Neural Networks trained on so-called Big Data have taken the spotlight.??
Systems built with this technology can answer questions that until about 15 years ago were too hard for computer programs, like “is there a cat in this picture” or “what bird is this” or “where is the runway in this image”, to name some relevant ones for our purpose.
领英推荐
In machine learning, instead of locking a team of software engineers in a room with the requirements and sufficient coffee and chocolate, you twiddle the knobs on a computer program called “the model” until you find an acceptable solution to your problem.?Neural networks, a family of such models, have a simple basic structure, but they can have many millions of knobs, and the twiddling is done by another computer program called “the machine learning algorithm” that uses large datasets of labeled examples together with some known recipe to construct ever better settings of all the knobs until it finds a member of the family that meets the requirements you set out to meet.*
This may sound like magic, but it is an extension of statistical techniques, very much like the one you may have studied in high school: Linear Regression. Imagine you are given a sample of people’s heights and weights in a spreadsheet, which we can visualize?in a scatter plot:
We can try to draw a line through the data to fit the points best.?The model family takes the form “weight = alpha x height + beta”, where alpha and beta are the parameters.?The recipe to fiddle alpha and beta to get the best fit follows from defining the error as the sum of the squared differences between predicted and measured values.?Your spreadsheet software can machine-do this for you.?Now we have machine-learned what we call a “model” of height-vs-weight in this dataset, which we can use to help with all kinds of things, like designing better chairs, or determining who is, statistically speaking, too short for their weight in order to recommend health measures.??
After alpha and beta have been optimized, the residual error may say something about how well this will work, which ultimately depends on the problem you are trying to solve, which is what we would call the ‘system requirements’. Typically we would use the model to make predictions on cases that were not part of the dataset we tuned the parameters on. That is to say, we are trying to predict something based on previously unseen data.
In much the same way, to answer the question ‘is there an aircraft in this 32x32 8-bit pixel camera image?’ a Deep Convolutional Neural Network is tuned by taking a dataset of images where the answer is given as a ‘yes’ or ‘no’ label and having a machine learning algorithm explore a very large design space until the result is good enough for our purposes, which may or may not be possible. A more advanced class of models may even output a bounding box around the aircraft or say for each pixel in the image whether it is part of an aircraft or not.
When we feed the linear regression model some data in the form of single observed height measurement, it very deterministically produces one single predicted weight. However, there may be nobody in the dataset who actually has that weight, or there may be multiple samples with different weights for the same height, and so when this model is applied to unseen data, we expect a non-zero error.?If the dataset we used is a good sample of the unseen population, we expect this error to be distributed according to the same distribution we observed for the residual error when tuning alpha and beta. But if the sample we ‘trained’ our model on is wildly unrepresentative of the conditions during application, for example, because of all kinds of biases during sampling, say only female sumo wrestlers, all bets are off.
With the Deep Convolutional Neural Network, we get the same: we feed it one new 32x32 image and provided we properly implement the runtime hardware and software, it will produce in hard real time a ‘yes’ or a ‘no’ output or a bounding box or a pixel mask.?If we feed it pixel for pixel the same input twice, it will deterministically produce identical output, but if this is on previously unseen data, it may be wrong, and the best we can do is to be wrong as often as we were when we were training, which depends on having used the right dataset.
With this background, we can already begin to understand how and why this form of “AI” works when it works: we build models of data that we then hope will generalize to their application in the field by virtue of having captured underlying regularities of the problem domain. We still have to verify for any concrete application that it works indeed.?
Next post, we’ll look into what we can already say about AI, now that we have specified what we mean by it.?
_____________________________________
*Here I am specifically talking about offline supervised machine learning. Unsupervised, reinforcement and/or online (“adaptive”) methods each come with their own can of worms which we can avoid by simply sticking to offline & supervised.