Teaching the Machine - Part 1 of 3
Waleed Sarwar
Transforming FCA/PRA Regulated Industries with AI & Future-Ready Tech | CEO & Founder at CoVi Analytics | Leading the Charge in Operational Innovation | London, UK
Caveat: like all things risk & compliance, let's start with a caveat. These series of articles are my attempts at simplifying the mechanics Machine Learning. I make significant simplifications in taking a helicopter view. There are many strands of ML such as classification, neural nets, regressions.
In these notes, my focus is to get to the nub of ML. It, therefore, goes without saying that by peeling away all the layers of complexity some of the nuances will be lost. I'll kick things off with classifications (which I have called categorisation).
PART 1: Categorisations
It is hard to walk five paces at a tech conference without overhearing someone singing praises of all that is wonderful and magical about Machine Learning (ML). People making these claims often don’t know the first thing about ML, they are simply parroting what they were told, deeply hypnotised by the bright hype.
Everyone talks about machine learning, but no one spares a breath for machine teaching. The reality is, the "smartness" of your machine is directly related to how well you taught it to learn.
ML for Classification
An algorithm that helps categorise information.
That’s all there is to it. Not as sexy as you thought ... right? This aspect of ML simply helps you classify raw data into pre-defined categories.
Lesson plan – Recognising cars
We teach children to recognise cars by teaching them the features to look out for - four wheels, windscreen, headlights etc.
Teaching a machine is no different. You tell it the features you want it to look for when categorising an object as a car. Based on the number of features the machine can spot, it will tell you how confident it is that it has spotted a car.
Another way would be to give the machine a few pictures of cars, tell it that these are pictures of cars and let it decide the most common features that define a car for itself. This is referred to as supervised learning.
Not matter how we get there, once we have taught our machines how to recognise cars based on a sub-set of, we are ready to release our machine into the wild, where it can “recognise” car when it sees one.
The magic of ML is that if our machine spots 'Del Boy' driving down the road, it will note that the object has 3 wheels (can't be a car as it's learnt it) but has a wind screen, headlights etc. On that basis, it will clock it as a car but with a lower confidence.
If Mr Bean happens to be out on a weekly shop, perhaps that won't be too taxing for our machine. Or will it? the lesson here is not to teach our machines to recognise cars in the UK.
We as teachers will need to step in and "educate" where the machine has a low confidence level of recognition. Once we have provided that assurance, the machine will “learn” this modification to the feature list for future categorisations.
Those who know ... Teach
There are three key components to teaching machines:
- Data Sets: Understanding the data you want the machine to analyse.
- Features: Appreciation for the features you want the machine to look out for. If in a bind, you can always give the raw data to the machine and ask it to find patterns or features in the data, however technically this is Data Mining but I am not the one for splitting hairs.
- Categories: Buckets which you want the machine to categorise things into. Asking a machine a black or white question is no fun, it’s a lot more interesting when we add a few shades of grey.
We don’t need to be coders to teach machines. Equally, great coders don’t always make great teachers. Those who are closer to the data and ask the right questions are the ones that make the best teachers.
Applying this to our car example: the DataSet is the catalogue of cars, the Features are the wheels, headlights etc. and the Categories are; Is a Car or Not a Car. We can make a conversation with our machine a lot more interesting by adding more buckets - is it a: sports car, station wagon, classic car etc.
Ultimately it all comes down to the features you teach your machine to recognise in answering the questions you have set for it. The more relevant the feature and interesting the question the more clever the machine is.
CoVi’s Academy for Machines (CAM)
At CAM our dataset includes qualitative and quantitative regulatory data. We also are very hard on our students, asking them difficult questions, namely:
- What parts of the regulation apply to Life insurance, non-life insurance, mutual’s etc.
- For a given insurer who are its best friends, good acquaintances, unpleasant familiars and mortal enemies.
- When particular regulation changes, which companies are most likely to get into trouble.
There are a lot more questions in our curriculum and we make use of other branches of ML, like Neural Networks, but I will keep for another day.
Summary
ML is not a silver bullet. One of its uses is to categorise information into pre-defined buckets. However, that simple task of categorization can be very powerful if the right question is asked.
We at CoVi Analytics challenge ourselves to find the right Features and ask the tough questions to make our machines as clever as possible in an effort to making compliance simple.
We are CoVi Analytics and we are making compliance simple.