A PRECURSOR - TO MY SERIES OF ONLINE TALKS AT SWEDISH CHAMBER OF COMMERCE, STOCKHOLM, SWEDEN IN OCTOBER 2024 - THE CUTTING EDGE OF MACHINE LEARNING
Dr Sudhanshu Bhushan
Senior Policy Advisor – ( 15th April 2023... ) at New Zealand Red Cross Auckland, New Zealand Job Description - Policy classification, Consulting & Strategy
A PRECURSOR - ??TO MY SERIES OF ONLINE TALKS AT SWEDISH CHAMBER OF COMMERCE, STOCKHOLM, SWEDEN IN OCTOBER 2024 ...
THE CUTTING EDGE OF MACHINE LEARNING IN MODERN DAY BUSINESS
WHAT NEEDS TO BE DONE ......... ??!!
INTERFACE OF MACHINE LEARNING – ARTIFICIAL INTELLIGENCE WITH THE MODERN DAY BUSINESS CONTEXT
First, I would like to thank you for coming and listening from me, about the world of machine leaning and artificial intelligence for business. I hope that you will have an understanding of what these things are, why they are important and how they work. Equally importantly, I hope that you understand that while these are essentially mathematically based topics, they can only realize their full potential when they are implemented with relevant business, social, ethical, political and legal considerations taken into consideration.
Machine learning is an exciting and evolving subject that is being driven by new developments in three areas:
?
1.??????? Models and Algorithms. New machine learning approaches are being developed all the time. One avenue of research is looking at new types of model. Another seeks to improve upon existing algorithms that produce scorecards, decision trees, neural networks and so on, in order to improve the predictive accuracy of these types of model.
?
2.??????? Data. Predictive models and other AI applications are only as good as the data used to build them. More/better data leads to better solutions. This is one reason why “Big Data” and machine learning are so closely related.
?
3.??????? Systems and software. The systems used to drive the development and deployment of machine learning solutions. The faster predictive models can be developed and deployed, the sooner the benefits can be realized.
?
Let’s start by discussing the models and algorithms side of machine learning. Linear models, decision trees and neural networks, the three models which have been introduced in this book, are probably the most widely used types of predictive model in use today. However, if you talk to the young bucks in Silicon Valley, then they will probably laugh and then tell you that these types of model are somewhat “vintage” – decision trees are so 1980s! There are so many better types of predictive model out there these days…
?
In one sense this is true. Scorecards, decision trees and neural networks are certainly not new. It’s also the case that, on average, newer types of predictive model such as deep neural networks, support vector machines and ensembles, generate more accurate predictions, particularly in areas where complex types of decision making are required.
?
Linear models and decision trees were first used commercially in the 1950s and 1960s, in an age when a typical computer was the size of a large desk and had less than 1% of the computational abilities of a basic smart phone today. Therefore, these simple types of predictive model could be developed and implemented relatively easily. Computer power is much less of an issue these days, but simple models such as scorecards and decision trees remain very popular for the following reasons:
?They are “White Box” in nature. It’s very easy to understand how a score, and hence a prediction about someone, is arrived at. Likewise, it’s easy to see which data items contributed most significantly to the score, and which are less important.
??They are easy to code. Specialist software is not required to implement them. If resources are tight, then you can implement a scorecard or a decision tree as a small IT project, without needing to purchase additional hardware/software, and without needing to employ very expensive data scientists.
?They still produce pretty good predictions, if not the best. Some predictive models are thousands of times more complex than a simple scorecard or decision tree. However, even the most advanced predictive models often provide no more than a few percent uplift over simple scorecards or decision trees, and sometimes none at all.
?
What I want to make clear is that the big win for organizations is to make the leap to using automated decision making in the first place, based on machine learning. The incremental benefits from using the most advanced methods available are often more marginal. This is particularly true where the business problems are simple and can be expressed precisely and concisely, such as assessing default risk on a loan, or how likely someone is to respond to a marketing campaign. It’s the more complex and convoluted type problems such as face recognition, medical diagnosis, internet search and self-driving cars which have benefited most from more advanced forms of machine learning, such as deep neural networks.
?
If your organization does not currently use machine learning, then developing some simple predictive models that can be integrated into your existing decision making infrastructure will give you most of the benefits. The fancy cutting edge stuff, which often requires specialist hardware and/or software, will provide greater benefits, but not a massive amount. Therefore don’t delay. The 80/20 rule applies. You’ll get 80% of the benefits for 20% of the effort.
?
It’s also the case that if you can’t get a simple predictive model to work, then just using a more complex approach, or buying some expensive hardware/software is unlikely to solve the underlying issues; i.e. the failure of a machine learning project is nearly always due to incorrect problem formulation, the underlying data or an organizational issue. The problem is unlikely to be due to the type of predictive model that has been developed.
?
OK. That’s the argument for keeping faith with simple models such as scorecards and decision trees. However, if an organization is an established user of AI applications developed using machine learning, and its predictive models are responsible for billions of dollars’ worth of decisions each year, then there will be a drive to have the very best (most predictive) models possible – and with good reason. For a model responsible for a billion dollars’ worth of decision making each year, then just a 0.1% uplift in performance equates to a $1m benefit. In this type of scenario it would be perfectly justifiable to employ a team of data scientists full-time to constantly challenge and improve upon the models that the organization employs.
?
The most advanced forms of predictive models in use today are ensemble models, based on complex (deep) neural networks. With an ensemble, instead of having a single scorecard, decision tree or neural network, hundreds or possibly thousands of different models are constructed, each using a different data sample, and/or different algorithms to determine the model’s parameters. Each model therefore makes predictions in a slightly different way. The scores (predictions) generated by each model will often be the same or very similar, but sometimes they will disagree with each other; i.e. some models will give some types of cases very high scores, whereas other models will give the same cases much lower scores and vice versa.
?
Using an ensemble model is a bit like having decisions made by a committee of experts rather than by a single expert. The reasons why the committee approach is better than having a single expert is twofold:
1.??????? If one of the experts has specialist knowledge that the other’s don’t have, then this can be brought into the decision making process.?
?
2.??????? Some of the experts may, on occasion, make poor decisions. The other experts will use their collective knowledge to override (out vote) those cases.?
?
Just like the committee, some of the models that form an ensemble will be particularly good at predicting the outcome of certain types of cases. Likewise, if any of the models are weak in certain areas (generate poor predictions) then these are overridden by the others.
?
Once constructed, the way an ensemble works is pretty straightforward. The score from each model is used to make a decision. A final decision is then made by simple majority vote. If we return to the heart disease scorecard model discussed earlier, then imagine that instead of a single scorecard, a thousand different scorecards are constructed. The original decision rule was to invite someone for a check-up if they scored 521 or more. With the ensemble, if at least 500 of the individual models generate a score of 521 or more, then the decision is to invite.
?
How much better are ensembles than single models? Sometimes none! However, in my experience it’s not unusual for an ensemble to be around 5-10% better than a single model. If an insurance company found that using a decision tree resulted in a $40m reduction in claims over their previous manual process for the same amount of underwriting, then moving to an ensemble approach could reasonably be expected to provide an additional $2-4m benefit.
?
?If all you are interested in is raw predictive accuracy, then ensembles are the way to go. If however, it’s important for you to be able to explain how a model arrives at a given prediction, then you may want to think twice before going down the ensemble route because the solution will be much more complex and more difficult to understand than a single model approach.
?
领英推荐
Let’s now move on to think about data. From reading the academic literature on machine learning, I would hazard a guess that 95% or more of it is about algorithms; i.e. very technical discussions about the cutting edge mathematical approaches that can squeeze a little bit more predictive accuracy from a given data set. In practice however, when it comes to improving the accuracy of predictive models, data is king.
?
Given a choice between a new algorithm for building a predictive model and having more/better data available, then data wins every time. To put it another way, very simple predictive models built using a good amount of high quality data almost always outperform more advanced approaches built using a smaller amount of lower quality data. If you really want to get more out of your predictive models, then improving the quality of the data used to build them, and seeking out new and better data sources, should come at the top of your priority list.
?
In the early days of Big Data, when the cost of data storage fell very dramatically in a short period of time, there was very much a “store and analyze it all” data philosophy amongst the pioneers. The message was that every organization should be gathering and analyzing all the data it could. Back then, there was a lot of talk about needing to invest in mass storage systems such as Hadoop. This was to allow organizations to store all the data that they could lay their hands on in order to be able to produce the best predictive models possible, and hence gain a competitive advantage. However, the amount of data being generated has continued to increase year on year and shows no signs of slowing down. In fact, the volume of data is increasing at a far faster rate than the cost of data storage is falling.
?
This means that the benefit of having all available data to hand is to some extent offset by the costs of storing and analyzing all that data. As discussed previously, only a small fraction of all the data out there actually features in predictive models and is used to make predictions; i.e. once you know what types of data are predictive of how people are going to behave, then you can discard most of the other data because you don’t need it. Continuing to maintain huge databases of “low value” data is not a very efficient use of time and resource.
?
These days, there are moves towards common data storage and aggregation – particularly when it comes to externally sourced data and data that is common across organizations. If people have ten apps on their phone supplied by ten different organizations, then it’s very wasteful for each of those organizations to be gathering location and movement data themselves. It makes far more sense for one organization to manage the data, and then provide clients with the specific data items that are relevant to them.
?
If you look at companies such as Facebook, Google, Experian, Equifax and so forth, then this is exactly what they are doing. They undertake the hard work of collecting, formatting, preparing and summarizing data. They then package the useful bits and sell it on. In this way, individual organizations only acquire data that is genuinely useful to them. Consequently, they don’t need to waste time and resources gathering huge amounts of data that they don’t need.
?
The third driver of developments in machine learning is IT systems and software. As the volume of personal data has grown, and the frequency with which data changes has increased, so cycle times between model developments has reduced in many industries.
?
The traditional paradigm for developing and implementing predictive models is to separate these two parts of the process; i.e. develop your models first, and then implement them. During the development phase, a data scientist spends days, weeks or even months gathering data and carrying out the statistical analysis required to build the model. When that part of the process is complete, there is a further exercise to code up the model within the production environment, test that the model works and then for it to be put into live operational use.
?
In many (and possibly most) industries this approach to predictive models is still applied and generally works pretty well. Not least, because after a model has been developed it has to pass internal and external audit, and then be subject to regulatory review before it can be put to use. Having a robust model governance structure in place is important because if your entire business relies on the correct decisions being made, and you get it wrong, then the impact on the bottom line can be very considerable indeed.
?
For risk models in banking and insurance it can take a year or more between a predictive modelling project commencing and a model being implemented within the business. Every aspect of the model has to be fully documented, and then a cycle of discussion, feedback and further analysis needs to occur before the regulator signs-off the model as fit for use. Banking regulators won’t even deem to review a predictive model until it has undergone a complete review by independent experts, a process which can take as long or longer than the initial model building process!
?
In other areas however, such as internet marketing, things move much faster. Data, and the relationships in that data, are changing frequently, some of it in real time. If an organization wants to retain a competitive edge then it needs a much more rapid cycle of model development and implementation. Models are rebuilt on a daily or more frequent basis in response to constant changes in the data. This has led to the development of IT systems that closely integrate the data an organization holds, the analytical tools used to create predictive models and the systems that deploy them.
?
These “In-database” systems stream data to machine learning tools without needing to extract the data first, drastically reducing the time required to pull data samples together, build predictive models and then to deploy those models operationally.
?
Once an in-database system has been configured, models can be redeveloped and deployed automatically. In theory, a new and updated model can be constructed every time a new piece of data becomes available – the system learns from each new case it deals with. New models are developed and deployed on a minute by minute basis. Consequently, it becomes impossible for a data scientist to be involved in the detail of every model that is constructed. Instead, the data scientist’s role is to be part of the team the designs the wider system. In particular, they have responsibility for understanding the data the feeds the system, and how this maps to business problems that the system needs to create predictive models for.
?
After a system goes live, the software provides a dashboard for the data scientist that reports on the status of the overall system. For example, how well models within the system are performing, how model performance changes over time, how the data that feeds the analytics process is changing and so on.
?
The data scientists themselves only become involved in the detail when something goes awry, or when some new feature needs to be incorporated into the system. If there is an unexpected dip in model performance, then the data scientist will need to investigate and find out why model performance has declined. They will then instigate remedial action to correct the problem and return the system to optimal operating conditions.
?
?A similar approach is taken with “Self-learning” devices and autonomous robots. Each time they undertake a task, they gather data about that event. The predictive models that drive their activities are automatically refined using the additional data available. In this way, as more and more data becomes available, so the accuracy of the underlying models improve.
?
?A further focus of some of the newer machine learning software are intermediary tools which seek to provide a better interface between non-technical business users and the underlying data and algorithms required for machine learning.
?
The most advanced of these tools try to replace some of the tasks that would traditionally have been undertaken by data scientists. They can analyze and prepare data from different sources, apply a range of algorithms and present the results back to non-technical users in an easy to understand way without any formulas or equations. In particular, the software attempts to present results in a contextual way that makes business sense, rather than providing a more formal statistical perspective that data scientists are used to dealing with.
?
A prime example of this approach is the one taken by IBM with its Watson Analytics Software. The original version of Watson famously beat several human players in the general knowledge quiz show Jeopardy in the USA. Watson has now evolved in to a commercial product. Behind the scenes, the software uses some very complex machine learning algorithms to extract information from a range of different data sources. The front end of the software is designed with managers and other business users in mind, rather than data scientists.
?
The net result is that when presented with suitable data, new insights and understanding about the behaviour of customers can be presented to business users within hours, or even minutes, without the need to involve technical specialists in the process.
?
THANKS SEE YOU ON ?4th October 2024
sudhanshu
?
?