A True Citizen Data Scientist End-to-End ML Example: Lead Scoring
Walter Adamson
? Helping business owners transform every role with AI-Thinking to boost productivity ? Empowering human potential one person at a time by enhancing productivity and role deliverables ? Beyond AI to AI-Thinking
The power of ever-higher levels of algorithmic abstraction and codification continues to enable non-IT professionals to dramatically improve their efficiency and effectiveness e.g. Microsoft's Power BI.
But for many of these advances, the promises remain elusive. Think of Robotic Process Automation, no-code app builders, and ... the fabled citizen data scientist.
A 600-fold Improvement - Plus The Steak Knives
For the latter, the world has just changed, and I experienced it myself. A new tool reduced 60 hours (of someone's else's work) into 6 minutes of my time! With better results.?
Not so many years ago I tagged myself as a citizen data scientist on my LinkedIn profile. It attracted more comments than anything else I've had on my profile, usually along the lines of "what is it?", or "love it".?
To be open, I'm a bit of a faux-citizen data scientist as I spent 7 years of my career as a consultant in computational statistics. But I left the service of statistics more than 40 years ago after being told by many people that there was no future in it. Sure, I still know the difference between correlation and causation but not enough else to be cast out of the ranks of citizen data scientists.
According To Gartner
According to Gartner, a citizen data scientist is a person who creates or generates models that leverage predictive or prescriptive analytics, but whose primary job function is outside of the field of statistics and analytics.
"These roles are often promoted as a silver bullet that can accelerate organizations into artificial intelligence (AI) and ML easily and cost-effectively."
The compelling idea of the citizen data scientists is that:
And the holy grail? These citizen data scientists will be able to utilise and operationalise what they have created almost instantly. Meaning that the models are immediately able to be integrated with enterprise or operational systems and put into action to inform business decision-making.
Sound too good to be true? Up until now, it has been. To quote Gartner (June 2021):
However, very few organizations have managed to harness the capabilities of citizen data scientists" - Gartner How to Use Citizen Data Scientists to Maximize Your D&A Strategy
The Astounding Here-Now Capabilities for Citizen Data Scientists
In a 2020 article on Medium Adam Barnhard explains how he developed a lead-scoring system. The system analyses attributes about each new lead in relation to the chances of that the lead actually becoming a customer. With this, a sales team can prioritise their time to focus on leads that are most likely to convert.
He walks through the full end-to-end implementation of a custom-built lead-scoring model. This includes pulling the data, building the model, deploying that model, and finally pushing those results directly to where they matter most — the tools that are used by a sales team.
My conservative estimate of the time taken to complete this end-to-end project is 60 hours:
This totals 60 hours of effort.
And this is detailed, messy and error-prone work.
领英推荐
For example, Adam cautions: "While building an ML model, you will likely go through multiple iterations and test a variety of model types. It’s important to keep track of metadata about those tests as well as the model objects themselves. What if you discover an awesome model on your 2nd of 100 tries and want to go back to use that?"
(Above: Process Overview of Adam Barnhard's Lead-Scoring System ML Project)
I found Adam's article, downloaded the same leads database from Kaggle, uploaded it into Hazlo.ai deleted data two columns of quantitative nominal data, selected the variable to predict and 219 seconds later had a model ready to be deployed to the field.?
I'm calling that 6 minutes of work - compared to 6 hours for Adam's lead scoring project.
And my model has an accuracy of 95.25% compared to Adam's 82% (which he described as "not too shabby").
(Above: My Lead Scoring Model, by Hazlo.ai)
You can my try model and its predictions of sales conversion: try it for yourself here.
Vary any of the parameters and the prediction will show the probability of conversion. (I compared it to the settings in the image in Adam's article and it gave the same result - "likely to convert".)
When you get better data you can simply upload and retrain the model, and compare it to the current model - and I mean simply. And with a single click you can take advantage of Hazlo's ever-evolving algorithms giving better results.
The careful reader will notice that I have brushed over the final integration from Booklet to Intercom which Adam's project completed. Hazlo provides an API for each model for the same purpose - to immediately build the model into business operations.?
You'll still need our programming friends for that bit. But I'm still claiming a 600X improvement over Adam's implementation time.
Conclusion
Of course, a little knowledge is dangerous*. But in potential ignorance, I am quite astounded by this result and the opportunity provided by these kinds of web-based systems.
What do you think about their potential and my assessment of the difference with the traditional approach? Where are the traps for young players? What are the organisational pitfalls?
*For example leaving the Lead Number and the Prospect Id in the data set for analysis gave an even better "accuracy" of 98.18% (up from 95.29% above). But clearly, those two data items do not represent more information in relation to the object of the analysis. They are simply more data, not more information, and spurious data at that.
Data doesn't always speak for itself, you need to use your business knowledge to eliminate that which is clearly spurious.
A big thank you to @AdamMBarnhard for his article giving me the inspiration to compare.
?