Business and Data Understanding in Data Science Lifecycle
Srivatsan Srinivasan
Chief Data Scientist | Gen AI | AI Advocate | YouTuber (bit.ly/AIEngineering)
"Give me data and I can do wonders"... A common fallacy we see today among data scientist. Many data scientist today miss important step of understanding the business and how underneath data was generated from the business process before playing around with data
While jumping on to data is not bad all time but in most of the case we end up with insights that does not get integrated with real world instances. There are multiple pit falls of not having understanding of business and underneath data, key ones are
- Trying to solve business objective other than one that can impact the business significantly
- Business and Data Scientist acting in complete vacuum resulting in business deployment skew
- Insights that does not generate any significant value even after it gets into production
To state an example, I know a marketing team who had problem with low conversion rate (< 14 %) for acquiring new prospects. They were also spending high cost to acquire prospects that was taking more time for them to turn positive post prospects turned on to be customers. Data Scientist team got internal CIO funding and related datasets to solve the problem. Business stakeholders were consulted minimally to clarify on data elements rather to understand the business process or real challenges within business environment
The resultant model that was developed did slightly better than current conversion rate in validation phase. Excited CIO and Data Scientist team put this finding forward to Business explaining what they did. Turned out marketing team challenge was not low conversion rate, while they would love to increase conversation rate their immediate problem was not able to prioritize prospects to understand where to spend more and where not. There was lack of customization to individual prospects in terms of products or offers. Finally they lacked research capability on which channel to prioritize and for what segments to increase spending resulting in better conversion rate
In some industry prospect conversion rates are in lower percentage but business does want wider audience to target than being too narrow on high conversion prospects. This way they are able to reach larger prospects to increase conversion coverage and acquire new segment of customers for business not only look alike of customer base today
Coming to the point
By not working on clear business objective or agreeable success criteria, data science project has already failed before it even started
If you take any data science lifecycle process CRISP-DM or TDSP, business and data understanding are starting point even before we get to work on underlying data for insight
Let us quickly see the activities we perform in business and data understanding phase
Business Understanding
In business understand phase we basically
- Understands the business process
- Define and Frame the business problem
- Define the business objective
- Agree on success criteria
To understand this phase in detail you can look into my video below. In this video I am taking an real world use case (Credit Underwriting) to walk you through this phase
You can also subscribe to my YouTube channel AIEngineering (AIEngineering) to get alerts as I post new videos on this or other topics
Data Understanding
In data understanding phase one typically
- Understand data touch points in the context of business process
- Gather knowledge on where data originates from, how it gets processed, what decisions are being made, where it is getting stored and how it flows to downstream
- Deep dive into business meaning of the data being leveraged as well as knowledge present in existing system in form of rules
- Check if it will be appropriate to use additional industry known external data sources that can enhance decision boundary
- Check for target label availability as well as check for late arriving labels
You can check my video below on Data Understanding phase. I will walk you through the same example used in Business Understanding phase overlaying data touch points in the process
You can also subscribe to my YouTube channel AIEngineering (AIEngineering) to get alerts as I post new videos on this or other topics
A Serial entrepreneur with a passion towards cutting edge technology
4 年So true!Understanding business and analyze,interpret the data accordingly makes sense
Technical Project Manager at iLink Digital
4 年Thanks for sharing
Vice President Operations
5 年Absolutely essential to understand the business first and then attempt anything with the data.
VP, AI/ML, building a Digital Nervous System with Data Science and AI | Author and Mentor
5 年Give me data? Data gathering does not come free. Those aspects of cost and complexity are essential to understand first.