Ten Ways Your Data Project is Going to Fail
This talk is based on conversations I've had with many senior data scientists over the last few years. Many companies seems to go through a pattern of hiring a data science team only for the entire team to quit or be fired around 12 months later. Why is the failure rate so high?
Let's begin:
1. Your data isn't ready
A very wise data science consultant told me he always asks if the data has been used before in a project. If not, he adds 6-12 months onto the schedule for data cleansing.
Do a data audit before you begin. Check for missing data, or dirty data. For example, you might find that a database has different transactions stored in dollar and yen amounts, without indicating which was which. This actually happened.
2. Somebody heard 'Data is the New Oil
No it isn't. Data is not a commodity, it needs to be transformed into a product before it's valuable. Many respondents told me of projects which started without any idea of who their customer is or how they are going to use this "valuable data". The answer came too late: "nobody" and "they aren't"
Click here for the other eight reasons.
Data Scientist
8 年There is a place for gut instinct, or call it data intuition. If results seem too good to be true, they generally aren't. Also, the simple habit of counting your data (incredible how much can be lost or duplicated after transformations) and actually looking at it (as already pointed out) can save a world of pain.
Senior Project/Program Manager, Clinical Trials Payment Systems (Veeva), SAP S4Hana and COUPA implementation
8 年Some very valid points... another point is that teams appear to spend weeks and months in workshops (Planning) before anyone actually looks at real data and system interactions. Get the team working with the data asap with tools. This is not unique to Data Science projects. That's when the real work starts and real designs and modeling can be done. But you will lose the less detailed leaders.... understanding the business context of the data in addition to data quality level is important as well. Most data sources have limitations and bad assumptions can easily be made.
Global Head of Marketing Science, Choreograph
8 年Sound advice. I would add: collaboration is critical for success on bigger projects. Data scientists need tools and encouragement to work together. Encourage pair analysis and model reviews so if someone leaves or is off sick, others can step in. Good process helps (git, pipelines, etc).
Chief Technology Officer (PEP Health)
8 年Certainly true for our clients, particularly the advice to use real data from the start. Designing a data product without the data is sure path to failure.
Data Practitioner (DS, DE, DA)
8 年Good post! It is important to know modes of failure to plan ahead.