How to make data scientists shine
Jose Almeida
Data Consultant/Advisor ?? ???? ???? ???? ???? ???? ???? ???? ?? Data Strategy ?? Data Governance ?? Data Quality ?? Master Data Management ?? Remote/Onsite Consulting Services in EMEA
The effort to take advantage of emergent new business innovations, of advances in digitization, analytics, artificial intelligence, machine learning, internet of things or robotics, is leading to an increasing demand for people with related skills.
The Challenges
Being a data scientist may be considered as the sexiest job within the data related jobs, but it has its challenges, specially when it comes to demonstrate the value created by their work.
In this article, let us look at some of those challenges, and how they can be overcome when organizations take on a systematic approach on how to manage their data.
Lack of clear question
This is often a communication problem, turning a business problem into a technical problem, when there is a gap in the language and concepts used by the business stakeholders and the data scientists. However, the causes run deeper, and can be related also with a lack of data literacy on the business side and business literacy on the data side, and with the lack of organization wide business concepts that can be clearly mapped into data.
All this leads to having data scientists to jump to work on data and tools without getting a clear understanding of the business requirement.
Inaccessible data
Almost every organization has overflowing data siloed across a range of platforms, software, formats, handling this data, accessing, finding, and consolidating the correct data is also a major challenge,
Finding the right data, scattered across multiple sources, with different and unclear business concepts, various rules and different levels of quality is usually dependent on manual entry of data and time-consuming data searching, leading to errors, repetitions, and redundancy.
Dirty data
Data preparation is often the biggest of the challenges, it is common the hear that 80% of a data scientist time is spent on cleansing and classifying data, even being accepted as part of the job, when in fact the quality of the data is a responsibility of everybody in the organization.
Adding to this, the cleansing is often performed autonomously, relying on individual judgment on what are the quality rules and on how to make data compliant with them.
Insights not used in management decisions
Strangely as it seems, considering the huge investments being made to create data-based insights to feed the corporate processes, those insights are frequently ignored, with top managers centering their decisions on their experience or “gut feelings”.
Trust on data is essential, and it is essential throughout all the data life cycle. It’s not possible to trust data-derived insights when there is no trust on the data itself.
Making Data Scientists Shine
In most organizations, data scientists are placed in an awkward situation, being asked to produce valuable insights based on the organization’s data. And although being given the technical tools and conditions to perform this task, they are denied the fundamental base for their work. Meaningful, reliable, accurate, quality data.
At the root of this problem, we have an asset that is not being managed.
Organizations need to have a clear stand on managing its most important asset – data.
Creating the right conditions for these data insights to exist and be trusted, hence adding true value to the organization it is a complex problem and must be approached in an “organic” way within the organization, allowing a data culture to grow naturally, founded on results, built upon success stories, that the business stakeholders and decision makers can relate to.
Knowing that any successful data strategy is necessarily a business strategy, that data’s purpose is to create business value, so any data strategy must be oriented towards the organization's strategic priorities and key business objectives.
The key to achieve these business objectives and to align the data strategy and any data initiatives with these objectives is to make business the driver for the data strategy.
Creating a set of initiatives that are driven and oriented by the business stakeholders and units, working on use cases that are grounded on solid business cases, allowing business to identify their needs and from there, working with business to identify the data that is necessary, the rules that govern that data, the business concepts behind the data, the quality standards and rules that should be met, will remove the overload that is being assumed by data scientists and build the necessary trust that will allow the insights produced to turn into actionable insights and add value to the corporate decision processes.
The need to address governance and quality cannot be an impediment for data initiatives, so move forward focusing on delivering the intended results and assuring that data quality management and governance have a major and ongoing role along the way.
Use an iterative, agile approach. Make continuous adjustments and keep going. This will give your data management initiatives the necessary momentum to keep going. Organizing these initiatives into smaller sprints will make the changes more manageable.
Being able to refocus data initiatives by starting small is the best approach, the same is valid for data governance and quality. By understanding the way in which to provide a framework that integrates into existing environments, using existing standards and allowing organizations to do something tactical with a high ROI.
Focusing first on the tactical deployment, organizations can then build on the first success, allowing the data strategy to evolve naturally growing on these small success stories.
Data Scientist / Data Storyteller
3 年Love this
Chief Officer, Strategic Planning at Kenya Power
3 年Jose Almeida: Very well, clearly and simply said. I like your concise articles!
CradleShyft的创始人兼CEO
3 年This is amazing. I love the fact that you've not just jumped into the how, but systemically explained the challenges involved. If data is the new oil, then, it's better be "unleaded" (purified). Very insightful.
Sr. Software Engineering Manager @Safaricom |DevSecOps | SDET | Product Development | Enterprise Architecture | AI & ML | IT Strategy | ISTQB 4x Certified | AWS 2x certified | ITIL 5 certified | Agile Certified
3 年Very insightful piece. Data is a big resource, and until organizations treat it as a strategic resource with everybody playing their part in churning out "clean" data, there will always be a mismatch between the sexy graphs/visualizations and organizational strategy.