Start as early as possible...the data product discovery & design
Omar Khawaja
Data, analytics & AI practitioner and thought leader. Successful track record of people management, team leadership, design & execution of vision & strategy and operating model to drive business value
"The best time to plant a tree is twenty years ago. The second best time is now."
I am a big believer that in the world of data, we are not starting earlier enough in the process of data product discovery, linked to business outcomes, use cases and understanding user personas. This is either not taking place at all and the entire focus on the latest and greatest technology or done by chance and not as a continuous habit of data product teams.?
Truth be told, I was not doing that as a practice either till 2019 and learnt a lot since 2021. There are good product management & discovery practices that can be applied to data or adapted for data, analytics & AI as well, here is one good example.
Having said that, for an enterprise, focusing on internal use cases, for internal employees at least, there could be multiple teams working in the data space. It can range from highly centralised to decentralised teams or a hybrid model (my favourite). Even with a hybrid approach, the freedom and empowerment comes hand in hand with oversight & visibility (may be a good form of data governance).?
In order to support this freedom... I think we have been missing a discovery, design, collaboration tool, which sits very close to the data catalogue and data modelling space but currently, I don't see such functionality in a data catalogue and data modelling software. However, I have recently explored a few tools in this space and colour me optimistic & hopeful (hence this blog). I am curious to know what others are thinking about this topic? To give a comparison, in the absence of such a tool, this work is either not happening or happening in a combination of Excel files, Google Sheets, Air Tables, Miro, Murals of this world and then there is a “coordinator” who is chasing all the teams to aggregate such information.?
领英推荐
Curious to know your thoughts...
Disclaimers:?
These are my personal observations and do not represent my current or previous employers. Any resemblance is purely coincidental.
i would like to hear other people's views on writing mega scripts in Python that perform extraction, transformation and loading. Enhancing these packages and debugging for data issues is more expensive then writing a new one. Compound that with serverless components further increases the complexity. Is that typical for data pipelines or people using other tools in the cloud?
the data space is back 20 years with current tooling and skills availability. The tooling that all 3 clouds offer is not up to the mark compared with some we had access to in data engineering and embedded reporting and data mining. A lot of visual tools that simplified training people, replacing resources and skills as well as overall coding and debugging. Things like error handling and data lineage were built-in to data pipelines back then. So overall, I think the data space has moved backwards and nobody teaches or learns data modeling anyway anymore. Most of the expert abstract modelers who could navigate between geo-spatial, hierarchical, network or relational schemes are retired pretty much or too high up in the food chain to not be hands-on. So overall, I see gaps all around in the data space that we need to quickly fill with either better training or preferably better tools
Data Engineer and Architect | Best selling author and course creator | Recovering Data Scientist ? | Global Keynote Speaker | Professor | Podcaster & Writer | Advisor & Investor
1 年This is such a great thread right now. Thanks Omar!