Is Your 'Data Backyard' Ready for AI?
In technology we regularly gather round the water cooler (or coffee machine) and hypothesize the next best things. The danger here is we've spoken about advance topics so often sometimes we forget that either we simply "have not yet done it" or fail to acknowledge significant dependencies required to achieve these new shiny outcomes.
Data science and AI can fall into this situation. In a complex data environment all we have need to have done is:
- Access large diverse data sets across our organisation(s) - we onboard data
- Understand what each column and row of data means so we use the correct sources and we don't accidently consume sensitive data (e.g. private information) - we catalogue data sources
- Data must be filtered, reduced, mapped and orchestrated. Very large datasets (think terabyte or petabytes) are transformed into usable results that can be applied to data science - data is extracted, transformed and loaded
All the above can be labelled as "data engineering" and is a major portion of data operations (DataOps). This comprises of up to 80% of the AI effort.
At this point a data scientist can consume this data and start weaving their magic. AI development can be as little as 20% of the data science project.
Does this give you a sense of why many AI projects fail?
In the Harvard Business Review article Is Your Data Infrastructure Ready for AI?, the need to construct an 'ontology' (somewhat summarised in steps 1 through 3 above) is highlighted. A significant project to "clean up your data backyard" is probably required in most large organisations.
The art of onboarding and cataloguing data is a significant portion of the agenda at Hitachi Vantara's inaugural DATAOPS.Next APAC on the 4th June.
- Please register here for this Virtual Conference. Its complimentary!
- Reach out to reserve 'one-on-one' time with experts who can articulate this outcomes in the context of your business
- Challenge us as to how we can support you achieve these outcomes in your organisation