Generative AI impact on data platform solutions
Introduction
Cognitive enterprises exist. Many organisations reshaped themself in last decade creating data driven companies and with the technological landscape that represent the enabler for data driven reasoning.
During last years there was a running to build lake house architectures, able to integrate data from many sources and ready to generate powerful insights through AI sandbox where data scientists can implement complex (and traditional, let say) AI algorithms: for instance, they can forecast energy consumption for an energy and utilities company, they can predict next fault for a manufacturer plant, they can identify an accident for an insurance company, they can segment customers for retails companies, and so on. An UI layer, developed by frontend developer, usually, shows extracted insights to business users who can acts being aware of the as-is scenario and with the predicted information. In addition, the most recent data fabric and data mesh approaches, introduced new architectural principles on top of traditional data platform, that are listed below (in particular, for data mesh): process data like a product (data product concept), assigning it to a data owner and publishing it to a data marketplace, where each data product must be discoverable. Each data product in the marketplace must have a documentation and the data owner must be specified.
This is a brief and simplified picture of a data driven company landscape.
Generative AI happened, and its impact on different business domains has being explored and certified.
Is there something in data platform adoption, management, and evolution processes I summarised above, where generative can support?
Here I will just focus on the impact of LLMs (Large Language Models) in a classic data platform ecosystem. Large Language Models are a subset of generative AI models, related to text generation. Below I report some data platform area that can benefit from LLMs.
Example of added value of generative AI for data platform solutions
Data governance
I talked above of most recent data mesh architectural principles, with a particular mention to the need to generate and maintain a data marketplace, generating a description for each data product. With generative AI models, it is possible to generate (and keep up to date) a description of a data product, using the metadata we have out of the box and data sample. The image below reports an high level workflow to generate a data product description for a JSON object stored on Amazon S3.
As you can see from the picture, we can generate a process acting as follow:
aws s3api head-object --bucket {BUCKET_NAME} --key {OBJECT_NAME}
Below a picture that shows what I reported above using IBM watsonx product as generative AI layer.
领英推荐
Data querying
Leveraging generative AI models it is possible to generate programming code from a natural language input.
With this approach you can imagine to interact with a data management system (for example, a SQL database) formulating questions in natural language and getting directly results, skipping interaction via SQL. The image below report an example.
As you can see from the picture, generative AI generates SQL query to get maximum value from a table named "iot_reading", from a natural language prompt.
Data visualisation
Leveraging code generation capability of foundation models, it is possible to support business users to navigate data, creating in a rapid way complex charts. For instance, it is possible to create a Python code that generate a plot from specific requirements formalised in natural language.
Log analysis
Generative AI can be used to analyse complex logs, with the objective to extract useful information, including:
Conclusions
In this article, I introduced the state of the art in data platform solutions. In this panorama, the impact of generative AI in data platform build and evolution has been introduced, also with some specific examples. In particular, I introduced also how IBM watsonx product can support data governance and data querying use cases leveraging generative AI models.
Interesting concept! How can generative AI specifically enhance data platform adoption and management? ??