Making AI Connections With Data Workflows

Making AI Connections With Data Workflows

?

Welcome to the latest edition of the Komprise Intelligent Data Management newsletter! We cover new ways for IT managers to be more productive managing enterprise data and storage to dealing with ever-changing compliance issues, working with departments on data strategies and understanding the new requirements for data management and AI. Learn more about Komprise, a SaaS solution for unstructured data management and mobility here and follow us on LinkedIn.

This month’s newsletter covers the intersection of AI and unstructured data, with a focus on AI data workflows. There are two significant intersections between unstructured data (file and object data, or anything not in a database) and AI.

First, we need to feed the beast (AI) with a steady flow of unstructured data-- even though it’s hard to corral this data across hybrid cloud infrastructure and add needed context. Preparing data for AI was identified as the top priority for IT in the Komprise 2023 State of Unstructured Data Management.

Second, unstructured data needs organization and classification so that sensitive data is handled correctly and users can search for what they need faster. AI can help do this. This latter point is a significant pain, according to a survey in the second half of 2023 of 334 CDOs and data leaders.

The research, sponsored by AWS and the MIT Chief Data Officer/Information Quality Symposium, found that 46% of these data leaders identified “data quality” as the greatest challenge to realizing GenAI’s potential in their organizations.?

Enter… AI Data Workflows!

AI data workflows are a systematic way to find, enrich and move data to the right places. Technology that enables these workflows automates two essential processes:?

  • Discover, segment, classify and automate the movement of data to AI tools.
  • Enrich the metadata of unstructured data, which makes it easier to find and use in a variety of analytics and AI projects.

?

Why is this important now?

There are more than a few urgent needs in enterprises and society today which could be solved with better orchestration of data for use in AI. Consider:

1.??? Compliance. Consider the impact that an AI tool could have to quickly identify sensitive data and make sure that it’s being managed with data compliance. Over half (51%) of IT managers surveyed by IDC reported non-compliance with data regulations in the past 12 months, with an average total cost of $1.03 million.?

2.??? Patient care. Healthcare is an industry with vast potential for AI to improve collaboration, accuracy and importantly—patient outcomes. The Journal of the American Medical Association (JAMA) recently reported on an AI-based model in use at Stanford Hospital that predicts when a patient is declining and alerts the patient’s care team. AI development has a diagnostic performance that is comparable with medical experts, especially in image recognition-related fields, according to the National Library of Medicine. Another study found that physicians are most excited about the application of AI to help alleviate the burden of administrative work.

3.??? Law enforcement and criminal investigations.? Police departments worldwide are struggling to thwart escalating crime and violence despite a continuation of tight resources. AI will be a critical component in public safety moving forward to fill the resources gap and deliver faster, more accurate methods to investigate incidents and prevent crime. Belgian police have developed a platform that allows investigators to cross reference more than 50 separate internal databases and yield results in seconds while New Jersey has used a similar approach to dramatically curtail gun crime, as documented on Prepared911.

4.??? Logistics. ?AI can optimize logistics resulting in more efficient use of warehouse space and lower overhead. ?AB InBev, the worldwide distributor for beverages like Budweiser and Corona, used AI to determine the optimal supply and demand for its products. This allowed them to cut down warehousing expenses and overhead costs significantly. ?Applying these tools to critical products such as medical supplies or baby formula in underserved communities could save lives, beyond helping companies operate more profitably.

Storage IT professionals today have a prominent role to play in facilitating AI and big data analytics initiatives across all sectors.

They must deliver fast, resilient and secure storage infrastructure to support everyday critical business transactions—and soon, AI data workloads. Equally, they need to classify and deliver the right data to storage platforms to support the work of data scientists and other data stakeholders across the enterprise. Let’s consider the emerging concept of automated data workflows for AI.

Automating the connections between AI and unstructured data

The two use cases of feeding the right data to AI and enriching metadata classification using AI both involve data workflows.? Yet, these AI data workflows are difficult to manually execute—in no small part due to the size and complexity of unstructured data in large organizations—and can benefit from systematic automation.

To automate AI workflows, you need to:

  • Find the right data across massive, distributed data estates: To create an AI data workflow, you first need a way to search across all data stores, which can be terabytes to petabytes of data to find the relevant data of interest.
  • Manage data governance: When executing AI data workflows, IT must audit what corporate data was fed to which AI process. This can protect the business in the event of a lawsuit or other issues that arise from the outcome of the AI process. Similarly, it is important to enforce guardrails such as not sharing sensitive data with external processes. Develop clear, enforceable corporate policies for AI data governance. Automation solutions designed for these workflows are now becoming available.
  • Cut AI costs by persisting results: Many AI solutions have a pay-per-use billing model, which can become prohibitively expensive if the same data is processed repeatedly. If you can create a global index that keeps track of the labels and tags from AI, then users can search for data without having to run the AI process again.? ?
  • Leverage automation: The ability to automatically run the AI workflow on? new data ensures that the AI is trained on the latest data without requiring cumbersome manual effort.

?

Sample use cases for AI data workflows

Healthcare and Life Sciences

  • Create a custom query across data silos to find all data for Project X, using a data management solution.
  • Next, the process could execute an external function on Project X data to look for a specific DNA sequence for a mutation. The data management software is configured to tag such data as “Mutation XYZ” and then moves only that new data set to a cloud AI service for analysis.
  • Once the mutation data is no longer needed, the workflow completes the cycle by moving it to a low-cost archival storage tier.
  • The workflow could repeat with new data sets as often as needed.

Taking this one step further, what if you could apply an AI tool to your data to rapidly segment and enrich the metadata with new tags? A data scientist may not know where all the data from a certain project resides and therefore cannot automate the process to tag it. Searching manually through files is usually not viable and with AI—it’s no longer necessary either.

Marketing and Customer Service Workflows

A marketing director at a consumer goods brand wants to create a campaign about sustainability. She needs to search across millions of images to find those containing visuals of the company’s and its partners’ sustainability efforts based on a set of keywords. She asks the IT department to help. They create a workflow in the data management system by connecting to an AI tool like Amazon Rekognition that filters the large data set for the requested images. The resulting data set is automatically tagged for future use, saving hundreds of hours of manual effort.

Or, consider the application of Azure Bot Service, which allows developers to build and deploy intelligent chatbots and virtual assistants for customer service. An AI data workflow could analyze data from customer responses and then tag that data based on sentiment or customer issue and move it to a cloud data lake for future analysis.

Today though, these use cases are not easy to implement because there is still a great deal of complexity regarding prepping data and understanding how to use the AI tools.? A 2024 study by IBM revealed that nearly half (45%) of companies report that advances in AI tools that make them more accessible are driving AI adoption. The research also found that only 34% are currently training or reskilling employees to work together with new automation and AI tools.

With many organizations lacking specialized skills in coding and AI tools, there will be ample opportunity for developers and software companies to create streamlined, point-and-click solutions for these workflows.

We’ll also see the development of open ecosystems of complementary technologies from which non-IT users can select and build AI projects from the point of collating and classifying the right data sets, to applying security and governance, to feeding data to AI and monitoring the outcomes and then moving the data sets to an archiving location upon completion of the project.

As the AI industry evolves and matures, we’re seeing a potential complexity barrier that could slow down the positive developments AI can bring to people, business and governments. Rising above these challenges requires extreme coordination between individuals across the organization – think CXOs, data scientists, security professionals, storage and data management experts and IT infrastructure people along with HR and legal. These collaborative programs will be essential to avoid harmful or false outcomes from AI and ensure that goals are aligned.

Data storage and data management leaders can contribute in this new age by connecting the dots between the unstructured data gold they manage and the best AI tools for the business. Developing and nurturing secure, intelligent AI data workflows is a sensible first step.


Komprise Smart Data Workflow Manager

In related news, Komprise announced in May a significant new product update for data workflow automation: Smart Data Workflow Manager. This functionality delivers enterprises:

  • An intuitive point-and-click user interface wizard that helps users easily set up an AI data workflow without the need for specialized coding or AI skills.
  • Quickly search for the right unstructured data set, configure and tune the AI service that is part of the workflow, monitor and audit workflows.
  • Pre-built integrations to popular AI services, such as Amazon Rekognition, which we will expand over time.


Komprise CEO Kumar Goswami gives his take here:


?

Subscribe to the Komprise Blog for regular updates on the latest trends in unstructured data management. Comment on the post or send a note to:

[email protected]

?

?

?

?

?

Parker Happ

Friendly Neighborhood Unstructured Data Hero

8 个月

Interesting additional take - a client is scoping to use Komprise for AI data lake cleansing and remove data that should NOT be in a lakes to reduce continued op-ex cost of said data lake.

要查看或添加评论,请登录

Komprise的更多文章