What Is AI-Ready Infrastructure?
Welcome to the latest edition of the Komprise Intelligent Data Management newsletter! We cover new ways for IT managers to be more productive managing enterprise data and storage to dealing with ever-changing compliance issues, working with departments on data strategies and understanding the new requirements for data management and AI. Learn more about Komprise, a SaaS solution for unstructured data management and mobility, and follow us on LinkedIn.
In this edition, we cover the building blocks for an an AI-ready infrastructure. Many organizations are still at the planning phase on this endeavor, but regardless of their plans, IT leaders will need to consider data management, cloud services, on-premises storage and compute, model selection and talent acquisition.
Many technical decisions go into supporting AI—from which LLMs to use, where to deploy AI, what infrastructure is required, and how to train employees. Recently, Komprise surveyed enterprise IT decision-makers in the U.S., and nearly half (44%) said that creating AI-ready data infrastructure is today's top priority. IT organizations are:
As always, budget factors heavily into AI technology decision-making, but so do security, governance compliance, and the availability of IT staff with the right AI and ML skill sets.
Creating AI-Ready Data Infrastructure
Launching an AI initiative in your enterprise may require model development and training if you need to build your own generative AI model. This typically begins with acquiring adequate high-performing computational resources—the pricey CPUs, GPUs, and TPUs that are required to host machine learning models and process data at warp speed. While pre-baked infrastructure, public models, and cloud services offer cost and ease-of-use benefits, IT organizations must also weigh the benefits of keeping AI in-house for better controls or, rather, establishing a hybrid model that provides the right levels of data governance, transparency, and security.
The average cost of an AI server is $32,000. “Gartner distinguished analyst John-David Lovelock points out a rack of AI servers will cost over $1 million.” Flash-based storage technologies designed for AI may also add to the costs. Then there’s the support and maintenance of all this gear, requiring full-time IT staff and a state-of-the-art data center.
Using Corporate Data With AI
Regardless of whether you are building your own model from scratch or, more likely, fine-tuning and using pre-built models, you need data management to bring the right unstructured data to AI. Unstructured data management automates AI data workflows and manages corporate data governance, especially with sensitive data.
Unstructured data, which according to IDC, accounts for 90% of all data, is typically scattered across many silos, and that’s part of the role of data management: to facilitate rapid search, tagging, and feeding of the right data to AI models.
Cloud Services for AI
The major cloud providers have built soup-to-nuts service to support AI for organizations that can’t or don’t want to manage the technology in-house. The components range from fast storage and compute resources to machine learning, GenAI, and development tools. While cloud-based AI has distinct cost advantages – you don’t need to buy servers or storage nor pay for the increased energy, which will add to your data center footprint – you can easily overprovision and overspend in the cloud. There is also the issue of cloud skills gaps.
A cloud AI strategy can be both successful and cost-efficient if you can manage data appropriately. For example, copying petabytes of unstructured data in the cloud and then trying to figure out which data is useful for AI would run up a huge bill quickly.
You’d also want to avoid feeding an AI application without cleaning up the data mess first: most organizations have large quantities of duplicate, obsolete, or zombie data that should be purged. Make sure your data is in good shape—classified and organized—before moving it, and only move the data you know fits the scope of your project.
Pick use cases with a predictable ROI, and be sure you can measure the results later. Security and compliance requirements may preclude the option of hosting AI in the cloud. At a minimum, understanding the risks of your data in any AI service and knowing how to audit projects for data risk are critical steps before beginning any project.
This article in The New Stack goes into the basics on managing data in a hybrid cloud environment, including achieving data visibility so you can apply rules and policies to keep IT and data assets under control.
Machine Learning Model Decisions
Popular machine learning models, such as GPT, Claude, Gemini, TensorFlow, and PyTorch, rely upon massive public data sets for training. Yet, to make AI useful and credible for enterprise projects aimed at improving operations, R&D, or customer relationships, you’ll want to train a model with your own proprietary data and keep it private.
Training and/or developing a model requires the skills of specialized data scientists who understand top programming languages like Python and R, big data modeling and analysis, knowledge of machine learning models, as well as security and cloud computing.
An ambitious, well-funded analytics and data science team may even choose to develop a model from scratch. The reasons for this include the desire for full control over architecture and security and/or to support a highly sensitive, competitive project.
While there are communities like Hugging Face and OpenAI that help choose the components and collaborate with others, this is a tremendous lift. It entails cleaning and preparing data, selecting and training algorithms, and fine-tuning the model for accuracy and reliability. You’ll need to procure not only the infrastructure but a team of engineers to do the work.
Due to the resource constraints of most organizations, using pre-trained proprietary or open-source ML models with corporate data is likely the most common pathway to AI.? Indeed, AI inferencing is a much larger, broader market than AI training. AI inferencing accounts for up to 90% of the machine learning costs for deployed AI systems, according to researchers published in Science Direct.?Hence, IT organizations are increasingly investing in creating the appropriate data infrastructure to find, curate, audit, and feed corporate data to AI while maintaining data governance.??
The Rise of Off-the-Shelf AI Tools
The Komprise survey found that only 30% of organizations have designated a budget for AI, implying that 70% are still experimenting and researching the technology. And today, that probably means using low-cost applications such as OpenAI ChatGPT, Anthropic Claude, Microsoft Copilot, or Google Gemini. Employees across departments use these tools to answer questions, write text, create graphics and images, or write software code – with laser speed and good enough results.
What’s missing are standards and mainstream best practices:
Start by understanding your data estate in terms of data characteristics and quantity of sensitive data such as PII and IP. That analysis will help guide the organization in developing policies for GenAI use that govern data and use cases. You’ll need a tool to monitor compliance and investigate issues that arise from using GenAI, when and if they arise.
Can you track which data has been sent into the AI tool by which users or departments? Can you find and move sensitive data out of directories where it can be discovered and pulled into an AI tool? Some unstructured data management solutions provide this functionality; AI data governance is a growing area of demand to prevent blowbacks from AI that can damage customer trust, loyalty, and marketplace credibility.
Read this blog to learn about five key areas for AI data governance to consider across security, privacy, lineage, ownership and governance of unstructured data for AI.
The Need for GenAI Governance
Given the general marketplace concerns with AI, its known ability to create false outcomes and damaging hallucinations, the risk for corporate data leakage into general-purpose LLMs, and the expense of developing and implementing AI technologies, IT leaders will want a watertight plan and process to evaluate and deploy the AI stack.
Learn more about how Komprise powers AI data workflows to help expedite search, tagging and monitoring of corporate data use for AI tools and services.
In addition, Komprise is helping customers like Duquesne University speed and improve the process of indexing data, running AI and tagging data.
Last Words…
Managing unstructured data today has become more complex all the time. IT leaders not only need to ensure fast, secure access to data for employees but with better cost economics, no disruption when moving it to new storage, meet governance and compliance requirements and prepare this data for AI. We'll be covering the trends as they develop right here!
You can subscribe to the Komprise Blog to receive new posts in your inbox and check out what's new by visiting our Resource Center.
Comment on the post or send a note to: