Operationalizing AI Requires a Storage Architecture Strategy
Sam Werner
Vice President, Chief Product Officer. Product and Business Strategist, Inventor. AI, Big Data, Hybrid Multicloud, Storage, Containers
As a storage person, my favorite thing about AI is that it requires lots and lots of data. For someone in the storage business, that is obviously good news. But in all seriousness, it actually creates quite a few challenging technical problems: Data Scientists need to get access to and organize the data; storage administrators need to make the data available while still ensuring privacy, security, and governance of the data; and engineers building storage software and systems need to continue increasing throughput and decreasing total response time to ensure GPUs are always fed all of the data they need.
Many enterprises are starting small on their journey to AI. They purchase a couple of GPU enabled servers and let the data scientists loose on them. Data is copied to these servers where they run frameworks like TensorFlow and PyTorch to train their neural networks. This is a simple way to get started and build something like a chatbot. However, what they quickly learn is they are not in a position to scale. Some of the challenges they will quickly encounter:
- How will they ensure data governance when data is being copied off to shared servers being used by multiple data scientists?
- How will they ensure the data is always secure and that personal information about their customers is always protected?
- How will they ensure copies are destroyed when they are no longer being used and how will they ensure there are no issues with data consistency?
- How can they build machine learning and deep learning models that have access to real-time / near real-time data to provide the highest value and most timely insights?
The reality is, you need to start with an information architecture (IA) that is scalable across multiple projects, that provides data consistency and governance, and that can provide a single source of truth.
In order to build an IA you first need to understand the ML/DL data workflow and challenges. The goal of the data workflow is really quite simple; getting from ingest to inference as quickly and accurately as possible.
Machine / Deep Learning Data Workflow.
The daily tasks of the Data Scientist
Let’s start by taking a look at each of the above steps in the workflow.
Ingest: This is where all of the data comes together. Sources include IoT, mobile, transactional, supply chain, customer service/support, CRM, etc. This can include years and years of intelligence an enterprise has developed about its industry and customers. Storage in this phase needs to be cost effective (leveraging multiple tiers of media), multi-protocol, geographically dispersed, and multi-cloud enabled.
Classify / Transform: This is where a data scientist spends about 80% of their time classifying, tagging, and cleaning data in order to build training datasets for their neural networks. Tools that can help with policy based tagging and classification of data can greatly accelerate this phase and boost a data scientists productivity.
Training: This is the most compute intensive part of the workflow and is where the GPUs come into play. Most models can be distributed across multiple GPUs and systems to accelerate training. In this phase, distributed storage with high throughput and low latency that can be shared across systems is critical in order to ensure that expensive GPUs do not sit idle.
Inference: This is the stage where you actually use the model to generate an insight or infer something. Storage latency is critical in this phase if the model is going to be deployed where many applications or users will need access to the output. NVMe enabled storage is a great fit for the inference stage.
It is easy to dive into AI by buying a few servers, hiring a data scientist, and deploying some open source frameworks. However, in order to actually operationalize AI in your organization, you will require a scalable infrastructure strategy that considers the entire data workflow. This will ensure data remains secure and safe while data scientists and GPUs remain productive. Most importantly it will ensure you can put your AI models into production in order to drive real value. After all there is no AI without IA.
You can see a video me talking about this topic at the IBM Think conference below
Defense/Hospitality/Sales
5 年Thanks for sharing