AI-Ready Data Infrastructure: The Answer to Large Models

AI-Ready Data Infrastructure: The Answer to Large Models

There is no AI without sufficient data. Today's data infrastructure is the fuel for large AI models and the engine for end-to-end AI services, but this is only possible when iterated with large volumes of high-quality data.

AI looks to disrupt conventional data infrastructure by enhancing performance, reliability and scalability, largely in four major areas: data asset management, cluster utilization, data consistency, and data resilience. AI transformation strategies aim to call upon a broad network of unstructured data, in which enterprises that are data-ready are best positioned to grow with AI.

So the question is, how can we ensure AI-ready data infrastructure?


Designing data infrastructure on large models

AI is pushing the boundaries of data storage software and hardware systems. Designed for AI applications and services, AI-ready infrastructure should be equipped with the following features:

1. Large-scale ingestion and preprocessing

In many enterprises, data is scattered across multiple data centers or storage devices within the same facility, leading to management challenges. A lack of a unified view can be a significant bottleneck in the AI training process. However, a unified namespace can efficiently manage, schedule, and share data across regions, making data visible, manageable, and available. This leads us to the AI data lake solution, which streamlines data preprocessing and supports multiple protocols to provide high-quality datasets for AI training.

2. High performance and strong consistency

Training dataset loading and checkpointing are pivotal for improving computing power utilization and making storage persistent, with high performance being a critical requirement for both processes. Moreover, maintaining strong consistency across the cluster can significantly enhance availability and stability, which is essential for seamless operation at scale. For large-scale AI clusters, high-performance cluster file storage with strong consistency should be leveraged to deploy a real-time data sharing platform, streamlining the entire AI process.

3. Superb resilience

It is estimated that a single day of service suspension for an intelligent computing center could result in a staggering direct loss of nearly 2 million Chinese yuan (around US$276,000). With valuable data assets such as high-cost training data and checkpoints at stake, device-level resilience becomes paramount to ensuring high availability and preventing data loss or damage caused by physical faults. Data infrastructure must be able to support extreme cross-site uptime demands, targeting 6 nines or even 7 nines availability.

However, physical protection alone is insufficient to address logical faults and emerging threats. The integration of AI has given rise to rapid ransomware variant iterations and new resilience vulnerabilities, necessitating a multi-layered defense strategy. Intrinsic resilience forms the last line of defense for data by embedding protection throughout the enterprise's infrastructure, covering storage system, data disaster recovery and backup, ransomware protection, as well as comprehensive data management.

Embracing the AI transformation in enterprise services

Generative AI has sparked a watershed moment for enterprise-grade services. Demonstrating immense potential across diverse fields, AI solutions are converting risks and challenges into opportunities, offering full-stack AI platforms that support any product, framework, and pipeline where GenAI is deployed. This is the future, and the future is exactly where Huawei is headed.

Download Huawei's AI-Ready Data Infrastructure Reference Architecture White Paper.



Abubaker Mustafa

Cybersecurity researchers and vulnerabilities developer

1 个月

( :

回复
Muchiu (Henry) Chang, PhD. Cantab (Cambridge, UK)

Consultant in Patent Intelligence and Engineering Management

1 个月

Huawei IT Products & Solutions Science is always of the human, for the human, and by the human. The recent IT security catastrophe from CrowdStrike was solved by human, NOT by AI nor by any science/technology. Is there any AI that can answer the following questions of business intelligence? "Who, in the Ontario province of Canada, have new US patents granted on the nearest Tuesday (Eastern Time), when the USPTO releases the newly granted US patents on a weekly basis?" "Who, in the "江蘇" province of China, have new US patents granted on the nearest Tuesday (Eastern Time), when the USPTO releases the newly granted US patents on a weekly basis?" With our intellectual property (IP), a Chinese-English multilingual metadata, we can get the full list answers for the above questions. This is a fact. Do you or any of your contacts need our expertise/IP to do the data analysis that AI can't do? Metadata is an enabler. It is like a treasure map for treasure hunting. Without metadata, like a treasure map, NO data can be found/retrieved, even by the most advanced technologies, like AI, high-end chips, supercomputers, etc. https://lnkd.in/g-aJFnXR

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了