Next-Generation AI with VAST Data: Beyond Storage and Compute
Gina Rosenthal
Product Marketing Leader | AI Enthusiast | Founder & CEO at Digital Sunshine Solutions | Co-Host of Tech Aunties Podcast
I was able to attend the AI Tech Field Day a couple of months ago. I have been thinking about the VAST presentations. Here's my report.
VAST Data overview
VAST Data is a cutting-edge technology firm that has established itself as a pioneer in the artificial intelligence (AI) and deep learning infrastructure space. Their flagship product, the VAST Data Platform, seamlessly integrates storage, database, and compute functionalities into a cohesive, scalable software solution designed to facilitate AI and deep learning applications across various sectors.
With an impressive deployment of over 100 petabytes (PB) in government agencies and other domains within a mere two-year span, VAST Data has demonstrated significant impact and utility. Financially, the company has achieved remarkable success, more than doubling its valuation to $9.1 billion post-money after securing $118 million in a Series E funding round, reflecting strong market confidence in its innovative approach and future prospects (Forbes article).
VAST according to VAST
John Mao VP, Technology Alliances and Neeloy Bhattacharyya , Director, AI/HPC Solutions Engineering kicked off their AFD4 presentation by making a pretty bold statement: VAST is the "fastest growing data company in history". As an ex-EMCer, that statement certainly got my attention.
?
They shared some pretty compelling customers statistics. VAST customers have deployed over 10 exabytes of data, and 60% of deployments are AI/HPC workloads or other usage.
?
?And they have some pretty impressive customers, such as Core Weave, Zoom, Pixar.
??The founder’s vision
VAST Data was founded in 2016 by Renen Hallak , who previously led R&D at XtremIO (acquired by Dell EMC), Shachar Fienblit , who formerly worked at Kaminario, and Jeff Denworth (Wikipedia).
?The original VAST investor pitch deck described the idea of building a data center scale computer that could be a thinking machine. The concept was that a thinking machine could process, output, and feed data back into itself to refine its data.
?Hallak’s plan to execute on the vision of a data center-scale perpetual thinking machine was to first build a storage system, next build a data management system, and then build a transactional storage system.
Step one: build a different storage system
In 2019, VAST launched its first product: the VAST Universal Storage platform (now renamed VAST DataStore). It offers NAS (file) and object storage. They pioneered methods to run NFS at local NVMe speed by reinvigorating RDMA for NFS. They also built a driver to use NFS to go at speeds that can power GPUs.
?According to Denworth, the plan was never to build a new storage company but a new generation of infrastructure. Denworth explained the reason for building a different storage system in an interview with Next Platform in 2022:
?“We started to unravel this realization that you could build one storage system within an environment to basically support the needs of all of your applications and all of your workloads. That spectrum of performance and capacity doesn’t need to be expressed at the storage level. It can be expressed at the application level, where each set of different applications consumes more or less, depending upon what they need. If you have a mix of applications that are archival and applications that are high performance, they can start to coexist in a system that isn’t designed for absolute performance per SSD but is designed to give you more than enough performance in the aggregate to meet the needs of all of your workloads.”
?In other words, VAST is building a platform. Transformative storage was only the foundational layer.
Step two: Build the data management system
In May 2023, VAST launched the Data Catalog. This foundational technology is a self-indexing catalog that can be queried about various POSIX properties. The database gives users granular access to exabytes of data. It will provides structure to unstructured data at a very large scale.
This catalog is essentially a transaction capable data lake at exabyte scale. This allows customers to store all of their data in a VAST system, but also to query the data and instantly find what they are looking for.
领英推荐
But there is more to this data catalog than just the speed at scale. This data catalog is part of the system. There is no integration, so it is never out of sync. According to Dentworth’s release blog in 2023, there are many possibilities that come from this improvement:
Chris Mellor Blocks and Files explains the importance of this innovation:
[This is an] “AI-enabled NAS, not an HPC parallel File system, which means that existing file-based enterprises can adopt it with no need to learn about HPC and parallel file systems”.
?Mellor was right on the money (as usual!). In May of 2023, VAST was the first enterprise NAS solution certified to be a datastore NVIDIA DGX SuperPOD. The NVIDIA DGX SuperPOD systems are tailored for training and inferencing multi-trillion parameter generative AI models. The goal was to make high-performance storage easy to manage without deep in-house HPC expertise. According to the VAST blog announcing the certification, “together, the joint offering gives customers a turnkey, co-designed, and validated “AI supercomputer” that delivers the performance needed for all modern, accelerated AI workloads”.
?Mellor also got this quote from Hallak in 2023, confirming the company’s intention to make things easier to develop and support the architecture required by AI applications:
?“I knew it [AI] was coming. I did not expect it to come this fast. I thought we had a little bit more time.” Generative AI is fully aligned with VAST’s engineering development direction.”
?Step 3: Add the transactional layer
Later in 2023, VAST held an event called Build Beyond to introduce the rest of the vision’s execution plan. They announced the DataBase, DataSpace, and previewed DataEngine. The also renamed the storage layer DataStore.
?We talked about the DataBase in the previous section. The VAST DataSpace provides a unified namespace capable of handling exabyte-scale datasets. This allows a way to ingest lots and lots of data, but also be able to address all of that data with one system. You can see why so many of their customers are using the platform for their AI workloads.
?This VAST video explains the concept really nicely.
?
Finally, wrap it everything in strong partnerships
?In an earlier section of this post we covered one of the early VAST – NVIDIA partnerships. They announced a new collaboration in March 2024. This time they announced that VAST compute nodes ?(CNodes) can be installed to run storage services on NVIDIA BlueField DPUs that are inside servers. Also the DPUs can also now power VAST DNodes (capacity) inside NVMe enclosures (see the VAST datasheet).
?And it’s pretty hard to beat this endorsement though at GTC:
?The company has also partnered with Supermicro to build an end-to-end AI solution that is powered by NVIDIA certified systems.
So what’s this really about?
We are starting to see the real work ahead of us when we talk about “digital transformation”. It will be impossible for IT organizations to support AI without a new way of storing, managing, protecting, and accessing data. VAST data is building a platform to make that possible without needing to be a rocket scientist. But I'm sure even for the IT organizations who support the rocket scientists will be happy to get back operational time so that they can better support their rocket science missions.
One thing that struck me as I did the research for this article is the fuzziness that still surrounds this emerging technology. IT operations has decades of experience behind it. It's easy to think that more experienced folks aren't excited about the new things coming out. Most ops people I know are thinking: it's about dang time!
Humans learn new things by bridging their current knowledge to the new things they have to learn. For example, I supported SANs (and rocket science, for what that’s worth) - as a customer. I started working with HPC and DL/ML from a product marketing perspective several years ago at VMware. And even though this session was called “Operationalizing AI at Scale”, I had a really hard time finding my bridge. I got stuck on an assertion made about NVMe, and that led me down a path that was a dead end.
Dear marketers of every flavor (product, technical, and everything in between), as we continue the journey through this latest round of digital transformations, be sure to share new things in very clear language. And make a clear path for people to ladder their experiences from the old client-server days into the new era of AI.
Digital Sunshine Solutions would be happy to help translate your message to help people bridge their gap in understanding. Schedule a free 30-minute consultation with us and let’s chat!
#VAST #GTC2024 #AI #NVIDIA #Supermicro #Storageinnovation #AIFD4 #thinkingmachine #productmarketing #messaging #learning
Senior SE @ VAST
8 个月Great write-up Gina! ?? ???? ??
Senior Managing Director
8 个月Gina Rosenthal Fascinating read. Thank you for sharing