Defeating Data Gravity? - Hammerspace

Defeating Data Gravity? - Hammerspace

I worked in life sciences for a few years. During my time in the industry, we strived to overcome data gravity. Identifying and accessing life science data globally is a significant challenge. The data gravity theory dictates that moving processing resources closer to the data is cheaper than moving data closer to the data processor. The theory is being tested in the era of AI and the limited access to accelerated computing. At AI Field Day 4 (#AIFD4), Hammerspace argues for intelligently moving the data closer to your accelerated computing, creating an AI pipeline.

In my conversations with data scientists, these practitioners can spend several weeks organizing and preparing data for either model training or inferencing. This preparation phase may be extended If that data is spread across several sources. Hammerspace offers a parallel file system as a proxy between multiple data sources.

So, data scientists have a single view of the various file system metadata in the preparation phase of AI. How does this solve the data gravity problem? If your GPUs are located 150ms away, the latency may prove too much to be useful. This is where Hammerspace's replication features come into play.

Hammerspace allows data syncing between locations while maintaining a consistent metadata file system. Paraphrasing Floyd Christofferson , Hammerspace VP of Marketing, it's not a copy of the data but a local cache presented by the global filesystem.

A solution to Data Gravity and complexity?

So, is Hammerspace the ultimate solution to data gravity in creating AI pipelines? I have yet to put the solution into production to know all the nuanced challenges. However, Hammerspace does offer a better user experience on the surface. The solution isn't magic. If you have PBs of data across a global landscape, you have to strategically place GPUs around centers of data.

As data administrators, you must consider data movement and the governance of that data. While the solution may allow the viewing and accessing that data cross geo-political boundaries, you must consider the reprocussions of caching that data near your processing centers.

On the surface, it's another powerful tool in a set of tools as you look to overcome the challenges of data gravity and GPU availability.

You have to split the discussion into two parts. First the metadata, holding the file system structure and details on individual files, including how they map to physical disk. Second the physical content itself. Metadata optimisation is essential for distributed file systems. Hammerspace has lots of lazy reading techniques for onboarding content without needing to walk the file system tree. Metadata syncing and searching also need to be super-efficient. It's no surprise, BTW that InfiniteIO founder Mark Cree is now at Hammerspace. Then there's the physical content. All data can be broken down into chunks, which can then be deduplicated and fingerprinted. Imagine having a copy of data on GCP and AWS. If either copy changes, only the differences need to be moved. Of course, you could also cache a copy as read on-demand. The next level of data mobility is to (accurately) predict data I/O profiles. That's what I was working on in 2016 (and had working). There's an argument for building AI into this process and enabling the file system to "learn" over time. Then you only move data around where it's needed, attempting to predict the process to reduce latency. So far, Hammerspace seems to have the best solution.

Adam de Delva

Founder: DTR | MSFT Alum, K8s, 4IR, Computational Consciousness | ???????

1 年
回复

要查看或添加评论,请登录

Keith Townsend的更多文章

  • Embracing AI as a Collaborative Partner: A Guide for Business Leaders

    Embracing AI as a Collaborative Partner: A Guide for Business Leaders

    Integrating Artificial Intelligence (AI) into core business processes isn’t about replacing human expertise but…

    2 条评论
  • IBM’s Acquisition of HashiCorp: A Synergy of Potential and Challenges

    IBM’s Acquisition of HashiCorp: A Synergy of Potential and Challenges

    IBM’s acquisition of HashiCorp marks a significant milestone in the tech industry. With this move, IBM can leverage…

    4 条评论
  • What I think about Enterprise AI from an AI.

    What I think about Enterprise AI from an AI.

    This is how XAi's Grok summarizes my thoughts on AI. What do you think, does it represent what I've been discussing for…

    4 条评论
  • Why AI Initiatives Often Stall: Lessons from the Cloud-First Debate

    Why AI Initiatives Often Stall: Lessons from the Cloud-First Debate

    CIOs and CTOs are treading cautiously when adopting new AI tools, and it’s not hard to see why. Despite the buzz around…

    13 条评论
  • AI is not an evolution. It’s a revolution.

    AI is not an evolution. It’s a revolution.

    We’re entering a systemic change—one we haven’t seen since the Industrial Revolution. AI is compressing the workday…

    3 条评论
  • Taking Private Cloud to the Next Level

    Taking Private Cloud to the Next Level

    Theoretically, you can automate the private data center to a point where it's indistinguishable from the public cloud…

    2 条评论
  • Covid-19 & IT Supply Chain

    Covid-19 & IT Supply Chain

    I'm getting consistent inquiries about mitigating the supply chain risk to IT associated with Covid-19. As smaller…

  • Getting Better Partners

    Getting Better Partners

    First, for those of you dealing with health and financial stress, my prayers are with you and your family. I’ve heard…

    5 条评论
  • Should You Backup Cloud Workloads?

    Should You Backup Cloud Workloads?

    If there were such a thing as Tier 5 Data Centers, AWS, Azure, and Google Cloud, all would qualify. When AWS CTO Werner…

    10 条评论
  • Webinar: Dealing with Multi-Cloud

    Webinar: Dealing with Multi-Cloud

    Multi-cloud is a thing. Whether planned or via organic creep from services such as Salesforce and Workday, every…

社区洞察

其他会员也浏览了