登录查看更多内容

Defeating Data Gravity? - Hammerspace

Keith Townsend

Chief Technology Advisor - The Futurum Group

发布日期: 2024年3月4日

I worked in life sciences for a few years. During my time in the industry, we strived to overcome data gravity. Identifying and accessing life science data globally is a significant challenge. The data gravity theory dictates that moving processing resources closer to the data is cheaper than moving data closer to the data processor. The theory is being tested in the era of AI and the limited access to accelerated computing. At AI Field Day 4 (#AIFD4), Hammerspace argues for intelligently moving the data closer to your accelerated computing, creating an AI pipeline.

In my conversations with data scientists, these practitioners can spend several weeks organizing and preparing data for either model training or inferencing. This preparation phase may be extended If that data is spread across several sources. Hammerspace offers a parallel file system as a proxy between multiple data sources.

So, data scientists have a single view of the various file system metadata in the preparation phase of AI. How does this solve the data gravity problem? If your GPUs are located 150ms away, the latency may prove too much to be useful. This is where Hammerspace's replication features come into play.

Hammerspace allows data syncing between locations while maintaining a consistent metadata file system. Paraphrasing Floyd Christofferson , Hammerspace VP of Marketing, it's not a copy of the data but a local cache presented by the global filesystem.

领英推荐

Data Science #19

Andriy Burkov 1 年前

Simple Methods for Your Business Journey

Dr. Kruti Lehenbauer 5 个月前

Challenges for 2024 in Data Science, Fraud Detection &…

Closer Consulting 1 年前

A solution to Data Gravity and complexity?

So, is Hammerspace the ultimate solution to data gravity in creating AI pipelines? I have yet to put the solution into production to know all the nuanced challenges. However, Hammerspace does offer a better user experience on the surface. The solution isn't magic. If you have PBs of data across a global landscape, you have to strategically place GPUs around centers of data.

As data administrators, you must consider data movement and the governance of that data. While the solution may allow the viewing and accessing that data cross geo-political boundaries, you must consider the reprocussions of caching that data near your processing centers.

On the surface, it's another powerful tool in a set of tools as you look to overcome the challenges of data gravity and GPU availability.

Chris M Evans

1 年

You have to split the discussion into two parts. First the metadata, holding the file system structure and details on individual files, including how they map to physical disk. Second the physical content itself. Metadata optimisation is essential for distributed file systems. Hammerspace has lots of lazy reading techniques for onboarding content without needing to walk the file system tree. Metadata syncing and searching also need to be super-efficient. It's no surprise, BTW that InfiniteIO founder Mark Cree is now at Hammerspace. Then there's the physical content. All data can be broken down into chunks, which can then be deduplicated and fingerprinted. Imagine having a copy of data on GCP and AWS. If either copy changes, only the differences need to be moved. Of course, you could also cache a copy as read on-demand. The next level of data mobility is to (accurately) predict data I/O profiles. That's what I was working on in 2016 (and had working). There's an argument for building AI into this process and enabling the file system to "learn" over time. Then you only move data around where it's needed, attempting to predict the process to reduce latency. So far, Hammerspace seems to have the best solution.

5 次回应

Adam de Delva

Founder: DTR | MSFT Alum, K8s, 4IR, Computational Consciousness | ???????

1 年

cc: Brian Kuebler

查看更多评论

要查看或添加评论，请登录

Keith Townsend的更多文章

Embracing AI as a Collaborative Partner: A Guide for Business Leaders

2025年3月6日

Embracing AI as a Collaborative Partner: A Guide for Business Leaders

Integrating Artificial Intelligence (AI) into core business processes isn’t about replacing human expertise but…

2 条评论
IBM’s Acquisition of HashiCorp: A Synergy of Potential and Challenges

2025年3月3日

IBM’s Acquisition of HashiCorp: A Synergy of Potential and Challenges

IBM’s acquisition of HashiCorp marks a significant milestone in the tech industry. With this move, IBM can leverage…

4 条评论
What I think about Enterprise AI from an AI.

2025年2月18日

What I think about Enterprise AI from an AI.

This is how XAi's Grok summarizes my thoughts on AI. What do you think, does it represent what I've been discussing for…

4 条评论
Why AI Initiatives Often Stall: Lessons from the Cloud-First Debate

2025年1月15日

Why AI Initiatives Often Stall: Lessons from the Cloud-First Debate

CIOs and CTOs are treading cautiously when adopting new AI tools, and it’s not hard to see why. Despite the buzz around…

13 条评论
AI is not an evolution. It’s a revolution.

2024年12月17日

AI is not an evolution. It’s a revolution.

We’re entering a systemic change—one we haven’t seen since the Industrial Revolution. AI is compressing the workday…

3 条评论
Taking Private Cloud to the Next Level

2023年10月27日

Taking Private Cloud to the Next Level

Theoretically, you can automate the private data center to a point where it's indistinguishable from the public cloud…

2 条评论
Covid-19 & IT Supply Chain

2020年4月3日

Covid-19 & IT Supply Chain

I'm getting consistent inquiries about mitigating the supply chain risk to IT associated with Covid-19. As smaller…
Getting Better Partners

2020年3月30日

Getting Better Partners

First, for those of you dealing with health and financial stress, my prayers are with you and your family. I’ve heard…

5 条评论
Should You Backup Cloud Workloads?

2019年9月10日

Should You Backup Cloud Workloads?

If there were such a thing as Tier 5 Data Centers, AWS, Azure, and Google Cloud, all would qualify. When AWS CTO Werner…

10 条评论
Webinar: Dealing with Multi-Cloud

2019年3月14日

Webinar: Dealing with Multi-Cloud

Multi-cloud is a thing. Whether planned or via organic creep from services such as Salesforce and Workday, every…

See all articles

Defeating Data Gravity? - Hammerspace

Keith Townsend

Chief Technology Advisor - The Futurum Group

领英推荐

A solution to Data Gravity and complexity?

Keith Townsend的更多文章

社区洞察

其他会员也浏览了

How to Choose the Right Machine Learning Model for Your Data

Transforming Data Storage with DNA Innovations

The Vital Role of Data Science in Today’s World

Will Data Science Finally Crack the Stock Market?

The need of ensembling

Day 7: k-Nearest Neighbors (k-NN)

xLSTM-Mixer model for improving forecasting

Revolutionizing Data Efficiency: Our Breakthrough in High-Performance Data Compression

Data on the Brain

Column Transformer in Machine Learning - Part 11

领英推荐

A solution to Data Gravity and complexity?

Keith Townsend的更多文章

Embracing AI as a Collaborative Partner: A Guide for Business Leaders

IBM’s Acquisition of HashiCorp: A Synergy of Potential and Challenges

What I think about Enterprise AI from an AI.

Why AI Initiatives Often Stall: Lessons from the Cloud-First Debate

AI is not an evolution. It’s a revolution.

Taking Private Cloud to the Next Level

Covid-19 & IT Supply Chain

Getting Better Partners

Should You Backup Cloud Workloads?

Webinar: Dealing with Multi-Cloud

社区洞察

其他会员也浏览了

How to Choose the Right Machine Learning Model for Your Data

Transforming Data Storage with DNA Innovations

The Vital Role of Data Science in Today’s World

Will Data Science Finally Crack the Stock Market?

The need of ensembling

Day 7: k-Nearest Neighbors (k-NN)

xLSTM-Mixer model for improving forecasting

Revolutionizing Data Efficiency: Our Breakthrough in High-Performance Data Compression

Data on the Brain

Column Transformer in Machine Learning - Part 11