iterative.ai的动态

查看iterative.ai的组织主页

7,623 位关注者

Under the hood DataChain combines power of warehouses with distributed clusters with proper data access patterns to process millions of video, images audio files: ?? Never copy data. Store references to files instead. (while still preserving versioning, data loading, efficient processing) ? Use warehouses under the hood (e.g. ClickHouse) to store metadata and perform as many operations inside it (e.g. filters). ?? Distributed compute that runs close to the data to compute Python-based UDFs ?? Data access. Pre-fetch, batching, caching, streaming - different workloads require different ways of using data. #unstructured #datachain #dvc #machinelearning #opensource

Try it here https://datachain.ai/ or less scalable open source version here https://github.com/iterative/datachain

回复
Cameron Price

Founder | Senior Data Executive | 30 Years of Leadership in Data Strategy & Innovation | Executive Director | Sales Executive | Mentor | Strategy | Analytics | AI | Gen AI | Transformation | ESG

2 个月

It's impressive how DataChain optimizes unstructured data processing. Could you share more about how the distributed compute works next to the data?

回复
查看更多评论

要查看或添加评论,请登录