lakeFS

软件开发

Git for Data - Scalable Data Version Control

关注

查看全部 29 位员工

关于我们

Simplifying the lives of engineers, data scientists and analysts who are transforming the world with data. Treeverse, the company behind lakeFS, is a team of passionate data enthusiasts who love all things open source and aim to find creative solutions to big problems.

网站: https://docs.lakefs.io/
lakeFS的外部链接
所属行业: 软件开发
规模: 11-50 人
总部: Santa Monica, California
类型: 私人持股
创立: 2020

产品

lakeFS

数据管理平台 (DMP)

lakeFS is an open source tool that provides a Git-like experience for data lakes, enabling data teams to run parallel pipelines for experimentation and CI/CD for data.

地点

主要

California St

US，Santa Monica, California，95008

获取路线

lakeFS员工

查看全部员工

动态

lakeFS

5,210 位关注者
4 天前
举报此动态
Can you fit a square peg into a round hole? ???? If you answered yes, this article is for you! ?? Sam Austin dives deep into the art of achieving CI/CD for Machine Learning — tackling the unique challenges of managing code, data, and models in harmony. ?? From handling data drift to automating training pipelines, this piece is packed with insights to inspire your ML CI/CD strategy. ?? Read the full article on Medium https://lnkd.in/d5Vzc8Vi #machinelearning #cicd #dataengineering #mlops

Continuous Integration and Continuous Deployment (CI/CD) for Machine Learning

medium.com

赞评论分享
lakeFS

5,210 位关注者
1 周
举报此动态
???????? ???? ?? ???????? ???????????????????? ??????????????????? A #datagovernance framework establishes a standardized set of rules and practices for data collection, storage, and utilization. It ensures that your policies, regulations, and definitions are applied to all data in your organization. A good framework lets you offer trusted data to people in a variety of roles, including business leaders, data stewards, and developers. This type of framework also makes sure that data can be managed, transformed, and delivered across all application and analytics installations, both in the cloud and on-premises. This opens the door to implementing self-service solutions available to non-technical teams, helping them identify and access the data they want for data governance and analytics. Read on to learn more ?? https://lnkd.in/dPnpanfp #datacollection #data #dataversioncontrol

Data Governance Frameworks: Pillars, Examples & Benefits

https://lakefs.io

赞评论分享
lakeFS

5,210 位关注者
2 周
举报此动态
Be sure to register for this virtual talk presented by Amit Kesarwani at the MLOPs World and Generative AI World Summit. ??? Here's what Amit will demonstrate in this talk: ? Training a TensorFlow predictive model on data mounted using lakeFS Mount ? Integration with Git to version code and data together ? Reproducibility of code as well as data #mlops #datareproducibility #machinelearning #genai
赞评论分享
lakeFS

5,210 位关注者
3 周
举报此动态
?? What if isolating data could ???????????????? collaboration, speed up development, and even make troubleshooting easier? It's time to rethink the norm and embrace #dataisolation ?? Here’s why data isolation (with version control) matters: ?????????? ??????????????????, ???????? ???????????????? ?? Isolation means each team can work without disrupting others. No more accidental overwrites—just smooth, uninterrupted progress. ??????????, ???????????????? ?????????????? ?? With isolated datasets and version control, you can test at full speed, roll back if needed, and never have to wait on others’ changes. ???????????????????? ????????????, ???????????? ?? Data version control ensures every experiment stays contained, letting teams take creative risks without risking production or other teams' data. ?????????????????????? ?????????????????? ??? Isolation, paired with version control, helps track data changes, making it easier to pinpoint issues and debug faster. Data isolation is a game-changer for teams who need speed, safety, and seamless collaboration. #datacollaboration #dataexperimentation #datatesting #datadebugging https://lnkd.in/djDArDrk
赞评论分享
lakeFS

5,210 位关注者
1 个月
举报此动态
In the world of software development, a ???????? ?????????????? is a mechanism for proposing changes to a codebase. When a developer makes changes in a separate branch, they can create a Pull Request to ask their peers (or other project maintainers) to review and merge those changes into the main branch. During the process, reviewers can leave comments, suggest improvements, and even run tests to ensure everything works as expected. This is invaluable because it creates a structured process for change: discussions happen in the open, reviews are documented, and changes are only merged when they’re fully approved. This ensures that every change is scrutinized, improving quality and fostering collaboration. By adding Pull Requests to data workflows, lakeFS offers the same benefits seen in software development: ?? ?????????????????????????? - Multiple team members can collaborate on data changes, review each other’s work, and leave feedback ?? ???????????????????????? - Every change is visible and documented, providing a clear audit trail ?? ?????????????? ?????????????? - Changes are reviewed and tested before they are merged, reducing the risk of introducing errors into the data Watch how to create a pull request in lakeFS and read more (link to post in comments) #pullrequests #dataversioning #dataquality

1 条评论

赞评论分享
lakeFS

5,210 位关注者
1 个月
举报此动态
2 weeks to go until Open Data Science Conference (ODSC) WEST kicks off. Will we see you there? ?? Be sure to check out Einat Orr's track: ?Don’t Go Over the Deep End: Building an Effective OSS Management Layer for Your Data Lake! In this talk, Einat will explore fundamental challenges, focusing on different needs of structured vs unstructured data where each requires its own distinct approach. She’ll dispel some of the chaos and cover key components of a robust data lake management architecture, including open table formats, catalogs and data version control systems. You’ll learn how they contribute to an organized data lake environment, helping you avoid feeling like you’re constantly treading water. ? Make sure to attend the talk, meet the team at?Booth #26?and grab some spooky swag! ???????? https://lnkd.in/dPv-Mhcu
1 条评论

赞评论分享
lakeFS

5,210 位关注者
1 个月
举报此动态
?? ???????? ???? ???????? ???????? ???????????? ????????’?? ???? ???????????????? ???? ?????? ??????????? ?? Many #filesystems make bold promises about handling complex datasets—but the reality? They can struggle with versioning, tracking, and scalability, especially in cloud environments. ?? ?????? ?????????????? ???????? ???????????? ???????? ??????????????: 1. Traditional file systems weren’t designed for the complexities of modern #datalakes. 2. They rely on manual workarounds for critical tasks like version control. 3. Gaps emerge when scaling data across distributed cloud architectures. ?? ?????????? lakeFS: 1. Provides Git-like capabilities for your file system, making #dataversioning seamless. 2. Automatically tracks every change and allows you to roll back without breaking a sweat. 3. Scales effortlessly with your cloud setup, ensuring performance even with massive datasets. ?? ?????? ????????????? You get full control over your data, simplified versioning, and peace of mind that your system can scale without bottlenecks. Read more about how lakeFS redefines file representation: https://lnkd.in/eFKM_avW?

Guide To The lakeFS File Representation

https://lakefs.io

赞评论分享
lakeFS

5,210 位关注者
1 个月
举报此动态
?? Ever wonder how #datapipelines keep your data flowing seamlessly? Here’s a breakdown of the ?? ?????? ???????????? that keep your data ready for analysis: 1?? ????????????????????: Data is gathered from diverse sources like databases, devices, and applications. 2?? ??????????????????: The collected data is loaded and organized within systems, making it ready for storage. 3?? ??????????????: The organized data is securely housed in data warehouses, lakes, or other systems. 4?? ??????????????????????: Data is cleaned, formatted, and transformed to meet company standards. 5?? ??????????????????????: The processed data is available for analysis, visualizations, and business applications. lakeFS introduces a Write-Audit-Publish model, ensuring #datavalidation throughout the pipeline. Its powerful "hooks" functionality automates checks and validation at critical points—such as during data writes and before publishing to production—guaranteeing the integrity and reliability of data versions. With hooks, #dataquality is continuously monitored, flagging any issues before they affect downstream workflows.
1 条评论

赞评论分享
lakeFS

5,210 位关注者
1 个月
举报此动态
??????Tired of ETL testing headaches? These 4 essential steps will set you on the path to hassle-free testing! 1?? ???????????????????? ?????? ????????: Manual testing can be slow and error-prone; automating #ETLtesting is key to faster, more reliable data workflows. 2?? ?????? ???? ?????? ??????????????????: Discover the right tools and methods to create robust automated tests for your #datapipelines. 3?? ?????????????????? ??????????????: Integrate testing into your #ETLworkflows for continuous validation and peace of mind. 4?? ?????? ?????????????? ??????????: Dive into the step-by-step guide on setting up automated ETL tests and watch your #productivity soar ?? https://lnkd.in/dzcNK7qY
6 条评论

赞评论分享
lakeFS

5,210 位关注者
1 个月
举报此动态
TL;DR: RAG (Retrieval-Augmented Generation) is transforming how we handle unstructured data with machine learning—especially in AI and LLMs. Using RAG as a service is a best practice for organizations. It frees users from difficult navigation and lets them focus on specific application requirements. This approach simplifies the end-user application development process, making it scalable and less complex to manage. Key components of RAG as a Service: ???????? ???????????? ????????????: Seamlessly integrate and retrieve structured and unstructured data from various sources. ?????????????????? ????????????: Use embeddings to capture the context and relevance of data, optimizing retrieval. ???????????? ????????????: Store embeddings efficiently for fast, precise queries. ???????????????????? ????????????: Employ advanced LLMs to generate relevant content based on retrieved data. ???????? ????????????????????: Maintain a robust system to ensure accurate, reliable data retrieval with version control. Although RAG provides a less difficult and resource-intensive upgrade for LLM answers than fine-tuning or other strategies, its implementation calls for MLOps knowledge, which combines data engineering, ML engineering, and application engineering. Companies experimenting with GenAI must additionally consider security around data access, privacy, scale, and price-performance when measuring the commercial value. #rag #llm #mlops #dataengineering #mlengineering Read the full post here https://lnkd.in/dUNQ7pKr
赞评论分享

相似主页

查看职位

融资

lakeFS 共 1 轮

上一轮

A 轮 2021年8月28日

US$23,000,000.00

投资者

Dell Technologies Capital Norwest Venture Partners +1 其他投资者

在 Crunchbase 上查看更多信息

查看关于lakeFS的洞察

lakeFS

软件开发

Git for Data - Scalable Data Version Control

关于我们

产品

lakeFS

数据管理平台 (DMP)

地点

lakeFS员工

Dror Nahumi

Peter Guagenti

Go-to-Market Leader, Entrepreneur, and Advisor focused on AI & the future of data.

Amit Kesarwani

Presales Executive and Entrepreneur - AI/ML, Analytics, Big Data and Cloud Computing

Iddo Avneri

VP, Sales & Customer Success

动态

立即加入，查看您错过的职场动态

相似主页

Decodable

Similarweb

dbt Labs

Qwak (Acquired by JFrog)

Tabnine

Earnix

XetHub

TREEVERS

Snowflake

Monte Carlo

查看职位

社区管理员职位

工程师职位

分析师职位

经理职位

高管职位

项目分析师职位

音乐老师职位

数据协调员职位

科学家职位

市场营销经理职位

项目经理职位

客户专员职位

专员职位

作家职位

数据分析员职位

销售客户专员职位

助理职位

总监职位

融资