DVC Community February Updates!
Compilation of images made with Midjourney

DVC Community February Updates!

???? Hi friend! Welcome to February’s wrap-up and Happy Leap Year! Here's what's new in the last month!

?The Latest

?? New Videos

? ? ? Tutorials/Conference Talks

? ? ? ?Product Updates ? ? ? ?

None this month but stay tuned for some exciting ones for next month!?

?? What we're looking at


Customer Success Engineer, Tibor Mach pointed out the headless member of the public in the now-famous Sora

"Paying down technical debt is not always as exciting as proving a new theorem, but it is a critical part of consistently strong innovation. And developing holistic, elegant solutions for complex machine learning systems is deeply rewarding work." - D. Sculley, et. al. ?

??What's Coming!

Next Meetups

  • If you missed this morning's meetup?- Here's the link to Ryan Turner 's presentation of Computer Vision Annotation & Preparation Using DVCx.
  • March 13th Meetup on Regulation and the Importance of Reproducibility and Standardization in AI/ML.? Info to come!

????What Can You Build?

We are starting a new feature in the Newsletter to highlight some cool projects that we’ve found from the Community. We searched dvc.yaml files on GitHub and ranked them by how many stages the dvc.yaml file has.

This month’s highlight is a project from Eamon O’Dea who built a 457-stage pipeline to create a COVID-19 forecaster! You can find the project here .

?? Community-Generated Content

We have quite a lot of great content this month from the Community!? Special thanks to Gift Ojeabulu for digesting and consolidating all of this amazingness for you! ???

  • Automate model development (Part 1) : In this article, Alex Harsha describes how Mission Lane uses DVC, to achieve end-to-end reproducibility in their model development workflow, he argues for automating the entire process, from data extraction to evaluation, to improve efficiency, reproducibility, and collaboration as traditional model development workflows, often relying on Jupyter notebooks, become cumbersome and error-prone as projects progress. Overall, the article highlights DVC as a valuable tool for streamlining model development processes, promoting reproducibility, and encouraging efficient collaboration within data science teams.
  • Keep Track of Your Backtests with DVC’s Experiment Tracking : This is the 4th series of the article by Eryk Lewinson where he started with, how to utilize DVC’s new extension to run and evaluate experiments to enhance the previously introduced experimentation workflow by monitoring the model performance and evaluating experiments with interactive plots, all within VS Code. In this series, he showed how to improve upon that approach by tracking the backtests with DVC. He explains how, by adopting DVC, data scientists can streamline their time series forecasting workflow, ensure experiment reproducibility, and effectively collaborate on model development.
  • MLOps: From Jupyter to Production : In the video, Pablo Tomas Fernandez uses DVC to track the data and code used to train a convolutional neural network that can recognize cats and dogs. He also shows how to use DVC to create a machine learning pipeline that can be used to train, test, and deploy the model. A written version of this video was also created.
  • Managing the Machine Learning Trifecta: The story of version control in ML : In this article, Himanshu uses an engaging story to explain the challenges of managing data science projects in traditional workflows like Jupyter Notebooks. He highlights the importance of version control for code, data, and models, but emphasizes the limitations of Git for handling large files like datasets and models. He introduced DVC as a "superhero tool" that solves these problems by seamlessly integrating with Git for code version control while offering version control specifically for data and models.
  • DVC & Airflow in End-to-End Project : In this comprehensive tutorial by Savita you will learn how DVC, integrated with Airflow, helps manage data versioning and orchestrate tasks throughout your MLOps project, enabling data tracking, reproducibility, collaboration, and robust pipeline building.
  • 2023 highlights of computer vision progress : Rustem Glue , An AI/ML Engineer at Bayanat with over 7 years of experience in software engineering and machine learning mentions DVC pipelines as a key tool for reproducible data workflows in this article. The author highlights the following strengths of DVC: caching intermediate results, tracking dependencies and outputs, and building DAGs for data processing pipelines. In his words, DVC? pipelines have been instrumental in establishing reproducible data workflows making them a cornerstone of his data handling practices. He also added his past article as a reference on how to automate data preparation with DVC.
  • End-to-end MLOps CICD Pipeline Using AWS EC2 and Pytorch : This is an in-depth video that showcases the power of DVC in streamlining MLOps workflows where the speaker demonstrates how to leverage DVC alongside AWS EC2 and PyTorch to Set up a comprehensive MLOps project, Establish a DVC repository for seamless data and code versioning, and Implement a robust CI/CD pipeline for efficient model deployment to AWS. By utilizing DVC for version control, the speaker ensures reproducible experiments and effortless model deployment, paving the way for reliable and efficient MLOps practices.
  • Leveraging Git for ML Experiment Management | PyData Global 2023 : In this practical presentation, Eryk Lewinson explains DVC as a valuable tool for data scientists to streamline their workflow, ensure experiment reproducibility, and improve collaboration during the machine learning development process.
  • From Code to Cloud: Building and Deploying a Wine Quality Predictor with MLOps, DVC, GitHub Actions, MLflow, and Heroku : In this article, Abhijeetas walks us through the different stages of building a Wine Quality Predictor where DVC was used to ensure data version control, enabling features like reproducibility, and comparison of different data versions throughout the development process. In his words, “Utilizing Data Version Control (DVC) facilitated effective versioning and management of both data and machine learning models, enhancing traceability and reproducibility.” ?

Thanks for the read!? We'll see you next month! ?

To your continued success,

Jeny

Community Manager, DVC.ai


要查看或添加评论,请登录