
treeverse/dvc
🔧 Tooltreeverse
Open-source tool for versioning data and ML experiments.
DVC (Data Version Control) is a free, open-source tool designed to manage machine learning projects. It works on top of Git repositories, allowing users to version large datasets and models by storing metadata and pointers in Git while the actual data resides in remote storage (S3, GCS, etc.). DVC supports lightweight pipelines, experiment tracking with metrics plotting, and seamless integration with cloud storage. Key features include: data versioning with content-addressable storage, pipeline stages as DAGs, metrics and plots for experiment comparison, and easy sharing via Git. DVC is Python-based and works with any programming language.
💡Highlights
- ├─Git-like version control for data & models
- ├─Lightweight pipelines with DAG support
- └─Track experiments & compare metrics
🎯For
- ├─Data scientists
- ├─ML engineers
- └─Research teams