
daochenzha/data-centric-AI
📦 Open Source Projectdaochenzha
A comprehensive, curated repository of essential resources for mastering data-centric AI and machine learning practices.
The data-centric AI movement represents a fundamental shift in machine learning, moving the focus from tweaking model architectures to systematically improving data quality. This repository provides a structured roadmap for this transition, covering critical areas such as data curation, data cleaning, data augmentation, and data valuation. It includes a wide array of academic papers, industry-standard tools, and practical guides that help developers identify and mitigate data-related bottlenecks. By emphasizing the importance of data engineering, the repository helps practitioners build more robust, reliable, and scalable AI systems. Whether you are dealing with noisy labels, imbalanced datasets, or the need for high-quality synthetic data, this collection offers the foundational knowledge and technical resources required to implement data-centric workflows effectively in real-world production environments.
💡Highlights
- ├─Curated list of data-centric papers
- ├─Covers data cleaning and augmentation
- └─Focuses on data-centric workflows
🎯For
- ├─Data Scientists
- └─Machine Learning Engineers