Teaching the first Data-centric AI course at MIT
Teaching "What is data-centric AI" in MIT lecture hall 6-120.

Teaching the first Data-centric AI course at MIT

A follow-up to this post.

Today I co-taught and launched the first course on Data-centric AI at MIT (free public access) with lecturers from MIT and Stanford.

My dad was a mailman. My mom works in a call center. We grew up with less than we needed in a rural area of Kentucky. The opportunity to teach a first-of-its-kind artificial intelligence course at MIT with a team of brilliant lecturers is a self-reminder that our stories do not stop at our latest achievement.

Our backgrounds are motivation, not limitation. I felt disempowered by lack of opportunity and spent over a decade understanding how artificially intelligent systems can empower people. 4 steps I took:

  1. I spent 8 years at MIT inventing theory and algorithms to make AI systems be able to train on real-world human data.
  2. Along the way, I helped companies like Google and Amazon train better AI assistants for millions of people globally.
  3. After my PhD, I co-founded Cleanlab to bring these data-centric AI technologies to masses through two products: the data-centric AI open-source package and Cleanlab Studio, the enterprise application that automatically improves ML models by improving the underlying labeled data its trained on (requires no code and no machine learning expertise).
  4. Today, Cleanlab launched the first in-person "Introduction to Data-centric AI" course taught at MIT so that others can learn how these methods work and use them to improve their own AI systems and machine learning models.


No alt text provided for this image

---

Why we taught the Introduction to Data-centric AI class:

MIT and other universities have many courses on machine learning (6.036, 6.867, etc.). Those classes teach techniques to produce effective models for a given dataset and emphasize mathematical details of models over practical applications. However, in real-world AI applications, the dataset is not fixed and improving the data often gives better results than improving the model. We’ve personally seen this time and time across at hundreds of companies using machine learning, as well as in our research. This is also the reason we founded https://cleanlab.ai.

Data-Centric AI (DCAI) is an emerging science that studies techniques to improve AI systems by improving the underlying datasets they are trained on in a systematic/algorithmic way — given that this topic wasn’t covered yet in a standard curriculum course, we launched the new class! This intensive 2-week course was taught over MIT’s IAP January term. All the course material, including lecture videos, lecture notes, hands-on lab assignments, and lab solutions is freely available to the public.

---

Topics covered include:

  • Data-Centric AI vs. Model-Centric AI
  • Label Errors
  • Dataset Creation and Curation
  • Data-centric Evaluation of ML Models
  • Class Imbalance
  • Outliers, and Distribution Shift
  • Growing or Compressing Datasets
  • Interpretability in Data-Centric ML
  • Encoding Human Priors: Data Augmentation and Prompt Engineerin
  • Data Privacy and Security

---

Thank you to my incredible co-founders Anish Athalye and Jonas Mueller who did more of the work on this course than I did, with Anish spear-heading the majority of the effort. I am grateful to have friends I admire to embark on life's journeys with. I learn something new from them everyday.

---

Resources:

Victor Kovalets

PhD Researcher | UCL | Southampton Uni | Nonprofit Founder Helping Disadvantaged Students Access Education | LSE Alumni Association | Edtech Founder

4 个月

Thanks for sharing, Curtis!

回复

transcribethis.io AI fixes this First data-centric AI course launched.

回复
??Kasey Evans

Founder & Managing Partner @ Lane VC

2 年

Blazing AI trails Curtis!

Ryan Ward

Confinity — Connecting humanity & safeguarding the world's heritage for generations ??

2 年

Inspiring ?? congrats on all your success!

Guo Zhang

Tech founder and tech researcher. MIT PhD and Tsinghua Alumni

2 年

Cool Man~

要查看或添加评论,请登录

Curtis Northcutt的更多文章

社区洞察

其他会员也浏览了