Missed the awesome AIQCon by MLOps Community & Kolena? Here are my notes:
??Themes ?
? Data quality and enabling data quality through tooling and thoughtful platform design
? The relationship of data quality to the success of generative AI projects
? The real, open-problems left to be solved with regards to classical ML and computer vision
? The challenges of getting more data: either through human labeling and annotation and synthetic data.
???Favorite Talks ??
???12pm - EIGHTY-THOUSAND POUND ROBOTS: AI DEVELOPMENT & DEPLOYMENT AT KODIAK SPEED
?? I really enjoyed this case study presented by Kodiak’s Collin Otis — it can be rare to:
? Get a transparent look at the data → performance measurement tree, especially of autonomous vehicle companies;
? Not only get KPIs but also see what kind of automation and CI/CD practices are being implemented to support the data quality and automation processes;
? Wrapped up in a nicely packaged presentation with self-explanatory slides.
????12:50pm - OVERCOMING BIAS IN COMPUTER VISION AND VOICE RECOGNITION (w/ Skip Everling, Rajpreet Thethy, Doug Aley, Peter Kant)
This is the talk I ended up taking the most notes on, especially because of the discussion around the role of high-quality labeling and annotations; the frustration of working with data labeling companies that are missing either services, best-of-breed tooling, or pricing; and the real scenarios where data quality (specifically a lack of the right data) could result in life or death scenarios (not to mention worsening wealth inequality in the US).
???1:55pm - GENERATING THE INVISIBLE: CAPTURING AND GENERATING EDGE-CASES IN AUTONOMOUS DRIVING (w/ Felix Heide)
? This was a really well-done deep-dive on how to use simulations to tackle the long-tail data challenges of autonomous vehicles: when you have a small sample of actual accidents, what needs to happen in order to train models to respond in critical scenarios?
Coming out of CVPR and a number of conferences and events where synthetic data has started coming up again as an effective tool where counterfactual data doesn’t exist and labeling isn’t an option.
I also enjoyed the thoughtful question by Jeremy Welland, Ph.D. on the use of near miss data as well as the insight by Feliz on the use of traffic and CHP reports.
?????2:55pm - PANEL: DATA QUALITY = QUALITY AI (w/ Sam Partee, Chad Sanderson, Joe Reis ??, Maria Zhang, Pushkar Garg)
And of course, this panel of OG’s (as well as friends) on data quality and why we keep coming back to data quality, regardless of the level of “AI” the industry is hyping on.
?????Data, ML, and MLOps Fam
One of the challenges of great events is sometimes you need to choose between attending a talk or catching up with the fam — it was exciting to catch-up (& meet IRL) with folks like: Wen Yang, Sadie St. Lawrence, Fanny Chow, Chandana Srinivasa, Christos Magganas, Stefan Krawczyk, Mihail Eric, David Scharbach, Faraz Thambi, Delia Lazarescu