登录查看更多内容

ML: Examining the Test Set

Dr. Robert McKeon Aloe

Generative AI Safety / Red Teaming at Apple

发布日期: 2019年5月13日

I recently saw a post where someone said “Never touch your test set.” The theory was that you (as the algorithm designer) are part of the training algorithm so by looking at your test set rather than final performance, you are contaminating your test set. While that may work academically, it doesn’t work to ship an Machine Learning (ML) customer experience because it doesn’t allow you to do proper failure analysis, ignores the real world test set feedback, and doesn’t allow you to clean your test set.

Failure Analysis

When I started on Face ID, the policy was that only QA could look at the test set. They were following this idea of separation of training and testing, but this idea started to break down as we algorithm engineers needed to look at the data ourselves.

First, it broke down when QA explained where the algorithm had failed on the test set. Then someone had to fix the issue which generally required looking at the test set or collecting more data. Data collection can be a laborious process at times, and tons of data was being collected at the time to cover all the bases. So it was easier and faster to examine the failure cases on hand.

Soon enough, the policy changed because it didn’t make sense anymore. We were out of the academic world and into the real world of trying to ship products. This sped up the development cycle and allowed us to ship the biggest ML feature at the time to a hand held device.

Real World Feedback

If you don’t want to touch your test data, then don’t collect any customer feedback or logs. Don’t read any reviews of your feature. That could potentially impact how you train your algorithm.

The reality is that all ML features have training, validation, and testing data sets that are collected in more controlled environments, but then you have actual data from the field. One can have a few levels of this information including all the images, meta data only, statistical information over large customer bases, and/or internal/external surveys.

Each piece of this feedback informs how to better train the algorithm through policy or more likely, data collection and retraining. If there is a survey issue, you can reproduce it. If there is meta data indicating an issue, you can dig back into older user studies or commission new ones. Whatever the case, all this data is part of the testing of the algorithm generalizing to the whole world.

Cleaning Data

If you don’t look at your test set, how do you find out the cases that are mislabeled? I don’t think the intent of the post was to say don’t clean your dataset, but I prefer to make absolutely certain that labeling and data cleaning have to occur for all your data. That is how one has confidence in the performance numbers.

Don’t train on your test set

I wouldn’t say don’t touch your test data; you should look at your test data unless you want your algorithm to suck in the real world. Just don’t train on it.

---------------------

If you like, follow me on Twitter and YouTube where I post videos espresso shots on different machines and espresso related stuff. You can also find me on LinkedIn.

Reflections on Professional Character

396 位关注者

Judy Dayhoff

Ph.D., Mathematical Biophysics / Deep Learning, Neural Networks, Artificial Intelligence, Machine Learning, Data Science

5 年

But if you made decisions about your model based on your first test set's performance, then you would, ideally, need a new test set for the next changes in the model.

2 次回应

Blaine Bateman, EAF

Chief Data Scientist at EAF LLC

5 年

The admonition is regarding using test set performance as a measure of performance on unseen data. Otherwise there’s no reason not to consider it. But keep in mind that you will need a NEW test set to evaluate any updated method on unseen data.

查看更多评论

要查看或添加评论，请登录

Dr. Robert McKeon Aloe的更多文章

Ph.D. Interviews

2019年7月30日

Ph.D. Interviews

I have interviewed mostly Ph.D.
How to break into Data Science the easy way

2019年7月16日

How to break into Data Science the easy way

Scratch that; there’s not an easy way. Data science has become a hot topic the past few years along side machine…

5 条评论
Privacy in Machine Learning: PII

2019年4月24日

Privacy in Machine Learning: PII

Privacy is not a value explicitly written into the US Constitution, but the essentials are there. As a democratic…

1 条评论
Mastering LinkedIn

2019年3月27日

Mastering LinkedIn

Account Creation I never had a LinkedIn account until I was searching for a job, and then I only paid attention to it…

1 条评论
Withdrawing a Conference Paper

2019年3月14日

Withdrawing a Conference Paper

In graduate school, I tried all sorts of optimizations aimed at making my face matcher work better and faster. I found…

1 条评论
Thoughts on Leaving

2019年2月26日

Thoughts on Leaving

Relax, I’m not leaving my current job right now. I’ve been writing about many different aspects of my work experience…
Crashing the Student Computer Lab

2019年2月6日

Crashing the Student Computer Lab

In my last year of graduate school at Notre Dame, I used over 1,000,000 computer hours or just over 114 years of…

3 条评论
Presentation Essentials

2019年1月23日

Presentation Essentials

I have fallen asleep in my fair share of presentations, and I’ve worked hard at making sure my presentations are not…
Design of Experiment: Data Collection

2019年1月9日

Design of Experiment: Data Collection

Anyone can collect data; some people can collect good data. The key theme to any good data collection is data…
Preserving LinkedIn for Professionalism

2019年1月2日

Preserving LinkedIn for Professionalism

I recently saw a discussion on LinkedIn about LinkedIn possibly becoming more like Facebook and how that was…

See all articles

ML: Examining the Test Set

Dr. Robert McKeon Aloe

Generative AI Safety / Red Teaming at Apple

Failure Analysis

Real World Feedback

Cleaning Data

Reflections on Professional Character

396 位关注者

Dr. Robert McKeon Aloe的更多文章

社区洞察

其他会员也浏览了

Machine learning and macro trading strategies

The Role of Machine Learning in Functional Coverage Optimization

Chapter 1: Meet The Buzzwords

The Role of Machine Learning in Functional Coverage Optimization

The Three Pillars of Robust Machine Learning: Specification Testing, Robust Training and Formal Verification

How Machines Learn (and Why It Matters)

Unleashing the Power of Feature Engineering and Selection in Machine Learning: A Comprehensive Guide

Navigating the Machine Learning Development Life Cycle: A Comprehensive Guide

Machine Learning Made Easy: Learn the Lingo in Minutes

Simplistic Models, Complex Problems

Failure Analysis

Real World Feedback

Cleaning Data

Reflections on Professional Character

396 位关注者

Dr. Robert McKeon Aloe的更多文章

Ph.D. Interviews

How to break into Data Science the easy way

Privacy in Machine Learning: PII

Mastering LinkedIn

Withdrawing a Conference Paper

Thoughts on Leaving

Crashing the Student Computer Lab

Presentation Essentials

Design of Experiment: Data Collection

Preserving LinkedIn for Professionalism

社区洞察

其他会员也浏览了

Machine learning and macro trading strategies

The Role of Machine Learning in Functional Coverage Optimization

Chapter 1: Meet The Buzzwords

The Role of Machine Learning in Functional Coverage Optimization

The Three Pillars of Robust Machine Learning: Specification Testing, Robust Training and Formal Verification

How Machines Learn (and Why It Matters)

Unleashing the Power of Feature Engineering and Selection in Machine Learning: A Comprehensive Guide

Navigating the Machine Learning Development Life Cycle: A Comprehensive Guide

Machine Learning Made Easy: Learn the Lingo in Minutes

Simplistic Models, Complex Problems