Resisting the urge to overfit

Akshat Verma

Product and engineering leader, Entrepreneur, Educator, Advisor

å‘å¸ƒæ—¥æœŸ: 2022å¹´12æœˆ3æ—¥

I drive back via NH8 from Gurgaon to Delhi on a daily basis. For folks familiar with the drive, the entry to Delhi opposite Ambience Mall is fairly congested. As a commuter, you can either take the service lane or the highway and you can use google maps for the same. The problem with google maps is that it is often wrong during congestion and can't be followed blindly. For the first few days during the drive, I did what many CS folks do - took both the paths on alternate days and observed which is faster. Once the exploration phase was over, I now stick to the service lane before Ambience mall and then cut into the highway just at the entry point. I also make it a point to observe if the overall behavior is drifting away from my original hypothesis. However, there are days, when I get this urge to deviate from the "optimal" strategy. It could be a little bit of extra congestion at the entry to the highway or something else; signals that are not strongly correlated with the drive through time but do have a weak correlation.

I am generally considered as a very logical person but even I find it very difficult to resist this urge; adding one additional attribute to the model; solve that one extra false negative.

In my experience, there are two distinct phases of building any data science model. The first phase or the lab phase is one where you collect the data to build your model. In this phase, you go out and talk to everyone under the sun trying to get as representative a data as possible. Once you have the data, you disappear in a lab and come out only after you have a model and all sorts of metrics - precision, recall, F1, confusion matrix, in hand. You have a smug smile on your face as throw these metrics to your business stakeholders and want to get to the real world "deployment phase".

Of course, no one on the business side understands any of these metrics. So, they do what they understand. They call someone in the field and ask them to send you 10 data points from the field. You run the data through the model and voila; the model is correct 9 out of 10 times. You claim success, again; see the precision is 90%.

The business exec however looks glumly at the one sample that you missed. Can we solve for that? Even better, they already have an answer - if you just put a condition on attribute X > Y, you get what you need. That is what you need to change in your algorithm. It may not even matter if you were using a deep learning model that already included X (and there is no "algorithm" to change) or a stochastic decision tree that leads to higher accuracy for different thresholds than the Y that works in this false negative. This negative sample is all that matters at that point in the discussion.

This is where it is absolutely important to hold ground. It is important to get the discussion back on the data set and the metric you are targeting. If the data set was not representative, then it needs to get fixed. If the precision of 80% is not good enough, we need to take it up to 90%. However, under no circumstances, can you guarantee that it will work with that one negative sample. The only goal that you can take back is to have a high precision (or whatever metric makes sense for the business) on a real world test set.

Often enough, I have seen data scientists/engineers come back and tweak the model to somehow fit that particular sample. This is sometimes done by choosing model parameters that will make that sample work; at other times by including that sample in the training set and picking a deeper network that will ensure a higher match for the training set. It is much simpler to just change enough in your model to get that offending data set out of the way (at a cost of over-fitting and lower precision in the real world) than to convince others that they need to change the way they think; look at aggregates and not individual samples for assessing if a model is working.

This urge to overfit is common to us all. But, take a deep breath and let that one sample fail. Never forget that a model that always works in training does not learn; it only memorizes.

Zafar Ahmed Ansari

Engineer ? Philosopher ! Writer ?

2 å¹´

It all goes back to the first principles. Do the right thing, period.

èµž

å›žå¤

è¦æŸ¥çœ‹æˆ–æ·»åŠ è¯„è®ºï¼Œè¯·ç™»å½•

Akshat Vermaçš„æ›´å¤šæ–‡ç«

The MVP conundrum

2023å¹´11æœˆ13æ—¥

The MVP conundrum

What should be the scope of our MVP? This is probably one of the topics that is the most debated, while building aâ€¦

2 æ¡è¯„è®º
Layoffs, hiring and staying true to yourself

2023å¹´1æœˆ22æ—¥

Layoffs, hiring and staying true to yourself

This has been the season of layoffs - the worst we have seen in a while. A network like Linkedin has been useful asâ€¦

6 æ¡è¯„è®º
Can you be an excellent architect without product management skills?

2023å¹´1æœˆ6æ—¥

Can you be an excellent architect without product management skills?

One very welcome change I have seen in India is the rising trend of engineers aiming to stay technical for a longerâ€¦

3 æ¡è¯„è®º
What a head-of-engineering is meant to do?

2022å¹´12æœˆ17æ—¥

What a head-of-engineering is meant to do?

Recently, a person who was earlier part of my team and is transitioning to a head of engineering role asked what aâ€¦

4 æ¡è¯„è®º
Visual Maps - the key to solving complex problems

2022å¹´12æœˆ11æ—¥

Visual Maps - the key to solving complex problems

Visual learning has been in vogue the past few years - learning via images and videos. The logic for visual learning isâ€¦

1 æ¡è¯„è®º
The real cost of feature sprawl

2022å¹´11æœˆ27æ—¥

The real cost of feature sprawl

I was going through this interesting post from Ashish Kashyap on how productivity reduces when we have more people in aâ€¦

9 æ¡è¯„è®º
Why "First Principles" thinking is so hard

2022å¹´11æœˆ13æ—¥

Why "First Principles" thinking is so hard

"First principles" is quite in vogue these days. One can find a lot of articles that extol the virtues of "firstâ€¦

7 æ¡è¯„è®º
why your tech architecture needs to adapt to your team team

2022å¹´9æœˆ24æ—¥

why your tech architecture needs to adapt to your team team

Many many years back, I was taking an interview for a principal engineer position. As is the norm in all my seniorâ€¦

3 æ¡è¯„è®º
The fine line between solving a painpoint and being creepy

2022å¹´6æœˆ28æ—¥

The fine line between solving a painpoint and being creepy

Being a product manager is a hard job; being a product manager at a cross-sell company is even harder. A cross-sellâ€¦

4 æ¡è¯„è®º
When vendor lock-in makes sense

2022å¹´5æœˆ30æ—¥

When vendor lock-in makes sense

Flexibility - it is one of the most important aspects that we care about when evaluating technologies. Flexibility isâ€¦

1 æ¡è¯„è®º

See all articles

Akshat Vermaçš„æ›´å¤šæ–‡ç«

The MVP conundrum

Layoffs, hiring and staying true to yourself

Can you be an excellent architect without product management skills?

What a head-of-engineering is meant to do?

Visual Maps - the key to solving complex problems

The real cost of feature sprawl

Why "First Principles" thinking is so hard

why your tech architecture needs to adapt to your team team

The fine line between solving a painpoint and being creepy

When vendor lock-in makes sense

ç¤¾åŒºæ´žå¯Ÿ