登录查看更多内容

Q1 answer

Ali Ghandi

Lead Data Scientist | Passionate about Solving Complex Problems| Machine Learning, Deep Learning, NLP, and Generative AI

发布日期: 2020年10月25日

Imagine you have a classification problem (with 2 class) in which you use logistic regression or any model that returns the probability of being in each class. As you may know, using a specific threshold you decide whether a sample belongs to this class or not. By default, we use 0.5 as a threshold in most implementations. Now it’s a question: how you find the best threshold in your classification problem?!

ROC curve shows True positive rate vs False positive rate. It shows that in different values of FPR what is the value of TPR. So you can imagine it actually changes the threshold for classification. So you can choose a threshold in which the ROC curve has saturate form there. By rewriting the threshold, you can change recall and precision. Sometimes you have some restrictions on your problem. Think about automatic punishment bills. You need to be precise instead of having a high recall. So plot recall and precision vs threshold can help you find better choices. So in your classification problems do not rely on the default threshold.

Alireza Hashemi

PhD Student in Physics | Data Science & Machine Learning

4 年

an important thing to note is not to touch this threshold at any cost. you can't pick a threshold from PR curve plot, which corresponds to your desired FP rate and just use it! In order to fine-tune a logistic regression classification, you should use class weights and plot FR rate vs class weight and choose the proper class weight.

1 次回应

Ehsan Yaghoubi

Mechanical/Energy engineering graduate

4 年

For sure there's going to be a trade-off between Precision vs. Recall (or in other words, minimizing false negatives vs. false positives) depending on the nature of the classifications, specifically for skewed classes, e.g. cancer diagnosis classifications. Accordingly, there's not a unique prescription as the absolute best solution. However, there is an interesting, practical approach mentioned by Prof. Andrew Ng in his online Machine Learning Course offered by Stanford university. Simply try a range of thresholds, and then pick whatever value of threshold which gives you the highest F1 score on your cross validation set. F1 score is commonly defined as follows: F1 = 2 * (P*R) / (P+R) Where, P and R stand for "Precision" and "Recall" respectively. P = (Num. True pos.)/(Num. True pos + Num. False pos) R = (Num. True pos.)/(Num. True pos + Num. False neg.) Hope it was useful for you :)

Kavan Alipanahi

Data Scientist | AI Specialist

4 年

That's great . Thanks for sharing this . Is there anywhere specific I can find these questions?

1 次回应

Hamed Pouriayevali

Fuel Cell Stack Engineer | Flexible Graphite Bipolar Plates | Research & Development

4 年

I assume using a ROC curve would be helpful; by choosing the right threshold value, considering whether we prioritize the elimination of False Negative or False Positive predictions, depending on the nature of the problem.

4 次回应

查看更多评论

要查看或添加评论，请登录

Ali Ghandi的更多文章

Q2

2020年10月27日

Q2

There are two famous regularization methods for regression. Lasso and Ridge regression.

1 条评论

Q1 answer

Ali Ghandi

Lead Data Scientist | Passionate about Solving Complex Problems| Machine Learning, Deep Learning, NLP, and Generative AI

Ali Ghandi的更多文章

社区洞察

其他会员也浏览了

Random Variables & Conditional Probability

Kendall's Tau; What a great method to use

Estimates are Probability Distributions

We modeled long memory with just one lag!

Balanced primes

Find Peak Element

On the Transient Queue with the Dropping Function

Calculation BER from Q

√2 as a fraction

Variadic Parameters in Swift

Ali Ghandi的更多文章

Q2

社区洞察

其他会员也浏览了

Random Variables & Conditional Probability

Kendall's Tau; What a great method to use

Estimates are Probability Distributions

We modeled long memory with just one lag!

Balanced primes

Find Peak Element

On the Transient Queue with the Dropping Function

Calculation BER from Q

√2 as a fraction

Variadic Parameters in Swift