Logistic regression has been a regression since its birth - and is used this way every day.

Adrian Olszewski

Clinical Trials Biostatistician at 2KMM (100% R-based CRO) ? Frequentist (non-Bayesian) paradigm ? NOT a Data Scientist (no ML/AI/Big data) ? Against anti-car/-meat/-cash and C40 restrictions

发布日期: 2024年3月6日

TL;DR

Is the linear or quantile regression a regression to you? I bet you say "yes, because it predicts numerical value", am I right? And the Poisson or negative binomial regression? Also yes, however they give something different than integers, right? And the Cox regression? Hmm, also numerical output, but a bit "different" from the binary-numerical input (censored/dead, time). And the Ordinal Logistic Regression applied to numerical data? You can obtain the empirical CDF from it, predict means, quantiles, test hypotheses (Mann-Whitney and Kruskal-Wallis are its special cases!), or you can obtain predictions on the latent variable scale... but in general you deal with cumulative logit... Weird case, isn't it? OK, so how about the logistic regression?

Let me tell you, to a statistician they all represent essentially the same idea - relating some feature of the observed data conditional to the independent variables. What feature? Linear regression (part of the General Linear Model) gives you conditional expected values (means), E(Y|X=x). Poisson, gamma, logistic, multinomial regressions (part of the Generalized Linear Model -GLM) give you link-transformed conditional expectation to preserve linearity with respect to the predictor, μ(E(Y|X=x)). Quantile regression gives you conditional quantile, Qi(Y|X=x). Cox regression gives you conditional hazard function of failure time, λ(t|X=x).

Logistic regression is truly no different! It also gives you link-transformed conditional expectation, as any other of the GLM family, here - for the Bernoulli distribution so it has natural interpretation as the probability (uncalibrated, BTW) of an event.

The logistic regression was invented and further developed between 1930s and 1970s (Berkson, McFadden, Cox, Nelder, Weddeburn) to replace the existing probit model, essential in answering regression related problems in experiments with binary endpoints. It was years ago before people started using it for classification. Nowadays it is the key regression algorithm not only in experimental research, like clinical trials, where it is used to answer questions about treatment efficacy and safety (through testing hypotheses about main, interaction and simple effects) or explore epidemiological questions about potential risk factors (through marginal effects).

"But, Adrian, its properties make it suitable for classification tasks!", "Adrian, note that it can be obtained from perceptron, a neural network, suitable for classification". Sure! That's right. Also Sir David Cox in his book ("Analysis of Binary Data") mentions also the relationship between logistic regression and discriminant analysis. But that's just one out of multiple applications of the conditional expectation (=regression).

Thousands of statisticians daily use it for tasks other than classification! In all those applications regression is the primary outcome to them, all others - are secondary. Of course, you can treat it as "a classifier", if you wish, but don't say that "logistic regression is not a regression", because its confusing the existential quantifier ?x ("for some cases") with the universal one ?x (for all cases).

"But Adrian, the coefficients are about log odds, they don't represent change of the response variable directly unlike the linear regression!" - tell me, how many regressions do you know where coefficients represent the change in raw response? ALL regressions (including the linear one) relate predictor with CONDITIONAL EXPECTATION, not the RAW response! Moreover, Poisson regression - integer input and fractional output, log(E(Y|X=x)). Cox regression? Binary input (alive/dead) + time to event result in fractional output = conditional S(t). Where's the problem?

Are you going to reject the entire Generalized Linear Model family (but remember, linear regression is part of it!) or Cox regression as regressions too?

See? All dots connect - because they must - all those are regressions, only with different interpretations and applications, but the underlying concepts are shared by all of them.

If you prefer reading on Medium (also for non-members): https://lnkd.in/dpDXr8qQ

What a weird situation... What statistics acknowledges, Machine Learning denies...

Let's Mortal Combat begin!

Well, it's kinda... awkward for me to write about something that is (should be) obvious to anyone working with statistics but in the last decade has been distorted by hundreds of thousands of members of the Machine Learning community, so today lie replaced the truth...

I remember the first time, when, during some discussion, I said that "I've been using logistic regression for long years on daily basis for regression and testing hypotheses, but I've never used it for classification" and a Data Scientist (with PhD degree) told me, that I must have been mistaken, because "despite its name logistic regression not a regression algorithm". I asked him "then tell me, please, what do I do every day at work???" he replied "I have no idea, but this sounds a pure nonsense, because logistic regression predicts only two binary outcomes so you understand it cannot be a regression".

I was shocked.

In such moments I wish there existed a triple lm too...

For a long time people (mostly researchers, statisticians) already had been reporting to me that a similar situation happened to them during interviews and internet discussions. I did small research, which results knocked me off my feet. I “googled” for terms like “logistic regression is not (a) regression”, “logistic regression is a misnomer” or “logistic regression, despite its name”. The number of findings was huge — they occurred everywhere: in articles, tutorials and courses (also issued by companies offering paid content), blogs, books (including bestsellers in ML written by people holding PhD), YouTube videos. I also repeated the search on LinkedIn and found endless flood of posts repeating this misinformation just copy-pasted from others’ posts.

/ ?? PS: this reveals the sad fact that people way too often, thoughtlessly, repeat what they find on the Internet without any fact checking! /

Not only that! I asked Chat GPT 3 (then 3.5) and got identical results. No surprise! If it was “fed” by misinformed sources, then it learned misinformation, and today it “helps” spreading misinformation to learners. And often learners are those, who may not even suspect that something is wrong, so they trust AI and repeat the nonsense further and further.

/ UPDATE: The newer GPT 4o is better!

There is no single week on LinkedIn without someone repeating it, earning hundreds of ?? -> proving that hundreds of people liked (so tens of thousands saw it) it and... will likely repeat the same.

Finally I decided to write a few words about this "issue". I write from the perspective of a clinical biostatistician, working in clinical trials - part of the pharmaceutical industry responsible for both existing and new therapies (drugs, procedures, devices) evaluation and approval. Here, in clinical trials, the logistic regression is the key regression algorithm, used to answer questions about treatment efficacy and safety based on the data from clinical trials with binary endpoints (success/failure).

Some of my readers might have heard that I have never used logistic regression for classification during the whole time of my professional career. That's right.

Birth of the logistic regression and the... Nobel Prize

The origins of the logistic function can be traced back to the 19th century (free PDF), where it was employed in a "model of population growth". Early attempts (1930s) to model binary data in the regression manner resulted in probit regression model (Bliss, Gaddum), which constituted a standard for the next few decades. Researchers found the outcome not much intuitive, so they searched for a regression model, which coefficients would be easier to interpret. In already 1944 Joseph Berkson started working (bioassay experiments) on the alternative to the probit model, and the "logit" (by analogy to "probit") model was born. Unfortunately, the logit model was rejected by many as inferior to the probit model. It slowly changed around 1950s when George Dyke and H. Patterson's published their paper on applying the linear logistic model to cancer survey data ("Analysis of Factorial Arrangements When the Data Are Proportions"). But it took long years, until the logit model gained similar "trust" (1960-1970), finally refined by Sir David Cox ("Some procedures connected with the logistic qualitative response curve", 1966 and "The regression analysis of binary sequences", 1968).

/ BTW, check also the list of other publications of this Great Mind of Statistics, especially "Analysis of Binary Data (Google Books)" /

Let me make a digression and recall that Sir David Cox, while working on binary-response problems, developed not only the logistic regression, but also the survival regression model (named after him: Cox regression), employing the conditional survival function. See? People tried to approach this problem from various perspectives, if we also briefly mention the latent-variable model (both using the logistic distribution and Gaussian distribution).

Almost in parallel with the multinomial logit model (Cox, Theil), which, finally, in 1973, allowed Daniel McFadden, a famous econometrician, to piece together existing puzzles, including the Duncan Luce's choice axiom, into a whole, which resulted in a theoretical foundation for the logistic regression. At that time, McFadden was deeply involved in pioneering work in developing the theoretical basis for discrete choice where he applied the logistic regression for empirical analysis. His work, making a profound impact on the analysis of discrete choice problems in economics and other fields, gave him the Nobel Prize in 2000.

I think we can fairly say that Daniel McFadden's work on the logistic (ordinary and multinomial) regression model and the discrete choice analysis was truly groundbreaking. It played a significant role in establishing logistic regression as a solid tool in statistical analysis, not only in econometrics!

Remember the rejection of the logit model, found inferior to the probit one? Now the situation reversed, and logistic regression today is the default approach.

1970s were truly fruitful to logistic regression! In 1972, Sir John Nelder and Robert Weddeburn, in their seminal work (free PDF), introduced the idea of a unified framework: the Generalized Linear Model (GLM). It enabled regression models to cope with response variables of any type (counts, categories, continuous), through various conditional (to predictor) distributions, including Bernoulli (binomial with k=1), Poisson, gamma, Gaussian, and appropriate linking functions (log, logit, reciprocal, identity), relaxing the assumption of normal distribution of errors for inference.

/ ?? Logistic regression is a special case of the GLM. You can spot it easily when working with the R statistical package: when you call the glm() function, you need to specify the family of conditional response distribution -here "binomial"- along with appropriate link -here "logit": glm(family = binomial(link = "logit")) /

Just a decade later, two other big names you know for sure, Prof. Trevor Hastie and Prof. Robert Tibshirani extended the Generalized Linear Model (logistic regression is a special case of it) to the Generalized Additive Model. In their articles (e.g. "Generalized Additive Models for Medical Research", https://doi.org/10.1177/096228029500400 ) they mention the role of logistic regression in identification and adjustment for prognostic factors in clinical trials and observational studies.

/ ?? Did you know that Professor Trevor Hastie authored the glm() command in the S-PLUS statistical suite, which is the father of GNU R? Yes, S is the origin of R syntax and was still in use a few years ago; I did statistical analyses in S-PLUS. /

Additional extensions for handling repeated observations were made by Kung-Yee Liang and Scott L. Zeger in 1986 via Generalized Estimating Equations (GEE) and Breslow, Clayton and others around 1993, when the theory of Generalized Linear Mixed Models (GLMM) was born.

I can only imagine McFadden's and others' reaction to the nonsense "logistic regression is not a regression"...

Conditional expectation - the key to understand the GLM

Every regression describes a relationship between the predictor and some function of the conditional response. It can be a quantile, Qith(Y|x=x), as in the quantile regression. Or some trimmed estimator of the expected value, like in the robust regression. Or - the expected value of the conditional response (=conditional expectation) itself, like in the classic linear regression: E(Y|X=x).

/ so often confused with one of the estimation algorithms --> "OLS regression" - don't repeat that. /

Now, it's all about the conditional distribution. If it's Gaussian (normal distribution), you obtain the linear regression. But the GLM allows you to use also other distributions: Bernoulli (or binomial), gamma, Poisson, negative binomial, etc. The problem is that then the conditional expectations are not linearly related with the predictor, which is something we really want. That's why we have the link function, linking the conditional expectation and the predictor for a given conditional distribution: g(E(Y|X=x)) = Xb (sometimes you will see this formula reversed: E(Y|X=x) = g?1(Xb). It's equivalent formulation; also "g" may be replaced with "μ").

Now, the expected values are "linearized" with respect to the predictor. For the ordinary linear regression you don't need that, so the g() is just I() (identity function, which we omit) - the expected values lay on a straight line, plane, or hyperplane (depending on how many predictors you have).

/ The name, conditional expectation, is also perfectly visible when you do ANOVA. That's just 1:1, perfect example: the levels of categorical predictor(s) "form" sub-distributions, and mean is calculated in each. Now you also understand what it means: "expected value CONDITIONAL to the predictor"! /

Below we can observe various conditional distributions and their means. The means lay on a straight line transformed by the g() function, the link.

/ OK, I know, the illustration isn't perfect, simplifications are made, but let's agree on its imperfection, as long as it shows the main idea, huh? /

beta regression isn't strictly a GLM, but an extension of it

BTW: This is well explained in the book I recommend you to read:

Peter H. Westfall, Andrea L. Arias, Understanding Regression Analysis A Conditional Distribution Approach

Now, let's answer a few questions:

Is expected value numerical or categorical? Of course it's numerical. It's just "average". So you instantly see that logistic regression CANNOT predict categorical (binary) outcome itself. Whatever you've been told - it cannot, and it does not. Period.
What kind of conditional distribution does the logistic regression use? It uses the Bernoulli's distribution of a single-trial random variable, taking value 1 (for success) with probability p and the value 0 (for failure) with probability 1?p.
What is the expected value for the Bernoulli's distribution? It's "p" - the probability of success.
So the E(Y|X=x) is numerical? Yes, probabilities are numerical.
Why "Bernoulli" if statistical manuals say "binomial"? Bernoulli is binomial with k=1. Just a general term.

I hope you can see from this that logistic regression, as any other regression, predicts a numerical outcome, NOT categorical.

Q: But, Adrian! In my preferred ML toolkit the logistic regression returns just classes!

A: Sure, because ML focuses on classification, so it takes ADDITIONAL STEP and turns probabilities into classified labels. In other words your procedure turns logistic regression into the logistic classifier. The two are NOT the same and serve DIFFERENT purposes!

How is the logistic regression turned into a classifier?

The outcome from the logistic regression, the conditional probability (therefore logistic regression is called also a "direct probability estimator") subjected to a conditional rule IF-THEN-ELSE , which compares it against some threshold (usually 0.5, but this shouldn't be taken as granted!) and returns the category:

IF (p < 0.5) THEN A ELSE B

- Wait, but this is NOT a regression! This USES the regression prediction instead!

Glad you spotted it!

Too often people do not and just repeat that "logistic regression predicts binary outcome". And when I tell them "but what about the regression term in it, which means that it should predict a numerical value?", they respond "Oh! It's a misnomer! Despite its name, logistic regression isn't a regression because it it doesn't predict numerical outcome!".

In other words, they do something like this:

... making a direct jump from binary input to binary output:

But notice, they did not change the name accordingly. Instead of calling it “Logistic Classifier”, the ML community left the name “Logistic Regression”. We could say they “appropriated the logistic regression”.

Consequently, they have problems with justifying the existing name.

Isn't this just crazy?

Statisticians invent an algorithm to answer regression-related questions about categorical (here: binary) data. They continue to use this algorithm this (regression) way for more than half a century.
One day, ML specialists realize that this model is useful for classification, so from this day forward, they start referring to it as to a classifier.
ML specialists go further and redefine well settled statistical definitions, making their own definition of regression: now, for them, a model is of regression kind, if its output variable is numerical. In this light, the logistic regression cannot be a regression anymore, because the output variable is binary. Some of ML specialists went even further, saying that the "predicted (returned) outcome is binary here", but... well, they evidently forgot basic statistical fact that regression gives conditional expectation, so... NO, it doesn't work ?? Not without the additional step, subjecting the predicted probability to a conditional rule, employing a threshold, to finally determine the category.
Having their new, shining (and discrepant...) definition of regression, they decide - however- to KEEP the "regression" name, but to DENY its regression nature.
They realize that now they need a "remedy" for the "logistic" part in the name. A "brilliant" idea came to them: how about calling the logistic regression a... "misnomer". Wait, what?!
Even better. Nowadays, ML specialists correct statisticians that "logistic regression is a misnomer, because despite its name logistic regression is not a regression". I faced this several times and couldn't actually believe what I hear. And no, they did NOT joke.

Now please, re-read the points 1-6 to see how ridiculous this approach is.

Despite numerous regression-related problems, where the logistic regression is used every day, the situation looks like below:

So once in a lifetime, let's recall what is the difference between logistic regression and logistic classifier:

But everyone uses logistic regression for classification!

Ah, argumentum ad populum ;]

OK then:

First thing: not "everyone". I understand that ML is a hot topic today (and is here to stay) but it does NOT mean everything revolves around and nothing else matters. There are other areas of science and data analysis too. Yes, really.
The fact, that gazillions of people use logistic regression for classification purposes doesn't remove it's regression nature. It's just one application. It's like cooling your forehead by touching it with a cold metal hammer - you obviously can call it a "forehead cooler", but it doesn't change the fact it's still a hammer.
You should add "... in Machine Learning". Outside ML, the applications are much richer.
Believing that there's nothing beyond Machine Learning in this world doesn't change... the world. And the reality is that experimental research (including clinical research, physical and chemical experiments, sociological and psychological studies, quality assessments), where the regression tools are applied to binary (and n-ary) endpoints on daily basis, is still the essential part of science. So no, it's not true that "everyone uses logistic regression for classification". You should be more specific and add: "Everyone in Machine Learning".

So while I can understand someone saying that "in ML, logistic regression is a classification algorithm", I cannot agree that "logistic regression is not a regression". A single specific application, employing also additional steps, and producing a different (categorized) output does not invalidate the "core" engine.

The fact that a tomato can be used to cook a soup (involving many steps) does not mean that "tomato is not a fruit - it is a misnomer, because tomato is a soup ingredient". It's that simple.

Look, how was the logistic regression addressed years before these "non-regression" nonsenses, where statisticians were developing the basic tools, now called "Machine Learning" ?? This is an excerpt from Prof. Harrell's paper The Practical Value Of Logistic Regression

See the sentence: "[...]of choice for many regression-type problems[...]"? It may sound weird to ML and Data Science specialists, but that's exactly how statisticians treat and use this very element of the Generalized Linear Model ?? Even more - this is exactly why it was invented and further developed by Berkson, McFadden, Cox, and others.

By the way, professor Frank Harrell wrote a series of papers (and mentioned also in his book: "Regression Modelling Strategies") about applying the ordinal logistic regression (aka proportional-odds model) to numerical data. This way, for example, you can test hypotheses (for any number of categorical predictors = factors, their interactions, also adjusted for numerical covariates!) in a distribution-free manner. Surprised? But the Mann-Whitney (aka Wilcoxon) and Kruskal-Wallis tests are nothing but just special cases of the ordinal logistic regression! Even better, you can obtain the empirical CDF for the data, and estimate both arithmetic mean and quantiles from it! Check the "rms" R package and this website - digitalized version of his Regression Modelling Strategies famous book: https://hbiostat.org/rmsc/cony

See? Ordinal regression model used for numerical data, like any other regression model, and it allows you to estimate the empirical CDF and predict means and quantiles!

If ordinal logistic regression is a regression, if multinomial logistic regression is a regression, then why the "normal" logistic regression is NOT a regression? Can you see the nonsense in denying its regression nature?

Q: But Adrian, Logistic regression returns probability, which is used for classification, so anyway the nature of the logistic regression is a classifier!

A: OK, but the fact that something exposes "some nature" doesn't invalidate it's "original nature", especially that the "original nature" drove its invention ??

In the range (about) 0.2 - 0.8 the the sigmoid curve can be approximated by a linear segment, for instance obtained from the... linear regression. You can treat the obtained prediction as probability and use it for classification (in the past it was used and called a "Linear Probability Model"; Link 1, Link 2). Does it make the linear regression as classifier? (check-mate, ML? ??). Well, probably... yes (partially - in the given range - because why not?). Does it mean that "linear regression is NOT a regression"? I guess - no? ??

It only shows that one out of many applications is possible.

Left figure: LPM vs. LR, Right figure: LR approximated by a straight segment

Regression-related applications of the logistic regression (and its friends)

Multiple times I mentioned that logistic regression is used by me and other statisticians to non-classification, regression tasks. Believe me, there is NO difference from any other regression!

Assessment = direction and magnitude of the impact of predictors on the response expressed as: log-odds, odds-ratios, or probability (via estimated marginal means or marginal effects --> for non-identity links)

For categorical predictors: inference about the main effects (=ANOVA; here called the "analysis of deviance"), optionally adjusted for numerical covariates (=ANCOVA); exploration of their interactions through the analysis of the simple effects
For categorical predictors: inference about the simple effects of interest, analysed via planned or ad hoc contrasts (including Tukey's pairwise, Dunnett's, Williams); optionally also adjusted for numerical covariates.

For numerical or ordinal categorical predictors: testing for trends (linear, quadratic, cubic, higher) in proportions. Comparisons of trends between groups.
Replicating the classic statistical tests: of proportions, odd-ratios and stochastic superiority. You may recall the classic statistical tests for proportions: Wald's and Rao z test, ??2 (chi-square) for general contingency tables (contingency analysis is directly related with Poisson and the log-linear model in the general KxN case; logistic regression handles the 2xN case), Cochran-Armitage, Breslow-Day, Cochran-Mantel-Haenszel, McNemar, Cochran Q, Friedman, Mann-Whitney (-Wilcoxon). All these - and some other - tests can be replicated directly through the "classic" logistic regression or its extensions: conditional logistic regression and GEE-estimated logistic regression (to account for clustered/repeated observations, like in the Friedman, McNemar or Cochran Q test).
Extending the above tests for multiple variables and their interactions, and numerical covariates. Just check the illustration below and then visit my GitHub for several examples:

logistic regression and friends can replicate lots of classic tests!

Bonus: the model-based approach (check my GitHub for some thoughts and notes) allows one to employ advanced parametric adjustment for multiple comparisons via multivariate t distribution, adjust numerical covariates, employ time-varying covariates, account for repeated and clustered observations and more!
Direct probability estimator used to implement the inverse probability weighting (IPW) and propensity score matching algorithms
Logistic regression is very useful in the assessment of the Missing-Completely-At-Random (MCAR) pattern when exploring the missing observations!

In my field, clinical trials, I use the logistic regression on almost daily basis for:

the assessment of between-arm treatment effect via comparison of the log-odds or the % of clinical success at certain timepoints
non-inferiority, equivalence or superiority testing (employs clinical significance) at selected timepoints via appropriately calculated confidence intervals of difference between percentage of successes (average marginal effect) or odds ratios. (PS: inference on odds-ratio scale is more reliable if performed with the Wald's approach and when %s are close to 0 or 1, as Wald's procedure isn't invariant to transformation of the log-likelihood curve(!), unlike the Wilks' Likelihood Ratio Testing or Rao's Score Testing.)
the assessment of the impact of predictors on the clinical success + covariate-adjusted EM-means for their main effects, interactions and finally their appropriate contrasts. This is the workhorse in observational studies.
the exploration of interactions (simple effects), making an essential part of my daily work in observational studies or when there's a solid clinical suspect that the interactions may exist.
analysing the over-time within-arm trends of % of successes, e.g. to assess the treatment (or some practice) persistence over time.

Well, definitely - very "non-regression" applications. All "misnomers" - "misnomers everywhere"...

Adrian, I want a concrete example!

Sure! Here's an example from my daily work in clinical trials. Let's imagine a randomized clinical trial with 2 arms: one group of patients is treated with some existing drug (comparator), and the other takes the drug under investigation. The goals of this trial could be:

1?? to assess whether both drugs lead to a similar % of clinical successes (defined by some conditions) at one or more repeated assessments.

?? Here testing hypotheses about the % of success combines both statistical and practical significance through so-called interval hypotheses. The practical significance is called MCID: the minimal clinically important difference. Here let's assume 15 percentage points (%p).So, depending on type of study we can test:- superiority: if B is better than A by more than 15 %p,- non-inferiority: if B is no-worse than A by 15%p,- equivalence: if B is equivalent to A within 15%p margin.

2?? whether the % form any trend over time. We test it via appropriately defined polynomial contrasts over the categorical ordered time.

3?? whether this %-difference is affected by age and sex or some other selected covariates (chosen by the domain specialists, based on causal analyses, experience, medical knowledge, etc). This is done by testing planned contrasts (functions of conditional means).

After the study is completed, we infer the clinical success from the collected observations, fit a logistic regression model (for 2-arms w/o covariates the simple z-test could do).

/ ?? We typically employ the GEE (Generalized Estimating Equations) to obtain population-average covariate-adjusted estimates accounting for correlated responses (patients are assessed multiple times). These estimates are called EM-means (estimated marginal means), usually on the probability scale (%). This is the Wald's approach to testing hypotheses about the average marginal effects. /

If the % are close to 0 or 1, or the differences are close to -1 or 1, the Wald's analysis may be inappropriate, as it returns symmetric confidence intervals, which may go outside the [0;1] or [-1;1] range. In this case, we either employ Likelihood Ratio profiled analysis (complicated, non-flexible) or (easier, flexible) switch to the log-odds (linear predictor) scale. With the note that this changes the null hypotheses(!) and may be questioned.

Now, we can test hypotheses:

1) between-arm comparisons of % at selected timepoints

2) within-arm trends of %

3) interactions B-A vs. age (numerical), contrasts B-A vs. sex (categorical).

Such analysis can look like in the attached example, taken from one of the RCTs I analysed. This one uses pooled (multiple imputation was used to deal with missing obs.) Wald's inference.

Friends of the logistic regression

Logistic regression has many friends that were invented to address various problems related to regression. Let us enumerate them and briefly describe:

Binary Logistic Regression - that's our binomial regression with logit link, a special case of the Generalized Linear Model, modelling the % of successes.
Multinomial Logistic Regression (MLR) - helpful, when we deal with a response consisting of multiple unordered classes (e.g. colours).
Nested MLR - will help us when the classes are "organized" in groups, related in a hierarchy - thus nested. Imagine that a person chooses a mean of transport between air {plane} and road {car, train, bus}. When the road transportation is chosen, then the further decision is made only between the three alternatives. It's similar to multi-level models, where the error terms may present some correlation within the same nest, whereas uncorrelated between nests. Thank you, McFadden, also for this one! Read more here and here (Applied Microeconometrix with R) (or just "google" for more).
Ordinal LR (aka Proportional Odds Model) - allows you to deal with 2+ ordered classes, {horrible < poor < average < good < excellent} or {slow < average < fast}, {small < medium < big} and so on. This is the default method of analysing responses from pools and questionnaires (including Likert items), e.g. . Did you know, that the OLR is related with the Mann-Whitney (-Wilcoxon) test? Use it, if you need a flexible non-parametric test, that: a) handles multiple categorical variables, b) adjusts for numerical covariates (like ANCOVA). Don't hesitate to use it with NUMERICAL variables! Yes, you can always do this, the same way you employ rank-based methods (e.g. Conover's AN[C]OVA). Read also the articles by Prof. Harrell, namely: Resources for Ordinal Regression Models, Equivalence of Wilcoxon Statistic and Proportional Odds Model, If You Like the Wilcoxon Test You Must Like the Proportional Odds Model, and more.
Generalized OLR - aka Partial Proportional Odds Model is used when the proportionality of odds doesn't hold. (PS: read Violation of Proportional Odds is Not Fatal)
Logistic Quantile Regression - application similar to the above - performs logistic quantile regression for bounded responses, like percentages (0-1), school grades, visual analog scales and so on. Check this article and manuals for Stata and R (lqr).
Conditional Logistic Regression - helpful when we deal with stratification and matching groups of data, e.g. in observational studies without randomization, to match subjects by some characteristics and create homogenous "baseline". It can be also used to reproduce the Cochran-Mantel-Haenszel test (via clogit(...strata) in R)
The binary logistic regression and its multinomial LR and ordinal LR friends can account for dependent responses (repeated measurements, clustered observations) through the Generalized Estimating Equations (GEE) semi-parametric estimation and the Generalized Linear Mixed Models (GLMM). No surprise that logistic regression is one of the core regression models for longitudinal clinical trials with binary endpoints. And no, we do NOT classify there anything ;]
Alternating Logistic Regression - it's a quite rare (and forgotten) model, suitable if we deal with correlated observations, e.g. when we analyse repeated or clustered data. I mentioned already two methods: the mixed-effect LR, and GEE LR. The Alternating LR is the 3rd option, which models the dependency between pairs of observations by using log odds ratios instead of correlations (which is done by GEE). It handles ordinal responses too. There were some past implementations in R, but now they are removed from CRAN. SAS supports it as part of PROC GEE.
LASSO, Ridge and Elastic Net regularization techniques exist for the Logistic Regression and are available in R via glmnet package, written and maintained by Professors Hastie, Tibshirani, Friedman et al.

? ... be my sweet model ??...

I really like the name "model". It's a very... "inclusive" name. A model can have many purposes. I use the logistic model for regression related tasks: inference about the model effects and predictions. I also use it to check the MCAR (missing completely at random) missing data pattern.

You can use it to derive a classifier (that could be derived also from perceptron, for instance).

My colleague uses it for propensity score matching in observational studies.

It's also part of the Inverse Probability Weighting (IPW) method itself having multiple applications (e.g. to handle monotonous dropouts with GEE estimation).

Maybe we should call it a Poisson model? A logistic model? A linear model?

... ... ...

PS: but still, the logistic model provides the E(Y|X=x), doesn't it?

Literature

I will populate this chapter with textual references later. For now, find the "collage" of covers. And believe, neither of these books will say that "logistic regression is not a regression" :)

NEITHER of these great books will give you nonsenses like "...is not a regression"

+ recently found an excellent one:

Norman Matloff, Statistical Regression and Classification From Linear Models to Machine Learnin

Other authors also prove it can be done properly:

Brett Lantz, Machine Learning with R: Learn How to Use R to Apply Powerful Machine Learning Methods and Gain and Insight into Real-world Applications

ad hoc comments from my readers

Q: "Adrian, but in their book, Hastie and Tibshirani put the logistic regression in the ?classification? chapter!"

A: Of course they did! It's a book about machine learning, so this kind of application is of interest and highly expectable. BUT they’ve never said it's not a regression model.? They both wrote also a series of articles on the application of the proportional hazard models and the logistic regression in biostatistical (they worked in the division of biostatistics) applications in the regression manner (assessment of the prognostic factors, assessment of the treatment effect) and call it a regression model.

Also In the book you mention, on page 121-122 + the following examples they say: "Logistic regression models are used mostly as a data analysis and inference tool, where the goal is to understand the role of the input variables in explaining the outcome. Typically many models are fit in a search for a parsimonious model involving a subset of the variables, possibly with some interactions terms."

Q: You said that Prof. Hastie authored the glm() function in S. Any source?

Q: ChatGPT 3.5 says that logistic regression is not a regression!

A: ChatGPT will repeat what was trained on. Don't rely on it strictly when you are learning a new topic, because what you will be told strongly depends on how you will ask. It was trained on mixed good and bad resources, so sometimes the valid one is "allowed to speak" but just a few questions later it may be messing again. This pertains to ANY kind of topic, not only in statistics. DO ALWAYS verify the responses from any AI-based system if you are going to learn from it, pass your exams or an interview, or do your job.

PS: I was told that the newest version of ChatGPT is much better, so give it a try.

The newest 4o version is better, sigh!

Q: “What would you propose to manage the situation?”

A: Either to use the name “logistic classifier” to highlight it uses the regression “engine” under the hood, or precise it as follows: “Despite the logistic regression was originally invented to solve regression problems (McFadden, Cox, Nelder, Weddeburn, Hastie, Tibshirani) and is used nowadays in this way by statisticians (for example in experimental research), Machine Learning specialists use it exclusively for classification purposes, adding one more step, a conditional decision rule based on a threshold, and turning the predicted conditional expectation into a classifier”.

OK, just to summarize:

I hope that after reading this story the “inner temptation” to repeat “logistic regression is not a regression” can be silenced once in a lifetime.

带有此图标的链接由领英创建，不带此图标的链接由作者添加。

Mohammad Shariful Islam ,CLSSMBB, Prompt Engineering, Minitab, TQM(Japan).

"Quality | Technical | Opeation Management | Six Sigma Master Black Belt | Data-Driven Decision Making"

2 周

I read your article on that. It is helpful. Thanks.

Adrian Olszewski

Clinical Trials Biostatistician at 2KMM (100% R-based CRO) ? Frequentist (non-Bayesian) paradigm ? NOT a Data Scientist (no ML/AI/Big data) ? Against anti-car/-meat/-cash and C40 restrictions

1 个月

Konrad Banachewicz Hello! I saw your discussion with Valeriy and thought that you might find this interesting! :) At the end of the day, it's my daily tool at work - but I've never used it for classification before...

1 次回应

Rami Abou-Shamalah

Developing solutions for geosciences

2 个月

A worthwhile read, thank you for posting, Adrian. looking forward to more.

Miguel Palencia-Olivar

ML Engineer | MLOps | 5+ projets IA en prod (Michelin, CA, SNCF, SG...) | Je fais partie des 5% qui rentabilisent l'IA ??

4 个月

Patrice Rouzaire

1 次回应

Adrian Olszewski

Clinical Trials Biostatistician at 2KMM (100% R-based CRO) ? Frequentist (non-Bayesian) paradigm ? NOT a Data Scientist (no ML/AI/Big data) ? Against anti-car/-meat/-cash and C40 restrictions

7 个月

Nick Ford I thought you might find this interesting.

查看更多评论

要查看或添加评论，请登录

Adrian Olszewski的更多文章

The 2-sample Wald’s z-statistic for proportions w. unpooled variances vs. the AME over logistic regression with a single binary predictor

2025年3月9日

The 2-sample Wald’s z-statistic for proportions w. unpooled variances vs. the AME over logistic regression with a single binary predictor

In my two previous articles: “Is logistic regression a regression? It has been a regression since its birth?—?and is…

1 条评论
Logistic regression can replicate multiple parametric and non-parametric tests of proportions

2024年3月8日

Logistic regression can replicate multiple parametric and non-parametric tests of proportions

Today we will continue the topic about logistic regression. In the previous part "Logistic regression has been a…

10 条评论
Why is transforming the response in regression analysis and hypothesis testing so dangerous? ??

2022年10月10日

Why is transforming the response in regression analysis and hypothesis testing so dangerous? ??

Introduction When reading scientific articles, blogs, tutorials and even textbooks, you will quickly notice, that…

42 条评论
Small Data in Clinical Research

2017年6月22日

Small Data in Clinical Research

Even in the era of ubiquitous big data (thousands of terabytes - and counting) processed by advanced machine learning…

9 条评论

TL;DR

Let's Mortal Combat begin!

Birth of the logistic regression and the... Nobel Prize

Conditional expectation - the key to understand the GLM

How is the logistic regression turned into a classifier?

But everyone uses logistic regression for classification!

Regression-related applications of the logistic regression (and its friends)

Friends of the logistic regression

? ... be my sweet model ??...

Literature

ad hoc comments from my readers

Adrian Olszewski的更多文章

The 2-sample Wald’s z-statistic for proportions w. unpooled variances vs. the AME over logistic regression with a single binary predictor

Logistic regression can replicate multiple parametric and non-parametric tests of proportions

Why is transforming the response in regression analysis and hypothesis testing so dangerous? ??

Small Data in Clinical Research