登录查看更多内容

A dramatic explanation of convex optimization and gradient descent to a group of ER physicians in a language they understand all too well

Tobi Olatunji MD

ASR/MT/AI for Global Health @ Intron | Ex-AWS | Ex-Enlitic | 3x patents

发布日期: 2020年9月10日

Q: Most algorithms in medicine are carefully hand-crafted rules built on years of research. How on earth does “AI” “learn” the optimal solution on its own.

A: Imagine that a diabetic patient involved in a road traffic accident is rushed into the ER in shock - severely hypotensive (low blood pressure), hypoglycemic (low blood sugar), anemic (low red cell count), and dehydrated. This patient obviously needs an “unknown” combination of life-saving fluids - saline, dextrose, plasma expanders, maybe even a blood transfusion, with a cautious amount of epinephrine. These are the levers (weights and biases) the network can adjust autonomously to achieve its objective. Too much or too little could result in the loss of the patient (meaning lawsuits flying left and right), so it has to get it just right (bias-variance tradeoff).

Before the fluids are given, you set up a central line to continuously measure the difference between the current (low) arterial blood pressure and the expected/normal BP (the loss function). The training objective is to lower (minimize) this gap.

The “AI doctor” begins with a random combination of fluids (random weight initialization) — NEVER do this in a real clinical setting— and checks the blood pressure, repeating this check after every update to the fluid combination, ready to adjust the quantity and speed of the fluids (weights) based on the movement of the blood pressure difference (loss). To prevent overly drastic changes that could put the patient at risk, we set a tolerance value (the learning rate) that modulates each change.

The “training” process begins by infusing this random combination of fluids. The blood pressure gap predictably worsens on the first read. An urgent adjustment (weight update) to the combination is paramount.

Q: But how do machines know how to make the correct kind of change? Do they make countless random changes (brute force) hoping to get it right?

The magic of machine learning rests on a simple concept in calculus called the slope (gradient).

A: No. The changes are more systematic and intelligent. Recall that our goal (training objective) is to keep tweaking the fluid combination progressively till the blood pressure is equal to the expected blood pressure (of a healthy person), that is, the difference between actual and expected is zero (global minima). A change that reduces this gap (towards zero) is a desirable one while a change that widens the gap is not. If a desirable change is achieved, we want to keep moving in that direction. An undesirable change should trigger a move in the opposite direction.

If you plot all possible amounts of adrenaline on the x-axis and the resulting BP gap on the y-axis, as the amount of adrenaline increases from zero, the blood pressure gap begins to improve gradually, reducing towards zero (negative slope) up to a point where the BP gap plateaus (zero slope) and begins to increase or worsen (positive slope) as adrenaline concentration increases. This forms a V- or U-shape from left to right indicating improvement (reduction in BP gap) till zero followed by worsening gap (increase away from zero). A loss function with this behavior is said to be convex. In mathematics, the systematic process of reaching the bottom of this convex surface is called Convex Optimization.

After each update to our fluid combination, if the resulting blood pressure gap falls on either side of the V (positive or negative slope), we want the next step to take us down either slope moving progressively in that downward direction (descent) instead of jumping around randomly to reach the bottom. Therefore, we check the “slope” (or gradient) at the end of each change in fluid combination (called an iteration).

The magic of machine learning therefore rests on a simple concept in calculus called the slope (gradient). Explained in clinical terms, if I add a very very very tiny amount of adrenaline (weight update), what is the resulting change (delta) in blood pressure gap (loss)? The (tiny) change in blood pressure divided by (tiny) change in adrenaline gives you the slope (gradient)— same as high school “rise over run” or change in y over change in x.

Through the rise over run principle, we know that if the resulting slope is positive, then we are on the right (upward) arm of the V and thus should reduce adrenaline, to get downhill (the opposite direction). Conversely, if the resulting slope is negative, we know we are on the left (downward) arm of the V-curve and should keep increasing adrenaline gradually to push the BP gap downward towards zero (global minima). This slope-driven process drives the training process to adjust the amount of adrenaline until the expected BP is reached. This is gradient descent.

SGD image credits: https://rasbt.github.io/mlxtend/user_guide/general_concepts/gradient-optimization/

At or near the target BP gap of zero, the bottom of the V or U-shaped curve is achieved and the slope is zero, therefore downward and upward movements are eliminated, effectively “stalling” the training process. It is typical to interrupt training at this point indicating that the optimal fluid combination has been achieved.

Typically, this optimal fluid combination (complex formula or function) is stored for future reuse on similar patients. Furthermore, it represents a much better starting point for training on dissimilar patients when compared with starting at a random initialization of fluids, significantly reducing the time required for future training. This process of initializing the training process with learned parameters is called transfer learning.

Matt Millett MD, MS

CEO & Founder @Caralyst; Anesthesia Resident @UChicago

4 年

Oh man this is beautiful. Love the analogy!

1 次回应

Odane D.

Data Integration & Interoperability

4 年

Solid read, and an applied approach to when we will need to use math in the “real world”. Thanks for sharing Tobi.

1 次回应

查看更多评论

要查看或添加评论，请登录

Tobi Olatunji MD的更多文章

USMLE score [Google AI (PaLM)]: 67% on 1,273 NBME Questions

2022年12月30日

USMLE score [Google AI (PaLM)]: 67% on 1,273 NBME Questions

Should all my colleagues get scared or excited? Is this a great #USMLE study assistant or will AI take over medicine…

5 条评论
Tackling Medical Misinformation using Machine Translation: NLLB-200, Why should we care, Part 2

2022年8月19日

Tackling Medical Misinformation using Machine Translation: NLLB-200, Why should we care, Part 2

In an era of viral vitriol, superstition, and conspiracy theories about vaccines and epidemics, “freedom of speech” on…
The decline in minority languages and MetaAI's NLLB model for multi-lingual translation

2022年8月11日

The decline in minority languages and MetaAI's NLLB model for multi-lingual translation

No-Language-Left-Behind (NLLB), MetaAI’s mega-project to automatically translate between 200 languages with a single…

4 条评论
Closing the Mistranslation Chasm between Clinicians and Engineering Teams

2020年9月17日

Closing the Mistranslation Chasm between Clinicians and Engineering Teams

I remember having to explain how interleukins work to a team of engineers working on a drug discovery project for…

4 条评论

A dramatic explanation of convex optimization and gradient descent to a group of ER physicians in a language they understand all too well

Tobi Olatunji MD

ASR/MT/AI for Global Health @ Intron | Ex-AWS | Ex-Enlitic | 3x patents

Tobi Olatunji MD的更多文章

社区洞察

其他会员也浏览了

Threading the Needle: How AI Can Navigate the Journey to Precision in Complex Dosing

My research, and not only, interests

The Role of Precision in Healthcare AI

WHY MEDICAL PROFESSIONALS SHOULD LEARN TO LIVE WITH ARTIFICIAL INTILLIGENCE Essays on Artificial Intelligence part 1 By Dr.T.V.Rao MD

Are multimodal models fit for medical use?

Alternative medical data, AI, and new bundlings/unbundlings in healthcare

Doctors aren't needed now? Or how AI is infiltrating medicine.

LLMs + Knowledge Graphs: A Novel Approach

AI to the X-Ray

Meet Dr. Turner and Mr. AI

Tobi Olatunji MD的更多文章

USMLE score [Google AI (PaLM)]: 67% on 1,273 NBME Questions

Tackling Medical Misinformation using Machine Translation: NLLB-200, Why should we care, Part 2

The decline in minority languages and MetaAI's NLLB model for multi-lingual translation

Closing the Mistranslation Chasm between Clinicians and Engineering Teams

社区洞察

其他会员也浏览了

Threading the Needle: How AI Can Navigate the Journey to Precision in Complex Dosing

My research, and not only, interests

The Role of Precision in Healthcare AI

WHY MEDICAL PROFESSIONALS SHOULD LEARN TO LIVE WITH ARTIFICIAL INTILLIGENCE Essays on Artificial Intelligence part 1 By Dr.T.V.Rao MD

Are multimodal models fit for medical use?

Alternative medical data, AI, and new bundlings/unbundlings in healthcare

Doctors aren't needed now? Or how AI is infiltrating medicine.

LLMs + Knowledge Graphs: A Novel Approach

AI to the X-Ray

Meet Dr. Turner and Mr. AI