A dramatic explanation of convex optimization and gradient descent to a group of ER physicians in a language they understand all too well

A dramatic explanation of convex optimization and gradient descent to a group of ER physicians in a language they understand all too well

Q: Most algorithms in medicine are carefully hand-crafted rules built on years of research. How on earth does “AI” “learn” the optimal solution on its own.

A: Imagine that a diabetic patient involved in a road traffic accident is rushed into the ER in shock - severely hypotensive (low blood pressure), hypoglycemic (low blood sugar), anemic (low red cell count), and dehydrated. This patient obviously needs an “unknown” combination of life-saving fluids - saline, dextrose, plasma expanders, maybe even a blood transfusion, with a cautious amount of epinephrine. These are the levers (weights and biases) the network can adjust autonomously to achieve its objective. Too much or too little could result in the loss of the patient (meaning lawsuits flying left and right), so it has to get it just right (bias-variance tradeoff). 

No alt text provided for this image

Before the fluids are given, you set up a central line to continuously measure the difference between the current (low) arterial blood pressure and the expected/normal BP (the loss function). The training objective is to lower (minimize) this gap. 

No alt text provided for this image

The “AI doctor” begins with a random combination of fluids (random weight initialization) — NEVER do this in a real clinical setting— and checks the blood pressure, repeating this check after every update to the fluid combination, ready to adjust the quantity and speed of the fluids (weights) based on the movement of the blood pressure difference (loss). To prevent overly drastic changes that could put the patient at risk, we set a tolerance value (the learning rate) that modulates each change.

The “training” process begins by infusing this random combination of fluids. The blood pressure gap predictably worsens on the first read. An urgent adjustment (weight update) to the combination is paramount.

Q: But how do machines know how to make the correct kind of change? Do they make countless random changes (brute force) hoping to get it right? 

The magic of machine learning rests on a simple concept in calculus called the slope (gradient).

A: No. The changes are more systematic and intelligent. Recall that our goal (training objective) is to keep tweaking the fluid combination progressively till the blood pressure is equal to the expected blood pressure (of a healthy person), that is, the difference between actual and expected is zero (global minima). A change that reduces this gap (towards zero) is a desirable one while a change that widens the gap is not. If a desirable change is achieved, we want to keep moving in that direction. An undesirable change should trigger a move in the opposite direction. 

No alt text provided for this image

If you plot all possible amounts of adrenaline on the x-axis and the resulting BP gap on the y-axis, as the amount of adrenaline increases from zero, the blood pressure gap begins to improve gradually, reducing towards zero (negative slope) up to a point where the BP gap plateaus (zero slope) and begins to increase or worsen (positive slope) as adrenaline concentration increases. This forms a V- or U-shape from left to right indicating improvement (reduction in BP gap) till zero followed by worsening gap (increase away from zero). A loss function with this behavior is said to be convex. In mathematics, the systematic process of reaching the bottom of this convex surface is called Convex Optimization.

After each update to our fluid combination, if the resulting blood pressure gap falls on either side of the V (positive or negative slope), we want the next step to take us down either slope moving progressively in that downward direction (descent) instead of jumping around randomly to reach the bottom. Therefore, we check the “slope” (or gradient) at the end of each change in fluid combination (called an iteration).

The magic of machine learning therefore rests on a simple concept in calculus called the slope (gradient). Explained in clinical terms, if I add a very very very tiny amount of adrenaline (weight update), what is the resulting change (delta) in blood pressure gap (loss)? The (tiny) change in blood pressure divided by (tiny) change in adrenaline gives you the slope (gradient)— same as high school “rise over run” or change in y over change in x.

No alt text provided for this image

Through the rise over run principle, we know that if the resulting slope is positive, then we are on the right (upward) arm of the V and thus should reduce adrenaline, to get downhill (the opposite direction). Conversely, if the resulting slope is negative, we know we are on the left (downward) arm of the V-curve and should keep increasing adrenaline gradually to push the BP gap downward towards zero (global minima). This slope-driven process drives the training process to adjust the amount of adrenaline until the expected BP is reached. This is gradient descent.

No alt text provided for this image

SGD image credits: https://rasbt.github.io/mlxtend/user_guide/general_concepts/gradient-optimization/

At or near the target BP gap of zero, the bottom of the V or U-shaped curve is achieved and the slope is zero, therefore downward and upward movements are eliminated, effectively “stalling” the training process. It is typical to interrupt training at this point indicating that the optimal fluid combination has been achieved. 

Typically, this optimal fluid combination (complex formula or function) is stored for future reuse on similar patients. Furthermore, it represents a much better starting point for training on dissimilar patients when compared with starting at a random initialization of fluids, significantly reducing the time required for future training. This process of initializing the training process with learned parameters is called transfer learning.

Matt Millett MD, MS

CEO & Founder @Caralyst; Anesthesia Resident @UChicago

4 年

Oh man this is beautiful. Love the analogy!

Odane D.

Data Integration & Interoperability

4 年

Solid read, and an applied approach to when we will need to use math in the “real world”. Thanks for sharing Tobi.

要查看或添加评论,请登录

Tobi Olatunji MD的更多文章

社区洞察

其他会员也浏览了