#26) The Butterfly Effect, The Ripple Effect--sigh...
My faithful and loyal grasshoppers. Last time, Dave Sensei promised you butterflies, and frickin' butterflies is what you are going to get. Nothing is too good for my students, and I spare no verbal expense in filling your hearts with joy. As my father used to say, God rest his soul, "Let us go forth. Or maybe even fifth..."
5.3) Defining The Chain Rule: It's Like The Butterfly Effect
OK. Here we go. Remember that the key overall problem we are trying to solve is, "How much do I adjust the 16 weights of syn0 and syn1 in order to synch each value perfectly with the other 15 bowling pins, to keep them all in the air and minimize the l2_error so I can arrive at the castle and the meaning of life?"
Let's break that big question down into tiny little pieces. Instead of looking at 16 bowling pins--er, I mean weights--at a time, let's just look at 1 of the 16--syn0 (1,1), which we have nicknamed syn0,1 for simplicity.
There, that's better. Now my question is this: How do I adjust syn0,1 in an orderly, interdependent manner so that all 16 bowling pins don't fall out of my hands while I am reducing the l2_error as much as possible?
The answer to that question is the chain rule, which is kind of like The Butterfly Effect, where the butterfly flapping its wings in New Mexico sets off a chain of events that results in a hurricane in China. (Did you notice? "chain of events" sounds a tad like "chain rule" in calculus, no? Oh Dave, so very clever...). Let's look at a picture of how this analogy might apply to the series of ratios we must calculate in back propagation:
So: a butterfly flaps its wings once, setting off an escalating chain reaction. In New Mexico, the tiny air current from the wing flap combines with a "perfect storm" of other events to create a gust of wind in neighboring Nevada, which combines with other weather to create heavy winds in L.A., which creates a thunderstorm in Hawaii...you get the idea.
Now, let's walk through the math of our butterfly analogy: When we increase or decrease the value syn0,1, that's like the butterfly in New Mexico flapping its wings. For simplicity's sake, let's say we just increased our syn0,1 to 3.66. This increase will now ripple through our chain of events--our "perfect storm" of other weights combining perfectly to reduce the l2_error.
One can never have too much beauty in the world, so now let's combine our butterfly effect with another beautiful analogy, the ripple effect:
5.4) The Butterfly Effect Meets Our 5 Ratios of Change: the Ripple Effect
Loyal friends, you are no doubt aware that the butterfly effect is merely an example of a more general concept, the ripple effect. The diagram below will joyfully combine our butterfly effect with the ripple effect of how one ratio of change affects the next. This diagram looks complicated, but it actually is made up of only three things:
- The same white circles we saw above, which describe the butterfly effect from left-to-right;
- At the bottom of the picture, there is a row of squares which contain the ratios of change in the back prop chain rule. These squares move backwards from right-to-left, i.e., a change in the right-most ratio ripples through the other ratios to affect the left-most ratio.
- Some artful colored arrows connect the two in a rather pleasing manner.
Ripple 1, the first ripple effect of our tweak to syn0,1 (the butterfly wing flap) will cause l1_LH to increase by a certain proportion, aka a ratio of change. That's the "gust of wind in Nevada." (see the grey line below)
Since l1_LH is the input of our sigmoid function, to calculate that ratio of change between l1_LH and L1 (aka, "to take the slope") is to measure Ripple 2, the "heavy winds in L.A." (purple line below)
Then, l2_LH will obviously be affected in proportion to the change in l1 and its subsequent multiplication by syn1,1 (which does not change-- we're leaving syn1,1 and the other 14 weights unchanged for now, to simplify our example), so measuring that ratio of change will give us Ripple 3, the "thunderstorm in Hawaii." (yellow line below)
You can probably guess l2 will change in proportion to the change in l2_LH, so taking the slope of l2_LH will give us Ripple 4, the "storm over the Pacific." (green line below)
Finally, when this new l2 is subtracted from our target y value, the remainder, which is l2_error will change, and this is Ripple Effect #5, the "hurricane in China." (light blue line below).
Our goal is to calculate the ratio by which each ripple ripples, in order to know the amount we want to increase/decrease syn0,1 in order to minimize l2_error on our next iteration (the dark blue arc line points to both the input and the output of our chain rule function). When we say our neural network "learns," we really mean it reduces l2_error with each iteration such that the network's predictions become more and more accurate each time. So, tweaking syn0,1 is like tweaking the flap of the butterfly's wings, which ripples through a chain of events right up to the hurricane in China, which in our example is the reduction of l2_error.
Since we're working backwards, we would say, "How much the hurricane l2_error changes depends on how much l2 changes, which depends on how much l2_LH changes, which depends on how much l1 changes, which depends on how much l1_LH changes, which depends on how much our butterfly, syn0,1 changes. The unspecified rule used in this example is that the size of the step is equal to the slope.
That's all well and good, but what I really want you to understand is how our Python code synchs up with these ratios of change that ripple from a change in syn0,1 all the way through to a change in l2_error. So tomorrow, let's take a look at which lines of code align with which ratios of change in our chain rule function. Why? Because I've grown fond of you over the weeks. And also, ya just gotta frickin' learn it.
Ta ta for now, but here's a little something to remember me by--this Peruvian Alpaca may look a tad stupid, I acknowledge that. But don't be fooled by that straw hanging out of his mouth--he actually holds a PhD in computer science and is an AI genius! Never judge a book by its cover...