Lessons in A.I. from a Budding Machine Learning Engineer — Getting to the Core — Part II
Introduction
In our last article we discussed prediction and how this relates to but is different from inference in Frequentist Statistics. I got some feedback that it may be advisable to provide more details on how prediction applies to Machine Learning at a deeper level. Let’s quickly revisit prediction and tie in our understanding of it to Linear Algebra including matrices, matrix equations, and Matrix Algebra.
Prediction — Core Ingredient to Machine Learning
As mentioned, in our previous articles on Linear Algebra we showed you the matrix equation.
Ax = b
where A is the a matrix, x is a vector, and b is a vector known as the solution. In Machine Learning it is common to rewrite this formula as:
X β = y
We refer to X as the design matrix, β (or beta) as the weight vector, and y as the actual output value vector. “The simplest relation between two variables x and y is the linear equation:
y = β0 +β1x1 + … βnxn.
The training data or samples are often produced using points (x1, y1), …. (xn, yn) that appear close to some linear or non-linear curve. Note, we could also have multiple input vectors (x1, x2, …xn) or output vector values (y1, y2, …yn). Our goal is produce or predict some curve or series of curves that are close to the input curve as computationally possible. You may also have noticed that the coefficient β0 does not have an input value of x next to it. This is because β0 is formally called the bias in order to help stabilize the algorithm to not overfit the data based on the prescribed curve.