Several ways to use lm function
Henry T.H. Tu
Specialised in operational data analytics, cloud big data analytics, customer analytics, geospatial data analytics, and natural language processing with computing/statistical background in EY Singapore TD team
We are given the data with columns (random variables) and we want to tell if there is a relationship / dependency between the variables. The simplest relationship would be linear relatioships that people usually think of.
However, linear relationships can be extended to non-linear relationships and generalized linear relationships. This means that we can do many amazing things from just understanding the linear model. Let us use the function lm in R language to demonstrate what I mean.
Learning (a, b) of the model Y = a*X + b from data
In the simplest case, we want to answer what is the value of a and b from the data. And we can use lm function to do that.
https://raw.githubusercontent.com/tutrunghieu/sharing/master/lm1/lm1-xy.R
Learning (a, b, c) of the model Y = a*X1 + b*X2 + c from data
When we have two variable, the function lm still works and it gives the linear coefficients for the model.
https://raw.githubusercontent.com/tutrunghieu/sharing/master/lm1/lm2-x1x2-y.R
Learning (a, b, c) of the generalized linear model Y = exp(a*X1+b*X2 + c) from data #output-transformation
This is not the linear model anymore. This is non-linear model. If you try to apply lm directly into the dataset, the error will be very high.
However, we can still make it linear if we take the log from both side. Then we have log(Y) = a*X1 + b*X2 + c instead of the original model. And we see that log(Y) linearly depends on X1 and X2.
As a result, we learn (X1, X2, logY) instead of learning (X1, X2, Y) then we predict by take exp(y) where y is the predicted value.
https://raw.githubusercontent.com/tutrunghieu/sharing/master/lm1/lm3-logY.R
Learning (a, b, c, d) of the model Y = a*X^3 + b*X^2 + c*X + d from data #input-transformation
We can use the lm function to learn the polynomial parameters as well. In this case, we need to add more input variables X3=X^3 and X2=X^2 therefore the input data will have (X, Y, X2, X3) instead of (X, Y). This is called input transformation. The input (X) is transformed into (X, X2, X3).
https://raw.githubusercontent.com/tutrunghieu/sharing/master/lm1/lm4-poly.R
Conclusion
Linear model is very simple. However it is very effective. It can help us understand not only the linear relationships between variables, but also the non-linear relationships if we use it wisely.
Linear model is also a very effective building block to build the complicated learning models with many layers. We can see it in model combining/averaging topics.