登录查看更多内容

Several ways to use lm function

Henry T.H. Tu

Specialised in operational data analytics, cloud big data analytics, customer analytics, geospatial data analytics, and natural language processing with computing/statistical background in EY Singapore TD team

发布日期: 2018年1月28日

We are given the data with columns (random variables) and we want to tell if there is a relationship / dependency between the variables. The simplest relationship would be linear relatioships that people usually think of.

However, linear relationships can be extended to non-linear relationships and generalized linear relationships. This means that we can do many amazing things from just understanding the linear model. Let us use the function lm in R language to demonstrate what I mean.

Learning (a, b) of the model Y = a*X + b from data

In the simplest case, we want to answer what is the value of a and b from the data. And we can use lm function to do that.

https://raw.githubusercontent.com/tutrunghieu/sharing/master/lm1/lm1-xy.R

Learning (a, b, c) of the model Y = aX1 + bX2 + c from data

When we have two variable, the function lm still works and it gives the linear coefficients for the model.

https://raw.githubusercontent.com/tutrunghieu/sharing/master/lm1/lm2-x1x2-y.R

Learning (a, b, c) of the generalized linear model Y = exp(aX1+bX2 + c) from data #output-transformation

This is not the linear model anymore. This is non-linear model. If you try to apply lm directly into the dataset, the error will be very high.

However, we can still make it linear if we take the log from both side. Then we have log(Y) = a*X1 + b*X2 + c instead of the original model. And we see that log(Y) linearly depends on X1 and X2.

As a result, we learn (X1, X2, logY) instead of learning (X1, X2, Y) then we predict by take exp(y) where y is the predicted value.

https://raw.githubusercontent.com/tutrunghieu/sharing/master/lm1/lm3-logY.R

Learning (a, b, c, d) of the model Y = aX^3 + bX^2 + c*X + d from data #input-transformation

We can use the lm function to learn the polynomial parameters as well. In this case, we need to add more input variables X3=X^3 and X2=X^2 therefore the input data will have (X, Y, X2, X3) instead of (X, Y). This is called input transformation. The input (X) is transformed into (X, X2, X3).

https://raw.githubusercontent.com/tutrunghieu/sharing/master/lm1/lm4-poly.R

Conclusion

Linear model is very simple. However it is very effective. It can help us understand not only the linear relationships between variables, but also the non-linear relationships if we use it wisely.

Linear model is also a very effective building block to build the complicated learning models with many layers. We can see it in model combining/averaging topics.

要查看或添加评论，请登录

Henry T.H. Tu的更多文章

A tutorial on Cartesian completion and its applications on lifespan, waterfall, and boxplot visualizations

2021年10月27日

A tutorial on Cartesian completion and its applications on lifespan, waterfall, and boxplot visualizations
Cellular charts with alteryx scripting tools

2021年10月15日

Cellular charts with alteryx scripting tools
Fencing, CAGR, PVM charts with Alteryx scripting tools

2021年10月13日

Fencing, CAGR, PVM charts with Alteryx scripting tools
Alteryx Customized Views: a productive reporting system for graphical and sectional results inside Alteryx

2021年10月8日

Alteryx Customized Views: a productive reporting system for graphical and sectional results inside Alteryx

When we work with Alteryx, we will expect to transform some unstructured data into tabular data. Therefore we have the…
Effective business simulation: revenue stochastic breakdown

2019年11月16日

Effective business simulation: revenue stochastic breakdown

Download the article from https://raw.githubusercontent.
Effective business simulation: store-by-store income statements

2019年11月15日

Effective business simulation: store-by-store income statements

Download paper at https://raw.githubusercontent.
Effective Business Simulation: rank and salary

2019年11月14日

Effective Business Simulation: rank and salary

Download pdf from https://raw.githubusercontent.
Effective business simulation: the core values for business simulations with alteryx MD5 hashing function

2019年11月7日

Effective business simulation: the core values for business simulations with alteryx MD5 hashing function

Download from https://raw.githubusercontent.
The Loremy simulation for business analytic workers

2018年9月10日

The Loremy simulation for business analytic workers

This simulation is to generate the financial/operational data for business analytic usage with more level of…
Bayes' Rule in excel

2018年8月25日

Bayes' Rule in excel

If you are familiar with excel, and you want to learn Bayesian inference, this is the tutorial for you. It shows how to…

See all articles

Several ways to use lm function

Henry T.H. Tu

Specialised in operational data analytics, cloud big data analytics, customer analytics, geospatial data analytics, and natural language processing with computing/statistical background in EY Singapore TD team

Learning (a, b) of the model Y = a*X + b from data

Learning (a, b, c) of the model Y = aX1 + bX2 + c from data

Learning (a, b, c) of the generalized linear model Y = exp(aX1+bX2 + c) from data #output-transformation

Learning (a, b, c, d) of the model Y = aX^3 + bX^2 + c*X + d from data #input-transformation

Conclusion

Henry T.H. Tu的更多文章

社区洞察

其他会员也浏览了

Bias-Variance tradeoff

Can Likert Scale Data ever be Continuous?

What kind of data does your company have?

Grind 75 - 21 - Diameter of Binary Tree

I ran 580 model-dataset experiments to show that, even if you try very hard, it is almost impossible to know that a model is degrading just by looking

Using GenAI for Analytics + using GenAI to understand something technical

How to speed up tabular data processing by 1053x in pandas/cudf

Avoiding Errors of Interpretation: the case of Selby & Ainsty

Why Big Data Needs To Stay Out Of Creative

Can an age old concept of milestone based sequential puzzle solving game of TreasureHunt be applied in a simulation based learn-by-doing environment?

Learning (a, b) of the model Y = a*X + b from data

Learning (a, b, c) of the model Y = a*X1 + b*X2 + c from data

Learning (a, b, c) of the generalized linear model Y = exp(a*X1+b*X2 + c) from data #output-transformation

Learning (a, b, c, d) of the model Y = a*X^3 + b*X^2 + c*X + d from data #input-transformation

Conclusion

Henry T.H. Tu的更多文章

A tutorial on Cartesian completion and its applications on lifespan, waterfall, and boxplot visualizations

Cellular charts with alteryx scripting tools

Fencing, CAGR, PVM charts with Alteryx scripting tools

Alteryx Customized Views: a productive reporting system for graphical and sectional results inside Alteryx

Effective business simulation: revenue stochastic breakdown

Effective business simulation: store-by-store income statements

Effective Business Simulation: rank and salary

Effective business simulation: the core values for business simulations with alteryx MD5 hashing function

The Loremy simulation for business analytic workers

Bayes' Rule in excel

社区洞察

其他会员也浏览了

Bias-Variance tradeoff

Can Likert Scale Data ever be Continuous?

What kind of data does your company have?

Grind 75 - 21 - Diameter of Binary Tree

I ran 580 model-dataset experiments to show that, even if you try very hard, it is almost impossible to know that a model is degrading just by looking

Using GenAI for Analytics + using GenAI to understand something technical

How to speed up tabular data processing by 1053x in pandas/cudf

Avoiding Errors of Interpretation: the case of Selby & Ainsty

Why Big Data Needs To Stay Out Of Creative

Can an age old concept of milestone based sequential puzzle solving game of TreasureHunt be applied in a simulation based learn-by-doing environment?

Learning (a, b, c) of the model Y = aX1 + bX2 + c from data

Learning (a, b, c) of the generalized linear model Y = exp(aX1+bX2 + c) from data #output-transformation

Learning (a, b, c, d) of the model Y = aX^3 + bX^2 + c*X + d from data #input-transformation