Recalling Linear Regression!
Amal Dhivahar S.
Project Associate @SGBC, IIT Madras | Medical Imaging | Image Processing | Computational Morphometrics | Biotechnology
Recently, I stumbled upon an intriguing course titled "Introduction to Machine Learning," from Great Learning Academy by Dr. Abhinanda Sarkar igniting my curiosity about this dynamic field. As someone inherently fascinated by technology, I couldn't resist the urge to dive in and understand the fundamentals.
In one particular module, "Introduction to Machine Learning and Linear Regression," I found myself transported back to my college days, revisiting the statistical concepts I once studied. Surprisingly, the memories of linear regression came flooding back, highlighting its relevance in machine learning.
I wanted to share this experience as a humble reminder of the continuous learning process we all embark on. While my understanding of machine learning is still evolving, revisiting linear regression served as a valuable milestone worth sharing.
??Machine Learning from a Novice Learner's POV:
Machine Learning (ML) fundamentally revolves around the capability to execute tasks derived from underlying models, a product of iterative learning processes fueled by vast real-world datasets. ML algorithms meticulously analyze this data to discern patterns encompassing trends, cycles, associations, and more. Subsequently, these patterns are translated into mathematical representations, such as probabilities or polynomial equations. I also got to learn the differences between supervised and unsupervised machine learning categories.
In the sphere of data science, the process of ML unfolds in several critical stages:
While steps 1 to 3 demand the lion's share, encompassing around 80-85% of the project timeline, step 4 typically consumes approximately 5%. The final phases of model training and evaluation, steps 5 and 6, respectively, account for the remaining 10%. Notably, time-saving resources like the Modern Ready Data (MRD) database expedite the process by providing pre-processed data, facilitating a streamlined entry directly into step 4.
?Regression Reminiscence:
“Regression” fundamentally means going back to the mean and predicting a real number. “Linear Regression” is when a method models data with a linear combination of the explanatory variables. This can be expressed as:
response (y) = intercept (β) + constant (α) * explanatory variable (x)
A simple linear regression is always additive in nature. E.g.: y = x1 + x2 + x3, which represents a linear relationship whereas, y = x1 + x2 * x3 is non-linear.
∴ when β = 0 and α = 1, y = x; when x = 0, y = β; when α is positive, x and y are directly proportional; when α is negative, x and y are inversely proportional; i.e., as α increases the correlation between x and y also increases.
Correlation is the same as Covariance with a small difference (“Same same but diffalent" ??). Covariance is influenced by units (x,y), unlike coefficients, which are easy to express and quantify. The coefficient of relation a.k.a. the Pearson’s coefficient ρ(x, y) can be expressed for a sample set as:
rxy = Cov(x,y) / (σx * σy)
where σ is the standard deviation.
∴ when r is near +1, the correlation between x and y is positive; when r is near -1, the correlation is negative; when r = 0, there is no linear correlation between x and y (other forms might exist). There are models that exhibit the relationship between their variables in a non-linear fashion.
??LPM & LDM Cases:
To understand this better, let's delve into the concepts of Linear Probability Model (LPM) and Linear Deterministic Model (LDM) regression cases. In a LDM no elements of uncertainty exist. For a given value of x, a value of y exists, and if x changes, y has to change correspondingly, and hence the model will only have to deal with regression (SSR) and ? errors. This can be expressed as:
where, ?is the y-intercept (x=0), and is the slope (change in y when x increases by 1 unit).
Whereas in the case of a LPM, for a given value of x, there could be two values of y (not a fixed value), leading to uncertainty. The model will have to deal with both SSR and residual errors (SSE) as well as the ? error. In simpler terms,
领英推荐
Response = constant + explanatory variable + ∑
where ∑ is the unexplainable error or the consolidated residual error expressed as:
Residual errors can be minimized by achieving the best-fit line, which is the regression line, where the error is minimal across all data points.
??Significance of R2:
So how do we determine the best fit line?
Answer: using R2= coefficient of determination of a regression model.
When a line is best fit with the least sum of squares values across all data:
R2 ≈ 1
When R2 is closer to 1, we’ve achieved the best possible fit (ideally). But there is always a ∑.
Total error (SST) = SSE + SSR
When SSE=0, SST= SSR is the best scenario possible where R2 = 1.
SSE is the controllable part, which is the difference between the actual data value and the observed value based on a regression model/equation. While SSR is actual value vs ? which is the mean of all values of the dependent variable. SSR will not be considered during calculations as it cannot be controlled.
??Cons of Linear Regression:
Though Linear Regression is simple to implement and easy to interpret the output coefficients, it has its cons in assuming a linear regression between the x and y variables and independence between its attributes. Outliers can also have a huge effect on the linear regression model.
Once the regression equation is fixed, it can be tested/interpreted but only within the ranges of x values observed in the original test data set.
I trust you found this helpful! Here's to embracing the learning journey and discovering unexpected connections along the way! #MachineLearning #LinearRegression #LearningJourney #Reflections
Medical Electronics | Embedded System|Embedded Design|Schematic and PCB design|Testing and debugging|Firmware|ML,DL algorithms
11 个月Really statistics and ML/DL algorithms are amazing topics. Love to see such post like this that make us to recall and refresh everything with the basics. Good work Amal Dhivahar S. And hope to see you posting many similar topics like this that would motivate readers to have deep dive experience on these topics.