Must-know Machine Learning Interview Questions – Linear Regression

It is a common practice to test data science aspirants on commonly used machine learning algorithms in interviews. These conventional algorithms being linear regression, logistic regression, clustering, decision trees etc. Data scientists are expected to possess an in-depth knowledge of these algorithms. We consulted hiring managers and data scientists from various organisations to know about the typical ML questions which they ask in an interview. Based on their extensive feedback a set of question and answers were prepared to help aspiring data scientists in their conversations. Q&As on these algorithms will be provided in a series of four blog posts.

Each blog post will cover the following topic:-

  1. Linear Regression
  2. Logistic Regression
  3. Clustering
  4. Decision Trees and Questions which pertain to all algorithms

Let’s get started with linear regression!

1. What is linear regression?

In simple terms, linear regression is a method of finding the best straight line fitting to the given data, i.e. finding the best linear relationship between the independent and dependent variables.

In technical terms, linear regression is a machine learning algorithm that finds the best linear-fit relationship on any given data, between independent and dependent variables. It is mostly done by the Sum of Squared Residuals Method.

2. State the assumptions in a linear regression model.

There are three main assumptions in a linear regression model:

  1. The assumption about the form of the model: 
  2. It is assumed that there is a linear relationship between the dependent and independent variables. It is known as the ‘linearity assumption’.
  3. Assumptions about the residuals:
  4. Normality assumption: It is assumed that the error terms, ε(i), are normally distributed.
  5. Zero mean assumption: It is assumed that the residuals have a mean value of zero.
  6. Constant variance assumption: It is assumed that the residual terms have the same (but unknown) variance, σ2 This assumption is also known as the assumption of homogeneity or homoscedasticity.
  7. Independent error assumption: It is assumed that the residual terms are independent of each other, i.e. their pair-wise covariance is zero.
  8. Assumptions about the estimators:
  9. The independent variables are measured without error.
  10. The independent variables are linearly independent of each other, i.e. there is no multicollinearity in the data.

Explanation:

  1. This is self-explanatory.
  2. If the residuals are not normally distributed, their randomness is lost, which implies that the model is not able to explain the relation in the data. 

Also, the mean of the residuals should be zero.

Y(i)i= β0+ β1x(i) + ε(i)

This is the assumed linear model, where ε is the residual term.

E(Y) = E(β0+ β1x(i) + ε(i))

      = E(β0+ β1x(i) + ε(i))

If the expectation(mean) of residuals, E(ε(i)), is zero, the expectations of the target variable and the model become the same, which is one of the targets of the model.The residuals (also known as error terms) should be independent. This means that there is no correlation between the residuals and the predicted values, or among the residuals themselves. If some correlation is present, it implies that there is some relation that the regression model is not able to identify.

3. If the independent variables are not linearly independent of each other, the uniqueness of the least squares solution (or normal equation solution) is lost.

Checkout 23 more questions and answers from Linear Regression here.




Debajit Tapadar

Senior Manager @ HDFC Bank | Strategic Project Manager | Expert in Debt Management & Data-Driven Digital Engagement | Driving Digital Banking and Tech-Collect Innovation

6 年

Great job Tulasi

Divya Nair

Analytics Technical Project Management at Applied Materials

6 年

Looking forward for other phase as well. Can this be included in our Monthly magazine -DataStreak as well under Interview Preparation?

回复

要查看或添加评论,请登录

Thulasiram Gunipati的更多文章

社区洞察

其他会员也浏览了