Ordinary Least Squares
Marcin Majka
Project Manager | Business Trainer | Business Mentor | Doctor of Physics
Ordinary Least Squares (OLS) holds a particularly prominent position due to its foundational role in the estimation of linear relationships between variables. OLS, a method formulated on the principles of minimizing the sum of the squared differences between observed and predicted values, provides an efficient and unbiased approach to parameter estimation in linear models.
The significance of OLS is underscored by its historical development and widespread application in empirical research. Originally introduced by Carl Friedrich Gauss, and further refined by Adrien-Marie Legendre, OLS has evolved into a cornerstone of modern statistical theory and practice. Its mathematical simplicity and the interpretability of its results make OLS an indispensable tool for researchers and practitioners aiming to uncover and quantify relationships among variables.
The primary objective of this article is to offer a comprehensive understanding of Ordinary Least Squares regression, elucidating its mathematical underpinnings, assumptions, interpretation of results, and practical applications. By delving into the theoretical constructs that govern OLS, we aim to equip the reader with a robust framework for applying this method to real-world data. Moreover, we will address the common assumptions underlying OLS to ensure valid inferences, as well as highlight potential limitations and pitfalls that may arise in practice.
Through a detailed exploration of the linear regression model, we will dissect the OLS estimators, showcasing their derivation and properties. This will be followed by a discussion on the interpretation of OLS results, emphasizing key metrics such as coefficients, R-squared, p-values, and confidence intervals. By illustrating a practical example using real-world data, we will demonstrate the application of OLS in a step-by-step manner, leveraging statistical software to facilitate the analysis.
Additionally, this article will venture into the limitations inherent in OLS regression, such as sensitivity to outliers, the impact of multicollinearity, and issues arising from non-linearity and heteroscedasticity. Understanding these limitations is crucial for any practitioner aiming to apply OLS effectively and to avoid potential biases in their analyses.
Finally, this article will touch upon advanced topics and variations of OLS, including Weighted Least Squares (WLS), Generalized Least Squares (GLS), and robust regression methods. These extensions serve to address some of the limitations of traditional OLS and expand its applicability to a broader range of data structures and research questions.
What is Ordinary Least Squares?
Ordinary Least Squares is a statistical method used to estimate the parameters of a linear regression model. OLS seeks to find the best-fitting line through a set of data points by minimizing the sum of the squared differences between observed values and the values predicted by the linear model. Mathematically, for a given set of observations ((Xi, Yi), where (Xi) represents the independent variable and (Yi) the dependent variable, OLS aims to determine the coefficients (β0) and (β1) in the linear equation:
Here, (ε) denotes the error term, capturing the deviation of observed values from the predicted values. The OLS estimators for (β0) and (β1) are derived by solving the normal equations, which result from setting the partial derivatives of the sum of squared residuals with respect to the coefficients equal to zero. This method provides an unbiased and efficient means of parameter estimation, assuming the underlying assumptions of the linear regression model are satisfied.
The genesis of OLS can be traced back to the early 19th century, with pivotal contributions from mathematicians Adrien-Marie Legendre and Carl Friedrich Gauss. Legendre introduced the method of least squares in 1805 as a means of solving the problem of determining the orbits of celestial bodies. Gauss independently developed a similar approach, formalizing it within the context of the Gaussian distribution and providing a probabilistic interpretation. The culmination of these efforts was Gauss's publication of the "Theoria Motus" in 1809, which not only solidified the mathematical foundation of OLS but also extended its application to a broader range of problems in astronomy and geodesy. Over time, OLS has been rigorously studied and refined, becoming a fundamental component of statistical theory. The development of computational algorithms in the 20th century further propelled the widespread adoption of OLS, enabling its application to increasingly complex datasets and models.
The versatility and robustness of OLS have cemented its status as a cornerstone of empirical research across a multitude of disciplines. In economics, OLS is extensively employed to model relationships between economic variables, such as consumption and income, or inflation and unemployment. Econometricians utilize OLS to estimate demand and supply curves, test economic theories, and forecast future trends. In engineering, OLS aids in the calibration of models predicting system behavior under varying conditions, from structural analysis to signal processing. For instance, in civil engineering, OLS can be used to determine the relationship between load and deflection in beams, while in electrical engineering, it assists in noise reduction and signal estimation.
The social sciences also heavily rely on OLS to unravel complex human behaviors and societal trends. Sociologists might use OLS to explore the impact of education on income levels, while psychologists could employ it to assess the relationship between cognitive scores and various environmental factors. In public health, OLS models help identify determinants of health outcomes, such as the effect of lifestyle choices on cardiovascular risk. The method's ability to handle large datasets and provide interpretable results makes it particularly valuable in these fields, where understanding and quantifying relationships between variables are crucial for informed decision-making and policy development.
The Mathematics Behind OLS
To derive the OLS estimators, we begin by expressing the residual (ε) as the difference between the observed value (Yi) and the predicted value (hat{Y}i):
The sum of the squared residuals is given by:
To find the estimators (hat{β}0) and (hat{β}1), we need to minimize (S(β0, β1)) with respect to (β0) and (β1). This involves taking the partial derivatives of (S) with respect to (β0) and (β1), and setting them to zero:
Solving these normal equations simultaneously provides the OLS estimators:
where (bar{X}) and (bar{Y}) represent the means of the independent and dependent variables, respectively. These formulas show that (hat{β}1) is the covariance of (X) and (Y) divided by the variance of (X), and (hat{β}0) is the intercept obtained by adjusting (bar{Y}) by the product of (hat{β}1) and (bar{X}).
The OLS estimators (hat{β}0) and (hat{β}1) possess several desirable properties under the Gauss-Markov assumptions. These properties include unbiasedness, efficiency, and consistency. An estimator is said to be unbiased if its expected value equals the true parameter value, implying that on average, the estimator hits the target. Mathematically, this is expressed as:
Efficiency refers to the estimator having the smallest variance among all unbiased estimators, a property guaranteed by the Gauss-Markov theorem which states that OLS estimators are the Best Linear Unbiased Estimators (BLUE). Consistency means that as the sample size increases, the estimators converge in probability to the true parameter values, ensuring the reliability of the estimators in large samples.
Assumptions of OLS
Ordinary Least Squares regression relies on a set of underlying assumptions to ensure the validity and reliability of its estimators. These assumptions, often referred to as the Gauss-Markov assumptions, are critical for the OLS estimators to possess desirable statistical properties such as unbiasedness, efficiency, and consistency. When these assumptions are satisfied, the OLS estimators are the Best Linear Unbiased Estimators (BLUE). Understanding these assumptions is essential for proper application and interpretation of OLS results.
The first assumption is that the relationship between the dependent variable (Y) and the independent variable (X) is linear. This means that (Y) can be expressed as a linear function of (X) plus an error term. This assumption simplifies the model and allows the use of linear algebra techniques to derive the OLS estimators. If the true relationship is not linear, the OLS model may provide biased estimates, leading to incorrect inferences. Therefore, it is crucial to verify the linearity assumption by plotting the data and examining residual plots for any signs of non-linearity.
The second assumption is that the error terms (εi) are independent of each other. This implies that the value of one error term does not provide any information about the value of another error term. In practical terms, this means that there should be no correlation between the residuals of the regression model. Violation of this assumption, often referred to as autocorrelation, is particularly common in time series data where observations are sequentially dependent. Autocorrelation can lead to inefficient estimates and invalid statistical tests. To detect autocorrelation, one can use the Durbin-Watson test or examine autocorrelation plots.
The third assumption is homoscedasticity, which means that the variance of the error terms is constant across all levels of the independent variable (X). Homoscedasticity ensures that the precision of the OLS estimates is consistent across observations. When this assumption is violated, resulting in heteroscedasticity, the OLS estimates remain unbiased, but their standard errors are biased, leading to unreliable hypothesis tests and confidence intervals. Detecting heteroscedasticity can be done through visual inspection of residual plots or using statistical tests such as the Breusch-Pagan test.
The fourth assumption is that there is no perfect multicollinearity among the independent variables. Perfect multicollinearity occurs when one independent variable is a perfect linear combination of other independent variables. This makes it impossible to isolate the individual effect of each variable on the dependent variable, leading to infinite or undefined OLS estimates. While perfect multicollinearity is rare in practice, high multicollinearity can still pose problems by inflating the variances of the OLS estimates and making them highly sensitive to changes in the model. To diagnose multicollinearity, one can examine the variance inflation factor (VIF) for each independent variable.
The final assumption is that the error terms are normally distributed, which is particularly important for inference purposes. Normality of the errors is not required for the unbiasedness or efficiency of the OLS estimates, but it is crucial for conducting valid hypothesis tests and constructing confidence intervals. When the errors are normally distributed, the OLS estimators follow a normal distribution, allowing the use of t-tests and F-tests for statistical inference. To check for normality, one can use graphical methods such as Q-Q plots or formal tests like the Shapiro-Wilk test.
Interpreting OLS Results
Interpreting the results of an Ordinary Least Squares regression involves understanding the estimated coefficients, assessing the overall fit of the model, and evaluating the statistical significance of the predictors. This section delves into the key components of OLS output, including the coefficients, (R^2) and adjusted (R^2), p-values, hypothesis testing, confidence intervals, and residual analysis, each of which provides critical insights into the relationship between the dependent and independent variables.
The primary output of an OLS regression analysis is the estimated coefficients, (hat{β}0) and (hat{β}1). The intercept, (hat{β}0), represents the expected value of the dependent variable (Y) when the independent variable (X) is zero. This value provides a baseline level of (Y) in the absence of (X). The slope, (hat{β}1), indicates the expected change in (Y) for a one-unit increase in (X). For example, if (hat{β}1) = 2, it suggests that for every additional unit of (X), (Y) is expected to increase by 2 units, holding all else constant. Interpreting these coefficients requires understanding the context of the variables and the data range to ensure meaningful and realistic interpretations.
The (R^2) value, or coefficient of determination, measures the proportion of variance in the dependent variable that is explained by the independent variable(s). It is a key indicator of the goodness-of-fit of the regression model. An (R^2) value of 0.75, for instance, means that 75% of the variability in (Y) is accounted for by (X). However, (R^2) alone can be misleading, especially when dealing with multiple predictors, as it tends to increase with the addition of more variables, regardless of their relevance. Adjusted (R^2) addresses this issue by adjusting for the number of predictors in the model, providing a more aaccurateccurate measure of model fit. It is particularly useful in comparing models with different numbers of predictors, as it penalizes the inclusion of non-significant variables.
P-values in OLS regression assess the statistical significance of each coefficient. They test the null hypothesis that a particular coefficient is equal to zero, implying no effect of the corresponding independent variable on the dependent variable. A low p-value (typically less than 0.05) indicates strong evidence against the null hypothesis, suggesting that the variable has a significant effect on (Y). For example, if the p-value for (hat{β}1) is 0.03, we reject the null hypothesis and conclude that (X) significantly affects (Y). It is important to consider the context and potential for Type I errors, especially in models with multiple predictors where the risk of false positives increases.
Confidence intervals provide a range of values within which the true population parameter is expected to lie with a certain level of confidence, typically 95%. For the slope coefficient (hat{β}1), a 95% confidence interval might be ([1.5, 2.5]), indicating that we are 95% confident that the true effect of (X) on (Y) lies between 1.5 and 2.5. Confidence intervals are crucial for understanding the precision and reliability of the estimated coefficients. Narrow intervals suggest precise estimates, while wide intervals indicate greater uncertainty. They also provide insight into the practical significance of the predictors, complementing the p-value analysis.
Residual analysis is essential for validating the assumptions of the OLS model and assessing the adequacy of the fit. Residuals are the differences between observed values and predicted values. Analyzing residual plots can reveal patterns that suggest violations of the OLS assumptions, such as non-linearity, heteroscedasticity, and autocorrelation. A residual plot displaying a random scatter of points around zero supports the assumptions of linearity and homoscedasticity. Conversely, a funnel-shaped pattern indicates heteroscedasticity, while systematic patterns suggest non-linearity or autocorrelation. Additionally, Q-Q plots can be used to assess the normality of residuals, an important assumption for valid inference.
Practical Example of OLS
To elucidate the practical application of Ordinary Least Squares (OLS) regression, we will consider a step-by-step example using a real dataset. This example will demonstrate the process of fitting an OLS model, interpreting the results, and validating the assumptions. We will use a dataset that examines the relationship between the number of hours studied (independent variable, (X)) and the score achieved on an exam (dependent variable, (Y)) by a group of students.
The dataset comprises observations on 30 students, recording the number of hours they studied for an exam and the corresponding scores they achieved. The dataset is summarized in the table below, where (Xi) represents the hours studied and (Yi) denotes the exam scores. The primary objective is to determine the nature and strength of the relationship between study hours and exam performance, leveraging the OLS regression technique.
We utilize Python, to perform the OLS regression analysis. The necessary libraries for data manipulation and analysis include pandas, numpy, and statsmodels. The following Python code snippet demonstrates how to load the dataset, fit the OLS model, and output the results.
import pandas as pd
import numpy as np
import statsmodels.api as sm
# Load the dataset
data = pd.DataFrame({
'Hours_Studied': [2.5, 5.1, 3.2, 8.5, 3.5, 1.5, 9.2, 5.5, 8.3, 2.7, 7.7, 5.9, 4.5, 3.3, 1.1, 8.9, 2.5, 1.9, 6.1, 7.4, 2.7, 4.8, 3.8, 6.9, 7.8, 2.3, 5.5, 2.5, 8.6, 7.7],
'Exam_Score': [21, 47, 27, 75, 30, 20, 88, 60, 81, 25, 85, 62, 41, 42, 17, 95, 30, 24, 67, 69, 30, 54, 35, 76, 86, 30, 60, 26, 78, 75]
})
# Define the independent and dependent variables
X = data['Hours_Studied']
Y = data['Exam_Score']
# Add a constant to the independent variable
X = sm.add_constant(X)
# Fit the OLS model
model = sm.OLS(Y, X).fit()
# Print the summary of the regression results
print(model.summary())
The output provides a detailed summary of the OLS regression results, including the estimated coefficients, (R^2) value, p-values, and confidence intervals. The estimated regression equation can be written as:
领英推荐
Assuming the output reveals the following estimates:
The regression equation becomes:
This implies that for each additional hour studied, the exam score is expected to increase by 9.7 points, holding all other factors constant. The intercept of 2.5 suggests that a student who does not study at all would be predicted to score 2.5 points on the exam, though this value is often not meaningful outside the observed range of data.
The (R^2) value indicates the proportion of variance in the dependent variable that is explained by the independent variable. Suppose the (R^2) value is 0.89, this implies that 89% of the variability in exam scores can be explained by the number of hours studied. The p-value for (hat{β}1) is likely to be very small (e.g., <0.001), indicating that the relationship between hours studied and exam score is statistically significant.
The confidence interval for (hat{β}1) might be (8.5, 10.9), suggesting that we are 95% confident that the true effect of studying one additional hour on the exam score lies between 8.5 and 10.9 points. This interval provides insight into the precision of our estimate.
To ensure the validity of the OLS results, we must check the underlying assumptions. We can use diagnostic plots to evaluate these assumptions. For example, plotting the residuals against the fitted values helps assess homoscedasticity and the linearity of the relationship. A residual plot displaying a random scatter around zero supports these assumptions. Additionally, a Q-Q plot of the residuals can be used to check for normality. If the residuals follow a straight line in the Q-Q plot, it indicates that the errors are normally distributed.
import matplotlib.pyplot as plt
import scipy.stats as stats
# Residual plot
plt.scatter(model.fittedvalues, model.resid)
plt.axhline(y=0, color='r', linestyle='--')
plt.xlabel('Fitted values')
plt.ylabel('Residuals')
plt.title('Residual plot')
plt.show()
# Q-Q plot
stats.probplot(model.resid, dist="norm", plot=plt)
plt.title('Q-Q plot')
plt.show()
Limitations of OLS
One major limitation of OLS is its sensitivity to outliers. Outliers are data points that deviate significantly from the other observations in the dataset. Because OLS minimizes the sum of squared residuals, outliers can exert a disproportionate influence on the estimated coefficients. This can lead to misleading results, where the model fit is unduly affected by a few extreme values rather than reflecting the overall trend of the data. For example, in a dataset of student study hours and exam scores, a single student who studied an exceptionally high number of hours but achieved a very low score could skew the regression line. To mitigate this issue, it is essential to conduct thorough exploratory data analysis and consider robust regression methods or transformation techniques that reduce the impact of outliers.
Multicollinearity occurs when two or more independent variables in a regression model are highly correlated, making it difficult to isolate the individual effect of each variable on the dependent variable. This can lead to inflated standard errors for the coefficient estimates, making it harder to determine the statistical significance of the predictors. Multicollinearity can also result in unstable estimates that are highly sensitive to small changes in the data. For instance, in an economic model predicting consumer spending, if income and wealth are highly correlated, it becomes challenging to disentangle their separate effects. Detecting multicollinearity typically involves examining the correlation matrix or calculating the variance inflation factor (VIF) for each predictor. Addressing multicollinearity may require removing or combining highly correlated variables, or employing principal component analysis to create uncorrelated factors.
OLS assumes a linear relationship between the independent and dependent variables. However, real-world data often exhibit non-linear patterns that a simple linear model cannot capture. If the true relationship is non-linear, the OLS estimates will be biased, leading to inaccurate predictions and conclusions. This issue can be addressed by transforming the variables, such as taking logarithms or applying polynomial regression, to better capture the non-linear relationship. Additionally, OLS assumes that the variance of the error terms is constant across all levels of the independent variable (homoscedasticity). When this assumption is violated, resulting in heteroscedasticity, the standard errors of the coefficients are biased, leading to unreliable hypothesis tests and confidence intervals. Heteroscedasticity can be detected through residual plots or formal tests like the Breusch-Pagan test, and remedied by using weighted least squares (WLS) or transforming the dependent variable.
Omitted variable bias occurs when a relevant variable that influences the dependent variable is left out of the regression model. This omission can lead to biased and inconsistent estimates for the included variables, as the effect of the omitted variable is incorrectly attributed to the included variables. For example, in a regression model predicting wages based solely on education, omitting work experience could bias the estimated effect of education, as work experience is likely correlated with both education and wages. Identifying and including all relevant variables is crucial to mitigate omitted variable bias. However, this can be challenging due to data limitations or theoretical considerations. Researchers must carefully consider the model specification and use domain knowledge to ensure that key variables are not omitted.
Advanced Topics and Variations
While Ordinary Least Squares is a fundamental tool in regression analysis, advanced topics and variations of this method address the limitations and expand its applicability to a broader range of data structures and research questions. These advanced techniques include Weighted Least Squares (WLS), Generalized Least Squares (GLS), robust regression methods, and logistic regression for binary outcomes. Each of these methods provides tailored solutions to specific challenges encountered in real-world data analysis, enhancing the robustness and flexibility of regression modeling.
Weighted Least Squares is an extension of OLS designed to handle heteroscedasticity, a condition where the variance of the error terms is not constant across observations. In WLS, each observation is assigned a weight inversely proportional to the variance of its error term, effectively giving more importance to observations with smaller error variances. The WLS estimator minimizes the weighted sum of squared residuals, thereby providing more efficient estimates in the presence of heteroscedasticity. Mathematically, if the weights are denoted by (wi), the WLS criterion is:
This method improves the precision of the estimates by accounting for the varying reliability of different observations. WLS is particularly useful in econometrics and medical research, where data often exhibit heteroscedasticity due to inherent variability in economic indicators or measurement errors in clinical tr
Generalized Least Squares (GLS) further generalizes OLS by allowing for correlated and non-constant variances in the error terms. GLS is applicable when the assumptions of OLS are violated, particularly in time series data where autocorrelation is common. In GLS, the error term covariance matrix, (Σ), is not assumed to be the identity matrix, as in OLS, but rather a general positive-definite matrix. The GLS estimator minimizes the quadratic form:
where (X) is the matrix of independent variables, (Y) is the vector of dependent variables, and (Σ) is the covariance matrix of the error terms. By appropriately modeling the structure of (Σ), GLS provides efficient and unbiased estimates in the presence of autocorrelation and heteroscedasticity. This method is widely used in econometric modeling, such as in the analysis of panel data and time series data with complex error structures.
Robust regression methods address the sensitivity of OLS to outliers by employing techniques that reduce the influence of anomalous observations. These methods include M-estimators, R-estimators, and S-estimators, each providing different ways to downweight or eliminate the impact of outliers. M-estimators, for instance, minimize a function of the residuals that grows slower than the quadratic function used in OLS, thereby reducing the effect of large residuals. The Huber loss function is a common choice in M-estimators:
where (ri) are the residuals and (c) is a tuning constant. Robust regression is essential in fields such as finance and environmental science, where data often contain outliers due to market anomalies or measurement errors.
Logistic regression is a specialized form of regression analysis used when the dependent variable is binary, taking on only two possible outcomes (e.g., success/failure, yes/no). Unlike OLS, logistic regression models the probability that the dependent variable equals a particular value using the logistic function. The logistic regression model is specified as:
where (P) is the probability that the dependent variable equals 1 given the independent variable (X). The coefficients (β0) and (β}1) are estimated using maximum likelihood estimation rather than the least squares criterion. Logistic regression is widely used in medical research for disease prediction, in marketing for customer segmentation, and in social sciences for behavior modeling. It provides a probabilistic framework for binary outcomes, facilitating the interpretation of results in terms of odds ratios and probabilities.
Conclusion
Ordinary Least Squares regression remains a fundamental and indispensable tool in the arsenal of statistical methods, widely employed across various disciplines for its simplicity, interpretability, and effectiveness in modeling linear relationships. The theoretical underpinnings of OLS, based on minimizing the sum of squared residuals, provide unbiased and efficient estimators under the Gauss-Markov assumptions. These properties make OLS a reliable method for parameter estimation in linear models, ensuring that the resulting coefficients accurately reflect the underlying relationships within the data.
However, the robustness of OLS is contingent upon several critical assumptions, including linearity, independence of errors, homoscedasticity, absence of perfect multicollinearity, and normality of the error terms for inference purposes. Adherence to these assumptions guarantees the validity and reliability of the OLS estimates, enabling accurate statistical inference and prediction. Violations of these assumptions, on the other hand, can lead to biased, inefficient, and inconsistent estimates, thereby undermining the conclusions drawn from the analysis. It is therefore essential for researchers to rigorously evaluate and address these assumptions through diagnostic tests and appropriate remedial measures.
The practical application of OLS was demonstrated through a detailed example, illustrating the step-by-step process of fitting a regression model, interpreting the results, and validating the assumptions using real-world data. This example underscores the importance of understanding the context of the data, the relevance of the variables, and the proper use of statistical software to perform regression analysis. By carefully interpreting the output, including the coefficients, \(R^2\) values, p-values, and confidence intervals, researchers can derive meaningful insights and make informed decisions based on their findings.
Despite its widespread utility, OLS is not without limitations. Sensitivity to outliers, the impact of multicollinearity, issues with non-linearity and heteroscedasticity, and the potential for omitted variable bias are significant challenges that can affect the reliability of OLS estimates. Advanced topics and variations of OLS, such as Weighted Least Squares (WLS), Generalized Least Squares (GLS), robust regression methods, and logistic regression, provide sophisticated solutions to these challenges. These methods enhance the robustness and flexibility of regression analysis, allowing for the accurate modeling of complex data structures and the handling of various anomalies in the data.
Weighted Least Squares addresses heteroscedasticity by assigning appropriate weights to observations, thereby improving the efficiency of the estimates. Generalized Least Squares extends this approach to handle correlated and non-constant variances in the error terms, making it particularly useful for time series and panel data analysis. Robust regression methods mitigate the influence of outliers, ensuring that the estimates are not unduly affected by anomalous observations. Logistic regression, on the other hand, adapts the principles of regression analysis to binary outcomes, providing a probabilistic framework for modeling and interpreting binary dependent variables.
In conclusion, while OLS serves as a foundational technique in regression analysis, a comprehensive understanding of its assumptions, limitations, and advanced variations is crucial for its effective application. By recognizing the scenarios where OLS may falter and employing alternative methods when necessary, researchers can enhance the accuracy and reliability of their statistical analyses. The continuous development and refinement of regression techniques reflect the evolving nature of empirical research, driven by the need to address increasingly complex data and sophisticated research questions. Mastery of these techniques empowers researchers to derive robust conclusions and contribute valuable insights to their respective fields, underscoring the enduring significance of regression analysis in scientific inquiry.
Literature:
1. Draper, N. R., & Smith, H. (1998). Applied Regression Analysis (3rd ed.). Wiley.
2. Greene, W. H. (2018). Econometric Analysis (8th ed.). Pearson.
3. Hamilton, J. D. (1994). Time Series Analysis. Princeton University Press.
4. Huber, P. J., & Ronchetti, E. M. (2009). Robust Statistics (2nd ed.). Wiley.
5. Kutner, M. H., Nachtsheim, C. J., Neter, J., & Li, W. (2005). Applied Linear Statistical Models (5th ed.). McGraw-Hill/Irwin.
6. Long, J. S., & Freese, J. (2014). Regression Models for Categorical Dependent Variables Using Stata (3rd ed.). Stata Press.
7. Montgomery, D. C., Peck, E. A., & Vining, G. G. (2012). Introduction to Linear Regression Analysis (5th ed.). Wiley.
8. Wooldridge, J. M. (2015). Introductory Econometrics: A Modern Approach (6th ed.). Cengage Learning.
9. Fox, J. (2015). Applied Regression Analysis and Generalized Linear Models (3rd ed.). Sage Publications.
10. Gujarati, D. N., & Porter, D. C. (2009). Basic Econometrics (5th ed.). McGraw-Hill/Irwin.