登录查看更多内容

Polynomial Regression

Harry Thapa

Build Predictive Models | Analyst | Smart Digital Solutions for Agencies, Start-Up & B2B | AI Strategies & Tech Innovations

发布日期: 2024年2月20日

+ 关注

Table of content

Introduction
Feature Engineering
Model Evaluation
Cross Validation
Residual Analysis
Conclusion

Introduction

Polynomial Regression represents a sophisticated approach to modelling the relationship between one or more independent variables and a dependent variable, wherein this relationship is expressed as an nth degree polynomial. This technique is particularly useful for capturing the nuances of curvilinear relationships that linear regression may not adequately represent.

The core methodology for fitting Polynomial Regression models is the method of least squares. This method, grounded in the principles of the Gauss-Markov Theorem, aims to minimise the sum of the squares of the differences between observed and predicted values. By doing so, it ensures the estimation of coefficients that bring about the best possible fit to the data under the constraints of the theorem.

What sets Polynomial Regression apart as a special case of Linear Regression is its ability to model data with a curvilinear relationship through the use of polynomial equations. This adaptability allows it to capture complex patterns and trends in the data, making it a powerful tool for predictive modelling where the relationship between variables is not strictly linear.

lets, take an example of quadratic polynomial model:

convex, when a > 0 and concave when a < 0.

Polynomial Model Assumption

Assumptions underlying Polynomial Regression elucidate the conditions necessary for its effective application. These assumptions provide a framework for understanding how the dependent variable's behaviour is influenced by one or more independent variables. Key assumptions include:

Additive Relationship: The model presumes that the dependent variable's variations can be accounted for through a linear or curvilinear additive relationship with a set of k independent variables (Xi, where i = 1 to k). This suggests that the impact of each independent variable on the dependent variable is distinct and can be summed to predict the dependent variable's behaviour.
Nature of Relationship: It is assumed that the connection between the dependent variable and each independent variable can either be linear or take the form of a curvilinear (specifically, polynomial) relationship. This allows the model to capture both simple and more complex patterns in the data.
Independence of Variables: A fundamental assumption is that the independent variables do not exhibit multicollinearity; they are independent of each other. This independence is crucial for the model's ability to isolate the effect of each independent variable on the dependent variable without interference from correlations among the independent variables themselves.
Distribution of Errors: The errors, or residuals — the differences between observed and predicted values — are assumed to be independent, normally distributed with a mean of zero, and exhibit constant variance. This assumption, central to Ordinary Least Squares (OLS) regression, ensures that the model's estimations are unbiased and efficient, conforming to the principles of the Gauss-Markov Theorem.

Linear Interpretation

Linear Interpretation is achieve by fitting two line between two data points.

The formula for known points x1 and x2 and the value of the dependent function is

Matrix Representation of Polynomial Regression Equations

In polynomial regression, we aim to approximate the relationship between the independent variable x and the dependent variable y using a polynomial function of degree n, where n is the degree of the polynomial.

The general form of a polynomial function is:

Matrix Notation :

X represents the matrix of independent variables (also known as the design matrix).
c represents the column vector of coefficients.
Y represents the column vector of dependent variables.

Least Square Solution:

In polynomial regression, we often have more equations (data points) than unknowns (coefficients). This leads to an overdetermined system.
The least squares method aims to find the coefficients that minimize the sum of the squares of the differences between the observed and predicted values.
The solution to the overdetermined system is given by the formula

Matrix Operations:

Why do we need Polynomial Regression?

When we develop a predictive model and observe its performance, we might find that it significantly underperforms. This discrepancy often becomes apparent when comparing the actual values to the predicted outcomes, especially if we notice that the actual data points form a curved pattern, while our prediction model—assuming it's linear—generates a straight line that fails to approximate the central tendency of these points accurately. This misalignment signals the need for a more sophisticated approach, and that's where polynomial regression comes into play.

Polynomial regression extends beyond the limitations of linear regression by accommodating the curved relationships between the independent (predictors) and dependent (outcome) variables. Unlike linear regression, which presupposes a straight-line relationship between variables, polynomial regression can model data with inherent curvature. This capability makes it a powerful alternative when linear models fall short, unable to capture the complex dynamics within the dataset.

The fundamental advantage of polynomial regression lies in its flexibility to fit a wide range of curvatures, thereby providing a more accurate representation of the data. This method does not require the relationship between variables to be linear, marking a critical distinction from linear regression. Consequently, polynomial regression is particularly valuable in scenarios where the data points deviate from a linear pattern, offering a clearer, more precise modeling of intricate relationships.

Feature Engineering

Loading libraries and our customer advertisement dataset.

import pandas as pd
import numpy as np
import seaborn as sns
import scipy.stats as stats
import matplotlib.pyplot as plt

from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.model_selection import train_test_split

df = pd.read_csv('advertising.csv')
df[:5]

we are using same dataset, which we have used earlier in Linear regression model.

#feature selection
X = df['TV'].values.reshape(-1, 1)
X[:5]

#target
y = df['Sales'].values
y[:5]

# splitting the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Now, we have selected feature as TV and Target as Sales from our columns. similarly, we going to split data into train and test. where, 80 percent training and 20 percent testing with random state value of 42. It is used as a seed to the random number generator. This seed controls the shuffling applied to the data before it is split.

Model Selection

# Linear Regression
lr = LinearRegression()
lr.fit(X_train, y_train)

# predictions for plotting linear regression
X_train_range = np.arange(X_train.min(), 300, 1)[:, np.newaxis] 
y_linear = lr.predict(X_train_range)

We are going to create a line from the minimum value of the training set to 300, where 300 represents the maximum value on the X-axis scale. This means predictions will be made within the range from the minimum value of the training set to the maximum value of the X-axis scale.

pr_quad = LinearRegression()
pr_cubic = LinearRegression()
pr_quartic = LinearRegression()
pr_quintic = LinearRegression()

quadratic = PolynomialFeatures(degree=2)
cubic = PolynomialFeatures(degree=3)
quartic = PolynomialFeatures(degree=4)
quintic = PolynomialFeatures(degree=5)

X_train_quad = quadratic.fit_transform(X_train)
X_train_cubic = cubic.fit_transform(X_train)
X_train_quartic = quartic.fit_transform(X_train)
X_train_quintic = quintic.fit_transform(X_train)

X_test_quad = quadratic.transform(X_test)
X_test_cubic = cubic.transform(X_test)
X_test_quartic = quartic.transform(X_test)
X_test_quintic = quintic.transform(X_test)

pr_quad.fit(X_train_quad, y_train)
pr_cubic.fit(X_train_cubic, y_train)
pr_quartic.fit(X_train_quartic, y_train)
pr_quintic.fit(X_train_quintic, y_train)

y_quad_train = pr_quad.predict(X_train_quad)
y_cubic_train = pr_cubic.predict(X_train_cubic)
y_quartic_train = pr_quartic.predict(X_train_quartic)
y_quintic_train = pr_quintic.predict(X_train_quintic)

plt.figure(figsize=(10, 5))
plt.scatter(X_train, y_train, label="Training data points", color='black')
plt.scatter(X_test, y_test, label="Testing data points", color='red')

X_train_range = np.arange(X_train.min(), X_train.max(), 1)[:, np.newaxis]
y_quad_range = pr_quad.predict(quadratic.fit_transform(X_train_range))
y_cubic_range = pr_cubic.predict(cubic.fit_transform(X_train_range))
y_quartic_range = pr_quartic.predict(quartic.fit_transform(X_train_range))
y_quintic_range = pr_quintic.predict(quintic.fit_transform(X_train_range))

plt.plot(X_train_range, y_quad_range, label="Quadratic fit", color='red', lw=2)
plt.plot(X_train_range, y_cubic_range, label="Cubic fit", color='blue', lw=2)
plt.plot(X_train_range, y_quartic_range, label="Quartic fit", color='green', lw=2)
plt.plot(X_train_range, y_quintic_range, label="Quintic fit", color='orange', lw=2)

plt.xlabel("TV")
plt.ylabel("Sales")
plt.legend()
plt.title("Polynomial Regression")
plt.grid(True, ls='--', alpha=0.5)
plt.show()

Best fit line using polynomial regression

Quartic and quintic models may visually appear to provide a better fit to the data, we must be careful with overfitting. Further statistical analysis, including cross-validation and performance metrics (like R-squared, RMSE, or others), is necessary to choose the model that provides the best trade-off between bias and variance, thus ensuring both accuracy and generalisable.

Model Evaluation

from sklearn.metrics import r2_score, mean_squared_error, mean_absolute_error

Now, we going to perform post processing which mean how accurate and strong is our predictions.

r2_quad = r2_score(y_train, y_quad_train)
mse_quad = mean_squared_error(y_train, y_quad_train)
rmse_quad = np.sqrt(mse_quad)
mae_quad = mean_absolute_error(y_train, y_quad_train)

r2_cubic = r2_score(y_train, y_cubic_train)
mse_cubic = mean_squared_error(y_train, y_cubic_train)
rmse_cubic = np.sqrt(mse_cubic)
mae_cubic = mean_absolute_error(y_train, y_cubic_train)

r2_quartic = r2_score(y_train, y_quartic_train)
mse_quartic = mean_squared_error(y_train, y_quartic_train)
rmse_quartic = np.sqrt(mse_quartic)
mae_quartic = mean_absolute_error(y_train, y_quartic_train)

r2_quintic = r2_score(y_train, y_quintic_train)
mse_quintic = mean_squared_error(y_train, y_quintic_train)
rmse_quintic = np.sqrt(mse_quintic)
mae_quintic = mean_absolute_error(y_train, y_quintic_train)

print("Polynomial Regression Metrics:")
print("Quadratic:")
print(f"R-squared: {r2_quad:.4f}")
print(f"MSE: {mse_quad:.4f}")
print(f"RMSE: {rmse_quad:.4f}")
print(f"MAE: {mae_quad:.4f}")

print("\nCubic:")
print(f"R-squared: {r2_cubic:.4f}")
print(f"MSE: {mse_cubic:.4f}")
print(f"RMSE: {rmse_cubic:.4f}")
print(f"MAE: {mae_cubic:.4f}")

print("\nQuartic:")
print(f"R-squared: {r2_quartic:.4f}")
print(f"MSE: {mse_quartic:.4f}")
print(f"RMSE: {rmse_quartic:.4f}")
print(f"MAE: {mae_quartic:.4f}")

print("\nQuintic:")
print(f"R-squared: {r2_quintic:.4f}")
print(f"MSE: {mse_quintic:.4f}")
print(f"RMSE: {rmse_quintic:.4f}")
print(f"MAE: {mae_quintic:.4f}")

R Squared, Mean Square Error, Root Mean Square Error and Mean Absolute Error

R-squared (Coefficient of Determination): Measures the proportion of the variance in the dependent variable that is predictable from the independent variable(s). Higher values indicate better fit. Quintic polynomial regression achieved the highest R-squared value of 0.8363, indicating it explains more variance in the data compared to lower-degree polynomials.
Mean Squared Error (MSE): Represents the average of the squares of the errors between predicted and actual values. Lower values indicate better predictive performance. Quintic polynomial regression resulted in the lowest MSE (4.3864), suggesting it has the best predictive accuracy among the evaluated models.
Root Mean Squared Error (RMSE): The square root of the MSE. It measures the average magnitude of the errors in the predicted values. Similar to MSE, lower values indicate better performance. Quintic polynomial regression also demonstrated the lowest RMSE (2.0944), reinforcing its superior predictive accuracy.
Mean Absolute Error (MAE): Represents the average absolute difference between predicted and actual values. Lower values indicate better accuracy. Quintic polynomial regression had the lowest MAE (1.6592), suggesting it has the smallest average absolute deviation from the actual values, thus the most accurate predictions.
Among the polynomial regression models evaluated, the quintic polynomial (degree 5) consistently outperforms the others in terms of R-squared, MSE, RMSE, and MAE. Therefore, for this dataset, the quintic polynomial regression model is the most suitable for prediction due to its superior accuracy and predictive power.

Cross Validation

The concept of k-fold cross-validation, where k is specifically set to 5, follows the same principles as general k-fold cross-validation, but with k fixed at 5.

Data Partitioning: The dataset is divided into five approximately equal-sized folds.
Iterative Training and Evaluation: The model is trained and evaluated five times. In each iteration, one of the five folds is used as the test set, and the remaining four folds are used as the training set.
Performance Metric Calculation: The performance metric (e.g., mean absolute error, root mean squared error) is computed for each iteration.
Aggregation of Results: The performance metrics from all five iterations are averaged to obtain a single estimation of model performance.

k=5 cross-validation offers a robust estimate of model performance by addressing variability in evaluation caused by random data partitioning. It ensures that each data point is used for both training and testing, reducing bias in the evaluation process.

Compared to leave-one-out cross-validation (LOOCV), which can be computationally expensive for large datasets, 5-fold cross-validation strikes a balance between computational cost and reliability of the estimation.

from tabulate import tabulate
from sklearn.model_selection import cross_val_score
from sklearn.metrics import make_scorer, r2_score

model_names = ["Quadratic", "Cubic", "Quartic", "Quintic"]
models = [pr_quad, pr_cubic, pr_quartic, pr_quintic]

results = []

for name, model in zip(model_names, models):
    cv_scores = cross_val_score(model, X_train, y_train, cv=5, scoring='r2')
    results.append([name] + list(cv_scores))

headers = ["Model", "CV 1", "CV 2", "CV 3", "CV 4", "CV 5"]
print(tabulate(results, headers=headers))

Identical Scores Across Models: It's very unusual to see the exact same cross-validation scores across different models and folds. This suggests that either the models are performing identically on the given dataset, which is highly improbable given the difference in complexity between the models, or there might be an error in the cross-validation process or data recording.
Lack of Overfitting: Generally, as the complexity of the model increases (going from quadratic to quintic), we would expect to see overfitting, especially on small or noisy datasets. Overfitting would typically manifest as higher performance on the training set and lower performance on the validation set. However, since the scores are identical across all models, overfitting cannot be inferred from this data.
Good Performance: The scores are quite high across all folds and models, which could indicate that all models are performing well. However, the uniformity of the scores suggests that you should revisit the cross-validation procedure to ensure that the data is not being leaked between the training and validation sets and that the models are being properly trained and validated.
Possible Data Leakage or Error: The identical nature of the scores suggests that there might be data leakage, where information from the validation set is inadvertently being used to train the model, or there might be an error in the computation or reporting of the CV scores.

Residual Analysis

residuals_linear = y_train - lr.predict(X_train)
residuals_quad = y_train - y_quad_train
residuals_cubic = y_train - y_cubic_train
residuals_quartic = y_train - pr_quartic.predict(X_train_quartic)
residuals_quintic = y_train - pr_quintic.predict(X_train_quintic)

fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(10, 10))
ax1.scatter(y_quad_train, residuals_quad, color='black', alpha=0.5)
ax1.set_xlabel('Fitted values')
ax1.set_ylabel('Residuals')
ax1.set_title('Residual Plot for Quadratic Regression')
ax1.axhline(y=0, color='red', linestyle='--')
ax1.grid(True, ls='--', alpha=0.5)

ax2.scatter(y_cubic_train, residuals_cubic, color='blue', alpha=0.5)
ax2.set_xlabel('Fitted values')
ax2.set_ylabel('Residuals')
ax2.set_title('Residual Plot for Cubic Regression')
ax2.axhline(y=0, color='red', linestyle='--')
ax2.grid(True, ls='--', alpha=0.5)

ax3.scatter(pr_quartic.predict(X_train_quartic), residuals_quartic, color='green', alpha=0.5)
ax3.set_xlabel('Fitted values')
ax3.set_ylabel('Residuals')
ax3.set_title('Residual Plot for Quartic Regression')
ax3.axhline(y=0, color='red', linestyle='--')
ax3.grid(True, ls='--', alpha=0.5)

ax4.scatter(pr_quintic.predict(X_train_quintic), residuals_quintic, color='orange', alpha=0.5)
ax4.set_xlabel('Fitted values')
ax4.set_ylabel('Residuals')
ax4.set_title('Residual Plot for Quintic Regression')
ax4.axhline(y=0, color='red', linestyle='--')
ax4.grid(True, ls='--', alpha=0.5)
plt.tight_layout()
plt.show()

Residual Plots

Quadratic Regression: The residuals are not randomly dispersed around the horizontal axis, suggesting that a quadratic model may not be adequate for the data. Some curvature in the residual plot indicates that a higher-order polynomial might be necessary.
Cubic Regression: Residuals appear more randomly dispersed than in the quadratic model, but there is still some pattern present, which could be a sign of remaining non-linearity in the data that the cubic term has not fully captured.
Quartic Regression: The dispersion of residuals shows more randomness compared to the quadratic and cubic models, suggesting a better fit. However, there are some signs of potential outliers or influential points that deviate significantly from the rest.
Quintic Regression: The residuals are dispersed, but there is a noticeable spread in the variance of residuals as the fitted values increase, which indicates heteroscedasticity. This might mean that even a fifth-degree polynomial does not fully capture the underlying relationship or that there are other issues at play, such as outliers.

fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(10, 8))
sns.histplot(residuals_quad, kde=True, color='black', alpha=0.5, ax=ax1)
ax1.set_title('Histogram Plot of Residuals for Quadratic Regression')
ax1.set_xlabel('Residuals')
ax1.set_ylabel('Frequency')
ax1.grid(True, ls='--', alpha=0.5)

sns.histplot(residuals_cubic, kde=True, color='blue', alpha=0.5, ax=ax2)
ax2.set_title('Histogram Plot of Residuals for Cubic Regression')
ax2.set_xlabel('Residuals')
ax2.set_ylabel('Frequency')
ax2.grid(True, ls='--', alpha=0.5)

sns.histplot(residuals_quartic, kde=True, color='green', alpha=0.5, ax=ax3)
ax3.set_title('Histogram Plot of Residuals for Quartic Regression')
ax3.set_xlabel('Residuals')
ax3.set_ylabel('Frequency')
ax3.grid(True, ls='--', alpha=0.5)

sns.histplot(residuals_quintic, kde=True, color='orange', alpha=0.5, ax=ax4)
ax4.set_title('Histogram Plot of Residuals for Quintic Regression')
ax4.set_xlabel('Residuals')
ax4.set_ylabel('Frequency')
ax4.grid(True, ls='--', alpha=0.5)

plt.tight_layout()
plt.show()

Quadratic to Quintic Regressions: All box plots show a relatively symmetric distribution of residuals around the median. However, there are outliers present in all models, as evidenced by the points beyond the whiskers. The box plot for the quintic regression seems to indicate a larger spread of residuals compared to others.

fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(8, 8))
stats.probplot(residuals_quad, dist="norm", plot=ax1)
ax1.set_title('Q-Q Plot of Residuals for Quadratic Regression')
ax1.set_xlabel('Theoretical Quantiles')
ax1.set_ylabel('Sample Quantiles')
ax1.grid(True, ls='--', alpha=0.5)

stats.probplot(residuals_cubic, dist="norm", plot=ax2)
ax2.set_title('Q-Q Plot of Residuals for Cubic Regression')
ax2.set_xlabel('Theoretical Quantiles')
ax2.set_ylabel('Sample Quantiles')
ax2.grid(True, ls='--', alpha=0.5)

stats.probplot(residuals_quartic, dist="norm", plot=ax3)
ax3.set_title('Q-Q Plot of Residuals for Quartic Regression')
ax3.set_xlabel('Theoretical Quantiles')
ax3.set_ylabel('Sample Quantiles')
ax3.grid(True, ls='--', alpha=0.5)

stats.probplot(residuals_quintic, dist="norm", plot=ax4)
ax4.set_title('Q-Q Plot of Residuals for Quintic Regression')
ax4.set_xlabel('Theoretical Quantiles')
ax4.set_ylabel('Sample Quantiles')
ax4.grid(True, ls='--', alpha=0.5)
plt.tight_layout()
plt.show()

All Regressions: The Q-Q plots should ideally show all points falling on the diagonal line, which would indicate that the residuals are normally distributed. In all polynomial models, most points follow the line, but there are deviations at the ends, suggesting the presence of outliers or heavy tails in the distribution of residuals.

fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(8, 10))
sns.boxplot(y=residuals_quad, color='black', ax=ax1)
ax1.set_title('Box Plot of Residuals for Quadratic Regression')
ax1.set_xlabel('Residuals')
ax1.grid(True)

sns.boxplot(y=residuals_cubic, color='blue', ax=ax2)
ax2.set_title('Box Plot of Residuals for Cubic Regression')
ax2.set_xlabel('Residuals')
ax2.grid(True)

sns.boxplot(y=residuals_quartic, color='green', ax=ax3)
ax3.set_title('Box Plot of Residuals for Quartic Regression')
ax3.set_xlabel('Residuals')
ax3.grid(True)

sns.boxplot(y=residuals_quintic, color='orange', ax=ax4)
ax4.set_title('Box Plot of Residuals for Quintic Regression')
ax4.set_xlabel('Residuals')
ax4.grid(True)
plt.tight_layout()
plt.show()

Conclusion and Recommendations

Heteroscedasticity: This is evident in the residual plots, particularly in the quintic regression. A potential fix could be to use weighted least squares or to transform the dependent variable to stabilise the variance of the residuals.
Outliers: Outliers and influential points are detected in all models. Investigating these points to determine if they are data entry errors or true outliers is important. If they are not errors, robust regression techniques might be more appropriate.
Model Complexity: As the polynomial degree increases, so does the complexity of the model. It is important to balance the goodness of fit with the model's simplicity to avoid overfitting.

References

polynomial models https://www.slideserve.com/ifama/polynomial-models

polynomial features https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PolynomialFeatures.html

Wajid Ali Khan

Researcher

4 个月

Nice share Harry Thapa. Could you please point us to the or post the jupyter notebook for the article, please.

Nitharsha V.

IT Risk Audit and Governance

7 个月

Thanks Hemant, this is an excellent way of explaining Polynomial regression using Python. Do you have similar for Tableau?

1 次回应

查看更多评论

要查看或添加评论，请登录

Harry Thapa的更多文章

Python & the Option Greeks

2024年6月20日

Python & the Option Greeks

What Are Primary & Secondary Option Greeks? Have you ever watched options prices bounce around and wondered, What's the…
Perceptron Based Linear Regression model

2024年2月26日

Perceptron Based Linear Regression model

What is perceptron ? Perceptions are one of the simplest types of feed forward neural networks and the foundational…

4 条评论
Building Neural Network from Scratch for Customer Data Classification and Analysis

2024年2月16日

Building Neural Network from Scratch for Customer Data Classification and Analysis

Table of Content Introduction Selecting Dataset Input Layer , Hidden Layer and Output Layer Activation Function Forward…

2 条评论
Basic Building Blocks of K-Means Clustering Algorithms

2024年2月12日

Basic Building Blocks of K-Means Clustering Algorithms

Table of Content: Introduction Theoretical Background K-means Algorithm Choosing the Number of Clusters Evaluation…

2 条评论
Decision Tree Algorithm

2024年1月1日

Decision Tree Algorithm

Table of Content Introduction What is “Entropy”? and What is its function? Play Golf & Outlook Entropy Average…
Ad Dataset - Linear Regression

2023年12月26日

Ad Dataset - Linear Regression

Table of Content 1. Introduction 2.
Gradient Descent Algorithm

2023年12月25日

Gradient Descent Algorithm

Table of Contents Introduction Function Requirements Gradient Descent Building the Gradient Descent Algorithm Gradient…

5 条评论
Cineworld, Blackstone & Ticker - Fundamental Analysis

2022年8月25日

Cineworld, Blackstone & Ticker - Fundamental Analysis

Cineworld Group plc & Equity Giant Blackstone Inc Cineworld Group Plc was found in 1995, it is well known in UK for…
Facial recognition with Computer Vision

2022年5月23日

Facial recognition with Computer Vision

OpenCV, a powerful open-source library, has become indispensable in modern systems due to its capabilities in object…

1 条评论
Contract for Difference (CFD) Trading: A Beginner’s Guide

2022年4月25日

Contract for Difference (CFD) Trading: A Beginner’s Guide

Have you ever considered getting into the world of trading and investment, but found yourself hesitating, likening it…

See all articles

Table of content

Introduction

The general form of a polynomial function is:

Matrix Notation :

Least Square Solution:

Matrix Operations:

Feature Engineering

Model Selection

Model Evaluation

Cross Validation

Residual Analysis

Residual Plots

Conclusion and Recommendations

References

Harry Thapa的更多文章

Python & the Option Greeks

Perceptron Based Linear Regression model

Building Neural Network from Scratch for Customer Data Classification and Analysis

Basic Building Blocks of K-Means Clustering Algorithms

Decision Tree Algorithm

Ad Dataset - Linear Regression

Gradient Descent Algorithm

Cineworld, Blackstone & Ticker - Fundamental Analysis

Facial recognition with Computer Vision

Contract for Difference (CFD) Trading: A Beginner’s Guide