Building and Evaluating a Linear Regression Model for AAPL Closing Stock Prices Using Vertex AI Notebooks
Bhargava Naik Banoth
Data analytics | Data scientist | Generative Ai Developer | Freelancer | Trainer
Overview
In this lab, you'll build and evaluate a simple linear regression model to predict AAPL's closing stock prices using Scikit-Learn and BigQuery within Vertex AI Notebooks.
Objectives
By the end of this lab, you will be able to:
Step-by-Step Instructions
1. Sign In to Google Cloud Console
2. Enable Vertex AI APIs
3. Launch Vertex AI Notebooks
To create and launch a Vertex AI Notebooks instance:
Once the instance is created, it may take a few minutes to be ready. When it appears in the Workbench page, click Open JupyterLab to open the JupyterLab interface in a new tab.
Building a Regression Model for a Financial Dataset
In this notebook, you will build a simple linear regression model to predict the closing AAPL stock price. The lab objectives are:
!sudo chown -R jupyter:jupyter /home/jupyter/training-data-analyst
!pip install --user google-cloud-bigquery==3.4.1
%%bash
bq mk -d ai4f
bq load --autodetect --source_format=CSV ai4f.AAPL10Y gs://cloud-training/ai4f/AAPL10Y.csv
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from sklearn import linear_model
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score
plt.rc('figure', figsize=(12, 8.0))
Pull Data from BigQuery
In this section we'll use a magic function to query a BigQuery table and then store the output in a Pandas dataframe. A magic function is just an alias to perform a system command. To see documentation on the "bigquery" magic function execute the following cell:
The query below selects everything you'll need to build a regression model to predict the closing price of AAPL stock. The model will be very simple for the purposes of demonstrating BQML functionality. The only features you'll use as input into the model are the previous day's closing price and a three day trend value. The trend value can only take on two values, either -1 or +1. If the AAPL stock price has increased over any two of the previous three days then the trend will be +1. Otherwise, the trend value will be -1.
Note, the features you'll need can be generated from the raw table ai4f.AAPL10Y using Pandas functions. However, it's better to take advantage of the serverless-ness of BigQuery to do the data pre-processing rather than applying the necessary transformations locally.
%%bigquery df
WITH
raw AS (
SELECT
date,
close,
LAG(close, 1) OVER(ORDER BY date) AS min_1_close,
LAG(close, 2) OVER(ORDER BY date) AS min_2_close,
LAG(close, 3) OVER(ORDER BY date) AS min_3_close,
LAG(close, 4) OVER(ORDER BY date) AS min_4_close
FROM
`ai4f.AAPL10Y`
ORDER BY
date DESC ),
raw_plus_trend AS (
SELECT
date,
close,
min_1_close,
IF (min_1_close - min_2_close > 0, 1, -1) AS min_1_trend,
IF (min_2_close - min_3_close > 0, 1, -1) AS min_2_trend,
IF (min_3_close - min_4_close > 0, 1, -1) AS min_3_trend
FROM
raw ),
train_data AS (
SELECT
date,
close,
min_1_close AS day_prev_close,
IF (min_1_trend + min_2_trend + min_3_trend > 0, 1, -1) AS trend_3_day
FROM
raw_plus_trend
ORDER BY
date ASC )
SELECT
*
FROM
train_data
View the first five rows of the query's output. Note that the object df containing the query output is a Pandas Dataframe.
领英推荐
print(type(df))
df.dropna(inplace=True)
df.head()
Visualize data
The simplest plot you can make is to show the closing stock price as a time series. Pandas DataFrames have built in plotting funtionality based on Matplotlib.
df.plot(x='date', y='close');
You can also embed the trend_3_day variable into the time series above.
start_date = '2018-06-01'
end_date = '2018-07-31'
plt.plot(
'date', 'close', 'k--',
data = (
df.loc[pd.to_datetime(df.date).between(start_date, end_date)]
)
)
plt.scatter(
'date', 'close', color='b', label='pos trend',
data = (
df.loc[df.trend_3_day == 1 & pd.to_datetime(df.date).between(start_date, end_date)]
)
)
plt.scatter(
'date', 'close', color='r', label='neg trend',
data = (
df.loc[(df.trend_3_day == -1) & pd.to_datetime(df.date).between(start_date, end_date)]
)
)
plt.legend()
plt.xticks(rotation = 90);
Build a Regression Model in Scikit-Learn
In this section you'll train a linear regression model to predict AAPL closing prices when given the previous day's closing price day_prev_close and the three day trend trend_3_day. A training set and test set are created by sequentially splitting the data after 2000 rows.
features = ['day_prev_close', 'trend_3_day']
target = 'close'
X_train, X_test = df.loc[:2000, features], df.loc[2000:, features]
y_train, y_test = df.loc[:2000, target], df.loc[2000:, target]
# Create linear regression object
regr = linear_model.LinearRegression(fit_intercept=False)
# Train the model using the training set
regr.fit(X_train, y_train)
# Make predictions using the testing set
y_pred = regr.predict(X_test)
# The mean squared error
print('Root Mean Squared Error: {0:.2f}'.format(np.sqrt(mean_squared_error(y_test, y_pred))))
# Explained variance score: 1 is perfect prediction
print('Variance Score: {0:.2f}'.format(r2_score(y_test, y_pred)))
Root Mean Squared Error: 2.99 Variance Score: 0.98
plt.scatter(y_test, y_pred)
plt.plot([140, 240], [140, 240], 'r--', label='perfect fit')
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.legend();
The model's predictions are more or less in line with the truth. However, the utility of the model depends on the business context (i.e. you won't be making any money with this model). It's fair to question whether the variable trend_3_day even adds to the performance of the model:
print('Root Mean Squared Error: {0:.2f}'.format(np.sqrt(mean_squared_error(y_test, X_test.day_prev_close))))
Root Mean Squared Error: 2.98
Indeed, the RMSE is actually lower if we simply use the previous day's closing value as a prediction! Does increasing the number of days included in the trend improve the model? Feel free to create new features and attempt to improve model performance!
Don't miss out! ?? (Subscribe on LinkedIn https://www.dhirubhai.net/build-relation/newsletter-follow?entityUrn=7175221823222022144)
Follow me on LinkedIn: www.dhirubhai.net/comm/mynetwork/discovery-see-all?usecase=PEOPLE_FOLLOWS&followMember=bhargava-naik-banoth-393546170
Follow me on Medium: https://medium.com/@bhargavanaik24/subscribe
Follow me on Twitter : https://x.com/bhargava_naik