登录查看更多内容

Building and Evaluating a Linear Regression Model for AAPL Closing Stock Prices Using Vertex AI Notebooks

Bhargava Naik Banoth

Data analytics | Data scientist | Generative Ai Developer | Freelancer | Trainer

发布日期: 2025年1月25日

Overview

In this lab, you'll build and evaluate a simple linear regression model to predict AAPL's closing stock prices using Scikit-Learn and BigQuery within Vertex AI Notebooks.

Objectives

By the end of this lab, you will be able to:

Load data from BigQuery into a Pandas DataFrame.
Build a linear regression model in Scikit-Learn.
Work within Vertex AI Notebooks.

Step-by-Step Instructions

1. Sign In to Google Cloud Console

After starting the lab, note your lab credentials (Username and Password).
Open the Google Cloud Console in an incognito window.
Sign in using your provided credentials. Make sure to use the lab credentials, not your personal ones, to avoid errors or unexpected charges.
Skip the recovery page, and accept the terms of service.

2. Enable Vertex AI APIs

In the Google Cloud Console, navigate to Vertex AI via the Navigation Menu.
Click on Enable All Recommended APIs to enable the required APIs for Vertex AI.

3. Launch Vertex AI Notebooks

To create and launch a Vertex AI Notebooks instance:

From the Navigation Menu, select Vertex AI > Workbench.

On the User-Managed Notebook page, if the Notebooks API isn’t enabled yet, click Enable Notebooks API.
Click Create New in the New instance menu.
Choose the latest version of TensorFlow Enterprise 2.6 (LTS) as your environment.

Name your notebook instance.
Set the Region and Zone based on your preferred location.
Leave other settings at their defaults, then click Create.

Once the instance is created, it may take a few minutes to be ready. When it appears in the Workbench page, click Open JupyterLab to open the JupyterLab interface in a new tab.

Building a Regression Model for a Financial Dataset

In this notebook, you will build a simple linear regression model to predict the closing AAPL stock price. The lab objectives are:

Pull data from BigQuery into a Pandas dataframe
Use Matplotlib to visualize data
Use Scikit-Learn to build a regression model

!sudo chown -R jupyter:jupyter /home/jupyter/training-data-analyst

!pip install --user google-cloud-bigquery==3.4.1

%%bash

bq mk -d ai4f
bq load --autodetect --source_format=CSV ai4f.AAPL10Y gs://cloud-training/ai4f/AAPL10Y.csv

%matplotlib inline

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from sklearn import linear_model
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score

plt.rc('figure', figsize=(12, 8.0))

Pull Data from BigQuery

In this section we'll use a magic function to query a BigQuery table and then store the output in a Pandas dataframe. A magic function is just an alias to perform a system command. To see documentation on the "bigquery" magic function execute the following cell:

The query below selects everything you'll need to build a regression model to predict the closing price of AAPL stock. The model will be very simple for the purposes of demonstrating BQML functionality. The only features you'll use as input into the model are the previous day's closing price and a three day trend value. The trend value can only take on two values, either -1 or +1. If the AAPL stock price has increased over any two of the previous three days then the trend will be +1. Otherwise, the trend value will be -1.

Note, the features you'll need can be generated from the raw table ai4f.AAPL10Y using Pandas functions. However, it's better to take advantage of the serverless-ness of BigQuery to do the data pre-processing rather than applying the necessary transformations locally.

%%bigquery df
WITH
  raw AS (
  SELECT
    date,
    close,
    LAG(close, 1) OVER(ORDER BY date) AS min_1_close,
    LAG(close, 2) OVER(ORDER BY date) AS min_2_close,
    LAG(close, 3) OVER(ORDER BY date) AS min_3_close,
    LAG(close, 4) OVER(ORDER BY date) AS min_4_close
  FROM
    `ai4f.AAPL10Y`
  ORDER BY
    date DESC ),
  raw_plus_trend AS (
  SELECT
    date,
    close,
    min_1_close,
    IF (min_1_close - min_2_close > 0, 1, -1) AS min_1_trend,
    IF (min_2_close - min_3_close > 0, 1, -1) AS min_2_trend,
    IF (min_3_close - min_4_close > 0, 1, -1) AS min_3_trend
  FROM
    raw ),
  train_data AS (
  SELECT
    date,
    close,
    min_1_close AS day_prev_close,
    IF (min_1_trend + min_2_trend + min_3_trend > 0, 1, -1) AS trend_3_day
  FROM
    raw_plus_trend
  ORDER BY
    date ASC )
SELECT
  *
FROM
  train_data

View the first five rows of the query's output. Note that the object df containing the query output is a Pandas Dataframe.

领英推荐

The March 2024 MinIO Newsletter

MinIO 11 个月前

Reasoning with Big Knowledge Graphs: Choices, Pitfalls…

Ontotext 2 年前

Marvelous MLOps #56: Streamlining ML Model Monitoring…

Marvelous MLOps 3 个月前

print(type(df))
df.dropna(inplace=True)
df.head()

Visualize data

The simplest plot you can make is to show the closing stock price as a time series. Pandas DataFrames have built in plotting funtionality based on Matplotlib.

df.plot(x='date', y='close');

You can also embed the trend_3_day variable into the time series above.

start_date = '2018-06-01'
end_date = '2018-07-31'

plt.plot(
    'date', 'close', 'k--',
    data = (
        df.loc[pd.to_datetime(df.date).between(start_date, end_date)]
    )
)

plt.scatter(
    'date', 'close', color='b', label='pos trend', 
    data = (
        df.loc[df.trend_3_day == 1 & pd.to_datetime(df.date).between(start_date, end_date)]
    )
)

plt.scatter(
    'date', 'close', color='r', label='neg trend',
    data = (
        df.loc[(df.trend_3_day == -1) & pd.to_datetime(df.date).between(start_date, end_date)]
    )
)

plt.legend()
plt.xticks(rotation = 90);

Build a Regression Model in Scikit-Learn

In this section you'll train a linear regression model to predict AAPL closing prices when given the previous day's closing price day_prev_close and the three day trend trend_3_day. A training set and test set are created by sequentially splitting the data after 2000 rows.

features = ['day_prev_close', 'trend_3_day']
target = 'close'

X_train, X_test = df.loc[:2000, features], df.loc[2000:, features]
y_train, y_test = df.loc[:2000, target], df.loc[2000:, target]

# Create linear regression object
regr = linear_model.LinearRegression(fit_intercept=False)

# Train the model using the training set
regr.fit(X_train, y_train)

# Make predictions using the testing set
y_pred = regr.predict(X_test)

# The mean squared error
print('Root Mean Squared Error: {0:.2f}'.format(np.sqrt(mean_squared_error(y_test, y_pred))))

# Explained variance score: 1 is perfect prediction
print('Variance Score: {0:.2f}'.format(r2_score(y_test, y_pred)))

Root Mean Squared Error: 2.99 Variance Score: 0.98

plt.scatter(y_test, y_pred)
plt.plot([140, 240], [140, 240], 'r--', label='perfect fit')
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.legend();

The model's predictions are more or less in line with the truth. However, the utility of the model depends on the business context (i.e. you won't be making any money with this model). It's fair to question whether the variable trend_3_day even adds to the performance of the model:

print('Root Mean Squared Error: {0:.2f}'.format(np.sqrt(mean_squared_error(y_test, X_test.day_prev_close))))

Root Mean Squared Error: 2.98

Indeed, the RMSE is actually lower if we simply use the previous day's closing value as a prediction! Does increasing the number of days included in the trend improve the model? Feel free to create new features and attempt to improve model performance!

Don't miss out! ?? (Subscribe on LinkedIn https://www.dhirubhai.net/build-relation/newsletter-follow?entityUrn=7175221823222022144)

Follow me on LinkedIn: www.dhirubhai.net/comm/mynetwork/discovery-see-all?usecase=PEOPLE_FOLLOWS&followMember=bhargava-naik-banoth-393546170

Follow me on Medium: https://medium.com/@bhargavanaik24/subscribe

Follow me on Twitter : https://x.com/bhargava_naik

The Future of Work with AI

264 位关注者

要查看或添加评论，请登录

Bhargava Naik Banoth的更多文章

The Cost-Benefit Analysis of Process Automation: How Long Does It Take to Save Time?

2025年2月24日

The Cost-Benefit Analysis of Process Automation: How Long Does It Take to Save Time?

In today's fast-paced business world, many companies are considering automation as a way to improve efficiency, reduce…
Advanced Financial Models: Expanding the Toolkit for Modern Finance

2025年2月23日

Advanced Financial Models: Expanding the Toolkit for Modern Finance

While the foundational models like Geometric Brownian Motion (GBM), Black-Scholes, and GARCH are indispensable, modern…
A Comprehensive Guide to Financial Modeling: Techniques, Applications, and Best Practices

2025年2月23日

A Comprehensive Guide to Financial Modeling: Techniques, Applications, and Best Practices

Financial modeling is a cornerstone of modern finance, enabling professionals to predict asset prices, manage risk, and…
Effortless Form Filling and Submission with Python: No Selenium Required

2025年2月17日

Effortless Form Filling and Submission with Python: No Selenium Required

1. Using requests and BeautifulSoup (for simple forms) If the form submission is straightforward (e.
Streamlining Web Form Submissions with Python: Excel-Driven Automation

2025年2月17日

Streamlining Web Form Submissions with Python: Excel-Driven Automation

To automate the process of filling out web forms and submitting them using Python, you can use libraries like for…
The Role of Outliers in Machine Learning: Should You Keep or Remove Them?

2025年1月28日

The Role of Outliers in Machine Learning: Should You Keep or Remove Them?

In machine learning, outliers are data points that differ significantly from most other data points. For instance, if…
The Evolution of Language Models: From Word2Vec to Transformers and Beyond

2025年1月27日

The Evolution of Language Models: From Word2Vec to Transformers and Beyond

Language modeling has come a long way over the years, from early attempts at representing text to the sophisticated…
Understanding Endogenous and Exogenous Factors in Trading

2025年1月22日

Understanding Endogenous and Exogenous Factors in Trading

In the world of financial markets, the forces that drive asset prices are shaped by a combination of internal market…
Trading Fundamentals: Quant Theory, Arbitrage, and Backtesting

2025年1月22日

Trading Fundamentals: Quant Theory, Arbitrage, and Backtesting

In the world of financial markets, trading strategies play a crucial role in the success of investors and institutions…
Comprehensive Approaches to Financial Fraud Detection: Methods and Techniques

2025年1月4日

Comprehensive Approaches to Financial Fraud Detection: Methods and Techniques

Financial fraud is an ever-evolving threat to individuals, businesses, and financial institutions. As technology…

See all articles

Building and Evaluating a Linear Regression Model for AAPL Closing Stock Prices Using Vertex AI Notebooks

Bhargava Naik Banoth

Data analytics | Data scientist | Generative Ai Developer | Freelancer | Trainer

Overview

Objectives

Step-by-Step Instructions

1. Sign In to Google Cloud Console

2. Enable Vertex AI APIs

3. Launch Vertex AI Notebooks

Building a Regression Model for a Financial Dataset

Pull Data from BigQuery

领英推荐

Visualize data

Build a Regression Model in Scikit-Learn

The Future of Work with AI

264 位关注者

Bhargava Naik Banoth的更多文章

社区洞察

其他会员也浏览了

Dashboards for different stages of the ML project + other resources

FiftyOne Computer Vision Aggregation Tips and Tricks — Nov 25, 2022

Gemini in BigQuery

Analytics and Data Science News for the Week of March 24; Updates from Alteryx, Power BI, Tellius & More

CRISP-DM, CD4ML or ModelOps: looking beyond just data

What Are Some Essential Tools And Technologies For Data Science?

12 Things I Learned During My First Year as a Machine Learning Engineer

24 Ultimate Data Science (ML) projects to work on in 2022.

Overview

Objectives

Step-by-Step Instructions

1. Sign In to Google Cloud Console

2. Enable Vertex AI APIs

3. Launch Vertex AI Notebooks

Building a Regression Model for a Financial Dataset

Pull Data from BigQuery

领英推荐

Visualize data

Build a Regression Model in Scikit-Learn

The Future of Work with AI

264 位关注者

Bhargava Naik Banoth的更多文章

The Cost-Benefit Analysis of Process Automation: How Long Does It Take to Save Time?

Advanced Financial Models: Expanding the Toolkit for Modern Finance

A Comprehensive Guide to Financial Modeling: Techniques, Applications, and Best Practices

Effortless Form Filling and Submission with Python: No Selenium Required

Streamlining Web Form Submissions with Python: Excel-Driven Automation

The Role of Outliers in Machine Learning: Should You Keep or Remove Them?

The Evolution of Language Models: From Word2Vec to Transformers and Beyond

Understanding Endogenous and Exogenous Factors in Trading

Trading Fundamentals: Quant Theory, Arbitrage, and Backtesting

Comprehensive Approaches to Financial Fraud Detection: Methods and Techniques

社区洞察

其他会员也浏览了

Dashboards for different stages of the ML project + other resources

FiftyOne Computer Vision Aggregation Tips and Tricks — Nov 25, 2022

Gemini in BigQuery

Analytics and Data Science News for the Week of March 24; Updates from Alteryx, Power BI, Tellius & More

CRISP-DM, CD4ML or ModelOps: looking beyond just data

What Are Some Essential Tools And Technologies For Data Science?

12 Things I Learned During My First Year as a Machine Learning Engineer

24 Ultimate Data Science (ML) projects to work on in 2022.