Python for Financial Analysis: Technical Indicator Insights for Index Investing
Takatsuyo Ishida, MBA
Accounting Senior| EOS Accountants LLP | Hult IBS MBA(STEM) Class of 2024
Introduction
This article uses Python to analyze the CNNpred: CNN-based stock market prediction using a diverse set of variables dataset. The stock market is a complex system influenced by various factors. Understanding stock price movements can help investors make informed decisions. This report analyzes the S&P 500 using a dataset from 2010 to 2017. The primary goal is to use technical analysis to understand market trends and identify potential trading signals. The data contains 1985 instances and 84 features. The analysis focuses specifically on the S&P 500, utilizing technical indicators such as moving averages, volume, MACD, RSI, and Bollinger Bands to gain insights into market behavior and identify potential trading signals.
Data Import
To begin the analysis, the dataset is imported using Python's pandas library, a powerful tool for data manipulation and analysis. The CSV file containing the dataset is read, and the first few rows are displayed to understand its structure. This step verifies that the data has been correctly loaded and provides an initial view of its contents.
The following code snippet reads the CSV file using the pd.read_csv function and stores it in a DataFrame named df. The df.head() function displays the first few rows of the dataset. This initial inspection shows the various features and their respective values, which will be useful for the subsequent analysis.
Next, the dataset will be explored further to understand its characteristics, handle any missing values, and perform descriptive statistical analysis to summarize the data.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
# Read the CSV file
df = pd.read_csv('/content/Processed_S&P.csv')
# Set the index to Date column
df['Date'] = pd.to_datetime(df['Date'])
df.set_index('Date', inplace=True)
# Extract the necessary columns
date = df.index
price = df['Close']
# Display the first few rows of the dataframe
df.head()
In this code, pandas is used to read the CSV file. The 'Date' column is converted to a date-time format and set as the index of the DataFrame. The necessary columns, including the index (date) and 'Close' price, are extracted. Finally, the df.head() function displays the first few rows of the DataFrame.
Handling Missing Values
Handling missing values is crucial for ensuring the accuracy of the analysis. Missing data can lead to biased results and reduce the overall effectiveness of the models. The steps below outline the approach to check for missing values and handle them accordingly.
First, columns with missing values were identified and the number of missing entries in each was counted.
# Check for missing values
missing_values = df.isnull().sum()
print("Missing values before filling:")
print(missing_values[missing_values > 0])
The code above calculates the number of missing values in each column and prints the columns that contain missing values. This step helps in understanding the extent of missing data in the dataset.
Next, all numeric columns were checked for negative values, and columns with negative values were identified.
# Check for negative values in all numeric columns
numeric_columns = df.select_dtypes(include=['float64', 'int64']).columns
# Identify columns with negative values
negative_values = (df[numeric_columns] < 0).sum()
print("Number of negative values in each column:")
print(negative_values[negative_values > 0])
Then, negative values in the Volume column were converted to positive values to ensure data integrity, as negative trading volumes are not possible.
# Convert negative values in the Volume column to positive values
df['Volume'] = df['Volume'].abs()
Missing values in numeric columns were then filled with the mean of each column.
# Fill missing values in numeric columns with the mean of the column
df[numeric_columns] = df[numeric_columns].fillna(df[numeric_columns].mean())
To ensure all rows now have no missing values, the total number of rows was compared to the number of rows without any missing values.
# Total number of rows in the dataframe
total_rows = df.shape[0]
# Number of rows with no missing values
rows_no_missing = df.dropna(how='any').shape[0]
# Check if all rows have no missing values
if total_rows == rows_no_missing:
print("All rows have no missing values.")
else:
print(f"Not all rows have no missing values. Total rows: {total_rows}, Rows with no missing values: {rows_no_missing}")
Finally, it was verified that there are no negative values in the Volume column.
# Verify that there are no negative values in the Volume column
negative_values_after = (df['Volume'] < 0).sum()
# Display results in an intuitive format
if negative_values_after == 0:
print("No negative values in the Volume column.")
else:
print("Negative values found in the Volume column.")
This comprehensive approach ensures that the dataset is free of missing values and inappropriate negative values in the Volume column, which is essential for maintaining the integrity of the analysis and improving the quality of the subsequent models.
Data Visualization
Visualizing the data helps in understanding trends and patterns that might not be immediately evident from raw data. A basic line graph is plotted to visualize the closing prices of the S&P 500 over time.
After setting up the data by reading the CSV file, setting the 'Date' column as the index, and extracting the necessary columns, the closing prices are plotted using the matplotlib library.
The resulting graph provides a clear visual representation of the S&P 500's performance over the specified period. The line graph below shows the closing prices of the S&P 500 from 2010 to 2017.
# Plot the basic line graph
plt.figure(figsize=(14, 7))
plt.plot(date, price, label='S&P500')
plt.title('S&P500', loc='center')
plt.xlabel('Date')
plt.ylabel('Price')
plt.legend()
plt.show()
This visualization helps in observing overall market trends, such as periods of growth or decline, which are essential for identifying potential trading signals.
Moving Averages
Moving averages are widely used in technical analysis to smooth out price data and identify trends. In this section, three different moving averages are calculated and plotted: 5-day, 25-day, and 50-day moving averages.
First, the moving averages were calculated using the rolling method in pandas. The rolling method allows us to compute the mean over a specified window of time.
# Calculate moving averages
df['sma01'] = df['Close'].rolling(window=5).mean()
df['sma02'] = df['Close'].rolling(window=25).mean()
df['sma03'] = df['Close'].rolling(window=50).mean()
# Display the first few rows to verify the calculations
df[['Close', 'sma01', 'sma02', 'sma03']].head()
The code above calculates the 5-day (sma01), 25-day (sma02), and 50-day (sma03) simple moving averages (SMAs) for the closing prices. The first few rows of the DataFrame are then displayed to verify that the calculations are correct.
To ensure that the moving averages are computed correctly over the entire dataset, the limit on display rows is removed, and the first 100 rows are shown.
# Remove the limit on display rows
pd.set_option('display.max_rows', None)
# Display the first 100 rows
df.head(100)
Next, the closing prices are plotted along with the moving averages to visualize the data.
# Plot the graph with moving averages
plt.figure(figsize=(14, 7))
plt.plot(date, price, label='S&P500')
plt.plot(date, df['sma01'], label='5-Day SMA')
plt.plot(date, df['sma02'], label='25-Day SMA')
plt.plot(date, df['sma03'], label='50-Day SMA')
plt.title('S&P500 with Moving Averages', loc='center')
plt.xlabel('Date')
plt.ylabel('Price')
plt.legend()
plt.show()
The graph above displays the S&P 500 closing prices along with the 5-day, 25-day, and 50-day SMAs. This visualization helps in identifying trends and potential trading signals. For instance, when a shorter-term moving average crosses above a longer-term moving average, it might indicate a buying opportunity, while a cross below could signal a selling opportunity.
Golden and Dead Crosses
Golden crosses and dead crosses are significant indicators in technical analysis. A golden cross occurs when a short-term moving average crosses above a long-term moving average, indicating a potential upward trend. Conversely, a dead cross occurs when a short-term moving average crosses below a long-term moving average, suggesting a potential downward trend.
First, the golden and dead crosses were identified by comparing the 5-day moving average (sma01) with the 50-day moving average (sma03).?
# Identify golden and dead crosses
golden_crosses = (df['sma01'] > df['sma03']) & (df['sma01'].shift(1) <= df['sma03'].shift(1))
dead_crosses = (df['sma01'] < df['sma03']) & (df['sma01'].shift(1) >= df['sma03'].shift(1))
The code above creates boolean masks for golden crosses and dead crosses. The shift method was used to compare the current and previous day's moving averages, identifying where the crosses occur.
Next, the closing prices were plotted along with the moving averages, and the golden and dead crosses were marked on the graph.
# Plot the graph with golden and dead crosses
plt.figure(figsize=(14, 7))
plt.plot(date, price, label='S&P500')
plt.plot(date, df['sma01'], label='5-Day SMA')
plt.plot(date, df['sma02'], label='25-Day SMA')
plt.plot(date, df['sma03'], label='50-Day SMA')
plt.scatter(date[golden_crosses], price[golden_crosses], marker='^', color='g', label='Golden Cross', alpha=1, s=100)
plt.scatter(date[dead_crosses], price[dead_crosses], marker='v', color='r', label='Dead Cross', alpha=1, s=100)
plt.title('S&P500 with Moving Averages and Crosses', loc='center')
plt.xlabel('Date')
plt.ylabel('Price')
plt.legend()
plt.show()
The graph displays the S&P 500 closing prices along with the 5-day, 25-day, and 50-day SMAs. Golden crosses are marked with green upward-pointing arrows (^), while dead crosses are marked with red downward-pointing arrows (v). These markers help visually identify the potential buy and sell signals.
Additionally, the dates where these crosses occur were extracted and displayed:
# Display dates where golden crosses (buy signals) and dead crosses (sell signals) occur
golden_dates = df.index[golden_crosses]
dead_dates = df.index[dead_crosses]
print("Golden Cross Dates (Buy signals):")
print(golden_dates)
print("Dead Cross Dates (Sell signals):")
print(dead_dates)
# Create a DataFrame to hold the dates
signals_df = pd.DataFrame({
'Golden Cross Dates (Buy signals)': golden_dates,
'Dead Cross Dates (Sell signals)': dead_dates
})
# Display the DataFrame
print("\nSignal Dates DataFrame:")
print(signals_df)
This code extracts the dates of golden and dead crosses and displays them. A DataFrame, signals_df, is created to hold these dates, making it easier to review and analyze the signals.
Signal Dates DataFrame:
Golden Cross Dates (Buy signals) Dead Cross Dates (Sell signals)
0 2010-07-23 2010-05-07
1 2010-09-07 2010-08-17
2 2011-03-29 2011-03-15
3 2011-04-20 2011-04-18
4 2011-07-05 2011-05-25
5 2011-07-19 2011-07-18
6 2011-10-12 2011-08-01
7 2011-12-02 2011-11-23
8 2011-12-22 2011-12-16
9 2012-04-18 2012-04-13
10 2012-04-20 2012-04-19
11 2012-04-27 2012-04-23
12 2012-07-02 2012-05-08
13 2012-12-11 2012-10-24
14 2013-07-09 2013-06-24
15 2013-09-11 2013-08-21
16 2013-10-14 2013-10-08
17 2014-02-13 2014-01-28
18 2014-04-21 2014-04-11
19 2014-08-19 2014-08-04
20 2014-10-29 2014-10-01
21 2014-12-18 2014-12-17
22 2015-02-06 2015-01-07
23 2015-03-17 2015-03-13
24 2015-04-09 2015-03-31
25 2015-06-22 2015-06-08
26 2015-07-16 2015-06-29
27 2015-08-03 2015-07-27
28 2015-10-09 2015-08-06
29 2016-03-01 2015-12-11
30 2016-05-26 2016-05-18
31 2016-06-22 2016-06-17
32 2016-07-05 2016-06-27
33 2016-11-11 2016-09-13
34 2017-04-25 2017-04-13
35 2017-08-31 2017-08-21
For example, in 2013, notable golden cross (buy signals) and dead cross (sell signals) dates were identified:
Golden Cross Dates (Buy signals) in 2013:
领英推荐
Dead Cross Dates (Sell signals) in 2013:
These dates indicate potential opportunities for buying and selling based on the intersection of the short-term and long-term moving averages.
Trading Volume
Trading volume is an important indicator in technical analysis, as it often precedes price movements. High trading volumes can indicate strong investor interest and potential price changes.
To visualize the trading volume, a bar graph is created using the matplotlib library. This graph helps in identifying periods of high and low trading activity.
# Plot the volume bar graph
plt.figure(figsize=(14, 7))
plt.bar(df.index, df['Volume'], label='Volume', color='grey', alpha=1.0, width=1.0)
plt.title('S&P500 Volume')
plt.xlabel('Date')
plt.ylabel('Volume')
# Adjust the margins
plt.tight_layout(pad=2.0)
plt.legend()
plt.show()
The bar graph above displays the trading volume for the S&P 500 over the specified period. By analyzing the volume trends, insights into market behavior can be gained. For example, spikes in trading volume often precede significant price movements, indicating potential buy or sell signals. This information is crucial for making informed trading decisions.
Comparing moving averages with the volume bar graph provides insights into the relationship between price movements and trading activity. High trading volumes often precede significant price changes. When high volume coincides with moving average crossovers, it strengthens the buy (golden cross) or sell (dead cross) signals. Increased volume during a short-term moving average crossing above a long-term moving average suggests a strong upward trend while crossing below indicates a strong downward trend. This combined analysis helps in understanding market trends and investor behavior more accurately.
Technical Analysis Preparation
In trading, decisions are rarely made based solely on moving averages. Therefore, this report also covers the Moving Average Convergence Divergence (MACD), Relative Strength Index (RSI), and Bollinger Bands indicators. Before delving into these technical analyses, the environment needs to be prepared by installing TA-Lib, a library that provides functions for technical analysis.
First, the TA-Lib source code is downloaded and extracted:
# Download TA-Lib source code
!wget https://prdownloads.sourceforge.net/ta-lib/ta-lib-0.4.0-src.tar.gz
# Extract the source code
!tar -xzf ta-lib-0.4.0-src.tar.gz
# Change directory to the source code folder
%cd ta-lib
# Configure and build
!./configure --prefix=/usr
!make
!make install
The above commands download the TA-Lib source code, extract it, navigate to the source code directory, and then configure and build the library.
Next, the Python wrapper for TA-Lib is installed:
# Install the Python wrapper for TA-Lib
!pip install TA-Lib
This command installs the TA-Lib Python wrapper, enabling us to use TA-Lib functions within Python scripts. With TA-Lib installed, the environment is now prepared to perform comprehensive technical analysis using MACD, RSI, and Bollinger Bands.
MACD Histogram
The MACD is a powerful indicator used to identify changes in the strength, direction, momentum, and duration of a trend in a stock's price. The MACD histogram represents the difference between the MACD line and the signal line, providing insights into the momentum of the market.
First, the MACD values were calculated using TA-Lib, and then the MACD histogram was plotted.
import talib as ta
# Calculate MACD using TA-Lib
df['macd'], df['macdsignal'], df['macdhist'] = ta.MACD(df['Close'], fastperiod=12, slowperiod=26, signalperiod=9)
# Display the first 100 rows of the MACD related columns to check calculations
df[['Close', 'macd', 'macdsignal', 'macdhist']].head(100)
The above code reads the CSV file, sets the 'Date' column as the index, and calculates the MACD, signal line, and histogram using TA-Lib. The calculations are then verified by displaying the first 100 rows and the last few rows of the DataFrame.
Next, the MACD histogram is plotted to visualize the momentum.
# Define the start and end for the hlines function
start = date.min()
end = date.max()
# Plot the MACD histogram
plt.figure(figsize=(14, 7))
plt.fill_between(date, df['macdhist'], color='grey', alpha=0.5, label='MACD_hist')
plt.hlines(0, start, end, "gray", linestyles='dashed')
plt.title('MACD Histogram')
plt.xlabel('Date')
plt.ylabel('MACD')
plt.legend()
plt.show()
The MACD histogram is plotted with the fill_between method, showing the difference between the MACD line and the signal line over time. The dashed horizontal line at zero indicates the point where the MACD line crosses the signal line, which can signal potential buy or sell opportunities.
RSI Calculation and Analysis
RSI is a momentum oscillator that measures the speed and change of price movements. It oscillates between 0 and 100 and is used to identify overbought or oversold conditions in a market.
First, the RSI was calculated using a 25-day window.
# Calculate RSI
window_length = 25
# Get the difference in price from previous step
delta = df['Close'].diff()
# Make the positive gains (up) and negative gains (down) Series
up, down = delta.copy(), delta.copy()
up[up < 0] = 0
down[down > 0] = 0
# Calculate the EWMA (Exponential Weighted Moving Average)
roll_up = up.ewm(span=window_length).mean()
roll_down = down.abs().ewm(span=window_length).mean()
# Calculate the RSI based on EWMA
RS = roll_up / roll_down
df['RSI'] = 100.0 - (100.0 / (1.0 + RS))
# Display the first few rows of the RSI column to check calculations
df[['Close', 'RSI']].head()
The code above calculates the RSI by first finding the difference in price from the previous step (delta). The positive gains (up) and negative gains (down) are then separated. The Exponential Weighted Moving Average (EWMA) of these gains is calculated to smooth out the values. The RSI is then computed using these smoothed values.
Next, the RSI was plotted to visualize the momentum.
# Plot the RSI
plt.figure(figsize=(14, 7))
plt.plot(date, df['RSI'], label='RSI', color='gray')
plt.axhline(30, linestyle='--', alpha=0.5, color='red')
plt.axhline(70, linestyle='--', alpha=0.5, color='red')
plt.axhline(50, linestyle='--', alpha=0.5, color='blue')
plt.title('RSI')
plt.xlabel('Date')
plt.ylabel('RSI')
plt.legend()
plt.show()
The plot shows the RSI over time, with horizontal lines at 30, 50, and 70. The lines at 30 and 70 are particularly significant:
By analyzing the RSI alongside the price data, insights into potential market reversals and the strength of the current trend can be gained.
Bollinger Bands Calculation and Analysis
Bollinger Bands are a type of statistical chart characterizing the prices and volatility over time. They consist of a moving average (middle band) and two standard deviation lines (upper and lower bands) that form an envelope around the price. The width of this envelope indicates the volatility of the asset; wider bands suggest higher volatility, while narrower bands indicate lower volatility.
First, the Bollinger Bands were calculated using a 25-day.
# Calculate Bollinger Bands
window = 25
df['rolling_mean'] = price.rolling(window).mean()
df['rolling_std'] = price.rolling(window).std()
df['bollinger_upper'] = df['rolling_mean'] + (df['rolling_std'] * 2)
df['bollinger_lower'] = df['rolling_mean'] - (df['rolling_std'] * 2)
# Display the first 30 rows of the Bollinger Bands related columns to check calculations
df[['Close', 'rolling_mean', 'bollinger_upper', 'bollinger_lower']].head(30)
The above code calculates the rolling mean (middle band) and rolling standard deviation over a 25-day window. The upper and lower bands are then determined by adding and subtracting twice the standard deviation from the rolling mean, respectively.
Next, the S&P 500 close prices were plotted with the Bollinger Bands.
# Plot the Close price with Bollinger Bands
plt.figure(figsize=(14, 7))
plt.plot(date, price, label='S&P500', color='green', linewidth=0.5)
plt.fill_between(date, df['bollinger_upper'], df['bollinger_lower'], color='grey', alpha=0.3, label='Bollinger Bands')
plt.title('S&P500 Close Price with Bollinger Bands', loc='center')
plt.xlabel('Date')
plt.ylabel('Price')
plt.legend()
plt.show()
The plot shows the S&P 500 close prices along with the Bollinger Bands. The area between the upper and lower bands is shaded, indicating the range within which the price typically moves. Key insights from the Bollinger Bands include:
By analyzing the Bollinger Bands along with the price movements, traders can gain valuable insights into the market's volatility and potential reversal points.
Insights Summary
This analysis focused on the S&P 500, utilizing technical indicators such as moving averages, volume, MACD, RSI, and Bollinger Bands to gain insights into market behavior and identify potential trading signals. The graphs presented in this article are once again displayed below.
The comprehensive technical analysis of the S&P 500 provided valuable insights into market behavior. By combining indicators such as moving averages, volume, MACD, RSI, and Bollinger Bands, potential trading signals that can guide future investment strategies were identified. This multi-faceted approach enhances the understanding of market dynamics and helps in making informed trading decisions.