A Novel Method for Pricing SPY Options via Brownian Motion and Black-Scholes Equations
Where the CFR attempts to provide novel methodologies for empirical asset pricing, the formulation of a method for pricing American options naturally arises as a goal.
This methodology is predicated on collecting historical data, and then using this data for simulating potential price paths for the underlying stock. Each path is randomly assigned a mean and prices are simulated for 100 days beginning April 20th and ending July 28th.
After the paths are simulated, we prune them every 10 days, removing the paths with the greatest average distance from the actual path; this ensures that the paths we use to calculate the price of the option are as accurate as we can make them.
For this strategy we will use the call option with the lowest strike price and/or put option with the highest strike price.
Our first example will be a strike price of $150 with an expiry of June 20th, 2025. The actual price of the option is $402.65.
Let's see how close we can get.
We therefore begin by loading the necessary packages.
if (!require(quantmod)) install.packages("quantmod")
if (!require(TTR)) install.packages("TTR")
if (!require(ggplot2)) install.packages("ggplot2")
if (!require(pracma)) install.packages("pracma")
library(quantmod)
library(TTR)
library(ggplot2)
library(pracma)
After this, we enter the code to collect the historical data for SPY; here we are selecting the most recent data.
stock_symbol <- "SPY"
start_date <- "2024-03-01"
end_date <- Sys.Date()
getSymbols(stock_symbol, src = "yahoo", from = start_date, to = end_date)
Next, we subset our data and set the date for simulating the price paths. To calculate the returns, we use the closing prices for consecutive days; to calculate volatility, we use 20-day rolling standard deviation. The data for the simulated price paths begin collecting data at the date set forth earlier (2024-03-01) and the data collection ends 90 days before the current date.
simulation_data_end_date <- as.Date(end(SPY)) - 90
SPY_subset <- SPY[index(SPY) <= simulation_data_end_date]
SPY_returns <- dailyReturn(Cl(SPY_subset))
SPY_volatility <- runSD(SPY_returns, n = 20)
Next, we set the parameters for our simulated paths:
mu <- mean(SPY_returns, na.rm = TRUE)
sigma <- sd(SPY_returns, na.rm = TRUE)
S0 <- as.numeric(Cl(SPY_subset)[length(Cl(SPY_subset))])
After this, we enter the code to set the terms for simulating each of the future price paths.
# Function to simulate future prices
simulate_prices <- function(start_price, mu, sigma, n, seed) {
set.seed(seed)
random_returns <- rnorm(n, mean = mu, sd = sigma)
future_prices <- numeric(n)
future_prices[1] <- start_price
for (i in 2:n) {
future_prices[i] <- future_prices[i - 1] * exp(random_returns[i - 1])
}
return(future_prices)
}
Our next step is to enter the code for simulating the prices for each day of each path; we will simulate 100 paths and 100 days for each path.
# Simulate prices for each path
n <- 100 # Number of simulation days
simulated_prices_list <- lapply(1:100, function(i) {
simulate_prices(S0, mu, sigma, n, seed = i)
})
Continuing, our simulated paths begin 90 days before present (in late April), and end 100 days later, near the end of July. S0 is the price of the underlying asset on the date before the simulated paths begin.
# Calculate the new start date for the simulation (90 days before the end of SPY data)
simulation_start_date <- as.Date(end(SPY)) - 90
After this, we create a data frame that allows us to plot the price paths.
# Convert the historical and simulated prices to a data frame for plotting
historical_prices_df <- data.frame(
Date = index(SPY),
Price = as.numeric(Cl(SPY)),
Path = "Historical"
)
The 100 paths are then simulated:
# Create simulated prices data frames with the new start date
simulated_prices_dfs <- lapply(seq_along(simulated_prices_list), function(i) {
data.frame(
Date = seq.Date(from = simulation_start_date + 1, by = "days", length.out = n),
Price = simulated_prices_list[[i]],
Path = paste0("Path ", i)
)
})
# Combine historical and simulated data for plotting
combined_df <- rbind(
historical_prices_df,
do.call(rbind, simulated_prices_dfs)
)
We can now calculate the average distance of each simulated path from the actual path.
# Define the function to calculate average distance to actual prices
calculate_average_distance_to_actual <- function(actual_prices_df, simulated_prices_df) {
# Merge simulated prices with actual prices on Date
merged_df <- merge(actual_prices_df, simulated_prices_df, by = "Date", suffixes = c("_actual", "_simulated"))
# Calculate the absolute distance between actual and simulated prices
distances <- abs(merged_df$Price_actual - merged_df$Price_simulated)
# Return the average distance
return(mean(distances, na.rm = TRUE))
}
We then begin to prune the paths, which is accomplished by removing the 25% percent of paths with the greatest absolute average distance from the average price path. Pruning is done every 10 days.
# Determine the last date of actual values
last_actual_date <- max(historical_prices_df$Date)
# Function to prune the paths
prune_paths <- function(simulated_prices_dfs, actual_prices_df, last_actual_date, prune_interval = 10) {
remaining_paths <- simulated_prices_dfs
pruned_paths <- list()
iterations <- 0
while (length(remaining_paths) > 2) {
iterations <- iterations + 1
prune_date <- simulation_start_date + iterations * prune_interval
# Stop pruning if prune_date is past the last actual date
if (prune_date > last_actual_date) {
prune_date <- last_actual_date
}
# Print average distances before pruning
avg_distances <- sapply(remaining_paths, function(path) {
path_up_to_interval <- path[path$Date <= prune_date, ]
calculate_average_distance_to_actual(actual_prices_df, path_up_to_interval)
})
print(paste("Iteration:", iterations))
print(paste("Prune Date:", prune_date))
print("Average distances from actual path to remaining paths:")
print(avg_distances)
# Rank paths by average distance
ranked_paths <- order(avg_distances, decreasing = TRUE)
# Determine paths to prune (top 25%)
paths_to_prune <- ranked_paths[1:ceiling(0.25 * length(ranked_paths))]
# Add pruning date to pruned paths and update remaining paths
for (path_idx in paths_to_prune) {
path <- remaining_paths[[path_idx]]
path_pruned <- path[path$Date <= prune_date, ]
pruned_paths <- c(pruned_paths, list(path_pruned))
}
remaining_paths <- remaining_paths[-paths_to_prune]
# Print average distances after pruning
avg_distances_after <- sapply(remaining_paths, function(path) {
path_up_to_interval <- path[path$Date <= prune_date, ]
calculate_average_distance_to_actual(actual_prices_df, path_up_to_interval)
})
print("Average distances from actual path to remaining paths after pruning:")
print(avg_distances_after)
# Stop if less than 3 paths remain
if (length(remaining_paths) <= 2) {
break
}
# Stop pruning if prune_date is past the last actual date
if (prune_date == last_actual_date) {
break
}
}
list(remaining_paths = remaining_paths, pruned_paths = pruned_paths)
}
# Perform pruning
pruning_results <- prune_paths(simulated_prices_dfs, historical_prices_df, last_actual_date, prune_interval = 10)
# Separate pruned and remaining paths
remaining_paths <- do.call(rbind, pruning_results$remaining_paths)
pruned_paths <- do.call(rbind, pruning_results$pruned_paths)
Our next step is to calculate the percentage change for the actual path and all of the simulated paths:
calculate_percentage_change <- function(prices) {
daily_returns <- diff(prices) / head(prices, -1) * 100
return(daily_returns)
}
remaining_paths_changes <- lapply(remaining_paths_list, function(path_df) {
calculate_percentage_change(path_df$Price)
})
We can now plot the simulated paths vs the actual. We add a second plot to make the remaining paths easier to visualize.
# Define the line thickness for the blue line
blue_line_thickness <- 2 # Adjust this value as needed
default_thickness <- 0.5 # Default thickness for other lines
# Set axis limits
plot_start_date <- simulation_start_date - 30 # 90 days before the simulated paths start
plot_end_date <- max(as.Date(sapply(simulated_prices_dfs, function(df) max(df$Date)))) # End date of the simulated paths
x_limits <- as.Date(c(plot_start_date, plot_end_date))
y_limits <- c(min(combined_df$Price), max(combined_df$Price))
# Plot the historical and simulated prices with adjustable thickness for the blue line
ggplot() +
geom_line(data = subset(combined_df, Path == "Historical"), aes(x = Date, y = Price, color = Path), size = blue_line_thickness) +
geom_line(data = remaining_paths, aes(x = Date, y = Price, color = Path), size = 0.5) +
geom_line(data = pruned_paths, aes(x = Date, y = Price, color = Path), linetype = "dashed", size = default_thickness) +
labs(title = "Historical and Simulated SPY Prices", x = "Date", y = "Price") +
theme_minimal() +
scale_color_manual(values = c("Historical" = "blue")) +
scale_x_date(limits = x_limits) +
scale_y_continuous(limits = y_limits)
# Separate the remaining paths for plotting
remaining_paths_df <- do.call(rbind, pruning_results$remaining_paths)
# Create a dataframe for the actual path (historical prices)
actual_path_df <- subset(combined_df, Path == "Historical")
# Plot only the remaining paths and the actual path
ggplot() +
geom_line(data = actual_path_df, aes(x = Date, y = Price, color = Path), size = blue_line_thickness) +
geom_line(data = remaining_paths_df, aes(x = Date, y = Price, color = Path), size = 0.5) +
labs(title = "Historical Prices and Remaining Simulated Paths", x = "Date", y = "Price") +
theme_minimal() +
scale_color_manual(values = c("Historical" = "blue")) + # Color for actual path
scale_x_date(limits = x_limits) +
scale_y_continuous(limits = y_limits)
We then take the remaining paths and (re) calculate their daily percentage change, creating histograms for each of the paths:
# Define remaining_paths_list (make sure this matches your pruning results)
remaining_paths_list <- pruning_results$remaining_paths
# Function to calculate daily percentage change
calculate_percentage_change <- function(prices) {
daily_returns <- diff(prices) / head(prices, -1) * 100
return(daily_returns)
}
# Calculate daily returns for remaining paths
remaining_paths_changes <- lapply(remaining_paths_list, function(path_df) {
calculate_percentage_change(path_df$Price)
})
# Create a data frame for plotting histograms
remaining_paths_hist_df <- do.call(rbind, lapply(seq_along(remaining_paths_changes), function(i) {
data.frame(
Path = paste0("Path ", i),
DailyReturn = remaining_paths_changes[[i]],
stringsAsFactors = FALSE
)
}))
# Plot histograms for the daily returns of each remaining path
ggplot(remaining_paths_hist_df, aes(x = DailyReturn, fill = Path)) +
geom_histogram(bins = 30, alpha = 0.7, position = "identity") +
facet_wrap(~ Path, scales = "free_y") +
labs(title = "Histograms of Daily Returns for Each Remaining Path", x = "Daily Return (%)", y = "Frequency") +
theme_minimal() +
theme(legend.position = "bottom")
These histograms are then used for creating densities for each of the remaining simulated paths:
# Create a data frame for plotting density plots
remaining_paths_density_df <- do.call(rbind, lapply(seq_along(remaining_paths_changes), function(i) {
data.frame(
Path = paste0("Path ", i),
DailyReturn = remaining_paths_changes[[i]],
stringsAsFactors = FALSE
)
}))
# Plot density curves for the daily returns of each remaining path
ggplot(remaining_paths_density_df, aes(x = DailyReturn, color = Path, fill = Path)) +
geom_density(alpha = 0.3) +
labs(title = "Density Plots of Daily Returns for Each Remaining Path", x = "Daily Return (%)", y = "Density") +
theme_minimal() +
theme(legend.position = "bottom")
More than halfway done!
Our next step is to use the five distributions to calculate the potential returns of the underlying asset over the next year. We use 10,000 simulations.
set.seed(123) # For reproducibility
num_days <- 252 # Assuming 252 trading days in a year
num_simulations <- 10000
Followed by extracting distributions from the densities:
领英推荐
distributions <- lapply(unique(remaining_paths_density_df$Path), function(path) {
remaining_paths_density_df$DailyReturn[remaining_paths_density_df$Path == path]
})
We can now simulated the cumulative returns of the underlying asset over the next year. We do so by randomly selecting one of the remaining distributions for each day's predicted price:
simulate_cumulative_return <- function() {
cumulative_return <- 100 # Starting value
for (day in 1:num_days) {
selected_distribution <- sample(distributions, 1)[[1]]
daily_return <- sample(selected_distribution, 1)
cumulative_return <- cumulative_return * (1 + daily_return / 100)
}
return(cumulative_return)
}
We then complete our simulation and plot the distribution of the potential returns.
cumulative_returns <- replicate(num_simulations, simulate_cumulative_return())
cumulative_returns_percentage <- (cumulative_returns / 100 - 1) * 100
cumulative_returns_df <- data.frame(CumulativeReturnPercentage = cumulative_returns_percentage)
ggplot(cumulative_returns_df, aes(x = CumulativeReturnPercentage)) +
geom_density(fill = "blue", alpha = 0.3) +
labs(title = "Distribution of Possible Cumulative Returns Over One Year", x = "Cumulative Return (%)", y = "Density") +
scale_x_continuous(labels = scales::percent_format(scale = 1)) +
theme_minimal()
To give us an idea about the probability that the underlying asset will increase in price enough to make the option profitable, we calculate the area under the curve which is greater than zero.
density_data <- density(cumulative_returns_percentage)
auc_above_zero <- trapz(density_data$x[density_data$x > 0], density_data$y[density_data$x > 0])
print(paste("Area under the curve above 0:", auc_above_zero))
Which gives us the following:
> # Print the AUC above 0
> print(paste("Area under the curve above 0:", auc_above_zero))
[1] "Area under the curve above 0: 0.521118832745085"
We also need to calculate the volatilities of simulated paths in order to have the correct variables for our pricing function; we will use the average for our pricing function:
# Function to calculate volatility (standard deviation) of daily returns
calculate_volatility <- function(daily_returns) {
return(sd(daily_returns, na.rm = TRUE))
}
# Calculate daily returns for remaining paths
remaining_paths_changes <- lapply(remaining_paths_list, function(path_df) {
calculate_percentage_change(path_df$Price)
})
# Calculate volatility for each path
remaining_paths_volatility <- sapply(remaining_paths_changes, function(daily_returns) {
calculate_volatility(daily_returns)
})
# Compute average volatility of remaining paths
average_volatility <- mean(remaining_paths_volatility, na.rm = TRUE)
print(paste("Average Volatility of Remaining Paths:", round(average_volatility, 4)))
And we get the following results:
> # Compute average volatility of remaining paths
> average_volatility <- mean(remaining_paths_volatility, na.rm = TRUE)
> print(paste("Average Volatility of Remaining Paths:", round(average_volatility, 4)))
[1] "Average Volatility of Remaining Paths: 0.6925"
Finally
# Function to calculate daily percentage change
calculate_percentage_change <- function(prices) {
daily_returns <- diff(prices) / head(prices, -1) * 100
return(daily_returns)
}
# Assuming 'remaining_paths_list' is a list of data frames with the column 'Price'
# Calculate daily returns for remaining paths
remaining_paths_changes <- lapply(remaining_paths_list, function(path_df) {
calculate_percentage_change(path_df$Price)
})
# Combine the distributions into one data frame
remaining_paths_density_df <- do.call(rbind, lapply(seq_along(remaining_paths_changes), function(i) {
data.frame(
Path = paste0("Path ", i),
DailyReturn = remaining_paths_changes[[i]],
stringsAsFactors = FALSE
)
}))
# Extract the five distributions
distributions <- lapply(unique(remaining_paths_density_df$Path), function(path) {
remaining_paths_density_df$DailyReturn[remaining_paths_density_df$Path == path]
})
# Function to simulate one stock price path over one year using the empirical distribution
simulate_stock_path <- function(S, T, daily_returns_distributions, num_days) {
stock_price <- numeric(num_days + 1)
stock_price[1] <- S
for (day in 2:(num_days + 1)) {
# Randomly select one of the daily return distributions
selected_distribution <- sample(daily_returns_distributions, 1)[[1]]
# Draw a random daily return from the selected distribution
daily_return <- sample(selected_distribution, 1) / 100
# Update the stock price
stock_price[day] <- stock_price[day - 1] * (1 + daily_return)
}
return(stock_price)
}
# Black-Scholes formula for option pricing using Monte Carlo simulation
monte_carlo_option_pricing <- function(S, K, r, T, sigma, type = "call", num_simulations = 10000) {
num_days <- 252 # Assuming 252 trading days in a year
simulated_prices <- replicate(num_simulations, simulate_stock_path(S, T, distributions, num_days)[num_days + 1])
if (type == "call") {
payoffs <- pmax(simulated_prices - K, 0)
} else if (type == "put") {
payoffs <- pmax(K - simulated_prices, 0)
} else {
stop("Invalid option type. Please use 'call' or 'put'.")
}
option_price <- mean(payoffs) * exp(-r * T)
return(option_price)
}
# Define parameters
S <- 548.94 # Current stock price
K <- 455 # Strike price
r <- 0.042 # Risk-free interest rate
T <- 0.008219 # Time to maturity in years
sigma <- average_volatility # Volatility (20%)
# Calculate the price of a call option using Monte Carlo simulation
call_price_mc <- monte_carlo_option_pricing(S, K, r, T, sigma, type = "call")
print(paste("Call Option Price (Monte Carlo): ", round(call_price_mc, 2)))
We'll redo a little bit of the calculations next, to ensure that we are using percentage changes from the combined distribution; first we have to reconstruct it; we first define our percentage price change function:
calculate_percentage_change <- function(prices) {
daily_returns <- diff(prices) / head(prices, -1) * 100
return(daily_returns)
}
After this, we calculate the daily returns for the remaining paths and then combine the distributions into one data frame:
remaining_paths_changes <- lapply(remaining_paths_list, function(path_df) {
calculate_percentage_change(path_df$Price)
})
remaining_paths_density_df <- do.call(rbind, lapply(seq_along(remaining_paths_changes), function(i) {
data.frame(
Path = paste0("Path ", i),
DailyReturn = remaining_paths_changes[[i]],
stringsAsFactors = FALSE
)
}))
Now we simulate the cumulative returns over one year so that we can calculate the payoffs of the the option and discount them:
simulate_stock_path <- function(S, T, daily_returns_distributions, num_days) {
stock_price <- numeric(num_days + 1)
stock_price[1] <- S
for (day in 2:(num_days + 1)) {
selected_distribution <- sample(daily_returns_distributions, 1)[[1]]
daily_return <- sample(selected_distribution, 1) / 100
stock_price[day] <- stock_price[day - 1] * (1 + daily_return)
}
return(stock_price)
}
Our penultimate calculation is using Monte Carlo methods to define a function for estimating the price of the call option (we can also calculate put options:
monte_carlo_option_pricing <- function(S, K, r, T, sigma, type = "call", num_simulations = 10000) {
num_days <- 252
simulated_prices <- replicate(num_simulations, simulate_stock_path(S, T, distributions, num_days)[num_days + 1])
if (type == "call") {
payoffs <- pmax(simulated_prices - K, 0)
} else if (type == "put") {
payoffs <- pmax(K - simulated_prices, 0)
} else {
stop("Invalid option type. Please use 'call' or 'put'.")
}
option_price <- mean(payoffs) * exp(-r * T)
return(option_price)
}
Almost done; we just need to define our parameters and then we can calculate the price of the option:
# Define parameters
S <- 548.94 # Current stock price
K <- 150 # Strike price
r <- 0.042 # Risk-free interest rate
T <- 0.91781 # Time to maturity in years
sigma <- average_volatility # Volatility (20%)
# Calculate the price of a call option using Monte Carlo simulation
call_price_mc <- monte_carlo_option_pricing(S, K, r, T, sigma, type = "call")
print(paste("Call Option Price (Monte Carlo): ", round(call_price_mc, 2)))
Finally, we get our first call option price:
> # Define parameters
> S <- 548.94 # Current stock price
> K <- 150 # Strike price
> r <- 0.042 # Risk-free interest rate
> T <- 0.91781 # Time to maturity in years
> sigma <- average_volatility # Volatility (20%)
>
> # Calculate the price of a call option using Monte Carlo simulation
> call_price_mc <- monte_carlo_option_pricing(S, K, r, T, sigma, type = "call")
> print(paste("Call Option Price (Monte Carlo): ", round(call_price_mc, 2)))
[1] "Call Option Price (Monte Carlo): 391.18"
Where the actual is $402.65, our price is $391.18. Taking $391.18/$402.65, we have an accuracy of 97.06%.
Now let's try a put option on the same date: The strike price is $830, the expiry is June 20th, 2025, and the actual price is $281.54.
Our results:
> # Define parameters
> S <- 548.94 # Current stock price
> K <- 830 # Strike price
> r <- 0.042 # Risk-free interest rate
> T <- 0.91781 # Time to maturity in years
> sigma <- average_volatility # Volatility (20%)
>
> # Calculate the price of a put option using Monte Carlo simulation
> put_price_mc <- monte_carlo_option_pricing(S, K, r, T, sigma, type = "put")
> print(paste("Put Option Price (Monte Carlo): ", round(put_price_mc, 2)))
[1] "Put Option Price (Monte Carlo): 261.85"
Calculating our accuracy using the actual and predicted, we get $261.85/$281.54, giving us an accuracy of 93.01%.
There we have it; using principles of statistics we have a modified Black-Scholes formula with an accuracy of 97.06% for our call option and 93.01% for our put option. While the parameters for the data gathering period, length of the window for rolling volatility, number of days being simulated, and number of price paths being simulated may vary from stock to stock, we appear to have a promising model that warrants further evaluation.
Thank you for reading the CFR.