Ethereum (ETH) Price Prediction and Visualization (ggplot2) using Machine Learning (Caret) in R
This tutorial provides a straightforward example for beginners to learn the essentials of predicting Ethereum prices using linear regression in R. For a deeper exploration into machine learning in R and a comprehensive guide suitable for everyone, consider checking out the book "Machine Learning in R for Everyone", available at https://www.amazon.com/Machine-Learning-Everyone-Jonathan-Wayne-ebook/dp/B0CHR9FZ1G/ref=sr_1_6?crid=M0GXZ7YT2JHX&keywords=machine+learning+in+R&qid=1702962040&sprefix=machine+learning+in+r%2Caps%2C84&sr=8-6.
Introduction:
Cryptocurrency price prediction is a challenging yet essential aspect for investors. In this article, we will explore how to predict the price of Ethereum (ETH) using historical data, perform linear regression with cross-validation, and visualize the predictions over an extended timeline.
Data Collection and Cleaning:
We begin by loading the required libraries, including caret, glmnet, rvest, and ggplot2. The historical prices of Ethereum are obtained from Yahoo Finance using web scraping techniques. The collected data is then cleaned by selecting relevant columns, converting the 'Date' column to the Date type, and handling missing values.
# Load libraries
library(caret)
library(glmnet)
library(rvest)
library(dplyr)
# Define the URL for Ethereum on Yahoo Finance
url <- "https://finance.yahoo.com/quote/ETH-USD/history"
# Read HTML content from the URL
page <- read_html(url)
# Extract historical prices using a more flexible approach
btc_data <- page %>%
html_nodes(xpath = '//*[@id="Col1-1-HistoricalDataTable-Proxy"]//table') %>%
html_table(header = TRUE, fill = TRUE) %>%
as.data.frame()
# Clean up the data
btc_data <- btc_data[, c("Date", "Close.")]
btc_data = btc_data[1:length(btc_data$Date)-1,]
# Convert Date column to Date type
btc_data$Date <- as.Date(btc_data$Date, format = "%b %d, %Y")
btc_data$Close. = gsub("-", NA, btc_data$Close.)
btc_data = na.omit(btc_data)
btc_data$Close. = as.numeric(gsub(",", "", btc_data$Close.))
# Arrange the data by Date in descending order
btc_df <- arrange(btc_data, desc(Date))
Model Training:
The data is split into training and testing sets using the createDataPartition function. A linear regression model is trained with cross-validation and hyperparameter tuning using the train function from the caret package.
# Split data into training and testing sets
set.seed(123)
split_index <- createDataPartition(btc_df$Close., p = 0.9, list = FALSE)
train_data <- btc_df[split_index, ]
test_data <- btc_df[-split_index, ]
# Train the model with cross-validation
model <- train(
Close. ~ Date,
data = train_data,
method = "lm",
trControl = trainControl(method = "cv", number = 5)
)
领英推荐
Prediction and Visualization:
The timeline is extended for predictions, and the model is used to predict both the test set and future dates. Actual, predicted, and future data are combined for visualization using ggplot2.
# Extend the timeline for predictions (e.g., next 8 days)
future_dates <- seq(max(btc_df$Date) + 1, by = "days", length.out = 8)
future_data <- data.frame(Date = future_dates)
# Make predictions on the extended timeline
test_predictions <- predict(model, newdata = test_data)
future_predictions <- predict(model, newdata = future_data)
# Combine actual, predicted, and future data
combined_data <- rbind(data.frame(Date = test_data$Date, Value = test_data$Close., Type = "Actual"),
data.frame(Date = test_data$Date, Value = test_predictions, Type = "Predicted"),
data.frame(Date = future_data$Date, Value = future_predictions, Type = "Future Predictions"))
Plotting with ggplot2:
A visually appealing plot is created using ggplot2 to compare actual, predicted, and future Ethereum prices.
# Plot with ggplot2 for better control over aesthetics
library(ggplot2)
library(ggrepel) # For label repelling
ggplot(combined_data, aes(x = Date, y = Value, color = Type, linetype = Type)) +
geom_line(size = 2) +
geom_point(size = 3, shape = 16) +
labs(title = "ETH Price Prediction",
x = "Date",
y = "Price",
subtitle = "Actual, Predicted, and Future Predictions") +
theme_minimal() +
theme(legend.position = "top") +
scale_color_manual(values = c("Actual" = "blue", "Predicted" = "red", "Future Predictions" = "green")) +
scale_linetype_manual(values = c("Actual" = "solid", "Predicted" = "solid", "Future Predictions" = "dashed")) +
geom_text_repel(data = subset(combined_data, Type == "Future Predictions"), aes(label = round(Value, 2)),
box.padding = 0.5, max.overlaps = Inf, segment.color = "green") +
theme(plot.title = element_text(hjust = 0.5, size = 18, face = "bold"),
plot.subtitle = element_text(hjust = 0.5, size = 14),
axis.title = element_text(size = 14),
axis.text = element_text(size = 12),
legend.title = element_blank(),
legend.text = element_text(size = 12))
Conclusion:
In this tutorial, we covered the simple process of predicting Ethereum prices, from data collection and cleaning to model training, prediction, and visualization. This serves as a foundational example, and users are encouraged to explore additional features and advanced modeling techniques for more accurate predictions. Remember that cryptocurrency markets are highly volatile, and predictions should be approached with caution. Happy coding!
#Ethereum #PricePrediction #LinearRegression #MachineLearning #DataCleaning #DataVisualization #Cryptocurrency #RProgramming #DataScience #PredictiveModeling #Tutorial #BeginnerFriendly #BookRecommendation #AmazonBook #FinanceAnalytics #kindle