Visualizing London's Cycle Hire Points with ggplot2 and QGIS: From Data to GeoPackage
Geospatial analysis is becoming increasingly important in various industries, from urban planning to environmental monitoring. Many professionals use QGIS, an open-source Geographic Information System (GIS), to visualize and analyze spatial data. Another popular tool among data scientists is ggplot2, a data visualization package for R. While ggplot2 is great for creating static and interactive plots, it doesn't inherently support geospatial data formats. In this post, I'll show you how to convert ggplot visualizations into the GeoPackage (GPKG) format, a modern and efficient geospatial data storage format, and use them in QGIS. This will allow you to seamlessly integrate your ggplot2 visualizations into your geospatial analyses and workflows. Let's dive in!
In this tutorial, we'll be using the spData package in R, which provides a variety of geospatial datasets for analysis and visualization. We'll be working with data on cycle hire points across London, as well as a map of London boroughs. Our goal is to create a comprehensive visualization that includes point representations of the cycle hire points, a map of the boroughs, and a density plot of the cycle hiring points to identify areas with high and low concentrations of hire points. We'll start by creating this visualization in ggplot2, then convert it into the GPKG format and import it into QGIS for further analysis and exploration. Whether you're a geospatial analyst or a data scientist, this workflow will help you integrate two powerful tools for data visualization and analysis.
We will call three libraries: ggplot2 for visualization, sf for dealing with spatial data, and spData to source the datasets.
library(ggplot2)
library(sf)
library(spData)
The data frame lnd contains administrative boundaries of London's boroughs, cycle_hire contains cycle hire points accross London.
data(cycle_hire)
data(lnd)
We will visualize cycle hire points, a map of the boroughs, and a density plot of the cycle hiring points.
# Add the X and Y coordinates of each cycle hire point to the "cycle_hire" dataset
cycle_hire = cbind(cycle_hire, st_coordinates(cycle_hire))
# Initialize a ggplot object
plot1 = ggplot() +
# Add the spatial data of London to the plot
geom_sf(data = lnd) +
# Add the cycle hire points to the plot with a transparency level of 0.6
geom_sf(data = cycle_hire, alpha = 0.6) +
# Create a 2D density plot using the X and Y coordinates of the cycle hire points
stat_density_2d( data = cycle_hire, mapping = aes(x=X, y=Y)) +
# Restrict the spatial extent of the plot to the specified limits
coord_sf(xlim = c(-0.3, 0.1), y = c(51.43, 51.55))+
# Add titles to the plot and the axes
labs(title="Density of Bike Stations") +
# Apply a minimal theme to the plot
theme_void()
# Print the plot to the console or the plotting device
plot1
The function ggplot_build() is often used to extract the data underlying a ggplot object after all transformations and statistics have been applied. This data can be different from the original data used to create the plot because it includes all the modifications that are made to the data during the process of building the plot.
data_plot = ggplot_build(plot1)$data
The result is a list with 3 dataframes, each corresponding to the ggplot data layers: geom_sf(data = lnd), geom_sf(data = cycle_hire, alpha = 0.6), and stat_density_2d(data = cycle_hire, mapping = aes(x=X, y=Y)).
领英推荐
While the first two layers produce an sf object that can be easily saved as a layer for a gpkg file, the last one (stat_density_2d) produces a simple dataframe.
In the code below, we're using the sf package in R to convert the last data frame, which is the result of the stat_density_2d function, into a geometric object. The data frame contains information about the contours of a density plot, including the X and Y coordinates of the contour lines and a group column that indicates which contour lines belong together. Our goal is to create a spatial object that represents these contour lines, which we can use for further spatial analysis or visualization in R.
To accomplish this, we first use the split function to separate the dataframe into a list of smaller dataframes, each corresponding to a unique contour line (grouped by the group column). For each of these smaller dataframes, we use the st_linestring function from the sf package to create a LineString object from the X and Y coordinates. We then use the st_sfc function to combine these LineString objects into a single geometric object. Finally, we use the st_sf function to create an sf object, which is a common format for representing and working with spatial data in R. This sf object can be used for spatial analysis, visualization, or export to other spatial data formats.
# Convert the dataframe into a list of LineString objects, grouped by the "group" column
lines <- lapply(split(data_plot[[3]], data_plot[[3]]$group), function(group_df) {
st_linestring(as.matrix(group_df[, c("x", "y")]))
})
# Create an sf object from the list of LineString objects
density_sf <- st_sf(geometry = st_sfc(lines))
# Set the right coordinate reference system
st_crs(density_sf) = st_crs(lnd)
# Plot the sf object
ggplot() + geom_sf(data = density_sf)
The first line of code below writes the boroughs data, which represents the boundaries and attributes of London's boroughs, to the GeoPackage file named london_cycle.gpkg and names the layer 'boroughs'.
The second line of code writes the cycle_hire_points data to the same GeoPackage file and names the layer "cycle_hire_points". The append = T argument tells the function to add this data as a new layer in the existing GeoPackage file, rather than overwriting the file. This data represents the locations and attributes of cycle hire points in London.
The third line of code writes the density data to the GeoPackage file and names the layer "density". Again, the append = T argument is used to add this data as a new layer in the existing GeoPackage file. This data represents the contours of a density plot of the cycle hire points, which shows areas with high and low concentrations of hire points.
The quiet = T argument in each line of code tells the function to suppress any messages that would normally be printed to the console.
st_write(data_plot[[1]], dsn = 'london_cycle.gpkg', layer = "boroughs", quiet = T)
st_write(data_plot[[2]], dsn = 'london_cycle.gpkg', layer = "cycle_hire_points", append = T, quiet = T)
st_write(density_sf, dsn = 'london_cycle.gpkg', layer = "density", append = T, quiet = T)
The result of this will be a GeoPackage file that now can be read and visualized using QGIS.
If you require assistance in marketing analytics, data science, or ML, feel free to reach out to us at?[email protected]. Together, we’ll work towards achieving success for your business.