Highcharting Jobs Friday
Today, in honor of last week’s jobs report from the Bureau of Labor Statistics (BLS), we will visualize jobs data with ggplot2 and then, more extensively with highcharter. Full post with the interactive charts is also available here. Our aim is to explore highcharter and its similarity with ggplot and to create some nice interactive visualizations. In the process, we will cover how to import BLS data from FRED and then wrangle it for visualization. We won’t do any modeling or statistical analysis today, though it wouldn’t be hard to extend this script into a forecasting exercise. One nice thing about today’s code flow is that it can be refreshed and updated on each BLS release date.
Let’s get to it!
We will source our data from FRED and will use the tq_get() function from tidyquant which enables us to import many data series at once in tidy, tibble format. We want to get total employment numbers, ADP estimates, and the sector-by-sector numbers that make up total employment. Let’s start by creating a tibble to hold the FRED codes and more intuitive names for each data series.
library(tidyverse)
library(tidyquant)
codes_names_tbl <- tribble(
~ symbol, ~ better_names,
"NPPTTL", "ADP Estimate",
"PAYEMS", "Nonfarm Employment",
"USCONS", "Construction",
"USTRADE", "Retail/Trade",
"USPBS", "Prof/Bus Serv",
"MANEMP", "Manufact",
"USFIRE", "Financial",
"USMINE", "Mining",
"USEHS", "Health Care",
"USWTRADE", "Wholesale Trade",
"USTPU", "Transportation",
"USINFO", "Info Sys",
"USLAH", "Leisure",
"USGOVT", "Gov",
"USSERV", "Other Services"
)
Now we pass the symbol column to tq_get().
fred_empl_data <-
tq_get(codes_names_tbl$symbol,
get = "economic.data",
from = "2007-01-01")
We have our data but look at the symbol column.
fred_empl_data %>%
group_by(symbol) %>%
slice(1)
# A tibble: 15 x 3
# Groups: symbol [15]
symbol date price
<chr> <date> <dbl>
1 MANEMP 2007-01-01 14008
2 NPPTTL 2007-01-01 115437.
3 PAYEMS 2007-01-01 137497
4 USCONS 2007-01-01 7725
5 USEHS 2007-01-01 18415
6 USFIRE 2007-01-01 8389
7 USGOVT 2007-01-01 22095
8 USINFO 2007-01-01 3029
9 USLAH 2007-01-01 13338
10 USMINE 2007-01-01 706
11 USPBS 2007-01-01 17834
12 USSERV 2007-01-01 5467
13 USTPU 2007-01-01 26491
14 USTRADE 2007-01-01 15443.
15 USWTRADE 2007-01-01 5969.
The symbols are the FRED codes, which are unrecognizable unless you have memorized how those codes map to more intuitive names. Let’s replace them with the better_names column of codes_names_tbl. We will do this with a left_join(). (This explains why I labeled our original column as symbol - it makes the left_join() easier.) Special thanks to Jenny Bryan for pointing out this code flow!
fred_empl_data %>%
left_join(codes_names_tbl,
by = "symbol" ) %>%
select(better_names, everything(), -symbol) %>%
group_by(better_names) %>%
slice(1)
# A tibble: 15 x 3
# Groups: better_names [15]
better_names date price
<chr> <date> <dbl>
1 ADP Estimate 2007-01-01 115437.
2 Construction 2007-01-01 7725
3 Financial 2007-01-01 8389
4 Gov 2007-01-01 22095
5 Health Care 2007-01-01 18415
6 Info Sys 2007-01-01 3029
7 Leisure 2007-01-01 13338
8 Manufact 2007-01-01 14008
9 Mining 2007-01-01 706
10 Nonfarm Employment 2007-01-01 137497
11 Other Services 2007-01-01 5467
12 Prof/Bus Serv 2007-01-01 17834
13 Retail/Trade 2007-01-01 15443.
14 Transportation 2007-01-01 26491
15 Wholesale Trade 2007-01-01 5969.
That looks much better, but we now have a column called price, that holds the monthly employment observations, and a column called better_names, that holds the more intuitive group names. Let’s change those column names to employees and sector.
fred_empl_data <-
fred_empl_data %>%
left_join(codes_names_tbl,
by = "symbol" ) %>%
select(better_names, everything(), -symbol) %>%
rename(employees = price, sector = better_names)
head(fred_empl_data)
# A tibble: 6 x 3
sector date employees
<chr> <date> <dbl>
1 ADP Estimate 2007-01-01 115437.
2 ADP Estimate 2007-02-01 115527.
3 ADP Estimate 2007-03-01 115647
4 ADP Estimate 2007-04-01 115754.
5 ADP Estimate 2007-05-01 115809.
6 ADP Estimate 2007-06-01 115831.
fred_empl_data has the names and organization we want, but it still has the raw number of employees per month. We want to visualize the month-to-month change in jobs numbers, which means we need to perform a calculation on our data and store it in a new column. We use mutate() to create the new column and calculate monthly change with value - lag(value, 1). We are not doing any annualizing or seasonality work here - it’s a simple substraction. For yearly change, it would be value - lag(value, 12).
empl_monthly_change <-
fred_empl_data %>%
group_by(sector) %>%
mutate(monthly_change = employees - lag(employees, 1)) %>%
na.omit()
Our final data object empl_monthly_change is tidy, has intuitive names in the group column, and has the monthly change that we wish to visualize. Let’s build some charts.
We will start at the top and use ggplot to visualize how total non-farm employment (Sorry farmers. Your jobs don’t count, I guess) has changed since 2007. We want an end-user to quickly glance at the chart and find the months with positive jobs growth and negative jobs growth. That means we want months with positive jobs growth to be one color, and those with negative jobs growth to be another color. There is more than one way to accomplish this, but I like to create new columns and then add geoms based on those columns. (Check out this post by Freddie Mac’s Len Kiefer for another way to accomplish this by nesting ifelsestatements in ggplot's aesthetics. In fact, if you like data visualization, check out all the stuff that Len writes.)
Let’s walk through how to create columns for shading by positive or negative jobs growth. First, we are looking at total employment here, so we call filter(sector == "Nonfarm Employment") to get only total employment.
Next, we create two new columns with mutate(). The first is called col_pos and is formed by if_else(monthly_change > 0, monthly_change,...). That logic is creating a column that holds the value of monthly change if monthly change is positive, else it holds NA. We then create another column called col_neg using the same logic.
empl_monthly_change %>%
filter(sector == "Nonfarm Employment") %>%
mutate(col_pos =
if_else(monthly_change > 0,
monthly_change, as.numeric(NA)),
col_neg =
if_else(monthly_change < 0,
monthly_change, as.numeric(NA))) %>%
dplyr::select(sector, date, col_pos, col_neg) %>%
head()
# A tibble: 6 x 4
# Groups: sector [1]
sector date col_pos col_neg
<chr> <date> <dbl> <dbl>
1 Nonfarm Employment 2007-02-01 85 NA
2 Nonfarm Employment 2007-03-01 214 NA
3 Nonfarm Employment 2007-04-01 59 NA
4 Nonfarm Employment 2007-05-01 153 NA
5 Nonfarm Employment 2007-06-01 77 NA
6 Nonfarm Employment 2007-07-01 NA -30
Have a quick look at the col_pos and col_neg columns and make sure they look right. col_pos should have only positive and NA values, col_neg shoud have only negative and NA values.
Now we can visualize our monthly changes with ggplot, adding a separate geom for those new columns.
empl_monthly_change %>%
filter(sector == "Nonfarm Employment") %>%
mutate(col_pos =
if_else(monthly_change > 0,
monthly_change, as.numeric(NA)),
col_neg =
if_else(monthly_change < 0,
monthly_change, as.numeric(NA))) %>%
ggplot(aes(x = date)) +
geom_col(aes(y = col_neg),
alpha = .85,
fill = "pink",
color = "pink") +
geom_col(aes(y = col_pos),
alpha = .85,
fill = "lightgreen",
color = "lightgreen") +
ylab("Monthly Change (thousands)") +
labs(title = "Monthly Private Employment Change",
subtitle = "total empl, since 2008",
caption = "inspired by @lenkiefer") +
scale_x_date(breaks = scales::pretty_breaks(n = 10)) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 90, hjust = 1),
plot.title = element_text(hjust = 0.5),
plot.subtitle = element_text(hjust = 0.5),
plot.caption = element_text(hjust=0))
That plot is nice, but it’s static!
Let’s head to highcharter and create an interactive chart that responds when we hover on it. By way of brief background, highcharter is an R hook into the fantastic highcharts JavaScript library. It’s free for personal use but a license is required for commercial use.
One nice feature of highcharter is that we can use very similar aesthetic logic to what we used for ggplot. It’s not identical, but it’s similar and let’s us work with tidy data.
Before we get to the highcharter logic, we will add one column to our tibble to hold the color scheme for our positive and negative monthly changes. Notice how this is different from the ggplot flow above where we create one column to hold our positive changes for coloring and one column to hold our negative changes for coloring.
I want to color positive changes light blue and negative changes pink, and put the rgb codes for those colors directly in the new column. The rgb code for light blue is “#6495ed” and for pink is “#ffe6ea”. Thus we use ifelse to create a column called color_of_bars that holds “#6495ed” (light blue) when monthly_change is postive and “#ffe6ea” (pink) when it’s negative.
total_employ_hc <-
empl_monthly_change %>%
filter(sector == "Nonfarm Employment") %>%
mutate(color_of_bars = ifelse(monthly_change > 0, "#6495ed", "#ffe6ea"))
head(total_employ_hc)
# A tibble: 6 x 5
# Groups: sector [1]
sector date employees monthly_change color_of_bars
<chr> <date> <dbl> <dbl> <chr>
1 Nonfarm Employment 2007-02-01 137582 85 #6495ed
2 Nonfarm Employment 2007-03-01 137796 214 #6495ed
3 Nonfarm Employment 2007-04-01 137855 59 #6495ed
4 Nonfarm Employment 2007-05-01 138008 153 #6495ed
5 Nonfarm Employment 2007-06-01 138085 77 #6495ed
6 Nonfarm Employment 2007-07-01 138055 -30 #ffe6ea
Now we are ready to start the highcharter flow.
We start by calling hchart to pass in our data object. Note the similarity to ggplot where we started with ggplot.
Now, intead of waiting for a call to geom_col, we set type = "column" to let hchart know that we are building a column chart. Next, we use hcaes(x = date, y = monthly_change, color = color_of_bars) to specify our aesthetics. Notice how we can control the colors of the bars from values in the color_of_bars column.
We also supply a name = "monthly change" because we want monthly change to appear when a user hovers on the chart. That wasn’t a consideration with ggplot.
library(highcharter)
hchart(total_employ_hc,
type = "column",
pointWidth = 5,
hcaes(x = date,
y = monthly_change,
color = color_of_bars),
name = "monthly change") %>%
hc_title(text = "Monthly Employment Change") %>%
hc_xAxis(type = "datetime") %>%
hc_yAxis(title = list(text = "monthly change (thousands)")) %>%
hc_exporting(enabled = TRUE)
Let’s stay in the highcharter world and visualize how each sector changed in the most recent month, which is July of 2018.
First, we isolate the most recent month by filtering on the last date. We also don’t want the ADP Estimate and filter that out as well.
empl_monthly_change %>%
filter(date == (last(date))) %>%
filter(sector != "ADP Estimate")
# A tibble: 14 x 4
# Groups: sector [14]
sector date employees monthly_change
<chr> <date> <dbl> <dbl>
1 Nonfarm Employment 2018-07-01 149128 157
2 Construction 2018-07-01 7242 19
3 Retail/Trade 2018-07-01 15944 7.1
4 Prof/Bus Serv 2018-07-01 21019 51
5 Manufact 2018-07-01 12751 37
6 Financial 2018-07-01 8568 -5
7 Mining 2018-07-01 735 -4
8 Health Care 2018-07-01 23662 22
9 Wholesale Trade 2018-07-01 5982. 12.3
10 Transportation 2018-07-01 27801 15
11 Info Sys 2018-07-01 2772 0
12 Leisure 2018-07-01 16371 40
13 Gov 2018-07-01 22334 -13
14 Other Services 2018-07-01 5873 -5
That filtered flow has the data we want, but we have two more tasks. First, we want to arrange this data so that it goes from smallest to largest. If we did not do this, our chart would still “work”, but the column heights would not progress from lowest to highest.
Second, we need to create another column to hold colors for negative and positive values, with the same ifelse() logic as we used before.
emp_by_sector_recent_month <-
empl_monthly_change %>%
filter(date == (last(date))) %>%
filter(sector != "ADP Estimate") %>%
arrange(monthly_change) %>%
mutate(color_of_bars = if_else(monthly_change > 0, "#6495ed", "#ffe6ea"))
Now we pass that object to hchart, set type = "column", and choose our hcaes values. We want to label the x-axis with the different sectors and do that with hc_xAxis(categories = emp_by_sector_recent_month$sector).
last_month <- lubridate::month(last(empl_monthly_change$date),
label = TRUE,
abbr = FALSE)
hchart(emp_by_sector_recent_month,
type = "column",
pointWidth = 20,
hcaes(x = sector,
y = monthly_change,
color = color_of_bars),
showInLegend = FALSE) %>%
hc_title(text = paste(last_month, "Employment Change", sep = " ")) %>%
hc_xAxis(categories = emp_by_sector_recent_month$sector) %>%
hc_yAxis(title = list(text = "Monthly Change (thousands)"))
Finally, let’s compare the ADP Estimates to the actual Nonfarm payroll numbers since 2017. We start with filtering again.
adp_bls_hc <-
empl_monthly_change %>%
filter(sector == "ADP Estimate" | sector == "Nonfarm Employment") %>%
filter(date >= "2017-01-01")
We create a column to hold different colors, but our logic is not whether a reading is positive or negative. We want to color the ADP and BLS reports differently.
adp_bls_hc <-
adp_bls_hc %>%
mutate(color_of_bars =
ifelse(sector == "ADP Estimate", "#ffb3b3", "#4d94ff"))
head(adp_bls_hc)
# A tibble: 6 x 5
# Groups: sector [1]
sector date employees monthly_change color_of_bars
<chr> <date> <dbl> <dbl> <chr>
1 ADP Estimate 2017-01-01 123253. 245. #ffb3b3
2 ADP Estimate 2017-02-01 123533. 280. #ffb3b3
3 ADP Estimate 2017-03-01 123655 122. #ffb3b3
4 ADP Estimate 2017-04-01 123810. 155. #ffb3b3
5 ADP Estimate 2017-05-01 124012. 202. #ffb3b3
6 ADP Estimate 2017-06-01 124166. 154. #ffb3b3
tail(adp_bls_hc)
# A tibble: 6 x 5
# Groups: sector [1]
sector date employees monthly_change color_of_bars
<chr> <date> <dbl> <dbl> <chr>
1 Nonfarm Employment 2018-02-01 148125 324 #4d94ff
2 Nonfarm Employment 2018-03-01 148280 155 #4d94ff
3 Nonfarm Employment 2018-04-01 148455 175 #4d94ff
4 Nonfarm Employment 2018-05-01 148723 268 #4d94ff
5 Nonfarm Employment 2018-06-01 148971 248 #4d94ff
6 Nonfarm Employment 2018-07-01 149128 157 #4d94ff
And now we pass that object to our familiar hchart flow.
hchart(adp_bls_hc,
type = 'column',
hcaes(y = monthly_change,
x = date,
group = sector,
color = color_of_bars),
showInLegend = FALSE
) %>%
hc_title(text = "ADP v. BLS") %>%
hc_xAxis(type = "datetime") %>%
hc_yAxis(title = list(text = "monthly change (thousands)")) %>%
hc_add_theme(hc_theme_flat()) %>%
hc_exporting(enabled = TRUE)
That’s all for today. This post is also available on at Reproducible Finance where you can hover on the highcharter visualizations and see how the interactivity works
See you next time when we'll breakdown a GDP report and visualize with highcharter. Happy coding!