登录查看更多内容

Highcharting Jobs Friday

Jonathan Regenstein

AI in Financial Services

发布日期: 2018年8月11日

Today, in honor of last week’s jobs report from the Bureau of Labor Statistics (BLS), we will visualize jobs data with ggplot2 and then, more extensively with highcharter. Full post with the interactive charts is also available here. Our aim is to explore highcharter and its similarity with ggplot and to create some nice interactive visualizations. In the process, we will cover how to import BLS data from FRED and then wrangle it for visualization. We won’t do any modeling or statistical analysis today, though it wouldn’t be hard to extend this script into a forecasting exercise. One nice thing about today’s code flow is that it can be refreshed and updated on each BLS release date.

Let’s get to it!

We will source our data from FRED and will use the tq_get() function from tidyquant which enables us to import many data series at once in tidy, tibble format. We want to get total employment numbers, ADP estimates, and the sector-by-sector numbers that make up total employment. Let’s start by creating a tibble to hold the FRED codes and more intuitive names for each data series.

library(tidyverse)
library(tidyquant)

codes_names_tbl <- tribble(
        ~ symbol, ~ better_names,
        "NPPTTL", "ADP Estimate",
        "PAYEMS", "Nonfarm Employment",
        "USCONS", "Construction",
        "USTRADE",   "Retail/Trade",
        "USPBS",  "Prof/Bus Serv",
        "MANEMP",    "Manufact",
        "USFIRE",    "Financial",
        "USMINE",   "Mining",
        "USEHS",    "Health Care",
        "USWTRADE",    "Wholesale Trade",
        "USTPU",    "Transportation",
        "USINFO",    "Info Sys",
        "USLAH",    "Leisure",
        "USGOVT",    "Gov",
        "USSERV",    "Other Services"
)

Now we pass the symbol column to tq_get().

fred_empl_data <- 
  tq_get(codes_names_tbl$symbol,                         
         get = "economic.data",             
         from = "2007-01-01")

We have our data but look at the symbol column.

fred_empl_data %>% 
  group_by(symbol) %>% 
  slice(1)
# A tibble: 15 x 3
# Groups:   symbol [15]
   symbol   date         price
   <chr>    <date>       <dbl>
 1 MANEMP   2007-01-01  14008 
 2 NPPTTL   2007-01-01 115437.
 3 PAYEMS   2007-01-01 137497 
 4 USCONS   2007-01-01   7725 
 5 USEHS    2007-01-01  18415 
 6 USFIRE   2007-01-01   8389 
 7 USGOVT   2007-01-01  22095 
 8 USINFO   2007-01-01   3029 
 9 USLAH    2007-01-01  13338 
10 USMINE   2007-01-01    706 
11 USPBS    2007-01-01  17834 
12 USSERV   2007-01-01   5467 
13 USTPU    2007-01-01  26491 
14 USTRADE  2007-01-01  15443.
15 USWTRADE 2007-01-01   5969.

The symbols are the FRED codes, which are unrecognizable unless you have memorized how those codes map to more intuitive names. Let’s replace them with the better_names column of codes_names_tbl. We will do this with a left_join(). (This explains why I labeled our original column as symbol - it makes the left_join() easier.) Special thanks to Jenny Bryan for pointing out this code flow!

fred_empl_data %>% 
  left_join(codes_names_tbl, 
            by = "symbol" ) %>% 
  select(better_names, everything(), -symbol) %>% 
  group_by(better_names) %>% 
  slice(1)
# A tibble: 15 x 3
# Groups:   better_names [15]
   better_names       date         price
   <chr>              <date>       <dbl>
 1 ADP Estimate       2007-01-01 115437.
 2 Construction       2007-01-01   7725 
 3 Financial          2007-01-01   8389 
 4 Gov                2007-01-01  22095 
 5 Health Care        2007-01-01  18415 
 6 Info Sys           2007-01-01   3029 
 7 Leisure            2007-01-01  13338 
 8 Manufact           2007-01-01  14008 
 9 Mining             2007-01-01    706 
10 Nonfarm Employment 2007-01-01 137497 
11 Other Services     2007-01-01   5467 
12 Prof/Bus Serv      2007-01-01  17834 
13 Retail/Trade       2007-01-01  15443.
14 Transportation     2007-01-01  26491 
15 Wholesale Trade    2007-01-01   5969.

That looks much better, but we now have a column called price, that holds the monthly employment observations, and a column called better_names, that holds the more intuitive group names. Let’s change those column names to employees and sector.

fred_empl_data <- 
fred_empl_data %>% 
  left_join(codes_names_tbl, 
            by = "symbol" ) %>% 
  select(better_names, everything(), -symbol) %>% 
  rename(employees = price, sector = better_names)

head(fred_empl_data)
# A tibble: 6 x 3
  sector       date       employees
  <chr>        <date>         <dbl>
1 ADP Estimate 2007-01-01   115437.
2 ADP Estimate 2007-02-01   115527.
3 ADP Estimate 2007-03-01   115647 
4 ADP Estimate 2007-04-01   115754.
5 ADP Estimate 2007-05-01   115809.
6 ADP Estimate 2007-06-01   115831.

fred_empl_data has the names and organization we want, but it still has the raw number of employees per month. We want to visualize the month-to-month change in jobs numbers, which means we need to perform a calculation on our data and store it in a new column. We use mutate() to create the new column and calculate monthly change with value - lag(value, 1). We are not doing any annualizing or seasonality work here - it’s a simple substraction. For yearly change, it would be value - lag(value, 12).

empl_monthly_change <- 
  fred_empl_data  %>% 
  group_by(sector) %>% 
  mutate(monthly_change = employees - lag(employees, 1)) %>% 
  na.omit()

Our final data object empl_monthly_change is tidy, has intuitive names in the group column, and has the monthly change that we wish to visualize. Let’s build some charts.

We will start at the top and use ggplot to visualize how total non-farm employment (Sorry farmers. Your jobs don’t count, I guess) has changed since 2007. We want an end-user to quickly glance at the chart and find the months with positive jobs growth and negative jobs growth. That means we want months with positive jobs growth to be one color, and those with negative jobs growth to be another color. There is more than one way to accomplish this, but I like to create new columns and then add geoms based on those columns. (Check out this post by Freddie Mac’s Len Kiefer for another way to accomplish this by nesting ifelsestatements in ggplot's aesthetics. In fact, if you like data visualization, check out all the stuff that Len writes.)

Let’s walk through how to create columns for shading by positive or negative jobs growth. First, we are looking at total employment here, so we call filter(sector == "Nonfarm Employment") to get only total employment.

Next, we create two new columns with mutate(). The first is called col_pos and is formed by if_else(monthly_change > 0, monthly_change,...). That logic is creating a column that holds the value of monthly change if monthly change is positive, else it holds NA. We then create another column called col_neg using the same logic.

empl_monthly_change %>% 
  filter(sector == "Nonfarm Employment") %>% 
   mutate(col_pos = 
           if_else(monthly_change > 0, 
                  monthly_change, as.numeric(NA)),
         col_neg = 
           if_else(monthly_change < 0, 
                  monthly_change, as.numeric(NA))) %>% 
  dplyr::select(sector, date, col_pos, col_neg) %>% 
  head()
# A tibble: 6 x 4
# Groups:   sector [1]
  sector             date       col_pos col_neg
  <chr>              <date>       <dbl>   <dbl>
1 Nonfarm Employment 2007-02-01      85      NA
2 Nonfarm Employment 2007-03-01     214      NA
3 Nonfarm Employment 2007-04-01      59      NA
4 Nonfarm Employment 2007-05-01     153      NA
5 Nonfarm Employment 2007-06-01      77      NA
6 Nonfarm Employment 2007-07-01      NA     -30

Have a quick look at the col_pos and col_neg columns and make sure they look right. col_pos should have only positive and NA values, col_neg shoud have only negative and NA values.

Now we can visualize our monthly changes with ggplot, adding a separate geom for those new columns.

empl_monthly_change %>% 
  filter(sector == "Nonfarm Employment") %>% 
   mutate(col_pos = 
           if_else(monthly_change > 0, 
                  monthly_change, as.numeric(NA)),
         col_neg = 
           if_else(monthly_change < 0, 
                  monthly_change, as.numeric(NA))) %>%
  ggplot(aes(x = date)) +
  geom_col(aes(y = col_neg),
               alpha = .85,
               fill = "pink",
               color = "pink") +
  geom_col(aes(y = col_pos),
               alpha = .85,
               fill = "lightgreen",
               color = "lightgreen") +
  ylab("Monthly Change (thousands)") +
  labs(title = "Monthly Private Employment Change",
       subtitle = "total empl, since 2008",
       caption = "inspired by @lenkiefer") +
  scale_x_date(breaks = scales::pretty_breaks(n = 10)) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1),
        plot.title = element_text(hjust = 0.5),
        plot.subtitle = element_text(hjust = 0.5),
        plot.caption = element_text(hjust=0))

That plot is nice, but it’s static!

Let’s head to highcharter and create an interactive chart that responds when we hover on it. By way of brief background, highcharter is an R hook into the fantastic highcharts JavaScript library. It’s free for personal use but a license is required for commercial use.

One nice feature of highcharter is that we can use very similar aesthetic logic to what we used for ggplot. It’s not identical, but it’s similar and let’s us work with tidy data.

Before we get to the highcharter logic, we will add one column to our tibble to hold the color scheme for our positive and negative monthly changes. Notice how this is different from the ggplot flow above where we create one column to hold our positive changes for coloring and one column to hold our negative changes for coloring.

I want to color positive changes light blue and negative changes pink, and put the rgb codes for those colors directly in the new column. The rgb code for light blue is “#6495ed” and for pink is “#ffe6ea”. Thus we use ifelse to create a column called color_of_bars that holds “#6495ed” (light blue) when monthly_change is postive and “#ffe6ea” (pink) when it’s negative.

total_employ_hc <- 
  empl_monthly_change %>% 
  filter(sector == "Nonfarm Employment") %>% 
  mutate(color_of_bars = ifelse(monthly_change > 0, "#6495ed", "#ffe6ea"))

head(total_employ_hc)
# A tibble: 6 x 5
# Groups:   sector [1]
  sector             date       employees monthly_change color_of_bars
  <chr>              <date>         <dbl>          <dbl> <chr>        
1 Nonfarm Employment 2007-02-01    137582             85 #6495ed      
2 Nonfarm Employment 2007-03-01    137796            214 #6495ed      
3 Nonfarm Employment 2007-04-01    137855             59 #6495ed      
4 Nonfarm Employment 2007-05-01    138008            153 #6495ed      
5 Nonfarm Employment 2007-06-01    138085             77 #6495ed      
6 Nonfarm Employment 2007-07-01    138055            -30 #ffe6ea

Now we are ready to start the highcharter flow.

We start by calling hchart to pass in our data object. Note the similarity to ggplot where we started with ggplot.

Now, intead of waiting for a call to geom_col, we set type = "column" to let hchart know that we are building a column chart. Next, we use hcaes(x = date, y = monthly_change, color = color_of_bars) to specify our aesthetics. Notice how we can control the colors of the bars from values in the color_of_bars column.

We also supply a name = "monthly change" because we want monthly change to appear when a user hovers on the chart. That wasn’t a consideration with ggplot.

library(highcharter)
hchart(total_employ_hc,  
                type = "column", 
                pointWidth = 5,
                hcaes(x = date,
                      y = monthly_change,
                      color = color_of_bars),
                name = "monthly change") %>%
  hc_title(text = "Monthly Employment Change") %>%
  hc_xAxis(type = "datetime") %>%
  hc_yAxis(title = list(text = "monthly change (thousands)")) %>%
  hc_exporting(enabled = TRUE)

Let’s stay in the highcharter world and visualize how each sector changed in the most recent month, which is July of 2018.

First, we isolate the most recent month by filtering on the last date. We also don’t want the ADP Estimate and filter that out as well.

empl_monthly_change %>% 
filter(date == (last(date))) %>%
filter(sector != "ADP Estimate")
# A tibble: 14 x 4
# Groups:   sector [14]
   sector             date       employees monthly_change
   <chr>              <date>         <dbl>          <dbl>
 1 Nonfarm Employment 2018-07-01   149128           157  
 2 Construction       2018-07-01     7242            19  
 3 Retail/Trade       2018-07-01    15944             7.1
 4 Prof/Bus Serv      2018-07-01    21019            51  
 5 Manufact           2018-07-01    12751            37  
 6 Financial          2018-07-01     8568            -5  
 7 Mining             2018-07-01      735            -4  
 8 Health Care        2018-07-01    23662            22  
 9 Wholesale Trade    2018-07-01     5982.           12.3
10 Transportation     2018-07-01    27801            15  
11 Info Sys           2018-07-01     2772             0  
12 Leisure            2018-07-01    16371            40  
13 Gov                2018-07-01    22334           -13  
14 Other Services     2018-07-01     5873            -5

That filtered flow has the data we want, but we have two more tasks. First, we want to arrange this data so that it goes from smallest to largest. If we did not do this, our chart would still “work”, but the column heights would not progress from lowest to highest.

Second, we need to create another column to hold colors for negative and positive values, with the same ifelse() logic as we used before.

emp_by_sector_recent_month <- 
  empl_monthly_change  %>% 
  filter(date == (last(date))) %>%
  filter(sector != "ADP Estimate") %>% 
  arrange(monthly_change) %>% 
  mutate(color_of_bars = if_else(monthly_change > 0, "#6495ed", "#ffe6ea"))

Now we pass that object to hchart, set type = "column", and choose our hcaes values. We want to label the x-axis with the different sectors and do that with hc_xAxis(categories = emp_by_sector_recent_month$sector).

last_month <- lubridate::month(last(empl_monthly_change$date),
                                 label = TRUE, 
                                 abbr = FALSE)

hchart(emp_by_sector_recent_month,  
                type = "column", 
                pointWidth = 20,
                hcaes(x = sector,
                      y = monthly_change,
                      color = color_of_bars),
                showInLegend = FALSE) %>% 
  hc_title(text = paste(last_month, "Employment Change", sep = " ")) %>%
  hc_xAxis(categories = emp_by_sector_recent_month$sector) %>%
  hc_yAxis(title = list(text = "Monthly Change (thousands)"))

Finally, let’s compare the ADP Estimates to the actual Nonfarm payroll numbers since 2017. We start with filtering again.

adp_bls_hc <- 
  empl_monthly_change %>% 
  filter(sector == "ADP Estimate" | sector == "Nonfarm Employment") %>% 
  filter(date >= "2017-01-01")

We create a column to hold different colors, but our logic is not whether a reading is positive or negative. We want to color the ADP and BLS reports differently.

adp_bls_hc <- 
  adp_bls_hc %>% 
  mutate(color_of_bars = 
           ifelse(sector == "ADP Estimate", "#ffb3b3", "#4d94ff"))

head(adp_bls_hc)
# A tibble: 6 x 5
# Groups:   sector [1]
  sector       date       employees monthly_change color_of_bars
  <chr>        <date>         <dbl>          <dbl> <chr>        
1 ADP Estimate 2017-01-01   123253.           245. #ffb3b3      
2 ADP Estimate 2017-02-01   123533.           280. #ffb3b3      
3 ADP Estimate 2017-03-01   123655            122. #ffb3b3      
4 ADP Estimate 2017-04-01   123810.           155. #ffb3b3      
5 ADP Estimate 2017-05-01   124012.           202. #ffb3b3      
6 ADP Estimate 2017-06-01   124166.           154. #ffb3b3      
tail(adp_bls_hc)
# A tibble: 6 x 5
# Groups:   sector [1]
  sector             date       employees monthly_change color_of_bars
  <chr>              <date>         <dbl>          <dbl> <chr>        
1 Nonfarm Employment 2018-02-01    148125            324 #4d94ff      
2 Nonfarm Employment 2018-03-01    148280            155 #4d94ff      
3 Nonfarm Employment 2018-04-01    148455            175 #4d94ff      
4 Nonfarm Employment 2018-05-01    148723            268 #4d94ff      
5 Nonfarm Employment 2018-06-01    148971            248 #4d94ff      
6 Nonfarm Employment 2018-07-01    149128            157 #4d94ff

And now we pass that object to our familiar hchart flow.

hchart(adp_bls_hc, 
       type = 'column', 
       hcaes(y = monthly_change,
             x = date,
             group = sector, 
             color = color_of_bars),
       showInLegend = FALSE
       ) %>% 
  hc_title(text = "ADP v. BLS") %>%
  hc_xAxis(type = "datetime") %>%
  hc_yAxis(title = list(text = "monthly change (thousands)")) %>%
  hc_add_theme(hc_theme_flat()) %>%
  hc_exporting(enabled = TRUE)

That’s all for today. This post is also available on at Reproducible Finance where you can hover on the highcharter visualizations and see how the interactivity works

See you next time when we'll breakdown a GDP report and visualize with highcharter. Happy coding!

要查看或添加评论，请登录

Jonathan Regenstein的更多文章

If you want to understand how to succeed in the hedge fund world, read these two books

2019年12月1日

If you want to understand how to succeed in the hedge fund world, read these two books

Happy holidays to all! I wanted to share brief thoughts on two excellent books that found their way to my book stack…

9 条评论
Momentum investing with R

2019年5月30日

Momentum investing with R

After an extended hiatus, Reproducible Finance is back! We’ll celebrate by changing focus a bit and coding up an…
Some Sector Analysis with R

2019年2月18日

Some Sector Analysis with R

Welcome to the second installment of Reproducible Finance 2019! In the previous post, we looked back on the daily…

2 条评论
Looking back on 2018 Reproducible Finance with R part 1

2019年1月14日

Looking back on 2018 Reproducible Finance with R part 1

Welcome to Reproducible Finance 2019! It’s a new year, a new beginning, the Earth has completed one more trip around…
Fund Flows via the FT

2019年1月9日

Fund Flows via the FT

There’s a great piece on the FT today about algos and fund flows. It’s author, Robin Wigglesworth, has become one of my…

3 条评论
Rolling Origin Sampling

2018年12月28日

Rolling Origin Sampling

Today, we continue our work on sampling so that we can run models on subsets of our data and then test the accuracy of…

2 条评论
Rsampling Fama French + Quant Finance Contest

2018年12月17日

Rsampling Fama French + Quant Finance Contest

Today we will continue our work on Fama French factor models, but more as a vehicle to explore some of the awesome…
Fama French with R: Managing Multiple Models + Cyber Monday Book Discount

2018年11月26日

Fama French with R: Managing Multiple Models + Cyber Monday Book Discount

Today, we will return to the Fama French (FF) model of asset returns and use it as a proxy for fitting and evaluating…

1 条评论
Reproducible Finance, the book!

2018年10月29日

Reproducible Finance, the book!

I’m thrilled to announce the release of my new book Reproducible Finance with R: Code Flows and Shiny Apps for…

12 条评论
Visualizing GDP with R

2018年8月30日

Visualizing GDP with R

Today we will take a look at the GDP data that is released every quarter or so by the Bureau of Economic Analysis BEA…

See all articles

Highcharting Jobs Friday

Jonathan Regenstein

AI in Financial Services

Jonathan Regenstein的更多文章

社区洞察

其他会员也浏览了

Random Thoughts on Analyst Firm Research and why you must thread carefully

Qlik - one to rule them all!

EMPLOYERS/HIRING MANAGERS EXPLOITING JOB APPLICANTS

Tips, Hints and Insights into all things data..

How to create your Data Science portfolio

Forget About Being Data Driven in Recruitment

Rethinking Labor Market Metrics

?? Power BI Interview QnA Series - Day 13! ??

?? Power BI Interview QnA Series - Day 22! ??

?? Power BI Interview QnA Series - Day 14! ??

Jonathan Regenstein的更多文章

If you want to understand how to succeed in the hedge fund world, read these two books

Momentum investing with R

Some Sector Analysis with R

Looking back on 2018 Reproducible Finance with R part 1

Fund Flows via the FT

Rolling Origin Sampling

Rsampling Fama French + Quant Finance Contest

Fama French with R: Managing Multiple Models + Cyber Monday Book Discount

Reproducible Finance, the book!

Visualizing GDP with R

社区洞察

其他会员也浏览了

Random Thoughts on Analyst Firm Research and why you must thread carefully

Qlik - one to rule them all!

EMPLOYERS/HIRING MANAGERS EXPLOITING JOB APPLICANTS

Tips, Hints and Insights into all things data..

How to create your Data Science portfolio

Forget About Being Data Driven in Recruitment

Rethinking Labor Market Metrics

?? Power BI Interview QnA Series - Day 13! ??

?? Power BI Interview QnA Series - Day 22! ??

?? Power BI Interview QnA Series - Day 14! ??