HR Analytics Starter Kit Part 2 - Intro to R
Richard Rosenow
Keeping the People in People Analytics | VP, Strategy at One Model | Speaker, Podcast Guest, Advisor
HR Analytics Starter Kit Part 2 - Intro to R
Welcome to part 2 of the HR Analytics Starter Kit! My goal in writing these has been to share with you some of the most substantial articles and resources I’ve found while exploring the HR analytics space to help you get up to speed. For those of you arriving here at part 2 first, there’s not a particular sequence to reading these, but here is the link to part I in case you feel compelled to start there.
In part 1 I walked through some of the most substantial articles I had found in three groupings that I labeled Excitement (get excited about HR analytics), Implementation (how to think about developing an HR analytics function), and Example (examples of HR analytics in action). Here in part 2 I’d like to review some of the tools to help you start performing HR analytics with a focus on R.
Tools for Performing HR Analytics
I’m a huge Excel nerd, but to perform analysis on large datasets, Excel starts to slow down. Setting up multiple statistical tests in Excel can also be time consuming and complicated. If you don’t have embedded analytics within your current HR management software then you might want to think about picking up a more robust tool to analyze, interpret, and visualize your data.
Some of the most popular tools that I’ve seen HR practitioners use are Excel, Tableau, R, SAS, SPSS and Python, with a few recent articles stating that R and Python are leading the pack as far as popularity. There’s not a golden standard for analytics work and in the end it’s a question of capability, cost, and even more so preference. My recommendation to someone just starting down this path would be to take a look at R.
R is an open source statistical programming language. To rephrase, R is FREE software that you could download today that lets you run analysis on large datasets quickly. Just to reiterate – free! I want to pause on that and recognize how rare it is for HR to get access to a cutting edge tool at no cost. It’s one of the benefits of HR being a last mover into the data science space . Well known companies have been using R for a long time now (Facebook, Twitter, and Google to name a few popular ones) and the applications of R across the business are profound.
I understand that asking you to consider learning a programming language to analyze data sounds intimidating and I'll admit that R has a steep learning curve. So in the sections below, I want to lay out an on-ramp of resources to ease you into R, point you in the direction of some early technical resources, and then review the few substantial technical examples I have found that apply R to HR solutions.
Introduction to R
We're starting off with two videos; the first 90 seconds and the second 45 minutes. Watch the first one now if you're relatively new to R and bookmark the other for later.
This first one is put together by Revolution Analytics (which was recently purchased by Microsoft) and covers R at a high level. This puts some context around why you might be interested in learning or working with it. If this is one of the first times you’re hearing about R, it’d be worth watching before moving forward.
This second video is more technical and the speaker John Cook provides a great history and context for the language. As the title suggests he frames up the good, the bad, and the ugly of R and this mindset sets the stage well for someone starting down this path. He also provides some of the best advice I’ve heard for working with R's steep learning curve. He says that:
“R is a domain specific language, and so you have to understand the domain. Learning R purely as a language would be like learning PHP as a language without being interested in Web Development”.
In other words, if you just learn R for R’s sake, you’re probably going to run out of motivation. My advice would be to find an HR problem you want to tackle, maybe one you’ve already worked through in excel, and try to apply R to it to get to the same results.
If I can add anything to his explanation, I would say that whatever it is you want to do in R, you can do it. The beauty of R being open source and modular is that experts in every field are working with the same system and adding to it every day. That and Google is your best friend (“i.e. How do I run regression in R?”). There’s a massive and incredibly helpful community out there working through every question you could possibly run into.
Code school is a fascinating website. They’ve built the coding prompts right into the lessons which allows for a seamless learning experience. For someone who just wants to see what coding in R is like, the introduction they provide is a low commitment learning tool that make the language accessible and reduces some of the fears around picking it up.
The best technical resource for learning R is to download the software and poke around. R is free to download and use and quick to set up. By itself, R will do what you need, but R Studio is a program that sits on top of R and provides you with more point and click options for analysis instead of just running code through text entry like you would in base R. From what I've seen R Studio is recommended almost across the board for R among beginners and experts.
Technical Resources
For those of you interested in potentially picking up R, these are some of the best early resources I found.
R for Data Science by Hadley Wickham
Hadley Wickham is nothing short of famous in the R world. He's the Chief Scientist at R Studio (founder) and creator of the tidyverse set of packages. His (graciously) open source book on using R for Data Science is one of the best places to start if you're exploring this tool.
How to transition from Excel to R by Tony Ojeda
One of the most helpful articles when I got started. I mentioned I love Excel, but it’s more of an addiction than anything else. Above all this article helped assure me that I could still do in R what I already knew how to do in Excel.
The R Podcast by Eric Nantz
The R podcast is thorough in it's explanations. Eric Nantz is a statistician with an unmatched knack for explaining the nuances and verbiage that goes along with learning R. I spent a few months struggling through R before coming across his podcast. Listening to his breakdown of the core functions of the system helped pull the disparate pieces into place. He gives a great history of R, context for the work, and his website includes follow-along video applications to help bring you up to speed.
R Programming by Roger Peng
For a beginner there’s not a better book on R. This book by Roger D. Peng breaks down the fundamentals of R in an accessible format and his follow-along lesson are incredible. It’s available in print at Lulu.com or you can download it by donation on Leanpub.
R Programming Coursera - John Hopkins
Every "Starting R" resource or message board that I’ve seen recommends the John Hopkins Coursera course as the gold star of learning R. Somewhat of a duplicate of the last resource, this course is taught by Roger Peng and his lessons are accessible and easy to follow. The benefit of the video course is getting to see the code and then execute along with the video. Sample exercises and a capstone project take it to the next level.
I personally have a love – procrastinate relationship with Coursera. I have started so many courses only to let them drift. I'm sure it's beneficial to sign up and go through the entire course, but picking out the parts you find most interesting or lectures covering troubles you're having is also a very valid option. I’ve started this one again on March 15th and I’m hoping with this public display of commitment I can push through it.
HR examples of R
Here's the core of why this is in the HR Analytics Starter Kit. From what I can tell there are unfortunately not many walkthrough examples of using R in HR that are publicly available, but that might be a question to explore for another post. Until then, here are the best ones I've found.
People Analytics - An Example Using R by Lyndon Sundmark
We start with the best. Lyndon Sundmark put together a mock data set to develop some examples of HR analytics in application. I thank him for the effort because it's leaps and bounds the best public example of applying R to an HR problem that I've come across. His post includes links to his R code, a walk-through of the steps he took, and the .csv original data file.
For anyone looking to understand what performing HR analytics could look like with R, this is the first example to check out. For anyone looking to produce an HR analytics example to help the community learn the techniques, this article sets the bar for what should be included.
Criss-Crossing the Org Chart: Predicting Colleague Interactions at Facebook
On the other hand this public presentation from Facebook is probably the best example of a company applying R to explore an HR issue. It doesn't get as deep into explaining how or why they developed the code, but it still shows deep technical details for a real-world application. Many large corporations are applying R to problems like this, but very few have talked openly about it.
Is there a Gender Gap in Florida's Government? by Charles McGuinness
RPubs is a public site where you can publish your work from R. I’m not sure if Charles McGuinness expected attention for it when he posted this to RPubs, but when I stumbled across it I bookmarked it immediately. Charles uses a public dataset from Florida state employees then cleans, analyzes, and visualizes the data. I thought this was an excellent and fairly straightforward example of how to use R to perform basic analysis on an employee dataset.
R Helps with Employee Churn by Pasha Roberts
This article reviewing the work and the original article by Pasha Roberts gives a glimpse into what you can do with HR analytics when you start getting more rigorous with your analysis. Pasha's article on the methodology behind Employee Churn and the corresponding Github gives us a taste for what a predictive analytics consulting firm like Talent Analytics is able to produce.
Predictive Analytics World: Workforce
To wrap up R resources for HR, I want to touch on Predictive Analytics World: Workforce. If you’re interested in HR analytics and want to attend a conference to network and learn about advancements, there are an amazing number of meetups and conferences and this would require a much longer post to touch on all of them in even a few sentences. PAW: Workforce stands out to me as one of the only conferences that has a dedicated track focused on analysts designed to provide them with the cutting edge technical skills needed to produce predictive insights.
The Predictive Analytics World series of conference as a whole are data science oriented conferences. In addition to the main conference track for PAW Workforce, there are standalone full day sessions on R, Predictive modeling, and even uplift modeling (a technique I fully believe HR needs to steal from marketing) which are taught by industry experts. If you’re going to be at PAW Workforce or if you're thinking about going, let me know and I look forward to meeting up with you out there.
Wrap-up
That wraps up this section of the HR Analytics Starter Kit for now. If you were new to R to start and you're now interested in R, I'm excited for you! Being open source, there's a huge community and a wealth of other resources out there for you to take advantage of in your learning. I'm still getting started on my journey here as well, so if you find other resources that worked well for you that you think should be included in this part of the Starter Kit, I'd love to hear about them in the comments.
As far as R in HR goes, I've found that there are still very few public resources available. There's a chance I've missed some that are hanging out in the corners of the web and if I have please send them over, I'd love to hear more. There's a lot of talk about HR analytics right now, but not a lot of step-by-step practical examples or public datasets for new learners looking to build their skill-set. If anyone reading this feels like they have the resources or knowledge to create more examples, please contact me. I'd love to help you publish.
To everyone, a huge thank you for the support on my first few articles. I hope you took something away that was helpful and please forward this to a friend or teammate who might enjoy it.
I'm also looking forward to hearing your general thoughts on these articles and the blog in general, so please reach out to me here or on Twitter with any comments, suggestions, or questions.
- Richard Rosenow
https://www.dhirubhai.net/in/richardrosenow
https://www.twitter.com/RichardRosenow
My other posts:
- Analyzing Employee Turnover - Predictive Methods
- Analyzing Employee Turnover - Descriptive Methods
- In Defense of Middle Measures: The Use of Constructs in HR Analytics
- HR Analytics Starter Kit Part 1 - Intro to HR analytics
- HR Analytics Starter Kit - Part 3 - Podcasts
- HR has Last Mover Advantage in HR Analytics
HR Technology and Operations leader | SAP Success Factors | Workday HCM Business Transformation | Digital Transformation| RPA| Artificial Intelligence
6 个月Thanks Richard, This is extremely valuable!!
People Analytics | Industrial-Organizational Psychology | Employee Surveys | Data Analysis | Excel | Employee Experience | KPI Analysis
3 年Thank you so much for putting this smorgasbord together Richard. I'm working through the how to transition from excel to R article right now and it is so refreshingly accessible. I could easily put together a cheat sheet of functions based on this article to get started with common functions. Everything you catalogued is still so relevant even after 5 years!!
Workforce Planning & Analytics
6 年Richard, I keep coming back to these resources you have put together. Extremely valuable, always. Thank You!