登录查看更多内容

bReakfast: Who Wants More R?

Mark Niemann-Ross

Author of "Stupid Machine" and educator at LinkedIn learning

发布日期: 2019年5月24日

Slice and Dice the StackOverflow Developer Survey

Stack Overflow surveys developers every year and their analysis is recommended reading. In addition, they publish the complete (anonymized) data set for further research.

Since I teach R (and Raspberry Pi) I'm interested in anything related to R learners. In the next couple of posts, I'll show you how I did research on the R dataset. (Are you interested in how to do this with the tidyverse? See how fellow LinkedIn Learning author Martin John Hadley accomplishes this task)

In the last post, I imported the survey and found the R programmers from a semi-colon delimited text string using grep and regular expressions.

Which Programmers Want To Learn More R?

I'm interested in creating awareness of my LinkedIn Courses on R programming. To do that, I'm going to advertise on a website - but which website will be the most valuable? I'd like to generate a table showing the amount of interest in R against languages people already know.

It's really important to clearly state the question. Which seems like a silly thing to point out - but if you aren't clear on where you're trying to go, how are you going to get there?

What I intend to do is ...

Calculate how many programmers use each language
Of the subset of programmers who want to learn more R, calculate how many programmers use each language
Divide the second step by the first step. This will give me a % of programmers who want to learn R listed by their currently used language.

So I need a list of languages used by developers from the StackOverflow developer survey

# I'll start with ...

survey_results_public$LanguageWorkedWith

# ...then split it at the string delimiter ";"...

strsplit(survey_results_public$LanguageWorkedWith, ";")

# ...this produces a list which I'll flatten...

unlist(strsplit(survey_results_public$LanguageWorkedWith, ";"))

# ...then count per language ...

table(unlist(strsplit(survey_results_public$LanguageWorkedWith, ";")))

# ...then convert to a data.frame and store in "totPop_lang"

totPop_Lang <- as.data.frame(table(unlist(strsplit(survey_results_public$LanguageWorkedWith, ";"))))

Here's links to video lessons on unlist, table, and as.data.frame.

totPop_Lang is now a data.frame that contains ...

Assembly Bash/Shell/PowerShell                     C                    C# 
    5833                 31991                 18017                 27097 

     C++               Clojure                  Dart                Elixir 
   20524                  1254                  1683                  1260 

  Erlang                    F#                    Go              HTML/CSS 
     777                   973                  7201                 55466
 
    Java            JavaScript                Kotlin           Objective-C 
   35917                 59219                  5620                  4191 

Other(s):                   PHP                Python                     R 
     7920                 23030                 36443                  5048 

    Ruby                  Rust                 Scala                   SQL 
    7331                  2794                  3309                 47544 

   Swift            TypeScript                   VBA           WebAssembly 
    5744                 18523                  4781                  1015

Next I'll count the languages used by programmers that want to learn more R. In this example, I've stored each element into successive variables. This is the sort of thing best done with pipelining ( i.e. %>%). I didn't use it here because I'm trying to keep the example clear.

step3a <- survey_results_public[grep(c("^R;|;R;|;R$"), survey_results_public$LanguageDesireNextYear), "LanguageWorkedWith"]
step3b <- strsplit(step3a, ";")
step3c <- unlist(step3b)
step3d <- table(step3c)
step3e <- as.data.frame(step3d)

step3e now contains a count by language, but only for programmers that want to learn R...

 Assembly Bash/Shell/PowerShell                     C                    C# 
      615                  2864                  1767                  1887 

      C++               Clojure                  Dart                Elixir 
     1871                   114                   145                   121 

   Erlang                    F#                    Go              HTML/CSS 
      110                   129                   467                  4139 

     Java            JavaScript                Kotlin           Objective-C 
     2755                  4017                   293                   269 

Other(s):                   PHP                Python                     R 
      684                  1662                  3987                  2541 

     Ruby                  Rust                 Scala                   SQL 
      524                   158                   341                  4454 

    Swift            TypeScript                   VBA           WebAssembly 
      314                  1032                   758                   108

Next, I merge the two data sets into one data.frame. I'm doing this to simplify the example so someone else has a chance of understanding what I'm doing...

lang_tot_R <- merge(totPop_Lang, step3e, 
                    by.x = "Var1", by.y = "step3c")

# then I clean up the names
names(lang_tot_R) <- c("Language", "worked with", "desire")

Here's video lessons on merge and names.

I'm interested in the relative interest among users of each language for learning R. So I divide the second set against the first....

# divide "desire" by "total population" and store in "quotient"
lang_tot_R$quotient <- lang_tot_R$desire / lang_tot_R$`worked with`

# sort by interest (quotient)
lang_tot_R <- lang_tot_R[order(lang_tot_R$quotient, decreasing = TRUE), ]

# convert the quotient to a percentage
lang_tot_R$quotient <- lang_tot_R$quotient * 100

And presto - I have the result...

                Language worked with desire  quotient
20                     R        5048   2541 50.336767
27                   VBA        4781    758 15.854424
9                 Erlang         777    110 14.157014
10                    F#         973    129 13.257965
19                Python       36443   3987 10.940373
28           WebAssembly        1015    108 10.640394
1               Assembly        5833    615 10.543460
23                 Scala        3309    341 10.305228
3                      C       18017   1767  9.807404
8                 Elixir        1260    121  9.603175
24                   SQL       47544   4454  9.368164
5                    C++       20524   1871  9.116157
6                Clojure        1254    114  9.090909
2  Bash/Shell/PowerShell       31991   2864  8.952518
17             Other(s):        7920    684  8.636364
7                   Dart        1683    145  8.615567
13                  Java       35917   2755  7.670462
12              HTML/CSS       55466   4139  7.462229
18                   PHP       23030   1662  7.216674
21                  Ruby        7331    524  7.147729
4                     C#       27097   1887  6.963871
14            JavaScript       59219   4017  6.783296
11                    Go        7201    467  6.485210
16           Objective-C        4191    269  6.418516
22                  Rust        2794    158  5.654975
26            TypeScript       18523   1032  5.571452
25                 Swift        5744    314  5.466574
15                Kotlin        5620    293  5.213523

...and a barplot of the results...

par(mar=c(11,4,4,4)) #increase margin
barplot(lang_tot_R$quotient,
        names.arg = lang_tot_R$Language,
        ylab = "% wanting to learn more R",
        main = "Who wants to learn more R?",
        las=2)

...that's the code. The plot is shown at the top of this article, but here it is in case it gets munged up.

Here are takeaways from this chart...

50% of R programmers want to learn more R
The next interesting groups are VBA, Erlang, and F#
Python programmers are a contented bunch - only 10% feel a need to learn R

So - perhaps I should be advertising on R-centric sites, followed by sites catering to VBA, Erlang, and F#

bReakfast is an ongoing look over my shoulder as I use R to explore data.

#rstats #linkedinlearning

bReakfast: Who Wants More R?

Mark Niemann-Ross

Author of "Stupid Machine" and educator at LinkedIn learning

Slice and Dice the StackOverflow Developer Survey

Which Programmers Want To Learn More R?

bReakfast is an ongoing look over my shoulder as I use R to explore data.

更多精彩文章

社区洞察

其他会员也浏览了

NetBox Custom Script Development Environment

Stayin’ Alive

Understanding Domain-Specific Languages: A Layman's Guide

5 Websites to get Better at Data Structures & Algorithms (DSA)

Learn Semantic Kernel with Jose Luis Latorre

WebAssembly and AI - Future in making

Cargo cult programming is killing the (Sri Lankan) software industry

Code like that !! with GPT- Q & A

Learn in 5 Minutes: Understanding Immutability, Shallow and Deep Comparisons & Their Effect on Reactivity

GitHub Presents AI-Powered Code Scanning Autofix to Streamline Security Remediation

Slice and Dice the StackOverflow Developer Survey

Which Programmers Want To Learn More R?

bReakfast is an ongoing look over my shoulder as I use R to explore data.

Documenting My Code ... For Me

2024年5月15日

R Meets Hardware

2024年5月8日

Party Buzz Kill: modifying data

2024年4月17日

Rain - Evapotranspiration = mm Water

2024年4月11日

Party Buzz Kill: Data Storage

2024年4月3日

R Waters My Garden

2024年3月27日

Caning and Naming

2024年3月26日

Irrigate with R and Raspberry Pi

2024年3月5日

5 Reasons to Learn Natural Language Processing with R

2024年2月13日

Performing Natural Language Processing with R

2024年2月6日

社区洞察

其他会员也浏览了

NetBox Custom Script Development Environment

Stayin’ Alive

Understanding Domain-Specific Languages: A Layman's Guide

5 Websites to get Better at Data Structures & Algorithms (DSA)

Learn Semantic Kernel with Jose Luis Latorre

WebAssembly and AI - Future in making

Cargo cult programming is killing the (Sri Lankan) software industry

Code like that !! with GPT- Q & A

Learn in 5 Minutes: Understanding Immutability, Shallow and Deep Comparisons & Their Effect on Reactivity

GitHub Presents AI-Powered Code Scanning Autofix to Streamline Security Remediation