R Challenge #1: World Population
https://www.rawpixel.com

R Challenge #1: World Population

I'm proud to say I've joined the esteemed ranks of Code Challenge Authors. In this and a set of future articles, I'll share some backstory behind each challenge. My peers have supplied other code challenges you might find interesting: Javascript ... Python ... Java ... Github ... HTML ... SQL ... SQL for Data Science ... PHP

R Challenge #1 : Import the World Population database

Importing CSV into an R data object would seem straightforward - but you quickly run into nuances that will foul your data. Data types, factors, missing data, incomplete lines and more cause importing to be a nightmare. In this episode, I challenge you to import a CSV file.

Here's My Solution

# import the United Nations world population database
# all fields should be integer or numeric
# except variant = factor, location = character


worldPop <- read.csv("https://population.un.org/wpp/Download/Files/1_Indicators%20(Standard)/CSV_FILES/WPP2019_TotalPopulationBySex.csv",
? ? ? ? ?colClasses = c("integer","character","integer", "factor", "integer", "numeric","numeric","numeric","numeric","numeric"))

        

What You Don't See

Writing courses for LinkedIn (and other online learning portals) requires sample code to be both public domain and accessible - AND - it should be interesting to work with. The UN world population data fills all three, but MAN, it was difficult to find. I spent more time looking for the dataset than I did actually writing the code.

An option would be to download the data and include it with the example files. Which would contribute to a multi-gigabyte download. Some of you don't have that kind of patience and bandwidth, so it's easier to grab it directly from the code.

An Alternate Solution

I have an ongoing debate with my peers regarding base R vs tidy R. Should I teach the tidy verse (read_csv)? Or should I teach base R (read.csv)? So far, I've focused on base R - but I'm aware there are cleaner solutions available in the tidyverse. For example, compare this code to the above.



library(readr)

WPP2019_TotalPopulationBySex <- read_csv("https://population.un.org/wpp/Download/Files/1_Indicators%20(Standard)/CSV_FILES/WPP2019_TotalPopulationBySex.csv",
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?col_types = "icifnnnnnn")
        

It looks similar, but read_csv is faster and makes better decisions about factors and missing data. Plus it creates an object tailored to other routines in the tidyverse.

Choosing a Problem

Choosing data is one aspect of authoring a course. Choosing a problem is another. Good challenges require some thought, possibly resulting in failure the first time through. I believe it's possible to import the data within two or three attempts.

And...good challenges are focused. I hope to illustrate one concept - maybe two. If a challenge requires multiple solutions, an early failure obscures the remaining lessons. Less complex challenges are easier to focus on a concept - but they may be less interesting to solve.

How about you?

Do you have some opinions on this challenge? Please share them in the comments below.

mnr


Joe Casabona

I help busy solopreneur parents save 12+ hours per week by putting systems in place and automating more of their work.

3 年

Welcome to the pack! I love this course format!

回复
Monika Wahi

Epidemiology & Biostatistics Consultant a/k/a Data Scientist | Exclusive and innovative solutions for data science challenges in public health, research and education

3 年

Hey Mark Niemann-Ross I love your videos! They are so fun! Hey Daniel Wanjiru - take a look at this video to see the R version of one of those SAS data steps where you use "input" and "cards". Hey healthcare analytics people: If you want to play with smaller datasets, try copying and pasting from this site with data about hospitals in the US: https://www.ahd.com/state_statistics.html

Yinghui Liu

PhD, Researcher

3 年

Since the size of .csv file is relatively large (20M), I tried the following code to download the file into the default folder used by web browser, then can be readed locally: url <- "https://population.un.org/wpp/Download/Files/1_Indicators%20(Standard)/CSV_FILES/WPP2019_TotalPopulationBySex.csv" browseURL(url) Wish it can help people with limited internet bandwidth or need to access the file frequently.

Ray Villalobos

Generative AI, Prompt Engineering and Full Stack Development. LinkedIn Top Voice. Senior Staff Instructor at LinkedIn, Instructor at Stanford University.

3 年

I think it's Scotty's turn to show you the secret handshake.

Barron Stone

Product Manager | Engineer | Instructor | Veteran

3 年

Congrats Mark! Welcome to the club!!! I had Metallica blasting in the background when I started playing your first challenge video. I misheard your opening words of "data science" as "data sucks." Seemed like an odd way to kick off an R video! ??

要查看或添加评论,请登录

Mark Niemann-Ross的更多文章

  • Documenting My Code ... For Me

    Documenting My Code ... For Me

    There are two signs of old age: old age, and ..

  • R Meets Hardware

    R Meets Hardware

    R is a programming language for statistical computing and data visualization. It has been adopted in the fields of data…

    2 条评论
  • Party Buzz Kill: modifying data

    Party Buzz Kill: modifying data

    So Steve (SQL), Marsha (C), Bob (Python), and I (R) are at this party. We have TOTALLY cleared the room, especially now…

    2 条评论
  • Rain - Evapotranspiration = mm Water

    Rain - Evapotranspiration = mm Water

    "Eeee-VAP-oooo-TRANS-PURR-ation," I savor the word as I release it into our conversation. I'm still at the party with…

  • Party Buzz Kill: Data Storage

    Party Buzz Kill: Data Storage

    I'm at this party where Bob and Marsha and I are discussing the best languages for programming a Raspberry Pi. Bob…

    5 条评论
  • R Waters My Garden

    R Waters My Garden

    I'm at a party, and the topic of programming languages comes up. A quarter of the room politely leaves, another half…

    10 条评论
  • Caning and Naming

    Caning and Naming

    We've been back from Port Townsend for a week. Progress on the boat isn't as dramatic as it is when we're spending the…

    1 条评论
  • Irrigate with R and Raspberry Pi

    Irrigate with R and Raspberry Pi

    I’m working on my irrigation system. This requires a controller to turn it on and off.

    3 条评论
  • 5 Reasons to Learn Natural Language Processing with R

    5 Reasons to Learn Natural Language Processing with R

    Why learn R? Why learn Natural Language Processing? Here's five reasons..

    1 条评论
  • Performing Natural Language Processing with R

    Performing Natural Language Processing with R

    I recently released a course on Educative covering topics in Natural Language Processing. Different Learners -…

    1 条评论

社区洞察

其他会员也浏览了