Using readr and tidyr R tidyverse packages
Danilo Torres
Data and System Development Manager | Data Analyst | Python | Power BI | Internal Audit | SQL | QlikView | ETL | Green Belt | JAVA | Scrum
Continuing our series of posts on the R language, to start coding we need to install the R application through the link https://cran.r-project.org/bin/windows/base/ (check your operating system). There is very good IDE called RStudio, but it's paid, giving you a trial period. It can be downloaded from the link https://posit.co/download/rstudio-desktop/.
With R installed, let's proceed with the demonstration of using two essential tidyverse packages called 'readr' and 'tidyr'.
# We install the tidyverse package, but we could have installed only 'readr'
install.packages("tidyverse")
# We load the readr package
library(readr)
# We load the .csv file into a Data Frame variable (works like a table) using the read_csv2 function since our file uses semicolon as a separator. For files that use comma as separator, use the 'read_csv()' function
companies <- read_csv2("C:/.../COMPANIES.csv")
If you need to specify another delimiter for the file, just use another function called 'read_delim()' and inform the character that should be used as separator.
# We can visualize a sample of the newly loaded data using head() function
head(companies)
领英推荐
# Let's load the tidyr package to clean and organize the data
library(tidyr)
# Through the 'separate' function we can split a column into one or more. In the case of the 'ADDRESS' column, we can split it by comma into other columns
companies <- companies %>%
separate(ADDRESS, c("NUMBER", "STREET", "CITY", "STATE", "COUNTRY"), ",")
The tidyr package has several other functions that can be explored. For more information, I suggest accessing the link https://livro.curso-r.com/7-3-tidyr.html
We can now visualize our data after the manipulations made:
# The 'View()' function generates a table view for the DataFrame
View(companies)
All packages have a lot to explore, so feel free to navigate through https://livro.curso-r.com/ and try some others functions. In the next post about the R language, we will talk about the other two essential tidyverse packages called dplyr and ggplot2. Stay tuned!