UPDATE - Cleaning addresses (with a new package)
Photo by Jac Alexandru via Unsplash

UPDATE - Cleaning addresses (with a new package)

If you have made use of code to simplify, clean, geocode, or round address coordinates, this package may be the one for you!


You may have seen my previous article on getting started simplifying street address fields prior to grouping them. Since that time, I have worked on improving the code functionality and flexibility...which resulted in the creation of a package I have named "cleanAddresses".

In addition to the function for simplifying street addresses, I chose to include a function to geocode addresses as well as round the geocoded coordinates (to protect individuals private information).

Let's take a look

The package can be installed from github by making use of the "remotes" package

install.packages("remotes") # install remotes if needed
library(remotes)

remotes::install_github("bell-samantha/Packages/cleanAddresses")
library(cleanAddresses)

Once installed, you can access all three of the functions within cleanAddresses:

  1. simplify_street()
  2. add_coord()
  3. round_coord()


simplify_street()

This function takes in a character vector of address text. and returns a same-length vector of simplified address text.

simplify_street(street, numWords)

myData$newField <- simplify_street(street = myData$rawStreetField, numWords = 2)

The entry data MUST start with street numbers but has the option to include or exclude City, State, and Zip fields. Any City, State, or Zip will be cut off in the simplified version. The result can be applied directly as a new column in a dataset if desired.

  • The parameter "street" takes in the vector of character street names (usually a column from a dataset)
  • The parameter "numWords" takes in the number of full words the user would like to allow to follow the street number and direction.


add_coords()

This function creates a simple address tibble that can be passed through censusxy::cxy_geocode() from the package "censusxy" to get x and y coordinates for each record.

This is best used after cleaning the street field with cleanAddresses::simplify_street(). Can be joined directly into a dataset.

  • The parameter "identifier" takes in the column containing unique record ids
  • The parameter "street" takes in the column containing street name and number
  • The parameter "city" takes in the column containing city name
  • The parameter "state" takes in the column containing state name
  • The parameter "zip" takes in the column containing zip codes
myCoordinates <- add_coord(
  street = myData$newField, 
  city = myData$cityField, 
  state = myData$stateField, 
  zip = myData$zipField, 
  identifier = myData$Id
)


round_coords()

This function takes in 2 vectors of coordinate values - one for lattitude and one for longitude. The result is a 4-column tibble the same length and order as the input values. The new coordinates can be accessed in column 3 and 4.

  • The parameter "lattitude" takes in the vector of lattitude values (usually a column from a dataset).
  • The parameter "longitude" takes in the vector of longitude values (usually a column from a dataset).
  • The parameter "distance" takes in the user's choice of the number of degree decimal places to round the coordinate. 1 degree, or zero decimal places, rounds to an accuracy of approximately 111km. Each additional decimal place is 10 times more accurate in distance.
# To round coordinates to an accuracy of approximately 1.11km.
# Returns 4 columns (two original, and two rounded)
round_coord(lattitude = myData$lat, longitude = myData$long, distance = 2)

# To get the rounded lattitude column only
round_coord(lattitude = myData$lat, longitude = myData$long, distance = 2)[,4]


I hope this helps!

Let me know if you find this package useful. Perhaps sharing my thoughts will inspire you to make some functions of you own!

Feedback and suggestions are much appreciated, as this is always a work in progress :-)

Happy Programming!

No alt text provided for this image


Cristin Larder

Sr Epidemiologist at Ingham County | Co-Owner at Larder Data Consulting, LLC

3 å¹´

Katie Larder We should try this.

要查看或添加评论,请登录

Samantha Bell的更多文章

  • Standardize and clean those phone numbers using the new CleanPhoneNumbers R package!

    Standardize and clean those phone numbers using the new CleanPhoneNumbers R package!

    Have some dirty phone numbers in your data? This package can help! THE TASK Many data analysts will encounter projects…

  • Grow your plot expertise in R with drag-and-drop from esquisse

    Grow your plot expertise in R with drag-and-drop from esquisse

    Ever felt overwhelmed by ggplot? Are you unsure of how to get started with building your own visuals in R? The esquisse…

  • Freshen up - Update your R version and packages from within R Studio!

    Freshen up - Update your R version and packages from within R Studio!

    Is it time for an update? If you can't remember the last time you updated R, the answer is most likely, "yes". Noticing…

  • Spot the difference - comparing tables in R

    Spot the difference - comparing tables in R

    Ever wondered how to compare code output without looking over each row and column by hand? This handy use of…

  • Tracking Progress in R

    Tracking Progress in R

    It sure does seem like "a watched pot never boils" when waiting for loops or mapped functions to complete many…

  • Simplifying and Grouping Address Fields Using R

    Simplifying and Grouping Address Fields Using R

    Trying to group records by street address can be a daunting task. Although hotspot analyses are a key part of writing…

    1 条评论
  • Tying it all together with stringr

    Tying it all together with stringr

    Manipulating strings and pulling patterns of text is a frequent coding task and can be a challenge. Among the many…

  • Exporting Multiple Pages to an Excel Workbook from R

    Exporting Multiple Pages to an Excel Workbook from R

    Reports exported from R language can become unwieldy as results quickly start to fill up your destination folders…

  • FUN FACT: find those duplicates!

    FUN FACT: find those duplicates!

    Using duplicated() in R I thought I would share this fun & helpful R function which can be used to easily find…

    1 条评论
  • Understanding the Chronic Optimist in Your Life

    Understanding the Chronic Optimist in Your Life

    In a world becoming increasingly aware of everyday anxieties, those of us who approach life with perpetual optimism can…

社区洞察

其他会员也浏览了