parzer: Parse Messy Geographic Coordinates
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
parzer is a new package for handling messy geographic coordinates. The first version is now on CRAN, with binaries coming soon hopefully (see note about installation below). The package recently completed rOpenSci review.
parzer motivation
The idea for this package started with a tweet from Noam Ross (https://twitter.com/noamross/status/1070733367522590721) about 15 months ago.
The idea being that sometimes you have geographic coordinates in a messy format, or in many different formats, etc. You can think of it as being the package for geographic coordinates that lubridate is for dates.
I started off thinking about wrapping a Javascript library with Jeroen’s V8 R package, but then someone showed me or I found (can’t remember) some C++ code from back in 2006 that seemed appropriate. I figured I’d go down the C++ track instead of the Javascript track because I figured I could likely get better performance out of C++ and have slightly less install headaches for users.
Package installation
The package is on CRAN so you can use install.packages
install.packages("parzer")
However, since this package requires compilation you probably want a binary. Binaries are not available on CRAN yet. You can install a binary like
install.packages("parzer", repos = "https://dev.ropensci.org/") library(parzer)
Check out the package documentation to get started: https://docs.ropensci.org/parzer/
Package basics
The following is a summary of the functions in the package and what they do:
Parse latitude or longitude separately
- parse_lat
- parse_lon
Parse latitudes and longitudes at the same time
- parse_lon_lat
Parse into separate parts of degrees, minutes, seconds
- parse_parts_lat
- parse_parts_lon
Pull out separately degrees, minutes, seconds, or hemisphere
- pz_degree
- pz_minute
- pz_second
- parse_hemisphere
Add/subtract degrees, minutes, seconds
- pz_d
- pz_m
- pz_s
Some examples:
parse latitudes and longitudes
lats <- c("40.123°", "40.123N74.123W", "191.89", 12, "N45 04.25764") parse_lat(lats) #> Warning in pz_parse_lat(lat): invalid characters, got: 40.123n74.123w #> Warning in pz_parse_lat(lat): not within -90/90 range, got: 191.89 #> check that you did not invert lon and lat #> [1] 40.12300 NaN NaN 12.00000 45.07096 longs <- c("45W54.2356", "181", 45, 45.234234, "-45.98739874N") parse_lon(longs) #> Warning in pz_parse_lon(lon): invalid characters, got: -45.98739874n #> [1] -45.90393 181.00000 45.00000 45.23423 NaN
In the above examples you can see there’s a mix of valid coordinate values as well as invalid values. There’s a mix of types supported as well.
Sometimes you may want to parse a geographic coordinate into its
component parts; parse_parts_lat
and parse_parts_lon
are what you
need:
x <- c("191.89", 12, "N45 04.25764") parse_parts_lon(x) #> Warning in pz_parse_parts_lon(scrub(str)): invalid characters, got: n45 04.25764 #> deg min sec #> 1 191 53 23.99783 #> 2 12 0 0.00000 #> 3 NA NA NaN
Taking a cue from lubridate, we thought it would be useful to make it easier to add or subtract numbers for coordinates. Three functions help with this:
pz_d(31) #> 31 pz_d(31) + pz_m(44) #> 31.73333 pz_d(31) - pz_m(44) #> 30.26667 pz_d(31) + pz_m(44) + pz_s(59) #> 31.74972 pz_d(-121) + pz_m(1) + pz_s(33) #> -120.9742
Use cases
Check out the parzer use cases vignette on the docs site. Get in touch if you have a use case that might be good to add to that vignette.
Thanks
Thanks to the reviewers Maria Munafó and Julien Brun for their time invested in improving the package.
To Do
There’s more to do. We are thinking about dropping the Rcpp dependency, support parsing strings that have both latitude and longitude together, making error messages better, and more.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.