parzer: Parse Messy Geographic Coordinates

[This article was first published on rOpenSci - open tools for open science, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

parzer is a new package for
handling messy geographic coordinates. The first version is now on CRAN,
with binaries coming soon hopefully (see note about installation below).
The package recently completed rOpenSci
review
.

parzer motivation

The idea for this package started with a tweet from Noam
Ross

(https://twitter.com/noamross/status/1070733367522590721) about 15
months ago.

The idea being that sometimes you have geographic coordinates in a messy
format, or in many different formats, etc. You can think of it as being
the package for geographic coordinates that
lubridate is for
dates.

I started off thinking about wrapping a Javascript library with
Jeroen’s
V8 R package, but then
someone showed me or I found (can’t remember) some C++
code

from back in 2006 that seemed appropriate. I figured I’d go down the C++
track instead of the Javascript track because I figured I could likely
get better performance out of C++ and have slightly less install
headaches for users.

Package installation

The package is on CRAN so you can use install.packages

install.packages("parzer")

However, since this package requires compilation you probably want a
binary. Binaries are not available on CRAN yet. You can install a binary
like

install.packages("parzer", repos = "https://dev.ropensci.org/")

library(parzer)

Check out the package documentation to get started:
https://docs.ropensci.org/parzer/

Package basics

The following is a summary of the functions in the package and what they
do:

Parse latitude or longitude separately

  • parse_lat
  • parse_lon

Parse latitudes and longitudes at the same time

  • parse_lon_lat

Parse into separate parts of degrees, minutes, seconds

  • parse_parts_lat
  • parse_parts_lon

Pull out separately degrees, minutes, seconds, or hemisphere

  • pz_degree
  • pz_minute
  • pz_second
  • parse_hemisphere

Add/subtract degrees, minutes, seconds

  • pz_d
  • pz_m
  • pz_s

Some examples:

parse latitudes and longitudes

lats <- c("40.123°", "40.123N74.123W", "191.89", 12, "N45 04.25764")
parse_lat(lats)

#> Warning in pz_parse_lat(lat): invalid characters, got: 40.123n74.123w

#> Warning in pz_parse_lat(lat): not within -90/90 range, got: 191.89
#>   check that you did not invert lon and lat

#> [1] 40.12300      NaN      NaN 12.00000 45.07096

longs <- c("45W54.2356", "181", 45, 45.234234, "-45.98739874N")
parse_lon(longs)

#> Warning in pz_parse_lon(lon): invalid characters, got: -45.98739874n

#> [1] -45.90393 181.00000  45.00000  45.23423       NaN

In the above examples you can see there’s a mix of valid coordinate
values as well as invalid values. There’s a mix of types supported as
well.

Sometimes you may want to parse a geographic coordinate into its
component parts; parse_parts_lat and parse_parts_lon are what you
need:

x <- c("191.89", 12, "N45 04.25764")
parse_parts_lon(x)

#> Warning in pz_parse_parts_lon(scrub(str)): invalid characters, got: n45 04.25764

#>   deg min      sec
#> 1 191  53 23.99783
#> 2  12   0  0.00000
#> 3  NA  NA      NaN

Taking a cue from lubridate, we thought it would be useful to make it
easier to add or subtract numbers for coordinates. Three functions help
with this:

pz_d(31)
#> 31
pz_d(31) + pz_m(44)
#> 31.73333
pz_d(31) - pz_m(44)
#> 30.26667
pz_d(31) + pz_m(44) + pz_s(59)
#> 31.74972
pz_d(-121) + pz_m(1) + pz_s(33)
#> -120.9742

Use cases

Check out the parzer use
cases

vignette on the docs site. Get in
touch
if you have a use case
that might be good to add to that vignette.

Thanks

Thanks to the reviewers Maria Munafó and
Julien Brun for their time invested in
improving the package.

To Do

There’s more to do. We are thinking about dropping the Rcpp
dependency
, support
parsing strings that have both latitude and longitude
together
, making error
messages better
, and more.

To leave a comment for the author, please follow the link and comment on their blog: rOpenSci - open tools for open science.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)