2017-01 Authored Package Updates

January 8, 2017
By

(This article was first published on R – rud.is, and kindly contributed to R-bloggers)

The rest of the month is going to be super-hectic and it’s unlikely I’ll be able to do any more to help the push to CRAN 10K, so here’s a breakdown of CRAN and GitHub new packages & package updates that I felt were worth raising awareness on:

epidata

I mentioned this one last week but it wasn’t really a package announcement post. epidata is now on CRAN and is a package to pull data from the Economic Policy Institute (U.S. gov economic data, mostly). Their “hidden” API is well thought out and the data has been nicely curated (and seems to update monthly). It makes it super easy to do things like the following:

library(epidata)
library(tidyverse)
library(stringi)
library(hrbrmisc) # devtools::install_github("hrbrmstr/hrbrmisc")

us_unemp <- get_unemployment("e")

glimpse(us_unemp)
## Observations: 456
## Variables: 7
## $ date            <date> 1978-12-01, 1979-01-01, 1979-02-01, 1979-03-0...
## $ all             <dbl> 0.061, 0.061, 0.060, 0.060, 0.059, 0.059, 0.05...
## $ less_than_hs    <dbl> 0.100, 0.100, 0.099, 0.099, 0.099, 0.099, 0.09...
## $ high_school     <dbl> 0.055, 0.055, 0.054, 0.054, 0.054, 0.053, 0.05...
## $ some_college    <dbl> 0.050, 0.050, 0.050, 0.049, 0.049, 0.049, 0.04...
## $ college         <dbl> 0.032, 0.031, 0.031, 0.030, 0.030, 0.029, 0.03...
## $ advanced_degree <dbl> 0.021, 0.020, 0.020, 0.020, 0.020, 0.020, 0.02...

us_unemp %>%
  gather(level, rate, -date) %>%
  mutate(level=stri_replace_all_fixed(level, "_", " ") %>%
           stri_trans_totitle() %>%
           stri_replace_all_regex(c("Hs$"), c("High School")),
         level=factor(level, levels=unique(level))) -> unemp_by_edu

col <- ggthemes::tableau_color_pal()(10)

ggplot(unemp_by_edu, aes(date, rate, group=level)) +
  geom_line(color=col[1]) +
  scale_y_continuous(labels=scales::percent, limits =c(0, 0.2)) +
  facet_wrap(~level, scales="free") +
  labs(x=NULL, y="Unemployment rate",
       title=sprintf("U.S. Monthly Unemployment Rate by Education Level (%s)", paste0(range(format(us_unemp$date, "%Y")), collapse=":")),
       caption="Source: EPI analysis of basic monthly Current Population Survey microdata.") +
  theme_hrbrmstr(grid="XY")

us_unemp %>%
  select(date, high_school, college) %>%
  mutate(date_num=as.numeric(date)) %>%
  ggplot(aes(x=high_school, xend=college, y=date_num, yend=date_num)) +
  geom_segment(size=0.125, color=col[1]) +
  scale_x_continuous(expand=c(0,0), label=scales::percent, breaks=seq(0, 0.12, 0.02), limits=c(0, 0.125)) +
  scale_y_reverse(expand=c(0,100), label=function(x) format(as_date(x), "%Y")) +
  labs(x="Unemployment rate", y="Year ↓",
       title=sprintf("U.S. monthly unemployment rate gap (%s)", paste0(range(format(us_unemp$date, "%Y")), collapse=":")),
       subtitle="Segment width shows the gap between those with a high school\ndegree and those with a college degree",
       caption="Source: EPI analysis of basic monthly Current Population Survey microdata.") +
  theme_hrbrmstr(grid="X") +
  theme(panel.ontop=FALSE) +
  theme(panel.grid.major.x=element_line(size=0.2, color="#2b2b2b25")) +
  theme(axis.title.x=element_text(family="Arial", face="bold")) +
  theme(axis.title.y=element_text(family="Arial", face="bold", angle=0, hjust=1, margin=margin(r=-14)))

(right edge is high school, left edge is college…I’ll annotate it better next time)

censys

Censys is a search engine by one of the cybersecurity research partners we publish data to at work (free for use by all). The API is moderately decent (it’s mostly a thin shim authentication layer to pass on Google BigQuery query strings to the back-end) and the R package to interface to it censys is now on CRAN.

waffle

The seminal square pie chart package waffle has been updated on CRAN to work better with recent ggplot2 2.x changes and has some additional parameters you may want to check out.

cdcfluview

The viral package cdcfluview has had some updates on the GitHub version to add saner behaviour when specifying dates and had to be updated as the CDC hidden API switched to all https URLs (major push in .gov-land to do that to get better scores on their cyber report cards). I’ll be adding some features before the next CRAN push to enable retrieval of additional mortality data.

sergeant

If you work with Apache Drill (if you don’t, you should), the sergeant package (GitHub) will help you whip it into shape. I’ve mentioned it before on the blog but it has a nigh-complete dplyr interface now that works pretty well. It also has a direct REST API interface and RJDBC interface plus many helper utilities that help you avoid typing SQL strings to get cluster status info. Once I add the ability to create parquet files with it I’ll push it up to CRAN.

The one thing I’d like to do with this package is support any user-defined functions (UDFs in Drill-speak) folks have written. So, if you have a UDF you’ve written or use and you want it wrapped in the package, just drop an issue and I’ll layer it in. I’ll be releasing some open source cybersecurity-related UDFs via the work github in a few weeks.

zkcmd

Drill (in non-standalone mode) relies on Apache Zookeeper to keep everything in sync and it’s sometimes necessary to peek at what’s happening inside the zookeeper cluster, so sergeant has a sister package zkcmd that provides an R interface to zookeeper instances.

ggalt

Some helpful folks tweaked ggalt for better ggplot2 2.x compatibility (#ty!) and I added a new geom_cartogram() (before you ask if it makes warped shapefiles: it doesn’t) that restores the old (and what I believe to be the correct/sane/proper) behaviour of geom_map(). I need to get this on CRAN soon as it has both fixes and many new geoms folks will want to play with in a non-GitHub context.

FIN

There have been some awesome packages released by others in the past month+ and you should add R Weekly to your RSS feeds if you aren’t following it already (there are other things you should have there for R updates as well, but that’s for another blog). I’m definitely looking forward to new packages, visualizations, services and utilities that will be coming this year to the R community.

To leave a comment for the author, please follow the link and comment on their blog: R – rud.is.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)