Update of overviewR with new functions!

[This article was first published on R-post | Cosima Meyer, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

We have updated (and extended) overviewR with three new functions:

  • overview_plot
  • overview_heat
  • overview_na

You can also access a detailed overview of all functions in the CheatSheet:

overview_plot

overview_plot illustrates the information that is generated in overview_table in a ggplot graphic. All scope objects (e.g., countries) are listed on the y-axis where horizontal lines indicate the coverage across the entire time frame of the data (x-axis). This helps to spot gaps in the data for specific scope objects and outlines at what time point they occur.

data(toydata)
overview_plot(dat = toydata, id = ccode, time = year)

overview_heat

overview_heat takes a closer look at the time and scope conditions by visualizing the data coverage for each time and scope combination in a ggplot heat map. This function is best explained using an example. Suppose you have a dataset with monthly data for different countries and want to know if data is available for each country in every month. overview_heat intuitively does this by plotting a heat map where each cell indicates the coverage for that specific combination of time and scope (e,g., country-year). As illustrated below, the darker the cell is, the more coverage it has. The plot also indicates the relative or absolute coverage of each cell. For instance, Angola (“AGO”) in 1991 shows the coverage of 75%. This means that of all potential 12 months of coverage (12 months for one year), only 9 are covered.

toydata_red <- toydata[-sample(seq_len(nrow(toydata)), 64), ]

overview_heat(toydata_red,
ccode,
year,
perc = TRUE,
exp_total = 12)

overview_na

overview_na is a simple function that provides information about the content of all variables in your data, not only the time and scope conditions. It returns a horizontal ggplot bar plot that indicates the amount of missing data (NAs) for each variable (on the y-axis). You can choose whether to display the relative amount of NAs for each variable in percentage (the default) or the total number of NAs.

toydata_with_na <- toydata %>%
dplyr::mutate(year = ifelse(year < 1992, NA, year),
month = ifelse(month %in% c("Jan", "Jun", "Aug"), NA, month),
gdp = ifelse(gdp < 20000, NA, gdp))

overview_na(toydata_with_na)

overview_na(toydata_with_na, perc = FALSE)

To leave a comment for the author, please follow the link and comment on their blog: R-post | Cosima Meyer.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)