Metadata : from PostgreSQL comments to R labels

April 18, 2019
By

(This article was first published on r.iresmi.net, and kindly contributed to R-bloggers)

Metadata are an essential part of a robust data science workflow ; they record the meaning of each variable : its units, quality, allowed range, how we collect it, when it’s been recorded etc. Data without metadata are practically worthless. Here we will show how to transfer the metadata from PostgreSQL to R.

In PostgreSQL metadata can be stored in comments with the statements COMMENT ON COLUMN... or COMMENT ON TABLE.... So I hope your tables have this nice comments, and you can see them in PgAdmin for example. But what about R ?

In R metadata can be assigned as attributes of any object and mainly as « labels ». We will use the Hmisc package which provides functions to manage labels. Another interesting package is sjlabelled. You may also have seen labels when importing labelled data from SPSS for example.

library(Hmisc)
library(tidyverse)
library(RPostgreSQL)
library(glue)

cnx <- dbConnect(dbDriver("PostgreSQL"),
                 user = "***",
                 password = "***",
                 host = "***",
                 dbname = "***",
                 port = 5432
)

Usually we got some data this way :

cog <- dbGetQuery(cnx,
"SELECT
  *
FROM ref_cog.france2018
LIMIT 10")

We can create a function that will query the metadata of the table in information_schema.columns and add it to the data frame ; the function expects a data frame, the name of the schema.table from which we get the comments and a connection handler. It will return the data frame with labels :

#' Add attributes to a dataframe from metadata read in the PostgreSQL database
#'
#' @param df dataframe
#' @param schema_table "schema.table" from which to read the comments
#' @param cnx a database connexion from RPostgreSQL::dbConnect()
#'
#' @return a dataframe with attributes
#'
#' @examples \dontrun{add_metadata(iris, "public.iris", cnx)}
add_metadata <- function(df, schema_table, cnx) {
  
  meta <- str_match(schema_table, "^(.*)\\.(.*)$") %>% 
    glue_sql(
      "SELECT 
        column_name,    
        pg_catalog.col_description(
          format('%s.%s', isc.table_schema, isc.table_name)::regclass::oid,
                 isc.ordinal_position) AS column_description
      FROM information_schema.columns AS isc
      WHERE isc.table_schema = {s[2]}
        AND isc.table_name = {s[3]};",
      s = .,
      .con = cnx) %>% 
    dbGetQuery(cnx, .)
  
  label(df, self = FALSE) <- colnames(df) %>% 
    enframe() %>% 
    left_join(meta, by = c("value" = "column_name")) %>% 
    pull(column_description)
  
  df
}

Now we would do :

cog <- dbGetQuery(cnx,
  "SELECT
    *
  FROM ref_cog.france2018
  LIMIT 10") %>% 
  add_metadata("ref_cog.france2018", cnx)

You can see the metadata in the column headings of the RStudio viewer with View(cog) or with contents(cog). Or with :

cog %>% 
  label() %>%
  enframe()

Or lastly, for one column :

label(cog$cheflieu)

We can also search for information in the variable names or in the labels with another function :

search_var <- function(df, keyword) {
  df %>% 
    label() %>%
    enframe() %>% 
    rename(variable = name,
           metadata = value) %>% 
    filter_all(any_vars(str_detect(., regex(keyword, ignore_case = TRUE))))
}

search_var(cog, "canton")

To leave a comment for the author, please follow the link and comment on their blog: r.iresmi.net.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)