Take Care If Trying the RPostgres Package

March 16, 2018
By

(This article was first published on R – Win-Vector Blog, and kindly contributed to R-bloggers)

Take care if trying the new RPostgres database connection package. By default it returns some non-standard types that code developed against other database drivers may not expect, and may not be ready to defend against.


Danger

Danger, Will Robinson!

Trying the new package

One can try the newer RPostgres as a drop-in replacement for the usual RPostgreSQL.

That starts out okay. We can connect to the database and and pull a summary about remote data to R.

db <- DBI::dbConnect(
  RPostgres::Postgres(),
  host = 'localhost',
  port = 5432,
  user = 'johnmount',
  password = '')
## Warning: multiple methods tables found for 'dbQuoteLiteral'
d <- DBI::dbGetQuery(
  db, 
  "SELECT COUNT(1) FROM pg_catalog.pg_tables")
print(d)
##   count
## 1   177
ntables <- d$count[[1]]
print(ntables)
## integer64
## [1] 177

The result at first looks okay.

class(ntables)
## [1] "integer64"
typeof(ntables)
## [1] "double"
ntables + 1L
## integer64
## [1] 178
ntables + 1
## integer64
## [1] 178
is.numeric(ntables)
## [1] TRUE

But it is only okay, until it is not.

pmax(1L, ntables)
## [1] 8.744962e-322
pmin(1L, ntables)
## [1] 1
ifelse(TRUE, ntables, ntables)
## [1] 8.744962e-322
for(ni in ntables) {
  print(ni)
}
## [1] 8.744962e-322
unclass(ntables)
## [1] 8.744962e-322

If your code, or any package code you are using, perform any of the above calculations, your results will be corrupt and wrong. It is quite likely any code written before December 2017 (RPostgres‘s first CRAN distribution) would not have been written with the RPostgres "integer64 for all of my friends" design decision in mind.

Also note, RPostgres does not currently appear to write integer64 back to the database.

DBI::dbWriteTable(db, "d", d, 
                  temporary = TRUE, 
                  overwrite = TRUE)
DBI::dbGetQuery(db, "
  SELECT 
     column_name, 
     data_type, 
     numeric_precision, 
     numeric_precision_radix,
     udt_name
  FROM 
     information_schema.columns 
  WHERE 
     table_name = 'd'
  ")
##   column_name data_type numeric_precision numeric_precision_radix udt_name
## 1       count      real                24                       2   float4
DBI::dbDisconnect(db)

The work-around

The work-around is: add the argument bigint = "numeric" to your dbConnect() call. This is mentioned in the manual, but not the default and not called out in the package description or README. Or, of course, you could use RPostgreSQL.

To leave a comment for the author, please follow the link and comment on their blog: R – Win-Vector Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)