How to write an R Data Frame to an SQL Table

[This article was first published on Michael Toth's Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Frequently I find I need to perform an analysis that requires querying some values in a database table based on a series of IDs that might vary depending on some input. As an example, assume we have the following:

  • A table that contains historical stock prices for 2000 stocks for the last 30 years
  • Some input that contain’s a user’s portfolio of stock tickers

Often, we’ll want to pull the price history over a certain date range for all stocks in the user’s portfolio. We could of course query all values in the stock prices table and then subset, but this is incredibly inefficient and also means we can’t make use of any SQL aggregation functions in our query. Something I’ve done before when working in an SQL IDE is to create a temp table where I insert a list of the IDs that I am trying to look up, and then join on that table for my query. This is an ideal solution when we’re talking about looking up more than a few securities. It took me a while to find an easy way to do this directly in R, but it turns out to be quite simple. I’m hoping my solution helps anybody else who might have this same issue.

Assumptions:

  • A table called stock_prices that contains stock price history
  • A data frame called tickers that contains a list of stock tickers (column name = ticker)
  • Here I am using PostgreSQL, but this should work essentially the same for any SQL variant

Code

Start by setting up the connection:

library(RPostgreSQL)
drv <- dbDriver("PostgreSQL")
con <- dbConnect(drv, user='user', password='password', dbname='my_database', host='host')


Next create the temp table and insert values from our data frame. The key here is the dbWriteTable function which allows us to write an R data frame directly to a database table. The data frame's column names will be used as the database table's fields

# Drop table if it already exists
if (dbExistsTable(con, "temp_tickers"))
    dbRemoveTable(con, "temp_tickers")

# Write the data frame to the database
dbWriteTable(con, name = "temp_tickers", value = tickers, row.names = FALSE)


Finally, join the stock prices table to the table we just created and query the subsetted values

sql <- " 
    select sp.ticker, sp.date, sp.price
    from stock_prices sp
    join temp_tickers tt on sp.ticker = tt.ticker
    where date between '2000-01-01' and '2015-07-08'
"

results <- dbGetQuery(con, sql)

# Free up resources
dbDisconnect(con)
dbUnloadDriver(drv)


And that's it. It turned out to not be very complicated, and many may already know this, but it took me a while to figure out how this should be done. I spent a lot of time messing around with INSERT statements before scrapping that idea and coming up with this solution. Let me know if you find this helpful or if you have any ideas on how to do this better!

To leave a comment for the author, please follow the link and comment on their blog: Michael Toth's Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)