UPDATE Multiple postgreSQL Table Records in Parellel

February 27, 2013
By

(This article was first published on NERD PROJECT » R project posts, and kindly contributed to R-bloggers)

Unfortunately the RpostgreSQL package (I’m pretty sure other SQL DBs as well) doesn’t have a provision to UPDATE multiple records (say a whole data.frame) at once or allow placeholders making the UPDATE a one row at a time ordeal, so I built a work around hack to do the job in parellel.  The big problem was that you have to open and close the connections with every iteration or you will exceed max connections since it goes through every row.

First the function for connecting, updating, and closing the DB:

update <- function(i) {
    drv <- dbDriver("PostgreSQL")
    con <- dbConnect(drv, dbname="db_name", host="localhost", port="5432", user="chris", password="password")
    txt <- paste("UPDATE data SET column_one=",data$column_one[i],",column_two=",data$column_two[i]," where id=",data$id[i])
    dbGetQuery(con, txt)
    dbDisconnect(con)
}

Then run the query:

library("foreach")
library("doMC")

registerDoMC()

foreach(i = 1:length(data$column_one), .inorder=FALSE,.packages="RPostgreSQL")%dopar%{
    update(i)
}

QED


To leave a comment for the author, please follow the link and comment on his blog: NERD PROJECT » R project posts.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.