Big data (useR! 2011)

Posted on August 18, 2011 by csgillespie in R bloggers | 0 Comments

[This article was first published on Why? » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Unfortunatley, I missed the first and last talks.

My notes from a session on Thursday morning

J. Demmler – Challenges of working with a large database of routinely
collected health data

The SAIL data bank holds over 1.9 billion (anonymous) entries. To use the data for research, they need to ensure that proper data security is observed. For example, secure data transport. All analysis is done with a secure environment. Files are moved into the environment via an FTP client

Why R? No advanced SQL options available, so using DB2 allows loops. Also R is great for data pre-cleaning and is suitable for the heavy analysis. To connect to the SAIL database, they need to use the RODBC package. SQL queries are run from within R, however SQL scripts are kept in separate files since they are “reviewed”.

Lots of errors in data, e.g. units.

John Bryant – Demographic: classes and methods for data about populations

Existing data structures for population type data:

array: messy code;
data frames: not that natural for this type of code;
demography package: not really extensible.

Target audience for this new package: applied statisticians, social scientists. Not programmers. Core to this package is the Demographic class: S4 object, specialized array with associated meta data.

To leave a comment for the author, please follow the link and comment on their blog: Why? » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Big data (useR! 2011)

J. Demmler – Challenges of working with a large database of routinely
collected health data

John Bryant – Demographic: classes and methods for data about populations

Related

J. Demmler – Challenges of working with a large database of routinely collected health data

John Bryant – Demographic: classes and methods for data about populations

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

J. Demmler – Challenges of working with a large database of routinely
collected health data

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)