Monthly Archives: July 2014

Creating Reproducible Software Environments with Packrat

July 15, 2014
By

Open science has grown tremendously in the past few years. While there’s stilla long way to go, the availability of data, software, and other materials is making it possible to re-use these products to expand upon previous work and apply them to new areas. Through responsible conduct of research (RCR) …

Read more »

Preparing Big Data for Analysis in R

July 15, 2014
By
Preparing Big Data for Analysis in R

by Yaniv Mor, Co-founder & CEO of Xplenty How do you get Big Data ready for R? Gigabytes or terabytes of raw data may need to be combined, cleaned, and aggregated before they can be analyzed. Processing such large amounts of data used to require installing Hadoop on a cluster of servers, not to mention coding MapReduce jobs in...

Read more »

Presentations and video of the 5th meeting

July 15, 2014
By

Great success for the 5th MilanoR meeting. At links below, you find speech presentations. Please leave a comment! Welcome Presentation by Nicola Sturaro, consultant at Quantide Singular Spectrum Analysis With Rssa by Maurizio Sanarico, Chief Data Scientist at SDG consulting … Continue reading →

Read more »

Average dissertation and thesis length, take two

July 15, 2014
By
Average dissertation and thesis length, take two

About a year ago I wrote a post describing average length of dissertations at the University of Minnesota. I've been meaning to expand that post by adding data from masters theses since the methods for gathering/parsing the records are transferable. This post provides some graphics and links to R code for evaluating dissertation (doctorate) and

Read more »

Consistent naming conventions in R

July 15, 2014
By
Consistent naming conventions in R

Naming conventions in R are famously anarchic, with no clear winner and multiple conventions in use simultaneously in the same package. This has been written about before, in a lucid article in the R Journal, a detailed exploration of names in R source code hosted on CRAN and general discussion on stackoverflow. Basically, there are 5 naming...

Read more »

Simple user interface in R to get login details

July 15, 2014
By
Simple user interface in R to get login details

Occasionally I have to connect to services from R that ask for login details, such as databases. I don't like to store my login details in the R source code file, instead I would prefer to enter the my login details when I execute the code.Fortunately,...

Read more »

Parallel Distance Matrix Calculation with RcppParallel

July 14, 2014
By
Parallel Distance Matrix Calculation with RcppParallel

The RcppParallel package includes high level functions for doing parallel programming with Rcpp. For example, the parallelFor function can be used to convert the work of a standard serial “for” loop into a parallel one. This article describes using RcppParallel to compute pairwise distances for each row in an input data matrix and return an n x n lower-triangular distance matrix which...

Read more »

Computing an Inner Product with RcppParallel

July 14, 2014
By
Computing an Inner Product with RcppParallel

The RcppParallel package includes high level functions for doing parallel programming with Rcpp. For example, the parallelReduce function can be used aggreggate values from a set of inputs in parallel. This article describes using RcppParallel to parallelize the inner-product example previously posted to the Rcpp Gallery. Serial Version First the serial version of computing the inner product. For this we use a...

Read more »

implementing reproducible research [short book review]

July 14, 2014
By
implementing reproducible research [short book review]

As promised, I got back to this book, Implementing reproducible research (after the pigeons had their say). I looked at it this morning while monitoring my students taking their last-chance R exam (definitely last chance as my undergraduate R course is not reconoduced next year). The book is in fact an edited collection of papers

Read more »

Implementing mclapply() on Windows: a primer on embarrassingly parallel computation on multicore systems with R

July 14, 2014
By

An easy way to run R code in parallel on a multicore system is with the mclapply() function. Unfortunately, mclapply() does not work on Windows machines because the mclapply() implementation relies on forking and Windows does not support forking. For me, this is somewhat of a headache because I am used to using mclapply(), and

Read more »

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)