Bias in Observational Studies – Sensitivity Analysis with R package episensr

April 18, 2015
When it’s time to interpret the study results from your observational study, you have to estimate if the effect measure you obtained is the truth, if it’s due to bias (systematic error, the effect measure’s precision), or if it’s due to chance (random error, the effect measure’s validity) (Rothman and Greenland, 2008, pp115-134). Every study … Continue reading...

Parallel R with BatchJobs

March 28, 2015
Parallelizing R with BatchJobs – An example using k-means Gord Sissons, Feng Li Many simulations in R are long running. Analysis of statistical algorithms can generate workloads that run for hours if not days tying up a single computer. Given the amount of time R programmers can spend waiting for results, getting acquainted parallelism makes

Growing some Trees

March 18, 2015
Consider here the dataset used in a previous post, about visualising a classification (with more than 2 features), > MYOCARDE=read.table( + "http://freakonometrics.free.fr/saporta.csv", + header=TRUE,sep=";") The default classification tree is > arbre = rpart(factor(PRONO)~.,data=MYOCARDE) > rpart.plot(arbre,type=4,extra=6) We can change the options here, such as the minimum number of observations, per node > arbre = rpart(factor(PRONO)~.,data=MYOCARDE, + control=rpart.control(minsplit=10)) > rpart.plot(arbre,type=4,extra=6) or...

How to Make a Histogram with ggplot2

March 12, 2015
In our previous post you learned how to make histograms with the hist() function. You can also make a histogram with ggplot2, “a plotting system for R, based on the grammar of graphics”. This post will focus on making a Histogram With ggplot2. Want to learn more? Discover the DataCamp tutorials. Step One. Check That The post

Visualising a Classification in High Dimension

March 6, 2015
So far, when discussing classification, we’ve been playing on my toy-dataset (actually, I should no claim it’s mine, it is inspired by the one used in the introduction of Boosting, by Robert Schapire and Yoav Freund). But in ral life, there are more observations, and more explanatory variables.With more than two explanatory variables, it starts to be more complicated...

Getting a statistics education: Review of the MSc in Statistics (Sheffield)

February 14, 2015
Some background:I started using statistics for my research sometime in 1999 or 2000. I was a student at Ohio State, Linguistics, and I had just gotten interested in psycholinguistics. I knew almost nothing ...

R package “fishdynr”

February 1, 2015
The fishdynr package allows for the construction of some basic population dynamics models commonly used in fisheries science. Included are models of a single cohort, cohortSim, and a more complex iterative model that incorporates a stock-recruitment re...

Top 77 R posts for 2014 (+R jobs)

January 7, 2015
R-bloggers.com is 5 years old this month! In celebration, this post share links to the top 77 most read R posts of 2014 (+stats on R-bloggers, + top R jobs for the beginning of 2015)

Canonical Correlation Analysis on Imaging

January 5, 2015
In imaging, we deal with multivariate data, like in array form with several spectral bands. And trying to come up with interpretation across correlations of its dimensions is very challenging, if not impossible. For example let's recall the number of s...

The average Stripe employee! Congrats to Alyssa!

January 2, 2015
$The average Stripe employee! Congrats to Alyssa!$

Recently, my colleague and fellow blogger Alyssa Frazee accepted a job at Stripe. All of us at JHU Biostat are happy for her, yet sad to see her go. While perusing Stripe’s website, I found the About page, where each employee has a photo of themselves. I’ve been playing around with some PCA and decompositions,