Monthly Archives: April 2013

Travis CI for R?

April 7, 2013
By
Travis CI for R?

I'm always worried about CRAN: a system maintained by FTP and emails from real humans (basically one of Uwe, Kurt or Prof Ripley). I'm worried for two reasons: the number of R packages is growing exponentially; time and time again I see frustrations ...

Read more »

Guide to accessing MS SQL Server and MySQL server on Mac OS X

April 6, 2013
By

Native GUI client access to MS-SQL and MySQL We can use Oracle SQL Developer with the jTDS driver to access Microsoft SQL Server. Note: jTDS version 1.3.0 did not work for me; I had to use version 1.2.6. Detailed instructions can be found here. We can use MySQL Workbench to access MySQL server. Setup is... Read more »

Mortality after paediatric heart surgery using public domain data

April 6, 2013
By
Mortality after paediatric heart surgery using public domain data

This post comes with some big health warnings. The recent events in Leeds highlight the difficulties faced in judging the results of surgery by individual hospital. A clear requirement is timely access to data in a form easily digestible by the public. Here I’ve scraped the publically available data from the central cardiac audit database

Read more »

Retirement : simulating wealth with random returns, inflation and withdrawals – Shiny web application

April 6, 2013
By
Retirement : simulating wealth with random returns, inflation and withdrawals – Shiny web application

Today, I want to share the Retirement : simulating wealth with random returns, inflation and withdrawals – Shiny web application (code at GitHub). This application was developed and contributed by Pierre Chretien, I only made minor updates. This is application is a great example of how easy it is to convert your R script into

Read more »

Worry about correctness and repeatability, not p-values

April 5, 2013
By
Worry about correctness and repeatability, not p-values

In data science work you often run into cryptic sentences like the following: Age adjusted death rates per 10,000 person years across incremental thirds of muscular strength were 38.9, 25.9, and 26.6 for all causes; 12.1, 7.6, and 6.6 for cardiovascular disease; and 6.1, 4.9, and 4.2 for cancer (all P < 0.01 for linear Related posts:

Read more »

Reconstructing Principal Component Analysis Matrix

April 5, 2013
By
Reconstructing Principal Component Analysis Matrix

PCA is widely used method for finding patterns in high-dimensional data. Whether you use it to compress large matrix or to remove one of the principal components in biological datasets, you’ll end up with the task of performing series of … Continue reading →

Read more »

Organise your data

April 5, 2013
By

Use R to specify factors, recode variables and begin by-group analyses. Video Files This file contains data on pain score after laparoscopic vs. open hernia repair. Age, gender and primary/recurrent hernia also included. The ultimate aim here is to work out which of these factors are associated with more pain after this operation. lap_hernia Script

Read more »

Properly “internationalized” regular expressions in R

April 5, 2013
By

We should pay special attention to writing a truly portable code that works in the same fashion under different locales and character encodings. Currently, R has two Regex engines, ERE (via TRE) and PRE (via PCRE). What is surprising, they…Read more ›

Read more »

Security in R: RAppArmor package & paper updates

April 5, 2013
By

This week version 0.8.3 of RAppArmor appeared on CRAN. RAppAmor is a package to dynamically enforce security policies and hardware restrictions in R on Linux systems. It currently supports Ubuntu 12.04+, Debian 7 and OpenSuse 12.1+. The readme page has more info, and helpful video tutorials to get you started. One important change in the ...

Read more »

Multiple pairwise comparisons for categorical predictors

April 5, 2013
By
Multiple pairwise comparisons for categorical predictors

Dale Barr (@datacmdr) recently had a nice blog post about coding categorical predictors, which reminded me to share my thoughts about multiple pairwise comparisons for categorical predictors in growth curve analysis. As Dale pointed out in his post, the R default is to treat the reference level of a factor as a...

Read more »