Building a statistical significance testing web service powered by R

October 18, 2012
By

(This article was first published on Gary Sieling » R, and kindly contributed to R-bloggers)

R is a programming language focused on solving statistical and mathematical calculations. R programs often operate on large, in-memory data sets, which feels somewhat similar to database programming. Examples in the R Cookbook bear a resemblence to functional programming in clojure, as others have noted.

I’ve been exploring the language to gain insight into related, but disparate technologies that I use with regularity (e.g. Postgres), but for this to be really useful, I’d like to see R behind a webservice. Looking through the official website, there are many defunct attempts at using R in this manner, often abandoned once the maintainer finishes their masters.

A couple have survived, notably Rook and rApache. Rook is a web server inside of R, and rApache, as you might guess, is an Apache module that calls R. I’ve chosen rApache, as I’d like to have a battle-tested front-end for this – while R seems to have very committed maintainers, there do not seem to be very many of them, and I have yet to find examples of anyone running this as a production application.

Inspired by WolframAlpha’s APIs, I built a small web service to test statistical significance. In the future I intend to do tests on performance and security, as well as available JSON libraries.

Here is the installation procedure:

apt-get upgrade
apt-get update
apt-get install r-base r-base-dev
apt-get install apache2-mpm-prefork apache2-prefork-dev
apt-get install git-core
git clone https://github.com/jeffreyhorner/rapache.git
cd rapache
./configure
make
make test
make install
vi /etc/apache2/httpd.conf

Apache configuration settings:

 
LoadModule R_module /usr/lib/apache2/modules/mod_R.so
 
<Location /RApacheInfo>
SetHandler r-info
</Location>
 
ROutputErrors
 
<Directory /var/www/R>
        SetHandler r-script
        RHandler sys.source
</Directory>
/etc/init.d/apache2 restart

And these are the contents of ws.R:

 
setContentType("application/json")
 
zscore<-function(p, pc, N, Nc){ (p-pc)
     / sqrt(p * (1-p) / N + pc * (1-pc) / Nc) }
significant<-function(p, pc, N, Nc){
     zscore(p, pc, N, Nc) > 1.65 }
 
valid<-function(x){ nchar(x) < 10 }
 
if (!valid(GET$pc)
 || !valid(GET$p)
 || !valid(GET$N)
 || !valid(GET$Nc)) {
  cat('error:arg length')
} else {
cat(significant(as.numeric(GET$p),
                as.numeric(GET$pc),
                as.numeric(GET$N),
                as.numeric(GET$Nc)))
}
 
OK

For instance, the output of http://localhost:8080/R/ws.R?p=.15&pc=.10&N=1000&Nc=1100
is “TRUE”

To leave a comment for the author, please follow the link and comment on his blog: Gary Sieling » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series,ecdf, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.