Applications of R at Google

July 13, 2012
By

At a talk I saw at the useR!2012 conference last month, Googler Karl Millar estimated that there are at least 200 active R users at Google, plus another 300+ occasional users participating in Google's internal R support list. But what are all these Google employees doing with R? A post from the Google Research team published on Google+ yesterday...

Read more »

influence.ME updated to version 0.9

July 13, 2012
By

Influence.ME is an R extension package for R that provides tools for detecting influential data in multilevel regression models. It is developed by Rense Nieuwenhuis (that’s me), Manfred te Grotenhuis, and Ben Pelzer. Recently, a new version (0.9) was uploaded ...

Read more »

Analysing time course microarray data using Bioconductor: a case study using yeast2 Affymetrix arrays

July 13, 2012
By
Analysing time course microarray data using Bioconductor: a case study using yeast2 Affymetrix arrays

A few years ago I was involved in analysing some time-course microarray data. Our biological collaborators were interested in how we analysed their data, so this lead to a creation of tutorial, which in turn lead to a paper. When we submitted the paper, one the referees “suggested” that we write the paper using Sweave;

Read more »

Dynamical systems: Mapping chaos with R

July 13, 2012
By
Dynamical systems: Mapping chaos with R

Chaos. Hectic, seemingly unpredictable, complex dynamics. In a word: fun. I usually stick to the warm and fuzzy world of stochasticity and probability distributions, but this post will be (almost) entirely devoid of randomness. While chaotic dynamics are entirely deterministic, their sensitivity to initial conditions can trick the observer into seeing iid. In ecology, chaotic

Read more »

Examples and resources on association rule mining with R

July 13, 2012
By
Examples and resources on association rule mining with R

by Yanchang Zhao, RDataMining.com The technique of association rules is widely used for retail basket analysis, as well as in other applications to find assocations between itemsets and between sets of attribute-value pairs. It can also be used for classification … Continue reading →

Read more »

R for Ecologists: Making MATLAB-like Graphs in R

July 13, 2012
By
R for Ecologists: Making MATLAB-like Graphs in R

I’ve decided that my blog should become a brain dump for my experience/troubles/solutions to programming in R. I’ve met many people who want to learn but don’t have the time or patience to sit down and figure it out from … Continue reading →

Read more »

1-Month Reversal Strategy

July 12, 2012
By
1-Month Reversal Strategy

Today I want to show a simple example of the 1-Month Reversal Strategy. Each month we will buy 20% of loosers and short sell 20% of winners from the S&P 500 index. The loosers and winners are measured by prior 1-Month returns. I will use this post to set the stage for my next post

Read more »

RcppArmadillo 0.3.2.4

July 12, 2012
By

Conrad released version 3.2.4 of Armadillo yesterday. It contains a workaround for g++ 4.7.0 and 4.7.1 which have a regression triggered by the Armadillo codebase for small fixed-sized matrices. The corresponding RcppArmadillo package 0.3.2.4 arrived ...

Read more »

Napa Valley wine tasting map: interactive version

July 12, 2012
By
Napa Valley wine tasting map: interactive version

Got some great reactions to the Napa Valley wine tasting map made with the ggmap package I posted on Monday. A couple of people asked if similar maps could be made for other wine regions (like Australia's Hunter Valley, or the Walla Walla region in Washington): provided you have a list of winery addresses, tweaks to the same R...

Read more »

Using discrete-event simulation to simulate hospital processes

July 12, 2012
By
Using discrete-event simulation to simulate hospital processes

Discrete-event simulation is a very useful tool when it comes to simulating alternative scenario’s for current of future business operations. Let’s take the following case; Patients of an outpatient diabetes clinic are complaining about long waiting times, this seems to have an adverse effect on patient satisfaction and patient retention.  Read more »

GenABEL: an annoying error after the import of PLINK data format

July 12, 2012
By

In the previous post we saw how much convenient could be GenABEL in the management of genotypic/phenotypic data. We introduced the import of genotypic data from an Illumina format file: > convert.snp.illumina(inf = "gen.illu", out = "gen.raw", strand = "file") … Continue reading →

Read more »

Creating Williams designs with even number of products

July 12, 2012
By
Creating Williams designs with even number of products

A Williams design is a special Latin square with the additional property of first order carry over (each product is followed equally often by each other product). In R the package crossdes can be used to create them. > williams(4)    &nbsp...

Read more »

R scripts for downloading iButton Thermochron dataloggers

July 11, 2012
By

Last time, I posted some R code to help quickly launch many iButton Thermochron temperature dataloggers with the same mission parameters. The R code makes use of a publicly-available command line utility released by the iButton’s manufacturer, Maxim.  Of course, Maxim also has a command line utility for downloading the data from those iButtons that

Read more »

A primer on R2OpenBUGS using the simple linear regression example.

July 11, 2012
By
A primer on R2OpenBUGS using the simple linear regression example.

I make using OpenBUGS fun (and easier)! I've been a BUGS, WinBUGS and OpenBUGS user for some time now (20 years and counting!). The combination of R and OpenBUGS using the R2OpenBUGS package allows the user to bring together data preparation...

Read more »

Rcpp is smoking fast for agent-based models in data frames

July 11, 2012
By

In a previous post, I discussed different approaches to speeding up some loops in data frames. In particular, R data frames provide a simple framework for representing large cohorts of agents in stochastic epidemiological models, such as those representing disease … Continue reading →

Read more »

Bridget Riley exhibition in London

July 11, 2012
By
Bridget Riley exhibition in London

The other day I saw a fantastic exhibition of work by Bridget Riley. Karsten Schubert, who is Riley's main agent, has a some of her most famous and influential artwork from 1960 - 1966 on display, including the seminal Moving Squares from 1961.Photo of...

Read more »

Health Care Costs – Part 2, "Unhealthy Things Not Related to the Problem"

July 11, 2012
By
Health Care Costs – Part 2, "Unhealthy Things Not Related to the Problem"

Lighting Up Way back in the day, folks believed that smoking was not only cool but also completely safe. As Marcel Danesi states in his book Of Cigarettes, High Heels, and Other Interesting Things, Second Edition: An Introduction to Semiotics ...

Read more »

In case you missed it: June 2012 Roundup

July 11, 2012
By

In case you missed them, here are some articles from June of particular interest to R users. The FDA goes on the record that it's OK to use R for drug trials. A review of talks at the useR! 2012 conference. Using the negative binomial distribution to convert monthly fecundity into the chances of having a baby in a...

Read more »

Getting numpy data into R — Take Two

July 10, 2012
By

A couple of days ago, I had posted a short Python script to convert numpy files into a simple binary format which R can read quickly. Nice, but still needing an extra file. Shortly thereafter, I found Carl Rogers cnpy library which makes reading and writing numpy files from C++ a breeze, and I quickly wrapped this up into a new package...

Read more »

This is *huge*: SAScii package

July 10, 2012
By

http://blog.revolutionanalytics.com/2012/07/importing-public-data-with-sas-instructions-into-r.html Q: How do you make a hairless primate? Answer 1: Take a hairy primate, wait a few million years and see if Darwin was right. Answer 2: Make them work i...

Read more »

Importing public data with SAS instructions into R

July 10, 2012
By

Many public agencies release data in a fixed-format ASCII (FWF) format. But with the data all packed together without separators, you need a "data dictionary" defining the column widths (and metadata about the variables) to make sense of them. Unfortunately, many agencies make such information available only as a SAS script, with the column information embedded in a PROC...

Read more »

introduction to R: learning by doing (part 2: plots)

July 10, 2012
By
introduction to R: learning by doing (part 2: plots)

Lets go one with the second part of learning R by doing R (you will find the first part here. As we have used vectors, matrices and loops in the first part, we will concentrate on graphics in this one. but first we will need data to plot: Sometimes you will need several plots in

Read more »

simulation, an ubiquitous tool

July 10, 2012
By
simulation, an ubiquitous tool

(This article was first published on Xi'an's Og » R, and kindly contributed to R-bloggers) After struggling for quite a walk on that AMSI public lecture talk, and dreading its loss with the problematic Macbook, I managed to complete a first draft last night in Adelaide, downloading a final set of images from the Web...

Read more »

Visualizing Graphical Models

July 10, 2012
By
Visualizing Graphical Models

I'm anticipating presenting research of mine based on Bayesian graphical models to an audience that might not be familiar with them. When presenting ordinary regression results, there's already the sort of statistical sniper questions along the lines o...

Read more »

SAS Beats R on July 2012 TIOBE Rankings

July 10, 2012
By
SAS Beats R on July 2012 TIOBE Rankings

The TIOBE Community Programming Index ranks the popularity of programming languages, but from a programming language perspective rather than as analytical software (http://www.tiobe.com). It extracts measurements from blogs, entries in Wikipedia, books on Amazon, search engine results, etc. and combines them into a single index. … Continue reading →

Read more »

2nd CFP: the 10th Australasian Data Mining Conference (AusDM 2012)

July 10, 2012
By
2nd CFP: the 10th Australasian Data Mining Conference (AusDM 2012)

The Tenth Australasian Data Mining Conference (AusDM 2012) Sydney, Australia, 5-7 December 2012 http://ausdm12.togaware.com/ The Australasian Data Mining Conference has established itself as the premier Australasian meeting for both practitioners and researchers in data mining. This year’s conference, AusDM’12, co-hosted … Continue reading →

Read more »

Data Mining In Excel: Lecture Notes and Cases

July 10, 2012
By
Data Mining In Excel: Lecture Notes and Cases

by Yanchang Zhao, RDataMining.com It is a 270-page book on data mining with Excel. It can be downloaded as a PDF file at http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.83.1393&rep=rep1&type=pdf. Below is its table of contents. - Overview of the Data Mining Process - Data Exploration … Continue reading →

Read more »

Sourcing Code from GitHub

July 10, 2012
By
Sourcing Code from GitHub

In previous posts I described how to input data stored on GitHub directly into R. You can do the same thing with source code stored on GitHub. Hadley Wickham has actually made the whole process easier by combining the getURL, textConnection, and source commands into one function: source_url. This is in his devtools...

Read more »

Package JM — version 1.0-0

July 10, 2012
By

(by Dimitris Rizopoulos) Dear R-users, I’d like to announce the release of version 1.0-0 of package JM (already available from CRAN) for the joint modeling of longitudinal and time-to-event data using shared parameter models. These models are applicable in mainly two settings. First, when focus is in the survival outcome and we wish to account for the effect of an...

Read more »