Blog Archives

Analysing time course microarray data using Bioconductor: a case study using yeast2 Affymetrix arrays

July 13, 2012
By
Analysing time course microarray data using Bioconductor: a case study using yeast2 Affymetrix arrays

A few years ago I was involved in analysing some time-course microarray data. Our biological collaborators were interested in how we analysed their data, so this lead to a creation of tutorial, which in turn lead to a paper. When we submitted the paper, one the referees “suggested” that we write the paper using Sweave;

Read more »

UK R Courses – 2012

September 17, 2011
By
UK R Courses – 2012

The School of Mathematics & Statistics at Newcastle University (UK), are again running some R courses. In January, 2012, we will run: January 16th: Introduction to R; January 17th: Programming with R; January 18th & 19th: Advanced graphics with R. The courses aren’t aimed at teaching statistics, rather they aim to go through the fundemental

Read more »

Development of R (useR! 2011)

August 19, 2011
By
Development of R (useR! 2011)

Michael Rutter – R for Ubuntu Ubuntu 10.10 uses 2.10.1. Backports are newer versions of software for old releases. R backports are available CRAN (link). Lauchpad is a website for users to develop and maintain software (Canonical). One of Launchpad’s services is the personal package archive (PPA). This allows users to upload .deb source files, allowing

Read more »

Simon Urbanek – R Graphics: supercharged

August 18, 2011
By
Simon Urbanek – R Graphics: supercharged

New features: rasterImage() (R2.11) bitmap raster drawing; have maps as data backdrops. Polygons with holes: polypath() -(R2.12) At present there is no way to tell when to actually show the plot. For example: plot(x); lines(x). Should we display the plot after plot or after lines Solution dev.hold() and dev.flush() Better performance and useful for animations –

Read more »

Kaleidoscope IIIb (useR! 2011)

August 18, 2011
By
Kaleidoscope IIIb (useR! 2011)

O. Mersmann - The microbenchmark package Slides and code (link). SURGEON GENERAL’s WARNING: Microbenchmarks can lead to a distorted view of reality and massive loss of productivity For a higher-order benchmarking package check out the rbenchmark package on R (suggestion from the speaker). Why do we need micro-benchmarking? A simple example showed that it is currently very

Read more »

Big data (useR! 2011)

August 18, 2011
By
Big data (useR! 2011)

Unfortunatley, I missed the first and last talks. My notes from a session on Thursday morning J. Demmler – Challenges of working with a large database of routinely collected health data The SAIL data bank holds over 1.9 billion (anonymous) entries. To use the data for research, they need to ensure that proper data security is

Read more »

Programming (useR! 2011)

August 17, 2011
By
Programming (useR! 2011)

Ray Brownrigg – Tips and Tricks for young R programmers Problem: Calculate the distribution function of a bivariate Kolomogorov Smirnoff statistic. Essentially three loops. Basic exhaustive search is O(N^3). Fortran gives a single order of magnitude speed-up. Restructuring in R using a single loop is an order faster than fortran. Further improvements make the algorithm

Read more »

Kaleidoscope IIb (useR! 2011)

August 17, 2011
By
Kaleidoscope IIb (useR! 2011)

L Collingwood – RTextTools RTextTools. A machine learning library for automated text classification. This package builds on previous packages such as tm and random forests. Use case: undergrad labels congressional bills but then quits. Using the previously labelled data, automatically classify the remaining documents. The speaker gave a nice overview of machine learning techniques, but I

Read more »

Lee E. Edlefsen – Scalable Data Analysis in R (useR! 2011)

August 17, 2011
By
Lee E. Edlefsen – Scalable Data Analysis in R (useR! 2011)

The RevoScaleR package isn’t open source, but it is free for academic users. Collect and storing data has outpaced our ability to analyze it. Can R cope with this challenge? The RevoScaleR package is part of the revolution R Enterprise. This package provides data management and data analysis. Uses multiple cores and should scale. Scalability

Read more »

Jonathan Rougier – Nomograms for visualising relationships between three variables (useR! 2011)

August 16, 2011
By
Jonathan Rougier – Nomograms for visualising relationships between three variables (useR! 2011)

Background: Donkeys in Kenya. Tricky to find the weight of a donkey in the “field” – no pun intended! So using a few measurements,  estimate the weight. Other covariates include age. Standard practice is to fit: for adult donkeys, and other slightly different models for young/old and ill donkeys. What can a statistician add: Add

Read more »