This summer, we have been told that some financial series broke some records (here, in French) For instance, the French CAC40 had negative return for 11 consecutive days (which has never been seen, so far). > library(tseries)> x<-get....

Let’s start this blog off right, with the stupidest R mistake I’ve ever made (I think). In the R package that I write, R/qtl, one of the main file formats is a comma-delimited file, where the blank cells in the second row are important, as they distinguish the initial phenotype columns from the genetic marker

I’m at the useR! Conference in Coventry, UK, this week. It’s been every bit as inspiring, interesting and useful as I’d hoped. Particularly interesting were the Lightning talks: a series of 5 minute presentations with one minute in between, with each presentation having 15 slides of 20 sec each, moved forward automatically. It worked extremely

"The R-Files" is an occasional series from Revolution Analytics, where we profile prominent members of the R Community. Name: Martyn Plummer Occupation: Statistician at International Agency for Research on Cancer Nationality: British Years Using R: 16 Known for: Member of R core group; member of R Journal editorial board Martyn Plummer is a longtime contributor to the R community...

Ray Brownrigg – Tips and Tricks for young R programmers Problem: Calculate the distribution function of a bivariate Kolomogorov Smirnoff statistic. Essentially three loops. Basic exhaustive search is O(N^3). Fortran gives a single order of magnitude speed-up. Restructuring in R using a single loop is an order faster than fortran. Further improvements make the algorithm

L Collingwood – RTextTools RTextTools. A machine learning library for automated text classification. This package builds on previous packages such as tm and random forests. Use case: undergrad labels congressional bills but then quits. Using the previously labelled data, automatically classify the remaining documents. The speaker gave a nice overview of machine learning techniques, but I

The RevoScaleR package isn’t open source, but it is free for academic users. Collect and storing data has outpaced our ability to analyze it. Can R cope with this challenge? The RevoScaleR package is part of the revolution R Enterprise. This package provides data management and data analysis. Uses multiple cores and should scale. Scalability

RTextTools v1.2 was released today and we're pleased to announce that the package is finally available on CRAN. Additionally, this update brings minor changes to the API, improvements to the GLMNET algorithm, and more comprehensive documentation. Get started by following our installation instructions!Additionally, Loren Collingwood will be giving a Kaleidoscope Session Talk today at the useR! 2011 conference in Coventry, UK....

A couple of days ago we released a package named fun to CRAN, but I did not dare to send an announcement to [email protected] as usual. This package is a collection of some classical computer games (e.g. the Mine sweeper and Five in a row) as well as other funny stuff. Some examples: ## install.packages('fun')

Handling Large Data with R The following experiments are inspired from this excellent presentation by Ryan Rosario: http://statistics.org.il/wp-content/uploads/2010/04/Big_Memory%20V0.pdf. R presents many I/O functions to the users for reading/writing data such as ‘read.table’ , ‘write.table’ -> http://cran.r-project.org/doc/manuals/R-intro.html#Reading-data-from-files. With data growing larger by the day many new methodologies are available in order to achieve faster I/O operations.

R Core member Professor Brian Ripley from Oxford University gave the first keynote presentation of useR! 2011 today, and gave some insights into what goes on behind the scenes to create two updates to R (plus several patches) every year. He began with some facts about the history of R (noting that if they'd known R would take off...

I gave my talk to the useR! 2011 conference this morning: The R Ecosystem. The goal of the talk was to show R in context: that the combination of the R project and its leadership, the R userbase, and the companies supporting and using R makes for a thriving ecosystem and is indicative of an extremely successful open source...

Background: Donkeys in Kenya. Tricky to find the weight of a donkey in the “field” – no pun intended! So using a few measurements, estimate the weight. Other covariates include age. Standard practice is to fit: for adult donkeys, and other slightly different models for young/old and ill donkeys. What can a statistician add: Add

Example: Car seat occupation: Algorithm must decide whether airbag opens: Must open for adult but not for small child or if the seat if empty a few others I missed. Key questions are: What type of design: 32 run regular fractional factorial Response measurement – depends on dummy position, so repeat for 3 different dummy

Wilem Ligtenberg – GPU computing and R Why GPU computing – theoretical GFLOPs for a GPU is three times greater than a CPU. Use GPUs for same instruction multiple data problems (SIMD). Initially GPUs were developed for texture problems. For example, a wall smashed into lots of pieces. Each core handled a single piece. CUDA

These are my rough notes on the Kaleidoscope Ic session. David Smith – The R Ecosystem (useR! 2011) David Smith works for Revolution Analytics. Quick overview of the R project – useR, r-journal, and r-forge. Social media starting to play a part in R – Google+, twitter, stackoverflow, and the traditional R mailing list. The

There are my notes on the User2011 invited talk. Brian Ripley has been a member of R core since 1998 The R Development Process – A insideR’s view R Timeline: JCGS paper submitted in 1995. 1997: CRAN(Mar), Core team(Aug), CVS (Sept) R 1.0.0 Feb 2000 – 2.8MB. Many people don’t take 0.X.X seriously R 2.0.0 Oct