Brian Ripley – The R Development Process (useR! 2011)

August 16, 2011
By

(This article was first published on Why? » R, and kindly contributed to R-bloggers)

There are my notes on the User2011 invited talk. Brian Ripley has been a member of R core since 1998

The R Development Process – A insideR’s view

R Timeline:

  • JCGS paper submitted in 1995.
  • 1997: CRAN(Mar), Core team(Aug), CVS (Sept)
  • R 1.0.0 Feb 2000 – 2.8MB. Many people don’t take 0.X.X seriously
  • R 2.0.0 Oct 2004, 10MB (actually 1.10.0)
  • R 2.14.0. Oct 2011, ext 22MB
  • Roughly 4000 repo commits per year.

In the future, 2.15.0 scheduled for Mar 2012. R 3.0.0 has been discussed for a few years, but keeping legacy support could be tricky – there are currently around 3200 packages. So no plans for 3.0.0 in the near future. R-core has 20 members, but several are inactive and only a handful are actively developing R (there are other valuable contributions). There are currently 80 successful submissions per week.

CRAN

CRAN is around 70GB with 1.9 GB for the current source packages. 10000 packages for Christmas 2016. Submission process is handled almost entirely by Kurt Hornik. It can be very time-consuming to check packages – there are 110 packages submitted per week.

In 2004 CRAN was placed by “repos”. However, there are few public repositories have emerged. Binary packages are kept for two versions.

The R Development Process

The R CORE team meets in person only every couple of years. R Core have total control over R. A rough criterion of membership is:

when it was more work to have someone out than in

Normal day to day business is by email as member are over a variety of time zones. The R foundation is the legally constituted body, with (voting) member R-core plus a small number others.

Getting features into R

“R was principally develpoped for the benefit of the core team. Only they have votes.” Most of what you see in R is there because core members wanted it for research, teaching, support for other projects, or to develop R itself. For example, the lm package is their because of a 1998 course in regression. Since almost all core R mathematics, they decided to built very general solutions rather than specific solutions.

If  a core member accepts a contribution they are commiting themselves/R-core to supporting that for many years. R-core have regretted accepting some (even small) contributions. So most new features should go into a package not into the core.

Timescales

  • Short: psnice, lis.dirs(recursive=FALSE).
  • Year or two: Internationalization.

Portability

Trying to phase out bash, sh and Make files for ease of use, maintainance, and performance. The parser for Rd2 was written in bison, but all the conversion scripts are in R. Also Fortran is becoming a problem since neither Apple nor Microsoft support it in their SDK. Legacies of R’s 32 bit beginnings is that there is only a single integer type. Longer integers have boon on the horizon for years, but still seems tricky. Could be in 3.0.0

Performance

For a long time, performance issues could be solved by waiting six months for a new computer. However, this isn’t true any more. Rather, we have multiple cores. New package parallel to support multi-core processors in the next version of R.

The future

R is heavily dependent on a small group of altruistic people who can feel that their contributions are not treated with respect. People have lives outside R, and circumstances and health do changes.

Other future developments are low-level support for threading, GUIS, vector types, replace library() with use() and moving to a yearly release schedule.

Please note that the notes/talks section of this post is merely my notes on the presentation. I may have made mistakes: these notes are not guaranteed to be correct. Unless explicitly stated, they represent neither my opinions nor the opinions of my employers. Any errors you can assume to be mine and not the speaker’s. I’m happy to correct any errors you may spot – just let me know! The above paragraph was stolen from Allyson Lister who makes excellent notes when she attend conferences.


To leave a comment for the author, please follow the link and comment on his blog: Why? » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , , ,

Comments are closed.