Development of R (useR! 2011)

[This article was first published on Why? » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Michael Rutter – R for Ubuntu

Ubuntu 10.10 uses 2.10.1. Backports are newer versions of software for old releases. R backports are available CRAN (link).

Lauchpad is a website for users to develop and maintain software (Canonical). One of Launchpad’s services is the personal package archive (PPA). This allows users to upload .deb source files, allowing easy creation of multiple Ubuntu releases and arch’s.

Workflow:

Dirk creates source file -> Michael gets source file -> packages built on launchpad -> Post on CRAN using apt-mirror.

There’s also a PPA available. PPAs are easier to add to the user’s system. Ubuntu has about 75 r-cran packages available in the main repository. A PPA could build the packages if the .deb packages were available. Could we use cran2deb?

cran2deb:  (no longer works), since maintaining the (virtual) machines to build the packages is time-consuming. Use launchpad.

cran2deb4ubuntu (PPA):  Contains most of the packages and dependencies from CRAN – 1107 in total. All packages can be installed with: sudo apt-get install r-cran-foo

  • Exceptions: non-free licences, windows/mac, dependencies not available to Launchpad (CUDA);
  • Problems(?): Can only install r-cran-foo outside of current R session. Can we get install.packages("foo") to look for r-cran-foo first?
  • Benefits: automatic updates to packages and creating R instances in the cloud.
  • Issues: c2d4u only available for 11.04. Naming and building issues for future versions. Space limitations on Launchpad may limit previous versions.

Andrew Runnalls – The CXXR project

The CXXR is progressively re-engineering the fundamental parts of the R interpreter from C to C++. Started in 2007, current release shadows 2.12.1. The aim of the project is to make the R interpreter more accessible to developers and researchers.

  • Improve documentation;
  • Encapsulation;
  • Move to an object-oriented structure;
  • Express internal algorithms.

RObjects

In CR, the C union is used to implement R object. This has a few disadvantages:

  • compiler doesn’t know which of the 23 types is at an address;
  • debugging at the C level is tricky
  • Adding a new type of R object means modifying a data definition at the heart of the interpreter

CXXR maps R objects to a particular C++ class.

Objectives:

  • Move program code relating to a datatype into one places
  • Use C++ public/protected/private mechanism
  • Allow developers to extend the class hierarchy.

Illustrative example: write a package to handle large integers

GNU MP library defines a C++ class mpz_class to represent an arbitrarily large integer, but not NA’s In CXXR, NA’s are added with a single line of C. Another line of code is used to create a vector of BigInts. It’s straightforward to add binary operations.

Subscripting in R

R is renowned for the power of its subscripting operations. In the CR interpreter, there are around 2000 C-language statements to implement these facilities. But this C code is locked up; no API and hard-wired around CR’s built-in data types. This is buried treasure.

CXXR makes an API available through its API. The API abstracts away from the type of elements and container. Result: adding subscripting operations is fairly simple.

Current problems: no serialization. No provision for BigIntVectors to be saved across sessions

Claudia Beleites: Google Summer of Code 2011

Open source software coding projects. Results can be used as part of thesis or article.

  • Student stipend: US$5000. Mentoring Organization: US $50;
  • Project topics: 7 GUI/images/visualisation, 4 optimization, 1 on High performance computing.
  • Aims: introduce students to the R developer community and push forward their project. roxygen and cran2deb were previous GSoC projects.
  • Communication channels: email, IM, skype, personal meetings.

Experiences:

  • Two mentors per student.  The two admins ping projects every now and again;
  • Time lines are based on US summer holidays;
  • Vanishing mentor and student.

Advice for Mentors:

  • Start to look early (January) for students. Look for a co-mentor;
  • Plan the time carefully;
  • Remember that coding time is also holiday time and students range from 1st year to PhD students.

To leave a comment for the author, please follow the link and comment on their blog: Why? » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)