A Short Side-by-side Comparison of the R and NumPy Array Types

March 22, 2011
By

(This article was first published on BioStatMatt » R, and kindly contributed to R-bloggers)

FeatureNumPyR
contiguous (virtual) memory
'view' memory model
subset-assignment
vectorized operations
memory-mapping✘*
broadcasting rules
index arrays

This comparison is current as of R 2.13.0, NumPy version 1.4.1, and other web resources to date. Because this post was motivated by a recent article (cited below) promoting the NumPy array, the comparison above may seem one-sided. To be fair, I welcome corrections and additions to the above feature table.

The NumPy Array: A Structure for Efficient Numerical Computation

Comput. Sci. Eng. 13, 22 (2011)

http://link.aip.org/link/?CSENFA/13/22/1

contiguous (virtual) memory

Contiguous (virtual) memory means that memory used by an array is allocated as a single block, and that the elements of an array are stored adjacently. This type of storage enables efficient operations on the array. The 'virtual' qualification signifies that memory may only appear contiguous to the executing process, but be noncontiguous in physical memory.

'view' memory model

A 'view' memory model allows an array to be 'viewed' differently under certain operations (matrix transpose, many types of subsetting, reshaping) without copying the memory where the array's data is stored. The NumPy array has a 'view' memory model, but the R array generally does not. However, the 'view' memory model may be viable for R arrays, since the memory model is mostly invisible to the user.

subset-assignment

Subset assignment refers to assignments that modify one or more elements of an array. For example:

> x <- c(1,2,3,4)
> x[1] <- 100
> x
[1] 100   2   3   4

vectorized operations

Vectorized operations refer to expressions where an element-wise operation is implicit. Consider this R code:

> x <- c(1,2,3,4)
> x * 3
[1]  3  6  9 12

where x * 3 implicitly specifies that each element of x should be multiplied by 3. Vectorized operations avoid the need for looping in many cases.

memory-mapping

Memory mapping refers to an ability to map a program's memory onto a file. Hence, a large array stored on disk may be manipulated without loading the entire array into memory. *R doesn't offer a memory mapping facility for arrays. However, some memory-mapping functionality is provided by the bigmemory and mmap extension packages. R also provides a well-developed interface to DBMSs (see the R Data Import/Export manual), enabling random access to data stored on disk.

broadcasting rules

Broadcasting rules affect the behavior of binary operations ('+', '*', etc.) on arrays of different dimensions. Without broadcasting rules, the behavior of such operations may not be defined. Both R and NumPy arrays have broadcasting rules, but they are not the same rules.

index arrays

Index arrays may be used to index another array. For example:

> x <- array(rnorm(9), c(3, 3))
> y <- array(c(1, 1, 1, 2), c(2, 2))
> x[y]
[1] -0.9345381  0.5509239

However, the rules for index arrays are different for R and NumPy arrays.

To leave a comment for the author, please follow the link and comment on his blog: BioStatMatt » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , ,

Comments are closed.