hash-1.99.x

February 17, 2010
By

(This article was first published on Open Data Group » R, and kindly contributed to R-bloggers)

hash-2.0.0 has been released please read about it here:

Earlier today, hash-1.99.x was released to CRAN. This is a stable release and adds some more functions to an already full-featured hash implementation. This version fixes some bugs, adds some features, improves performance and stability. You can read about the hash package in my previous blog post, The hash package: hashes come to R. All changes were responsible from users who wrote in and contributed, thoughts, ideas and use cases. Keep the good ideas coming.  Two of the major changes are summarized below.

Matthias Buch-Kromann of the Copenhagen Business School recommended the ability to access multiple keys from a single call and even access the same key multiple times. This was previously allowed using the [[ method, but was deprecated. By convention, the [[ method returns only one value. ( You can read about the conventions of this and other R accessors in my previous blog post, R Accessors Explained. ) This behavior has returned to hash-1.99.x the use of the values method and the and optional keys argument:


h <- hash( c('a','b','c'), 1:3 )
values(h)
values(h, keys=c('a','b','c','a','b','c' ) )

Matthias suggested calling the method mget, but there was some disparity with the mget function in base. The generic function that I needed just wouldn't play nice with base::mget.

Another change in the behavior was prompted by Mohammad Fahim of the Department of Computer Engineering and Computer Science at the University of Louisville. He wrote me to ask if there is a way to suppress warnings when trying to access non-existent keys. When accessing hashes hundreds of thousands of times, it becomes a drag to continually see:

key: xxxx not found in the hash : hash_table_name

I have refactored the behavior to be more R-like by following na.action-type conventions. Now the default behavior is to return NA when trying to access non-existing keys.


> library(hash)
>h <- hash( c('a','b','c'), 1:3 )
> h h[ letters[1:5] ]
containing 6 key-value pair(s).
a : 1
b : 2
c : 3
d : NA
e : NA

The behavior is also controllable by na.action.hash option. The functions are provided for most use cases:

  • na.default.hash (default) returns NA silently ,
  • na.fail.hash (old default) errors on non-existing keys
  • na.warn.hash returns NA but issues a warning.

Behaviors can be set by setting the na.hash.action option. For example, to get the default behavior:


> options( na.hash.action = na.fail.hash )
> h$d
Error: key, d, not found in hash.
> h[[ 'd' ]]
Error: key, d, not found in hash.

And , for the [ and [[ methods, this behavior can be declared at access time:


> h[[ 'd', na.action=na.warn.hash ]]
Warning: key, d, not found in hash.
d
NA
> h[[ 'd', na.action=na.fail.hash ]]
Error: key, d, not found in hash.
> h[[ 'd', na.action=na.default.hash ]]
d
NA

If you don’t like these hash-key-miss behaviors, you are free to write your own. Functions should minimally accept arguments of the hash and the key.

Thanks to both Matthias and Mohammed for your feedback.

New features are on their way. Notably, the ability to use any object as keys and to preserve the order of the hash. These are sometimes called Indexed Hashes. Look for that in the hash-2.00.x release. If you would like to see features added contact me at cbrown -at- opendatagroup.com

References:

To leave a comment for the author, please follow the link and comment on his blog: Open Data Group » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , , , ,

Comments are closed.