R Site Search with the ‘sos’ Package

[This article was first published on Revolutions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

REvolution Computing is a sponsor of the Bay Area R User Group, which gathered on Tuesday for its monthly meeting.  This month’s meeting was in a cozy setting at San Jose State University, and was hosted by David Czerwinski who teaches data mining there.  The topic at hand was search, and finding R packages to compliment the functionality requirements of developers.  Spencer Graves led the discussion on the most recent release of sos.  The sos package replaces the recently deprecated RSiteSearch and is used to narrow down the growing list of available packages to something more manageable that suits a designer’s need.
The main capability of this package is the findFn function, which scans the “function” entries in Jonathan Baron’s “R site search” database and returns the matches in a data.frame of class findFn (Baron, 2009).
Here is an illustration with the well-known Fisher dataset Petal.Length:
“To look for this data set, one might first try the help.search function. Unfortunately, this function returns nothing in this case:

> help.search(‘petal.length’)

No help files found with alias or concept or title matching

‘petal.length’ using regular expression matching.

help.search only searches through the packages you’ve already installed.  By creating a dataframe with the findFn function, we are able to identify all packages with help pages for the search terms.

> library(sos)

> (PL <- findFn('Petal.Length'))

Think about a difficult question you would like to answer with R.  If you are looking for general capabilities for a program design, this is a valuble tool to help you find what you are looking for.  A look at The Comprehensive R Archive Network shows over 2000 active packages, a number that is quickly growing.  A search through the pages of R packages allows you to view the statistics on the help documents with that phrase.  This way you can make a good decision on where to start your work. In addition, if you are preparing a talk on R, sos is a useful tool to identify the packages designed by the other speakers the audience.

Now suppose you conduct more than one search for variations on a phrase.  Since the search result is an exact match to the search phrase, it may be useful to conduct more than one search and combine the outcomes.  You can manipulate the search results like you can manipulate any other data frame.  Suppose you are working on a project that requires analyzing financial derivatives.  You can concatenate the results with a pipe | operator to create the union of the two data frames:

> fd <- findFn('financial derivative')

> fo <- findFn('financial option')

> fdo <- fd | fo

The sos package is available from your favorite CRAN mirror now.  The code above is described in depth in a vignette:


Spencer Graves will discuss search capabilities in R, as well as good practices in package development and remote collaboration, on September 15, 2010 for the San Francisco Bay Chapter of the Association for Computing Machinery.  Be sure to pencil in the date.  Time and Location are to be determined.

To leave a comment for the author, please follow the link and comment on their blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)