An efficient way to do dataset intersection

July 27, 2011
By

The main message is to use "match" to get index of needed rows and then get the rows by the index, instead of using the row names to select, which is much slower. Here is example:In example above, we know that the same values of column 2nd have same values of columns from 4th to the end. So, instead...

Read more »

Social reception for R enthusiasts at Joint Statistics Meetings

July 27, 2011
By
Social reception for R enthusiasts at Joint Statistics Meetings

At the JSM 2011 conference in Miami on Monday, August 1, Revolution Analytics will be hosting a cocktail reception for R users, and anyone interested in R. From 5:30-7:30 at Emeril's Miami Beach House (in the JSM host hotel), we'll have appetizers, drinks, and the opportunity to socialize with R users from around the word. There will also be...

Read more »

Word Cloud in R

July 27, 2011
By
Word Cloud in R

A word cloud (or tag cloud) can be an handy tool when you need to highlight the most commonly cited words in a text using a quick visualization. Of course, you can use one of the several on-line services, such as wordle or tagxedo ,...

Read more »

RStudio available at cloudnumbers.com

RStudio available at cloudnumbers.com

RStudio™ is an integrated development environment (IDE) for the statistical software R (www.r-project.org). It combines an intuitive user interface with powerful coding tools to help you get the most out of R. cloudnumbers.com provides researchers and companies with the access to resources to perform high performance calculations in the cloud. One often used application at cloudnumbers.com’s

Read more »

Bayesian Core and loose logs

July 26, 2011
By
Bayesian Core and loose logs

Jean-Michel (aka Jean-Claude!) Marin came for a few days so that we could make late progress on the revision of our book Bayesian Core towards an Use R! version. In one of the R programs in the mixture chapter, we were getting improbable answers, until we found an R mistake in the shape of which

Read more »

The Luck and Skill of Scrabble

July 26, 2011
By
The Luck and Skill of Scrabble

Scrabble is a game that involves both skill and luck. There's skill in knowing the words you can play and — especially — the most advantageous ways to play them. But there's also luck in the tiles you draw randomly from the bag: get saddled with a rack containing four I's and there's usually not much you can do....

Read more »

Two-way CRAN

July 26, 2011
By
Two-way CRAN

Sooner on later, every useR will manage to exhaust R’s built-in capabilities and land on CRAN looking for his dreamed needle in a haystack of 3k+ contributed packages. Probably most of you already know stuff like Task Views or rseek which make finding something relevant a bit easier than digging the full list or googling, however all methods

Read more »

JAGS 3.0.0 is released

July 25, 2011
By
JAGS 3.0.0 is released

Somewhat later than I originally planned, JAGS 3.0.0 is released and can be downloaded from Sourceforge.  The corresponding R interface package rjags has been uploaded to CRAN There are no new modules or features in this release. I have been … Continue reading →

Read more »

Rahul Dravid – a legend in Test Cricket

July 25, 2011
By
Rahul Dravid – a legend in Test Cricket

Rahul Dravid is a fantastic cricketer, and a role model for younger generation - focused, hardworking and humble.Dravid recently became the 2nd highest scorer in Test Cricket (Sachin Tendulkar is the leading scorer). His contribution to India...

Read more »

Really useful R package: sas7bdat

July 25, 2011
By
Really useful R package: sas7bdat

For SAS users, one hassle in trying things in R, let alone migrating, is the difficulty of getting data out of SAS and into R. In our book (section 1.2.2) and in a blog entry we've covered getting data out of SAS native data sets. Unfortunately, for ...

Read more »

Getting to know multivariate data

July 25, 2011
By
Getting to know multivariate data

psych::pairs.panels and corrgram::corrgram using mtcars data Core Ideas: multivariate modeling is challenging pair plots make it easy to get a quick understanding of each variable and the relationships between them Multivariate analysis and modeling can be really challenging. Getting the job done well requires you to know your data really well. People often use the

Read more »

Scatterplot matrices in R

July 25, 2011
By
Scatterplot matrices in R

I just discovered a handy function in R to produce a scatterplot matrix of selected variables in a dataset. The base graphics function is pairs(). Producing these plots can be helpful in exploring your data, especially using the second method below.Try...

Read more »

Nick Stoke’s Improvements

July 25, 2011
By
Nick Stoke’s Improvements

Fast on the heels of getting RomanM’s code up and running in RghcnV3,  Nick Stokes whipped out a version of his approach which he covered on his blog here: We exchanged code and few mails thrashing through details and I’m now in a position to start the integration work of his approach into Rghcnv3. In

Read more »

The Road To Default: Still No Agreement

July 25, 2011
By
The Road To Default: Still No Agreement

Another day and yet another failed agreement. The u.s. Is in trouble folks. Even if a debt deal is reached there is still widespread consensus that the u.s. will lose its AAA credit rating. More to come later.Keep dancin'Steven J.

Read more »

The R-Files: Jeff Ryan

July 25, 2011
By
The R-Files: Jeff Ryan

"The R-Files" is an occasional series from Revolution Analytics, where we profile prominent members of the R Community. Name: Jeff Ryan Profession: Owner/Principal at Lemnica; Committee Member at R/Finance Nationality: American Years Using R: 8 Known for: R/Finance Conference, quantmod and xts packages Jeffrey Ryan is a Chicago-based quantitative software analyst and avid R user. He is perhaps best...

Read more »

sas7bdat reader ported to ActionScript

July 25, 2011
By

By Brian Kimball: http://code.google.com/p/sasquatch

Read more »

Creating svg graphics for web publishing

July 25, 2011
By

<p><p><p><p><p><p><p><p>This is an error message. If you are reading this, something broke. You may need to upgrade your browser. </p></p></p></p></p></p></p></p> Thanks to the nice post from Revolution Analytics I was finally able to get an svg device working on my Windows OS version of R. It took some additional tips from a fellow user of blogger to...

Read more »

Welcome to the CV blog!

July 25, 2011
By
Welcome to the CV blog!

It is almost a year since CrossValidated was launched. Today we start a new activity at CrossValidated – a community blog. It is the fourth (after the main site, meta and chat) place for getting in touch with the community and contributing to it. To get started, we plan to post series of posts about the

Read more »

Ternary sorting

July 24, 2011
By
Ternary sorting

The last Le Monde puzzle made me wonder about the ternary version of the sorting algorithms, which all seem to be binary (compare x and y, then…). The problem is, given (only) a blackbox procedure that returns the relative order of three arbitrary numbers, how many steps are necessary to sort a series of n

Read more »

Crazy RUT

July 24, 2011
By
Crazy RUT

I have noticed that the Russell 2000 (RUT) acts very differently from most of the other indexes that I have studied.  If we apply the system shown in Shorting Mebane Faber to RUT and then extend it with a simple slope, we notice something very dif...

Read more »

PC-Axis with R: pxR

PC-Axis with R: pxR

PC-Axis is a software family consisting of a number of programs for the Windows and Internet environment used to present statistical information. It is used by national and international institutions to publish statistical data. Programs in the PC-Axis family use a particular data file format (see the full PX-Axis data format description). Now the pxR

Read more »

BSE Bhavcopy with Delivery Quantity

July 24, 2011
By
BSE Bhavcopy with Delivery Quantity

One of my TI forum members IV had a requirement for BSE Quotes along with Delivery Quantity. This made me implement "merge" function of R coding (thanks to the great work done by people behind various packages and guidance available on R Mailing lists)...

Read more »

Parallel JAGS RNGs

July 23, 2011
By

As a matter of convention, we usually run 3 or 4 chains in JAGS. By default, this gives rise to chains that draw samples from 3 or 4 distinct pseudorandom number generators. I didn’t go and check whether it does things 111,222,333 or 123,123,123, but in any event the “parallel chains” in JAGS are samples

Read more »

RcppArmadillo 0.2.26 and 0.2.27

July 23, 2011
By

Earlier this week, Conrad Sanderson issued a minor bug fix release 2.0.2 of his Armadillo library which provides templated C++ code for linear algebra. We wrapped that into a new RcppArmadillo release 0.2.26 and shipped it to CRAN. Due to it being su...

Read more »

Passing non-graphical parameters to graphical functions using …

July 23, 2011
By
Passing non-graphical parameters to graphical functions using …

Argument passing via ... is a great feature of the R language, allowing you to write wrappers around existing functions that do not need to list all the arguments of the wrapped function. ... is used extensively in S3 methods … Continue reading →

Read more »

Passing non-graphical parameters to graphical functions using …

July 23, 2011
By

Argument passing via ... is a great feature of the R language, allowing you to write wrappers around existing functions that do not need to list all the arguments of the wrapped function. ... is used extensively in S3 methods and in passing graphical parameters on to graphical functions. When writing you own plot methods, using ... allows the...

Read more »

RomanM’s Method

July 22, 2011
By
RomanM’s Method

I’ve succeeded in getting a version of RomanM and JeffId’s Thermal hammer working with version 1.3 of RghcnV3. This is going to be a long post because there is a lot of ground to cover. First, some errata, the “Globe” demo in V1.3 appears to have a missing line, looks like an editor bug, so

Read more »

Clustering U.S. Senators using roll call voting data

July 22, 2011
By
Clustering U.S. Senators  using roll call voting data

For our forthcoming book on machine learning for hackers, John Myles White and I will discuss clustering, and various methods for doing so. One common method for clustering observations

Read more »

IBM Netezza: Embrace open source analytics

July 22, 2011
By

Earlier this month Thomas Dinsmore, solutions architect for IBM Netezza’s Advanced Analytics team, had a great blog post on why companies should embrace R as an analytics platform. He says: There are three main reasons R should be part of your enterprise analytics architecture: R has capabilities not available in commercial analytics software Usage of R by analysts is...

Read more »