## Merging Two Different Datasets Containing a Common Column With R and R-Studio

August 2, 2011
Another way for the database challenged (such as myself!) for merging two datasets that share at least one common column… This recipe using the cross-platform stats analysis package, R. I use R via the R-Studio client, which provides an IDE wrapper around the R environment. So for example, here’s how to merge a couple of

## JSM 2011 [3]

August 2, 2011
Monday August 01 was the first full day of JSM 2011 and full is the appropriate word to describe the day! It started for me at 7am with a round table run by Marc Suchard on parallel computing (or at 3am if I am considering the time I woke up!). I was rather out of

## Dividend Quartiles with Kenneth French Data

August 1, 2011
Based on my perception of the last 3 years, I would have expected high dividend stocks to have substantially underperformed low and zero dividend stocks.  Fortunately, just like with size and momentum in Beating Kenneth French Small – High, we c...

## LaTeX Typesetting –Basic Mathematics

August 1, 2011
LaTeX is very strong for typesetting mathematical equations. Fast Tube by Casper Other useful resources are provided on the Supplementary Material page.

## Google Trends, R, and the NFL

August 1, 2011
A week or so ago I saw a tweet related how the NFL lockout was affecting the search traffic for “fantasy football” on Google (using Google Trends).  Basically, the search traffic (properly normalized on Google Trends) was down prior to … Continue reading →

## On not going viral

August 1, 2011
This week the reader is directed to Messy Matters to read up on research conducted by Sharad Goel, Duncan Watts and Dan Goldstein in which they hunted for traces of "viral" diffusion on Twitter, Facebook, Yahoo!, and beyond. The results run counter to mainstream intuition.

## ISMB coverage on Twitter? It’s possible there was…

July 31, 2011
Peter writes: I wonder if part of the drop off is live bloggers moving to platforms like Twitter? I can tell you it seemed like there were almost as many tweets for one SIG (#bosc2011) as for the whole of #ISMB / #ECCB2011, and I personally didn’t post anything to FriendFeed but posted lots on

## CRU Data in RghcnV3

July 31, 2011
As many have noted CRU have posted their data a bit ago and the usual gang has started to put it through the various “engines” for calculating a global temperature index. Thanks to others who worked this problem ahead of me getting the data in Rghcnv3 was not that hard, but it was not without

## Welcome to rOpenSci

July 31, 2011
rOpenSci is a collaborative effort to develop R-based tools for facilitating Open Science. So what is Open Science?  Open science is the practice of making various elements of scientific research — data & methods, code & software, and results & publications — readily accessible to anyone. While this has great potential for advancing research (in

## Taking August off!

July 31, 2011
We'll be back with recharged batteries and lots of new entries in September. Have a great summer*!As usual, please send any questions you have about using SAS or R.*Not valid in the southern hemisphere.

## Your Data is Never the Right Shape

July 31, 2011
One of the recurring frustrations in data analytics is that your data is never in the right shape. Worst case: you are not aware of this and every step you attempt is more expensive, less reliable and less informative than you would want. Best case: you notice this and have the tools to reshape your Related posts:

## Getting My Eye In Around F1 Quali Data – Parallel Coordinate Plots, Sort of…

July 30, 2011
Looking over the sector times from the qualifying session for tomorrow’s Hungarian Grand Prix, I noticed that Vettel was only fastest in one of the sectors. Whilst looking for an easy way of shaping an R data frame so that I could plot categorical values sector1, sector2, sector3 on the x-axis, and then a line

July 30, 2011
This past Friday, the web portal to the US Federal government, USA.gov, organized hackathons across the US for programmers and data scientists to work with and analyze the data from their link-shortening service. It turns out that if you shorten a web link with bit.ly, the shortened link looks like 1.usa.gov/V6NpL (that one goes to

## RStudio 0.94.92 visited

July 30, 2011
I just updated my RStudio version to the latest, v.0.94.92 (will this asymptotically approach 1, or actually get to 1?). It was nice to see the number of improvements the development team has implemented, based I’m sure on community feedback. The team has, in my experience, been extraordinarily responsive to user feedback, and I’m sure

## Forking Myself

July 29, 2011
I’ve spent some time forking myself. Over the past few days when I could steal away an hour here  or there I decided to make a big change to the package. But it’s a good change. First some book-keeping. The Romantest.R file has a minor bug in it. Not really a bug, I just pulled

## Splitting Vectors of Uneven Strings

July 29, 2011
Suppose you have a vector of names such that the first three words in the vector contain relevant information, but there is a bunch of extraneous stuff. For example,Our goal is to collapse the first three words into one contiguous string (without the ...

## Text Editors in The Lord of the Rings

July 29, 2011
Prompted by a passing thought about TextMate, I thought I'd make a comprehensive, accurate, unbiased, and irrefutable survey of text editors by way of comparison to locations in The Lord of the Rings. TextMate: Minas Tirith A once-great but now decaying city. Only the King has the power to renew it, but he is a long absent, indeed...

## NppToR 2.6.0 beta 2

July 29, 2011
http://sourceforge.net/projects/npptor/files/npptor%20installer/NppToR-2.6.0.beta2.exe/download I’ve released beta 2 of NppToR 2.6.0.  Please take a look and report any problems.  This improves the installer and the uninstaller as well as a few bugs that popped up from the transition to UNICODE.

## The Road to Default: Whaa???

July 29, 2011
Okay so here is what has been happening:The yield curve has been going through a mad flattening- indicating that investors are "flying to safety" and that a recession may be looming around the corner. Why has it been flattening? Well, a string of bad n...

## multi-platform real-time ‘intro’ in R using rdyncall

July 29, 2011
Guest post by Daniel Adler. Below is a real-time audio-visual multimedia demonstration – or in short ‘an intro’ – written in 100% pure R. It requires no compilation and runs across major platforms via the package rdyncall and preinstalled precompiled standard libraries such as OpenGL and SDL libraries. This ‘happy-birthday’ production runs about 3 minutes

## Financial Engineering with R

July 29, 2011
At the InformationManagement blog, Steve Miller talks about the applications of R to financial engineering, and reviews David Ruppert's book Statistics and Data Analysis for Financial Engineering. InformationManagement: Statistics and Financial Engineering

## Infovis vs. statgraphics: A clear example of their different goals

July 29, 2011
I recently came across a data visualization that perfectly demonstrates the difference between the “infovis” and “statgraphics” perspectives. Here’s the image (link from Tyler Cowen): That’s the infovis. The statgraphic version would simply be a dotplot, something like this: (I purposely used the default settings in R with only minor modifications here to demonstrate what The post Infovis...

## [R][ggplot2][R-bloggers]RcmdrPlugin.KMggplot2_0.0-3 is on CRAN now

July 28, 2011
RcmdrPlugin.KMggplot2 (CRAN) I posted an Rcmdr plug-in for a ”ggplot2” GUI front-end on CRAN. This version supports Kaplan-Meier plot and other plots as follow: Kaplan-Meier plot Show no. at risk on inside Show no. at risk table on outside Histogram Colo

## Text Editors in The Lord of the Rings

July 28, 2011
Prompted by a passing thought about TextMate, I thought I’d make a comprehensive, accurate, unbiased, and irrefutable survey of text editors by way of comparison to locations in The Lord of the Rings. TextMate: Minas Tirith A once-great but now decaying city. Only the King has the power to renew it, but he is a long absent, indeed...

## Challenge alert — material identification

July 28, 2011
We start yet another series of post — challenge alerts. This series is intended to share news about machine learning or data mining challenges which may be interesting to the members of our community, possibly with some brief introduction to the problem. So if you hear about some contest, notify us on Skewed distribution. Today

## Le Monde puzzle [#29]

July 28, 2011
$Le Monde puzzle [#29]$

This week, the puzzle from the weekend edition of Le Monde was easy to state: in the sequence (8+17n), is there a 6th power? a 7th? an 8th? If so, give the first occurrence. So I first wrote an R code for a function testing whether an integer is any power: (The function returns the