Comparing two-dimensional data sets in R; take II

March 10, 2011
By
Comparing two-dimensional data sets in R; take II

David commented on yesterday's post and suggested to put the continuous fitted distribution in the background and the discrete, empirical distribution in the foreground. This looks quite nice, although there's a slight optical illusion that makes the c...

Read more »

Beeswarm Boxplot (and plotting it with R)

March 10, 2011
By
Beeswarm Boxplot (and plotting it with R)

(The image above is called a “Beeswarm Boxplot” , the code for producing this image is provided at the end of this post) The above plot is implemented under different names in different softwares. This “Scatter Dot Beeswarm Box Violin – plot” (in the lack of an agreed upon term) is a one-dimensional scatter plot

Read more »

R: Drop factor levels in a dataset

March 9, 2011
By

R has factors, which are very cool (and somewhat analogous to labeled levels in Stata). Unfortunately, the factor list sticks around even if you remove some data such that no examples of a particular level still exist# Create some fake datax levels(x)x levels(x) # still the same levelstable(x) # even though one level has 0 entries!The solution is...

Read more »

Howling Winds and Stochastic Tones

March 9, 2011
By
Howling Winds and Stochastic Tones

My greatest pleasures in mathematics come from observing--and here, listening to--the interplay of simple and complex. With a few axioms and definitions you can create surprising worlds, and in what seems like a mess you can find beautiful regularities. It's damn sexy, frankly. Here, I use a simple recursive equation to directly generate my sounds

Read more »

The R-Files: Call for Nominations

March 9, 2011
By

We run an occasional series here on Revolutions called The R-Files, in which we profile members of the R community. Our intention with this series is to call out noteworthy work being done for open-source R and popular CRAN packages, and shine a light on some of the noteworthy individuals that make up what is now a broad community...

Read more »

Special issue of TOMACS

March 9, 2011
By
Special issue of TOMACS

TOMACS (ACM Transactions on Modeling and Computer Simulation) is launching a call for paper submission. The special topic is Monte Carlo Methods in Statistics and Arnaud Doucet and myself are the special issue editors. Here are the details.: Over the last two decades Monte Carlo methods have attracted much attention from statisticians as they provide

Read more »

Tips on installing R extension for Rapidminer on Mac OS X

March 9, 2011
By
Tips on installing R extension for Rapidminer on Mac OS X

Rapidminer is a cool toy to play with machine-learning/data-mining algorithms and it can interface with R. However, it was a bit problematic for me to get the R extension working properly on Mac OS X Leopard for R 2.11. Here is what works for me at the...

Read more »

Tips on installing R extension for Rapidminer on Mac OS X

March 9, 2011
By
Tips on installing R extension for Rapidminer on Mac OS X

Rapidminer is a cool toy to play with machine-learning/data-mining algorithms and it can interface with R. However, it was a bit problematic for me to get the R extension working properly on Mac OS X Leopard for R 2.11. Here is what works for me at the...

Read more »

In case you missed it: February Roundup

March 9, 2011
By

In case you missed them, here are some articles from February of particular interest to R users. Revolution R Enterprise 4.2 is now available to subscribers, and for free download to academics. A brief report from the Strata: Working with Data conference, and a comprehensive review from Ted Leung. A profile of prolific R contributor, Dirk Eddelbuettel. A list...

Read more »

Comparing two-dimensional data sets in R

March 9, 2011
By
Comparing two-dimensional data sets in R

I wanted to fit a continuous function to a discrete 2D distribution in R. I managed to do this by using nls, and wanted to display the data. I discovered a nice way to compare the actual data and the fit using ggplot2, where the background is the real ...

Read more »

Comparing two-dimensional data sets in R

March 9, 2011
By
Comparing two-dimensional data sets in R

I wanted to fit a continuous function to a discrete 2D distribution in R. I managed to do this by using nls, and wanted to display the data. I discovered a nice way to compare the actual data and the fit using ggplot2, where the background is the real ...

Read more »

Forest plots using R and ggplot2

March 9, 2011
By
Forest plots using R and ggplot2

Abhijit over at Stat Bandit posted some nice code for making forest plots using ggplot2 in R. You see these lots of times in meta-analyses, or as seen in the BioVU demonstration paper. The idea is simple - on the x-axis you have the odds ratio (or what...

Read more »

Forest plots using R and ggplot2

March 9, 2011
By

Abhijit over at Stat Bandit posted some nice code for making forest plots using ggplot2 in R. You see these lots of times in meta-analyses, or as seen in the BioVU demonstration paper. The idea is simple - on the x-axis you have the odds ratio (or what...

Read more »

My First Few Days with RStudio

March 9, 2011
By
My First Few Days with RStudio

As most readers are probably aware, the free IDE for R, called RStudio, was recently released for general use and it immediately made huge waves within the R community. IDE stands for Integrated Development Environment. IDEs typically provides a rich set tools developing in some target language. For standard programming languages like C++ (VisualStudio) and Java (Eclipse or NetBeans),...

Read more »

Playing with quantiles, part 2

March 8, 2011
By
Playing with quantiles, part 2

It is common to look at best time at the Marathon. Or perhaps the distribution of the top100, as done by John Myles White on his blog here (data can be found there), as the graph below, with the density of the time for the first 100 men (in blue) a...

Read more »

Playing with quantiles, part 1

March 8, 2011
By
Playing with quantiles, part 1

A standard idea in extreme value theory (see e.g. here, in French unfortunately) is that to estimate the 99.5% quantile (say), we just need to estimate a quantile of level 95% for observations exceeding the 90% quantile. In extreme value theory,...

Read more »

Ascii code table in R

March 8, 2011
By

A quick method to enumerate the printable ascii characters with their hex & decimal values.The following code relies on taking the "raw" value of a base 10 int (this gives a hex value), and then using the builtin function rawToChar, which gives a character. You can of course change the range (up to 255). Not sure and haven't tested,...

Read more »

Can one beat a Random Walk– IMPOSSIBLE (you say?)

March 8, 2011
By
Can one beat a Random Walk– IMPOSSIBLE (you say?)

Firstly, apologies for the long absence as I've been busy with a few things.  Secondly, apologies for the horrific use of caps in the title (for the grammar monitors).  Certainly, you'll gain something useful from today's musing, as it's a pr...

Read more »

Challenge: Visualizing the US Federal Budget

March 8, 2011
By

Google today announced a Data Visualization Challenge that is well suited to the graphical capabilities of R. The goal is to visualize the US Federal budget from the point of view of the taxes an individual pays. The data are available from whatwepayfor.com -- their FAQ gives details about the source of the data and the philosophy of making...

Read more »

New R IDE

March 8, 2011
By
New R IDE

I'm always looking for ways to improve my workflow and overall academic efficiency. I've tried a variety of text editors, GUIs, and integrated development environments (IDEs) for R. I have some preferences but I haven't found anything that I'm complete...

Read more »

A Short Return to the Age-Earnings Profile

March 8, 2011
By
A Short Return to the Age-Earnings Profile

Two posts ago I mentioned the age-earnings profile but did not provide a regression of log earnings on wage. I also offered, without evidence, that fitting a simple linear regression would be inappropriate. How do I know that? How could … Continue reading →

Read more »

Splitting a Dataset Revisited: Keeping Covariates Balanced Between Splits

March 8, 2011
By
Splitting a Dataset Revisited: Keeping Covariates Balanced Between Splits

In my previous post I showed you how to randomly split up a dataset into training and testing datasets. (Thanks to all those who emailed me or left comments letting me know that this could be done using other means. As things go with R, it's sometimes ...

Read more »

Splitting a Dataset Revisited: Keeping Covariates Balanced Between Splits

March 8, 2011
By

In my previous post I showed you how to randomly split up a dataset into training and testing datasets. (Thanks to all those who emailed me or left comments letting me know that this could be done using other means. As things go with R, it's sometimes ...

Read more »

Blackbox trading Strategy using Rapidminer and R II

March 8, 2011
By
Blackbox trading Strategy using Rapidminer and R II

Long time without updating the blog for lack of time (again) due to new professional and personal challenges. Continuing with the strategy of Black Box, thanks to recommendations made by several readers and the lack of time to make a good tutorial of the model, I’m going to make available the file with...

Read more »

Blackbox trading Strategy using Rapidminer and R II

March 8, 2011
By
Blackbox trading Strategy using Rapidminer and R II

Long time without updating the blog for lack of time (again) due to new professional and personal challenges. Continuing with the strategy of Black Box, thanks to recommendations made by several readers and the lack of time to make a good tutorial of the model, I’m going to make available the file with...

Read more »

R Studio

March 8, 2011
By
R Studio

                      If you think that R is The EnvironmentForStatisticalAnalysisAndGraphics but you do not think that Vim is The Editor for text files you might want to have a look at R Studio. It works on Windows, MacOS and Linux. I tried it out on my Ubuntu

Read more »

In case you missed it: January Roundup

March 8, 2011
By

Catching up on roundups today. February roundup will follow soon, but in the meantime enjoy this trip down memory lane - DS. In case you missed them, here are some articles from January of particular interest to R users. Revolution Analytics is now offering annual sponsorship grants for local R user groups worldwide. Issue 2 of the R Journal...

Read more »

Video Tutorial on Instrumental Variables in R

March 8, 2011
By
Video Tutorial on Instrumental Variables in R

Update: I have replaced this video tutorial with a video tutorial on a newer, easier to use IV regression command. Check out that command here.In this video, I show how to use my instrumental variables function in R, ivreg(), along with its companion ...

Read more »

IV Regression

March 8, 2011
By
IV Regression

Here is my code from a previous post that performs IV regression. This may be easier to copy into an R script. I will post a video tutorial using this code shortly.

Read more »