## Forest plots using R and ggplot2

March 9, 2011
By

Abhijit over at Stat Bandit posted some nice code for making forest plots using ggplot2 in R. You see these lots of times in meta-analyses, or as seen in the BioVU demonstration paper. The idea is simple - on the x-axis you have the odds ratio (or what...

## Forest plots using R and ggplot2

March 9, 2011
By

Abhijit over at Stat Bandit posted some nice code for making forest plots using ggplot2 in R. You see these lots of times in meta-analyses, or as seen in the BioVU demonstration paper. The idea is simple - on the x-axis you have the odds ratio (or what...

## My First Few Days with RStudio

March 9, 2011
By

As most readers are probably aware, the free IDE for R, called RStudio, was recently released for general use and it immediately made huge waves within the R community. IDE stands for Integrated Development Environment. IDEs typically provides a rich set tools developing in some target language. For standard programming languages like C++ (VisualStudio) and Java (Eclipse or NetBeans),...

## Playing with quantiles, part 2

March 8, 2011
By

It is common to look at best time at the Marathon. Or perhaps the distribution of the top100, as done by John Myles White on his blog here (data can be found there), as the graph below, with the density of the time for the first 100 men (in blue) a...

## Playing with quantiles, part 1

March 8, 2011
By

A standard idea in extreme value theory (see e.g. here, in French unfortunately) is that to estimate the 99.5% quantile (say), we just need to estimate a quantile of level 95% for observations exceeding the 90% quantile. In extreme value theory,...

## Ascii code table in R

March 8, 2011
By

A quick method to enumerate the printable ascii characters with their hex & decimal values.The following code relies on taking the "raw" value of a base 10 int (this gives a hex value), and then using the builtin function rawToChar, which gives a character. You can of course change the range (up to 255). Not sure and haven't tested,...

## Can one beat a Random Walk– IMPOSSIBLE (you say?)

March 8, 2011
By

Firstly, apologies for the long absence as I've been busy with a few things.  Secondly, apologies for the horrific use of caps in the title (for the grammar monitors).  Certainly, you'll gain something useful from today's musing, as it's a pr...

## Challenge: Visualizing the US Federal Budget

March 8, 2011
By

Google today announced a Data Visualization Challenge that is well suited to the graphical capabilities of R. The goal is to visualize the US Federal budget from the point of view of the taxes an individual pays. The data are available from whatwepayfor.com -- their FAQ gives details about the source of the data and the philosophy of making...

## New R IDE

March 8, 2011
By

I'm always looking for ways to improve my workflow and overall academic efficiency. I've tried a variety of text editors, GUIs, and integrated development environments (IDEs) for R. I have some preferences but I haven't found anything that I'm complete...

March 8, 2011
By
$A Short Return to the Age-Earnings Profile$

Two posts ago I mentioned the age-earnings profile but did not provide a regression of log earnings on wage. I also offered, without evidence, that fitting a simple linear regression would be inappropriate. How do I know that? How could … Continue reading →

## Splitting a Dataset Revisited: Keeping Covariates Balanced Between Splits

March 8, 2011
By

In my previous post I showed you how to randomly split up a dataset into training and testing datasets. (Thanks to all those who emailed me or left comments letting me know that this could be done using other means. As things go with R, it's sometimes ...

## Splitting a Dataset Revisited: Keeping Covariates Balanced Between Splits

March 8, 2011
By

In my previous post I showed you how to randomly split up a dataset into training and testing datasets. (Thanks to all those who emailed me or left comments letting me know that this could be done using other means. As things go with R, it's sometimes ...

## Blackbox trading Strategy using Rapidminer and R II

March 8, 2011
By

Long time without updating the blog for lack of time (again) due to new professional and personal challenges. Continuing with the strategy of Black Box, thanks to recommendations made by several readers and the lack of time to make a good tutorial of the model, I’m going to make available the file with...

## Blackbox trading Strategy using Rapidminer and R II

March 8, 2011
By

Long time without updating the blog for lack of time (again) due to new professional and personal challenges. Continuing with the strategy of Black Box, thanks to recommendations made by several readers and the lack of time to make a good tutorial of the model, I’m going to make available the file with...

## R Studio

March 8, 2011
By

If you think that R is The EnvironmentForStatisticalAnalysisAndGraphics but you do not think that Vim is The Editor for text files you might want to have a look at R Studio. It works on Windows, MacOS and Linux. I tried it out on my Ubuntu

## In case you missed it: January Roundup

March 8, 2011
By

Catching up on roundups today. February roundup will follow soon, but in the meantime enjoy this trip down memory lane - DS. In case you missed them, here are some articles from January of particular interest to R users. Revolution Analytics is now offering annual sponsorship grants for local R user groups worldwide. Issue 2 of the R Journal...

## Video Tutorial on Instrumental Variables in R

March 8, 2011
By

Update: I have replaced this video tutorial with a video tutorial on a newer, easier to use IV regression command. Check out that command here.In this video, I show how to use my instrumental variables function in R, ivreg(), along with its companion ...

## IV Regression

March 8, 2011
By

Here is my code from a previous post that performs IV regression. This may be easier to copy into an R script. I will post a video tutorial using this code shortly.

## Machine Learning Ex3 – multivariate linear regression

March 8, 2011
By

Exercise 3 is about multivariate linear regression. First part is about finding a good learning rate (alpha) and 2nd part is about implementing linear regression using normal equations instead of the gradient descent algorithm. Data As usual hosted in google docs: mydata = read.csv("http://spreadsheets.google.com/pub?key=0AnypY27pPCJydExfUzdtVXZuUWphM19vdVBidnFFSWc&output=csv", header = TRUE) # show last 5 rows tail(mydata, 5) area bedrooms price 43 2567 ...

## An enhanced Kaplan-Meier plot

March 8, 2011
By

We often see, in publications, a Kaplan-Meier survival plot, with a table of the number of subjects at risk at different time points aligned below the figure. I needed this type of plot (or really, matrices of such plots) for an upcoming publication. Of course, my preferred toolbox was R and the ggplot2 package. There

## Our Friend the Age-Earnings Profile

March 7, 2011
By

I like Labor Economics. Partially because it has a nice mix of theory and practical empiricism, but mostly because it seems to be a sub-field with a number of agreed upon stylized facts that grow not out of micro theory … Continue reading →

March 7, 2011
By

## Alabama is a foreign country

March 7, 2011
By

Faculty and students of Iowa State University Department of Statistics published online an analysis of the data on 2009 distributions of the US Stimulus funds, aka the Recovery And Reinvestment Act. (The analysis was published in March last year as part of the Design for America competition, but I only recently came across it.) The analyses and associated charts...

## Basic Plots in R

March 7, 2011
By

Here's a tutorial I recorded on producing basic plots in R.I lost the script file I used to create the video to a horrifying black screen of death, but I used the data from the previous post (available here). Hopefully, the video is clear enough that ...

## Visualizing the Language Used by Academics when Protected by Anonymity

March 7, 2011
By

Those in the political science discipline probably remember their first encounter with poliscijobrumors.com. For those outside, you have probably never heard of this particular message board, and you would have no reason to. As the URL suggests, the board specializes in rumor, gossip, back-bitting, mudslinging, and the occasional lucid thread on the political science

## Example 8.29: Risk ratios and odds ratios

March 7, 2011
By

When can you safely think of an odds ratio as being similar to a risk ratio?Many people find odds ratios hard to interpret, and thus would prefer to have risk ratios. In response to this, you can find several papers that purport to convert an odds rat...

## R Tutorial Series: ANOVA Pairwise Comparison Methods

March 7, 2011
By

When we have a statistically significant effect in ANOVA and an independent variable of more than two levels, we typically want to make follow-up comparisons. There are numerous methods for making pairwise comparisons and this tutorial will demonstrate...