Monthly Archives: June 2011

Don Quijote — Word Statistics

June 4, 2011
By
Don Quijote — Word Statistics

Using the Gutenberg Project’s free text of Don Quijote + Unix for Poets, here are the most used (non-short) words in Miguel de Cervantes’ famous work: 2167 Quijote 2145 Sancho 1331 porque 1053 respondió 1027 había  900 merced  813 vuestra  79...

Read more »

searching ITIS and fetching Phylomatic trees

June 3, 2011
By
searching ITIS and fetching Phylomatic trees

I am writing a set of functions to search ITIS for taxonomic information (more databases to come) and functions to fetch plant phylogenetic trees from Phylomatic. Code at github.Also, see the examples in the demos folder on the Github site above.

Read more »

Visualizing small-scale paired data – combining boxplots, stripcharts, and confidence-intervals in R

June 3, 2011
By
Visualizing small-scale paired data – combining boxplots, stripcharts, and confidence-intervals in R

Sometimes when working with small paired data-sets it is nice to see/show all the data in a structured form. For example when looking at pre-post comparisons, connected dots are a natural way to visualize which data-points belong together. In R this can be easily be combined with boxplots expressing the overall distribution of the data.  This

Read more »

Using R for Stata to CSV Conversion

June 3, 2011
By

I recently found myself in the unpleasant situation of needing to read a Stata .dta file, but not having Stata readily available to me. Normally, I’d fire up a text editor and deconstruct the file, except Stata saves its data in a proprietary Binary format, meaning it garbles some of the content of the file.

Read more »

Example 8.39: calculating Cramer’s V

June 3, 2011
By
Example 8.39: calculating Cramer’s V

Cramer's V is a measure of association for nominal variables. Effectively it is the Pearson chi-square statistic rescaled to have values between 0 and 1, as follows:V = sqrt(X^2 / )where X^2 is the Pearson chi-square, n...

Read more »

Simulating CMYK mis-registration printing

June 3, 2011
By
Simulating CMYK mis-registration printing

I recently came across a poster advertising a children's production of Shakespeare's The Tempest where they purposely used an effect to mimic a mis-registration in CMYK printing. You have probably seen this before as a slight offset in one of t...

Read more »

The residuals of crime

June 3, 2011
By
The residuals of crime

Real-estate search website Trulia has a new tool to help you in your choice of a new home: crime maps. With local police forces being much better about sharing data crime maps are nothing new, but Trulia takes it to the next level with a slick user interface for navigating US cities, a beautiful heat-map visualization of crime hot-spots...

Read more »

Always learn and never know

June 3, 2011
By
Always learn and never know

I have been using R for about two years, with no previous coding background. So, I feel like the title says, “always learn and never know”. This time, I decided to use R to study a simple, non-statistical problem that came up some time ago. Suppose the exponential function 2^x and the parabola x^2. One

Read more »

Merge all files in a directory using R into a single dataframe

June 3, 2011
By
Merge all files in a directory using R into a single dataframe

In this post, I provide a simple script for merging a set of files in a directory into a single, …Continue reading »

Read more »

Optmatch and RItools — New homes and techniques

June 2, 2011
By

Co-developers Jake Bowers, Ben Hansen and I are happy to announce that our R packages optmatch and RItools have new homes on GitHub. We had previously been managing development on private subversion repositories and managed the projects through an ad-h...

Read more »