Don Quijote — Word Statistics

June 4, 2011
By
Don Quijote — Word Statistics

Using the Gutenberg Project’s free text of Don Quijote + Unix for Poets, here are the most used (non-short) words in Miguel de Cervantes’ famous work: 2167 Quijote 2145 Sancho 1331 porque 1053 respondió 1027 había  900 merced  813 vuestra  79...

Read more »

Don Quijote — Word Statistics

June 4, 2011
By
Don Quijote — Word Statistics

Using the Gutenberg Project’s free text of Don Quijote + Unix for Poets, here are the most used (non-short) words in Miguel de Cervantes’ famous work: 2167 Quijote 2145 Sancho 1331 porque 1053 respondió 1027 había  900 merced  813 vuestra  79...

Read more »

searching ITIS and fetching Phylomatic trees

June 3, 2011
By
searching ITIS and fetching Phylomatic trees

I am writing a set of functions to search ITIS for taxonomic information (more databases to come) and functions to fetch plant phylogenetic trees from Phylomatic. Code at github.Also, see the examples in the demos folder on the Github site above.

Read more »

Visualizing small-scale paired data – combining boxplots, stripcharts, and confidence-intervals in R

June 3, 2011
By
Visualizing small-scale paired data – combining boxplots, stripcharts, and confidence-intervals in R

Sometimes when working with small paired data-sets it is nice to see/show all the data in a structured form. For example when looking at pre-post comparisons, connected dots are a natural way to visualize which data-points belong together. In R this can be easily be combined with boxplots expressing the overall distribution of the data.  This

Read more »

Using R for Stata to CSV Conversion

June 3, 2011
By

I recently found myself in the unpleasant situation of needing to read a Stata .dta file, but not having Stata readily available to me. Normally, I’d fire up a text editor and deconstruct the file, except Stata saves its data in a proprietary Binary format, meaning it garbles some of the content of the file.

Read more »

Example 8.39: calculating Cramer’s V

June 3, 2011
By
Example 8.39: calculating Cramer’s V

Cramer's V is a measure of association for nominal variables. Effectively it is the Pearson chi-square statistic rescaled to have values between 0 and 1, as follows:V = sqrt(X^2 / )where X^2 is the Pearson chi-square, n...

Read more »

Simulating CMYK mis-registration printing

June 3, 2011
By
Simulating CMYK mis-registration printing

I recently came across a poster advertising a children's production of Shakespeare's The Tempest where they purposely used an effect to mimic a mis-registration in CMYK printing. You have probably seen this before as a slight offset in one of t...

Read more »

The residuals of crime

June 3, 2011
By
The residuals of crime

Real-estate search website Trulia has a new tool to help you in your choice of a new home: crime maps. With local police forces being much better about sharing data crime maps are nothing new, but Trulia takes it to the next level with a slick user interface for navigating US cities, a beautiful heat-map visualization of crime hot-spots...

Read more »

Always learn and never know

June 3, 2011
By
Always learn and never know

I have been using R for about two years, with no previous coding background. So, I feel like the title says, “always learn and never know”. This time, I decided to use R to study a simple, non-statistical problem that came up some time ago. Suppose the exponential function 2^x and the parabola x^2. One

Read more »

Merge all files in a directory using R into a single dataframe

June 3, 2011
By
Merge all files in a directory using R into a single dataframe

In this post, I provide a simple script for merging a set of files in a directory into a single, …Continue reading »

Read more »

Optmatch and RItools — New homes and techniques

June 2, 2011
By

Co-developers Jake Bowers, Ben Hansen and I are happy to announce that our R packages optmatch and RItools have new homes on GitHub. We had previously been managing development on private subversion repositories and managed the projects through an ad-h...

Read more »

A Quantstrat to Build On

June 2, 2011
By
A Quantstrat to Build On

THIS IS NOT INVESTMENT ADVICE.  PLEASE DO NOT TRADE THIS SYSTEM AS IT CAN LOSE SIGNIFICANT AMOUNTS OF MONEY.  YOU ARE RESPONSIBLE FOR YOUR OWN GAINS AND LOSSES. Some R finance powerhouses have been banging away on the quantstrat package for q...

Read more »

Highlights from R/Finance 2011 presentations

June 2, 2011
By

Patrick Burns offers his selections from the presentations at the R/Finance 2011 conference. Check out his post for overviews of some great presentations (and truly, there's some awesome content available to download). I'll add another of my favourites: Bryan Lewis's presentation of his interface from R to the betfair betting market. (But if you use it to automate bets...

Read more »

Australians and Americans, 10 years after 9/11

June 2, 2011
By

With Lynn Vavreck at UCLA, I ran parallel public opinion surveys in Australia and the United States, measuring attitudes on security, the fight against terrorism, the wars in Afghanistan etc, some 10 years after the 9/11 attacks. Full report here (gene...

Read more »

Sweave diagram, following Knuth’s original

June 2, 2011
By
Sweave diagram, following Knuth’s original

In preparation for a talk, I updated Knuth's original diagram in Donald E. Knuth. Literate programming. The Computer Journal, 27(2):97–111, May 1984. The new diagram is Sweave specific. Click the Sweave diagram for a PDF version, or right-click and select 'save image as' for the PNG version. Permission is granted for any use of the

Read more »

Selections from the R/Finance conference

June 2, 2011
By
Selections from the R/Finance conference

The R/Finance conference happened in Chicago at the end of April.  If, like me, you weren’t there, you can still benefit from it because slides from many of the talks are now online. Here is a quick synopsis (in chronological order) of some of the talks I found most interesting. Michael Kane Michael Kane and … Continue reading...

Read more »

Annual Returns by State of the US Economy

June 1, 2011
By
Annual Returns by State of the US Economy

Sometimes it is fun to just look at annual returns, especially as the financial world has shifted its focus to microseconds in a world of inconceivable macro imbalances.  St. Louis Fed (USREC) offers a binary state of the economy with 1=recession ...

Read more »

Kaggle Competition Walkthrough: Wrapup

June 1, 2011
By
Kaggle Competition Walkthrough: Wrapup

The Kaggle Don't Overfit competition is over, and I took 11th place! Additionally, I tied with tks for contributing the most to the forum, so thanks to everyone who voted for me! I voted for tks, and I'm very happy to share the prize with him, as most...

Read more »

R in a nutshell

June 1, 2011
By
R in a nutshell

I got this book as a reference for my work with R and do like it. Just after browsing the chapters I already found some useful hints about loading and manipulating data, e.g., loading of fixed-width data files!

Read more »

Day #54 Major bugfix

It seems that Rserve works in a 1 byte charactersystem. This was giving us strange results. When I read in data from KNIME with strange characters like µ the errors began. Every time I used the character “µ”, my output in Rserve would be:...

Read more »

New version of analogue (0.7-0)

June 1, 2011
By
New version of analogue (0.7-0)

Last week I pushed an update of my analogue package to CRAN. The last release (0.6-23) was on CRAN sometime in Mar 2010 so an update was well overdue. This (0.7-0) is a major update to analogue containing lots of … Continue reading →

Read more »

Drawing Grids in R

June 1, 2011
By
Drawing Grids in R

Here's an example of how to draw a grid in R and how to fill it.I did use the grid-package and its functions for displaying species cover values at squares of a recording frame...Read more »

Read more »

Day #54 Major bugfix

June 1, 2011
By

It seems that Rserve works in a 1 byte charactersystem. This was giving us strange results. When I read in data from KNIME with strange characters like µ the errors began. Every time I used the character “µ”, my output in Rserve would be:...

Read more »

Using R in Excel

Got to know a very cool tool to use R in Excel named RExcel, basically it provides an integration solution such that users can get data, run command in Excel the same way as in R, which is presumably good and convenient to present results to your coll...

Read more »

Reduce Memory Use for Large Datasets

One key limiting factor for automated text classification is memory consumption. As you accumulate more news articles, bills, and legal opinions, the term-document matrices used to represent the data grow quickly. RTextTools provides two algorithms, support vector machines and maximum entropy, that can handle large datasets with very little memory. Luckily, these two algorithms tend to be the most...

Read more »

Minor update to Vegan (1.17-10)

June 1, 2011
By
Minor update to Vegan (1.17-10)

I overlooked blogging about this at the time, but Jari released a minor update to our Vegan package to fix a few issues following release of R 2.13-0. As far as the user is concerned, this mainly affects capscale(). metaMDSrotate(), … Continue reading →

Read more »

New version of analogue (0.7-0)

June 1, 2011
By

Last week I pushed an update of my analogue package to CRAN. The last release (0.6-23) was on CRAN sometime in Mar 2010 so an update was well overdue.

Read more »

Minor update to Vegan (1.17-10)

June 1, 2011
By

I overlooked blogging about this at the time, but Jari released a minor update to our Vegan package to fix a few issues following release of R 2.13-0. As far as the user is concerned, this mainly affects capscale(). metaMDSrotate(), a helper function for rotating nMDS solutions from function metaMDS() can now handle missing values via argument na.rm =...

Read more »

A dubious statistics

May 31, 2011
By
A dubious statistics

Following a link on R-bloggers, I ended up on this page (with a completely useless graph that only contained the pieces of information 5% in 1900 and 55% in 2000). The author (Ralph Keeney) reports on “A remarkable 55 percent of deaths for people age 15 to 64 can be attributed to decisions with readily

Read more »