Drafting the Documentation for RTextTools

In preparation for The 4th Annual Conference of the Comparative Policy Agendas Project in Catania, Sicily, our development team has been busy drafting the documentation for RTextTools. In addition to standard documentation of functions, we want to provide quick-start guides, sample datasets, example scripts, and

Read more »

How to fit power laws

June 7, 2011
By
How to fit power laws

A new paper out in Ecology by Xiao and colleagues (in press, here) compares the use of log-transformation to non-linear regression for analyzing power-laws.They suggest that the error distribution should determine which method performs better. When you...

Read more »

A Quantstrat to Build on Part 2

June 7, 2011
By
A Quantstrat to Build on Part 2

As I explore additional functionality of quantstrat and make changes to my original post A Quantstrat to Build On, I will write multiple posts, and hopefully, the finished product will not be so overwhelming to comprehend.  Also, it might highligh...

Read more »

The ‘Big Analytics’ Revolution Starts with R: Webinar June 14

June 7, 2011
By

On Tuesday next week I'll be teaming up with Revolution Analytics' Mike Minelli to give a 30-minute webinar to introduce executives to R, Big Data, and applications of advanced analytics. If there's someone in your company who needs to know about the impact of R on getting value out of data, they can register here. Here's the agenda: The...

Read more »

R books are now showing up in the dollar bin. That’s a good…

June 7, 2011
By
R books are now showing up in the dollar bin. That’s a good…

R books are now showing up in the dollar bin. That’s a good sign!

Read more »

K-Means Clustering on Big Data

June 7, 2011
By
K-Means Clustering on Big Data

In this post Joseph Rickert demonstrates how to build a classification model on a large data set with the RevoScaleR package. A script file for use with Revolution R Enterprise to recreate the analysis below is at the end of the post, and can also be downloaded here -- ed. The k-means (Lloyd) algorithm, an intuitive way to explore...

Read more »

The pros and cons of robust data characterizations

The pros and cons of robust data characterizations

Over the years, I have looked at a lot of data contaminated with outliers, the subject of Chapter 7 of Exploring Data in Engineering, the Sciences, and Medicine.  That chapter adopts the definition of an outlier presented by Barnett and Lewis in their book Outliers in Statistical Data 2nd Edition

Read more »

Fittesmodel.com: A user-friendly way to conduct empirical research together

June 6, 2011
By

(A guest post by Camiel de Koning) ————– When trying to replicate, verify or extend empirical research of others, a researcher generally encounters many time-consuming barriers and there are often many prerequisites. Fittestmodel has the objective to overcome many of these problems, by presenting a webapplication that allows users to: use but not having to install R. quickly incorporate...

Read more »

R for Data Mining

June 6, 2011
By

Statistics and data mining often get bundled together, but (in my opinion), they're generally different practices with different goals. As a language designed for statistics, much of R's core functionality is focused on exploring and understanding data: model design, inference, and visualization. But when your goal is simply to get the best predictions from a big data set (without...

Read more »

In case you missed it: May Roundup

June 6, 2011
By

In case you missed them, here are some articles from May of particular interest to R users. A review of "R Cookbook", a new how-to book for R programmers. A detailed example of using the RevoScaleR package to analyze a large airline data set. A new guide for R beginners, "How to Learn R", provides links to R resources,...

Read more »

Shared Ecological Modelling References

June 6, 2011
By

05.06.2011 Today i started to create a list of books and articles about ecological modelling. In this list you will not only find general books about modelling but also books about spatial analysis, image analysis and other (in my opinion) important techniques useful in the context of ecological modelling. For the collection i use “Zotero”

Read more »

10 R One Liners to Impress Your Friends

June 5, 2011
By

Following the trend of one liners for various languages (Haskell, Scala, Python), here's some examples in RMultiply Each Item in a List by 2#listslapply(list(1:4),function(n){n*2})# otherwise(1:4)*2 Sum a List of Numbers#listslapply(list(1:4),sum)# oth...

Read more »

Conway’s Game of Life in R with ggplot2 and animation

June 5, 2011
By

In undergrad I had a computer science professor that piqued my interest in applied mathematics, beginning with Conway’s Game of Life. At first, the Game of Life (not the board game) appears to be quite simple — perhaps, too simple — but it has been widely explored and is useful for modeling systems over time. It has been...

Read more »

An application of aggregate() and merge()

June 5, 2011
By
An application of aggregate() and merge()

Today, I encountered an interesting problem while processing a data set of mine. My data have observations on businesses that are repeated over time. My data set also contains information on longitude and latitude of the business location, but unfort...

Read more »

Conway’s Game of Life in R with ggplot2 and animation

June 5, 2011
By
Conway’s Game of Life in R with ggplot2 and animation

In undergrad I had a computer science professor that piqued my interest in applied mathematics, beginning with Conway’s Game of Life. At first, the Game of Life (not the board game) appears to be quite simple — perhaps, too simple — but it has been widely explored and is useful for modeling systems over time.

Read more »

Testing Different Methods for Merging a set of Files into a Dataframe

June 5, 2011
By
Testing Different Methods for Merging a set of Files into a Dataframe

I previously posted a method I used for merging a set of files into a dataframe. It wasn’t long before …Continue reading »

Read more »

Environments in R

June 4, 2011
By
Environments in R

One interesting thing about R is that you can get down into the insides fairly easily. You're allowed to see more of how things are put together than in most languages. One of the ways R does this is by having first-class environments. At first glance, environments are simple enough. An environment...

Read more »

Don Quijote — Word Statistics

June 4, 2011
By
Don Quijote — Word Statistics

Using the Gutenberg Project’s free text of Don Quijote + Unix for Poets, here are the most used (non-short) words in Miguel de Cervantes’ famous work: 2167 Quijote 2145 Sancho 1331 porque 1053 respondió 1027 había  900 merced  813 vuestra  79...

Read more »

Don Quijote — Word Statistics

June 4, 2011
By
Don Quijote — Word Statistics

Using the Gutenberg Project’s free text of Don Quijote + Unix for Poets, here are the most used (non-short) words in Miguel de Cervantes’ famous work: 2167 Quijote 2145 Sancho 1331 porque 1053 respondió 1027 había  900 merced  813 vuestra  79...

Read more »

searching ITIS and fetching Phylomatic trees

June 3, 2011
By
searching ITIS and fetching Phylomatic trees

I am writing a set of functions to search ITIS for taxonomic information (more databases to come) and functions to fetch plant phylogenetic trees from Phylomatic. Code at github.Also, see the examples in the demos folder on the Github site above.

Read more »

Visualizing small-scale paired data – combining boxplots, stripcharts, and confidence-intervals in R

June 3, 2011
By
Visualizing small-scale paired data – combining boxplots, stripcharts, and confidence-intervals in R

Sometimes when working with small paired data-sets it is nice to see/show all the data in a structured form. For example when looking at pre-post comparisons, connected dots are a natural way to visualize which data-points belong together. In R this can be easily be combined with boxplots expressing the overall distribution of the data.  This

Read more »

Using R for Stata to CSV Conversion

June 3, 2011
By

I recently found myself in the unpleasant situation of needing to read a Stata .dta file, but not having Stata readily available to me. Normally, I’d fire up a text editor and deconstruct the file, except Stata saves its data in a proprietary Binary format, meaning it garbles some of the content of the file.

Read more »

Example 8.39: calculating Cramer’s V

June 3, 2011
By
Example 8.39: calculating Cramer’s V

Cramer's V is a measure of association for nominal variables. Effectively it is the Pearson chi-square statistic rescaled to have values between 0 and 1, as follows:V = sqrt(X^2 / )where X^2 is the Pearson chi-square, n...

Read more »

Simulating CMYK mis-registration printing

June 3, 2011
By
Simulating CMYK mis-registration printing

I recently came across a poster advertising a children's production of Shakespeare's The Tempest where they purposely used an effect to mimic a mis-registration in CMYK printing. You have probably seen this before as a slight offset in one of t...

Read more »

The residuals of crime

June 3, 2011
By
The residuals of crime

Real-estate search website Trulia has a new tool to help you in your choice of a new home: crime maps. With local police forces being much better about sharing data crime maps are nothing new, but Trulia takes it to the next level with a slick user interface for navigating US cities, a beautiful heat-map visualization of crime hot-spots...

Read more »

Always learn and never know

June 3, 2011
By
Always learn and never know

I have been using R for about two years, with no previous coding background. So, I feel like the title says, “always learn and never know”. This time, I decided to use R to study a simple, non-statistical problem that came up some time ago. Suppose the exponential function 2^x and the parabola x^2. One

Read more »

Merge all files in a directory using R into a single dataframe

June 3, 2011
By
Merge all files in a directory using R into a single dataframe

In this post, I provide a simple script for merging a set of files in a directory into a single, …Continue reading »

Read more »

Optmatch and RItools — New homes and techniques

June 2, 2011
By

Co-developers Jake Bowers, Ben Hansen and I are happy to announce that our R packages optmatch and RItools have new homes on GitHub. We had previously been managing development on private subversion repositories and managed the projects through an ad-h...

Read more »

A Quantstrat to Build On

June 2, 2011
By
A Quantstrat to Build On

THIS IS NOT INVESTMENT ADVICE.  PLEASE DO NOT TRADE THIS SYSTEM AS IT CAN LOSE SIGNIFICANT AMOUNTS OF MONEY.  YOU ARE RESPONSIBLE FOR YOUR OWN GAINS AND LOSSES. Some R finance powerhouses have been banging away on the quantstrat package for q...

Read more »

Sponsors