Damn scoping in R

February 17, 2011
By

Ok, R is very well-considered in certain respects, but there are also some things annoying me... This time it's scoping...

Read more »

Backtesting in Excel and R

February 17, 2011
By

This post is the introduction to a series that will illustrate how to backtest the same strategy in Excel and R.  The impetus for this series started with this tweet by Jared Woodard at Condor Options.  After Soren Macbeth introduced us, Jare...

Read more »

Backtesting in Excel and R

February 17, 2011
By

This post is the introduction to a series that will illustrate how to backtest the same strategy in Excel and R.  The impetus for this series started with this tweet by Jared Woodard at Condor Options.  After Soren Macbeth introduced us, Jare...

Read more »

Aligning labels in circular igraph layouts

February 17, 2011
By
Aligning labels in circular igraph layouts

The folks at IPE at UNC have produced this nice animated gif of some network data on increasing financial integration in the run-up to the 2008 crisis. They used a small trick I pointed to a while ago (just using a pipe, nothing fancy) that lets you...

Read more »

Le Monde puzzle [#6]

February 17, 2011
By
Le Monde puzzle [#6]

A simple challenge in Le Monde this week: find the group of four primes such that any sum of three terms in the group is prime and the overall sum is minimised. Here is a quick exploration by simulation, using the schoolmath package (with its imperfections): A=primes(start=1,end=53) lengthA=length(A) res=4*53 for (t in 1:10^4){ B=sample(A,4,prob=1/(1:lengthA)) sto=is.prim(sum(B))

Read more »

Student travel grants for useR! 2011

February 17, 2011
By

For students planning to attend the annual worldwide R user conference, useR! 2011, travel grants are available to help defray the cost of attending the conference in the UK. CRISM is offering bursaries for accommodation and conference fees, and Revolu...

Read more »

RSI(2) and the pre 80s Market

February 17, 2011
By
RSI(2) and the pre 80s Market

In his detailed research on RSI(2) indicator, MarketSci emphasized several times that the contrarian strategies based on the RSI(2) indicator didn’t start working until the 80s. I remembered this observation recently when I observed another interesting anomaly … In statistics, an important initial step in studying time series data is to consider the auto correlation

Read more »

R: Given column name in a Data Frame, Get the Index

February 17, 2011
By

Had a mental block today trying to figure out how to get the indices of columns in a data frame given their names. Simple task but difficult to search Google for an answer. Thanks to jashapiro, Matt, and Vince for giving me a heads up on the which() fu...

Read more »

R: Given column name in a Data Frame, Get the Index

February 17, 2011
By

Had a mental block today trying to figure out how to get the indices of columns in a data frame given their names. Simple task but difficult to search Google for an answer. Thanks to jashapiro, Matt, and Vince for giving me a heads up on the which() fu...

Read more »

Book Review: R Graphs Cookbook

February 17, 2011
By
Book Review: R Graphs Cookbook

Book InformationMittal, H. (2011). R graphs cookbook. Birmingham, UK: Packt Publishing Ltd.AudienceThe book's stated audience is anyone who is familiar with the basics of R, as well as expert users who are looking for a graphical reference. However, it...

Read more »

Book Review: R Graphs Cookbook

February 17, 2011
By
Book Review: R Graphs Cookbook

Book InformationMittal, H. (2011). R graphs cookbook. Birmingham, UK: Packt Publishing Ltd.AudienceThe book's stated audience is anyone who is familiar with the basics of R, as well as expert users who are looking for a graphical reference. However, it...

Read more »

Stata or R – How to create dynamic variables in R?

February 16, 2011
By

As we dig deeper into Stata or R debate, a few questions have come up.Question 1: One of the things Stata does well is the way it constructs new variables (see example below). How to do this in R? We can rewrite it as-is using for loops in R...

Read more »

Stata or R – How to create dynamic variables in R?

February 16, 2011
By

As we dig deeper into Stata or R debate, a few questions have come up.Question 1: One of the things Stata does well is the way it constructs new variables (see example below). How to do this in R? We can rewrite it as-is using for loops in R...

Read more »

Regional Variation in Law Enforcement Deaths – Part B

February 16, 2011
By
Regional Variation in Law Enforcement Deaths – Part B

I would like to thank Tal Galili for establishing and maintaining the blog aggregator at R-bloggers. This site has been added to their directory and new posts which are tagged with R will now appear on their feed. http://www.r-bloggers.com/ In part a, I presented a series of barplots which showed that the plurality of police

Read more »

Top 15 Daily Tweeters of #25bahman for the Past Five Days

February 16, 2011
By
Top 15 Daily Tweeters of #25bahman for the Past Five Days

My friend Michael Bommarito has been doing the data community quite a service, capturing and sharing all of the traffic on Twitter related to the Iranian protests. Specifically, he has all of the tweets containing the #25bahman hast-tag, and made them available for anyone to download. I am unable to resist the temptation to explore a

Read more »

Silver and Russell 2000

February 16, 2011
By
Silver and Russell 2000

When I find a chart that looks like this, I always like to explore a little further. via StockCharts.com I pull it into R and try to find anything worthwhile.  I do not find anything, except that I do not want to be trading both in the same direc...

Read more »

Summarize Missing Data for all Variables in a Data Frame in R

February 16, 2011
By

Something like this probably already exists in an R package somewhere out there, but I needed a function to summarize how much missing data I have in each variable of a data frame in R. Pass a data frame to this function and for each variable it'll give you the number of missing values, the total N, and the...

Read more »

Summarize Missing Data for all Variables in a Data Frame in R

February 16, 2011
By

Something like this probably already exists in an R package somewhere out there, but I needed a function to summarize how much missing data I have in each variable of a data frame in R. Pass a data frame to this function and for each variable it'll give you the number of missing values, the total N, and the...

Read more »

RHIPE: An Interface Between Hadoop and R for Large and Complex Data Analysis

February 16, 2011
By
RHIPE: An Interface Between Hadoop and R for Large and Complex Data Analysis

RHIPE: An Interface Between Hadoop and R Presented by Saptarshi Guha About the Video: I filmed the event using LectureMaker’s live event recording technique. One special feature I add to my R video recordings is the addition of my own R source code … Continue reading →

Read more »

RcppArmadillo 0.2.12

February 16, 2011
By

A new version 1.1.2 of Conrad Sanderson's Armadillo templated C++ library for linear algebra came out a couple of days ago. This has now been wrapped into a new version 0.2.12 of RcppArmadillo, our Rcpp-based integration into R. The short NEWS fil...

Read more »

Take the ggplot2 user survey

February 16, 2011
By

The author of the ggplot2 graphics package for R, Hadley Wickham, is looking for feedback from ggplot2 users. If you've used ggplot2, fill out his short survey at the link below. WuFoo: ggplot2 survey

Read more »

The Egyptian Revolution, in tweets

February 16, 2011
By
The Egyptian Revolution, in tweets

Twitter played a significant role in the recent uprising in Egypt, with protesters communicating via tweets marked with the #25bahman hastag (February 14 in the arabic calendar) to plan and rally for the demonstration. Michael Bommarito downloaded all such tweets and plotted their frequency over time using R's ggplot2 library: Not surprisingly, the activity peaked on February 14. The...

Read more »

Pre-processing text: R/tm vs. python/NLTK

February 16, 2011
By
Pre-processing text: R/tm vs. python/NLTK

  Let’s say that you want to take a set of documents and apply a computational linguistic technique.  If your method is based on the bag-of-words model, you probably need to pre-process these documents first by segmenting, tokenizing, stripping, stopwording, and … Continue reading →

Read more »

Twin Cities R User Group Meeting Tonight!

February 16, 2011
By

TCRUG will be having a meeting TONIGHT (2/16) at 5:30 PM. We will meet in ROOM 29 in Willey Hall. Willey Hall is located on the West Bank of the Minneapolis campus. See the Google map at http://goo.gl/tnRnU. Erik Iverson will be giving a talk ...

Read more »

Twin Cities R User Group Meeting Tonight!

February 16, 2011
By

TCRUG will be having a meeting TONIGHT (2/16) at 5:30 PM. We will meet in ROOM 29 in Willey Hall. Willey Hall is located on the West Bank of the Minneapolis campus. See the Google map at http://goo.gl/tnRnU. Erik Iverson will be giving a talk ...

Read more »

Mapping London’s Population Change 1801-2030

February 16, 2011
By
Mapping London’s Population Change 1801-2030

Buried in the London Datastore are the population estimates for each of the London Boroughs between 2001 – 2030. They predict a declining population for most boroughs with the exception of a few to the east. I was surprised by this general decline and also the numbers involved- I expected larger changes from one year to ...

Read more »

Regional Variation in Law Enforcement Deaths – Part A

February 15, 2011
By
Regional Variation in Law Enforcement Deaths – Part A

In recent months, there has been a series of high profile incidents in the United States where police officers were killed. While such events are unfortunate, the data suggests that it is extremely rare for an officer to be harmed or killed while on duty. In this post, I examine whether there are significant regional

Read more »

Mixed models – Part 2: lme lmer

February 15, 2011
By
Mixed models – Part 2: lme lmer

Getting more into mixed models, I’ve been playing around with both nlme::lme and lme4::lmer. http://tolstoy.newcastle.edu.au/R/e2/help/06/10/3345.html was quite a good post at explaining the differences, which from what I gather is largely performance based when using crossed or partially crossed models. In the models I am tinkering with at the moment I am noticing differences in

Read more »

Boxplots and Beyond III: Violin Plots

Boxplots and Beyond III: Violin Plots

This post is the third in a series of four on boxplots and closely related data visualization techniques for comparing subsets of a dataset, or comparing different datasets that we hope or expect to be similarly distributed.  The previous two post...

Read more »