Stata or R – How to create dynamic variables in R?

February 16, 2011
By

As we dig deeper into Stata or R debate, a few questions have come up.Question 1: One of the things Stata does well is the way it constructs new variables (see example below). How to do this in R? We can rewrite it as-is using for loops in R...

Read more »

Stata or R – How to create dynamic variables in R?

February 16, 2011
By

As we dig deeper into Stata or R debate, a few questions have come up.Question 1: One of the things Stata does well is the way it constructs new variables (see example below). How to do this in R? We can rewrite it as-is using for loops in R...

Read more »

Regional Variation in Law Enforcement Deaths – Part B

February 16, 2011
By
Regional Variation in Law Enforcement Deaths – Part B

I would like to thank Tal Galili for establishing and maintaining the blog aggregator at R-bloggers. This site has been added to their directory and new posts which are tagged with R will now appear on their feed. http://www.r-bloggers.com/ In part a, I presented a series of barplots which showed that the plurality of police

Read more »

Top 15 Daily Tweeters of #25bahman for the Past Five Days

February 16, 2011
By
Top 15 Daily Tweeters of #25bahman for the Past Five Days

My friend Michael Bommarito has been doing the data community quite a service, capturing and sharing all of the traffic on Twitter related to the Iranian protests. Specifically, he has all of the tweets containing the #25bahman hast-tag, and made them available for anyone to download. I am unable to resist the temptation to explore a

Read more »

Silver and Russell 2000

February 16, 2011
By
Silver and Russell 2000

When I find a chart that looks like this, I always like to explore a little further. via StockCharts.com I pull it into R and try to find anything worthwhile.  I do not find anything, except that I do not want to be trading both in the same direc...

Read more »

Summarize Missing Data for all Variables in a Data Frame in R

February 16, 2011
By

Something like this probably already exists in an R package somewhere out there, but I needed a function to summarize how much missing data I have in each variable of a data frame in R. Pass a data frame to this function and for each variable it'll give you the number of missing values, the total N, and the...

Read more »

Summarize Missing Data for all Variables in a Data Frame in R

February 16, 2011
By

Something like this probably already exists in an R package somewhere out there, but I needed a function to summarize how much missing data I have in each variable of a data frame in R. Pass a data frame to this function and for each variable it'll give you the number of missing values, the total N, and the...

Read more »

RHIPE: An Interface Between Hadoop and R for Large and Complex Data Analysis

February 16, 2011
By
RHIPE: An Interface Between Hadoop and R for Large and Complex Data Analysis

RHIPE: An Interface Between Hadoop and R Presented by Saptarshi Guha About the Video: I filmed the event using LectureMaker’s live event recording technique. One special feature I add to my R video recordings is the addition of my own R source code … Continue reading →

Read more »

RcppArmadillo 0.2.12

February 16, 2011
By

A new version 1.1.2 of Conrad Sanderson's Armadillo templated C++ library for linear algebra came out a couple of days ago. This has now been wrapped into a new version 0.2.12 of RcppArmadillo, our Rcpp-based integration into R. The short NEWS fil...

Read more »

Take the ggplot2 user survey

February 16, 2011
By

The author of the ggplot2 graphics package for R, Hadley Wickham, is looking for feedback from ggplot2 users. If you've used ggplot2, fill out his short survey at the link below. WuFoo: ggplot2 survey

Read more »

The Egyptian Revolution, in tweets

February 16, 2011
By
The Egyptian Revolution, in tweets

Twitter played a significant role in the recent uprising in Egypt, with protesters communicating via tweets marked with the #25bahman hastag (February 14 in the arabic calendar) to plan and rally for the demonstration. Michael Bommarito downloaded all such tweets and plotted their frequency over time using R's ggplot2 library: Not surprisingly, the activity peaked on February 14. The...

Read more »

Pre-processing text: R/tm vs. python/NLTK

February 16, 2011
By
Pre-processing text: R/tm vs. python/NLTK

  Let’s say that you want to take a set of documents and apply a computational linguistic technique.  If your method is based on the bag-of-words model, you probably need to pre-process these documents first by segmenting, tokenizing, stripping, stopwording, and … Continue reading →

Read more »

Twin Cities R User Group Meeting Tonight!

February 16, 2011
By

TCRUG will be having a meeting TONIGHT (2/16) at 5:30 PM. We will meet in ROOM 29 in Willey Hall. Willey Hall is located on the West Bank of the Minneapolis campus. See the Google map at http://goo.gl/tnRnU. Erik Iverson will be giving a talk ...

Read more »

Twin Cities R User Group Meeting Tonight!

February 16, 2011
By

TCRUG will be having a meeting TONIGHT (2/16) at 5:30 PM. We will meet in ROOM 29 in Willey Hall. Willey Hall is located on the West Bank of the Minneapolis campus. See the Google map at http://goo.gl/tnRnU. Erik Iverson will be giving a talk ...

Read more »

Mapping London’s Population Change 1801-2030

February 16, 2011
By
Mapping London’s Population Change 1801-2030

Buried in the London Datastore are the population estimates for each of the London Boroughs between 2001 – 2030. They predict a declining population for most boroughs with the exception of a few to the east. I was surprised by this general decline and also the numbers involved- I expected larger changes from one year to ...

Read more »

Regional Variation in Law Enforcement Deaths – Part A

February 15, 2011
By
Regional Variation in Law Enforcement Deaths – Part A

In recent months, there has been a series of high profile incidents in the United States where police officers were killed. While such events are unfortunate, the data suggests that it is extremely rare for an officer to be harmed or killed while on duty. In this post, I examine whether there are significant regional

Read more »

Mixed models – Part 2: lme lmer

February 15, 2011
By
Mixed models – Part 2: lme lmer

Getting more into mixed models, I’ve been playing around with both nlme::lme and lme4::lmer. http://tolstoy.newcastle.edu.au/R/e2/help/06/10/3345.html was quite a good post at explaining the differences, which from what I gather is largely performance based when using crossed or partially crossed models. In the models I am tinkering with at the moment I am noticing differences in

Read more »

Boxplots and Beyond III: Violin Plots

Boxplots and Beyond III: Violin Plots

This post is the third in a series of four on boxplots and closely related data visualization techniques for comparing subsets of a dataset, or comparing different datasets that we hope or expect to be similarly distributed.  The previous two post...

Read more »

ABC in London

February 15, 2011
By
ABC in London

After the very exciting and I think quite successful ABC in Paris meeting two years ago, Michael Stumpf from Imperial College London suggested a second edition in London along the same lines. Michael kindly associated me with the planning of this meeting. It is (logically) called ABC in London (or ABCiL) and will take place

Read more »

Statistical Graphics – Edward Tufte

February 15, 2011
By

The work of Edward Tufte is worth reading if you are interested in designing meaningful graphs and removing chart junk from your displays.

Read more »

Example 8.25: more latent class models (plus a graphical display)

February 15, 2011
By
Example 8.25: more latent class models (plus a graphical display)

In recent entries (here, here, here and here), we've been fitting a series of latent class models using SAS and R. One of the most commonly used and powerful package for latent class model estimation is Mplus. In this entry, we demonstrate how to use...

Read more »

Rcpp 0.9.1

February 15, 2011
By

A new release 0.9.1 of Rcpp went to CRAN and Debian yesterday. This version contains mostly bug-fixes and rather few enhancements. The changes are mostly 'internal fixes' and not user-facing; they mostly address some issues in memory management w...

Read more »

R 2.12.2 scheduled for February 25

February 15, 2011
By

The next release of R is scheduled for release February 25, and R 2.12.2 will likely be the final bug-fix release of the 2.12 series before R 2.13 is released in April. According to the NEWS file in the latest daily build, 2.12.2 will improve complex-arithmetic support on some rare platforms that don't support complex types in C99, and...

Read more »

Reaching 1000

February 14, 2011
By
Reaching 1000

This is the 1000th post on the ‘Og! Here are the entries that have had above 1000 views (not viewers) so far: In{s}a(ne)!! 5,353 “simply start over and build something better” 4,345 Julien on R shortcomings 1,966 Sudoku via simulated annealing 1,762 Of black swans and bleak prospects 1,462 Do we need an integrated Bayesian/likelihood

Read more »

Extracting all Crime Data for England and Wales using R and MYSQL

February 14, 2011
By
Extracting all Crime Data for England and Wales using R and MYSQL

Last week I started creating some data extraction code for the new England and Wales crime maps website using the R software / language. Although there is an API, a more efficient way of accessing all of the data (and without causing stress to their API server) is to download the CSV files located here

Read more »

R-commander installation in openSUSE

February 14, 2011
By

Thanks to this post I was able to install R-commander in openSUSE.  I've modified recipe a bit and don't want to search for it the next time.You have to perform several steps:Install R-base and R-base-devel packages from here. Install gfortran :~&...

Read more »

Modern Science and the Bayesian-Frequentist Controversy

February 14, 2011
By

The Bayesian-Frequentist debate reflects two different attitudes to the process of doing science, both quite legitimate. Bayesian statistics is well-suited to individual researchers, or a research group, trying to use all the information at its disposal to make the quickest possible progress. In pursuing progress, Bayesians tend to be aggressive and optimistic with their modeling

Read more »

Stack Exchange: Quantitative Finance in public beta

February 14, 2011
By
Stack Exchange: Quantitative Finance in public beta

The Quantitative Finance Stack Exchange community entered public beta last week.  To quote the FAQ: The Quantitative Finance Stack Exchange is intended specifically for professionals and traders working in investment banking, and aca...

Read more »

OkCupid: Finding your Valentine with R

February 14, 2011
By

Free dating site OkCupid (which was recently acquired by match.com) collects a lot of data. With over 3 million members, many of whom have provided extensive information about their personal details including preferences, lifestyle, sexuality and hobbies via their dating profiles, they have a wealth of information upon which to identify trends about the love lives of a typical...

Read more »