Permutation tests in R

May 21, 2012
By
Permutation tests in R

Permuation tests (also called randomization or re-randomization tests) have been around for a long time, but it took the advent of high-speed computers to make them practically available. They can be particularly useful when your data are sampled from unkown … Continue reading →

Read more »

RcppArmadillo 0.3.2.0

A new stable release 3.2.0 of Armadillo is now available. As usual, we have wrapped this into a new RcppArmadillo package, now at 0.3.0.2; and this version is now available via CRAN. The short NEWS entry follows below. For those interested in follo...

Read more »

Project Euler — problem 2

May 21, 2012
By

Almost my time for bed. Just write a quick solution on the second problem of Project Euler. Here it is. Each new term in the Fibonacci sequence is generated by adding the previous two terms. By starting with 1 and 2, … Continue reading →

Read more »

The Simple Gibbs example in Julia

May 21, 2012
By

The Gibbs sampler discussed on Darren Wilkinson's blog and also on Dirk Eddelbuettel's blog has been implemented in several languages, the first of which was R. In preparation for a session at useR!2012 on "What other languages should R user...

Read more »

Example 9.32: Multiple testing simulation

May 21, 2012
By
Example 9.32: Multiple testing simulation

In examples 9.30 and 9.31 we explored corrections for multiple testing and then extracting p-values adjusted by the Benjamini and Hochberg (or FDR) procedure. In this post we'll develop a simulation to explore the impact of "strong" and "weak" control of the family-wise error rate offered in multiple comparison corrections. Loosely put, weak control procedures...

Read more »

National Registry of Exonerations charts with R

May 21, 2012
By
National Registry of Exonerations charts with R

According to recent news (dallasnews.com) there is a new release of a public national database for wrongful convictions.  There are plenty of details in the public list including Age, Race, and how the conviction was overturned.  According to...

Read more »

Visualizing the #nonato Twitter hashtag – time series and top users

May 21, 2012
By
Visualizing the #nonato Twitter hashtag – time series and top users

  The NATO summit is currently being held in Chicago, and, as is typical for NATO or G# summits, the streets and tweets are full of dissent.  In the spirit of my past investigations of online dissent (#jan25, #25bahman, #12fev,… Read more ›

Read more »

Charting Twitter time series data with tweet and unique user counts

May 21, 2012
By

Let’s say you’ve used my Python script to automate the download of a hashtag or search phrase from Twitter (in a Unicode safe way, unlike within R).  Now let’s say you want to visualize the number of tweets over time.  Easy… Read more ›

Read more »

Aspirational & Useful: deck.rb with RStudio/knitr & Go2Shell

May 21, 2012
By
Aspirational & Useful: deck.rb with RStudio/knitr & Go2Shell

There has been some interest in the recent release of RStudio 0.96 and especially the ability to use combine its knitr Markdown functionality with Pandoc to integrate R and a variety of different documents types. I just wanted to add two quick things ...

Read more »

R-NOLD 2012-05-21 04:46:00

May 21, 2012
By
R-NOLD 2012-05-21 04:46:00

Mapping Philippines earthquake data from January 2011 to January 2012 collected by PHIVOLCS using R ggplot package.I tried to recreate the earthquake map of the Philippines created using maptool and R plot function using ggplot2. Earthquake map ...

Read more »

Time-Series Policy Evaluation in R

May 21, 2012
By
Time-Series Policy Evaluation in R

Quantifying the success of government policies is clearly important. Randomized control trials, like those conducted by drug companies, are often described as the ‘gold-standard’ for policy evaluation. Under these, a policy is implemented in/to one area/group (treatment), but not in/to another (control). The difference in outcomes between the two areas or groups represents the effectiveness

Read more »

CambR and other upcoming events

May 21, 2012
By
CambR and other upcoming events

New events CambR (Cambridge UK R user group) 2012 May 29 6:30 PM for 7:00 PM start. Pat Burns “Inferno-ish R” Abstract: While R is wonderful, it is not uniformly wonderful. We highlight a few things generally found to be confusing, and outline the forces that have driven such imperfections. Markus Gesmann “Interactive charts with … Continue reading...

Read more »

Another look at over-representation analysis interpretation

May 21, 2012
By
Another look at over-representation analysis interpretation

Interpreting a list of differentially regulated genes can take many forms. One of the most widely used method is looking for enrichment of functional group of genes compared to a random sampling of gene from the same universe, namely an over-representation analysis (ORA).The point I want to explore today is what is the best way to interpret the results...

Read more »

A Monty Hall Monte Carlo, Part 1? (Oh God)

May 20, 2012
By
A Monty Hall Monte Carlo, Part 1? (Oh God)

While I dig into conjugacy and the calculation of Bayesian credibility intervals, I figured it’d be good to put some of my other little rabbit holes up here on the off chance they’re interesting to someone. For some reason I...

Read more »

CFP: the 10th Australasian Data Mining Conference (AusDM 2012)

May 20, 2012
By
CFP: the 10th Australasian Data Mining Conference (AusDM 2012)

The Tenth Australasian Data Mining Conference (AusDM 2012) Sydney, Australia 5-7 December 2012 http://ausdm12.togaware.com/ Data mining, the art and science of intelligent analysis of (usually large) data sets for meaningful (and previously unknown) insights, is now being actively applied in … Continue reading →

Read more »

Alternate way of plotting means and errors

May 20, 2012
By
Alternate way of plotting means and errors

Last month, I wrote a post discussing dynamite plots, noting that they're not considered to be especially good at presenting information. I got a little bit of flak for it, from people for and against dynamite plots. This post shows a different method of showing a point and an error bar. If you're going to do it, why...

Read more »

Bayes on drugs (guest post)

May 20, 2012
By
Bayes on drugs (guest post)

This post is written by Julien Cornebise. Last week in Aachen was the 3rd Edition of the Bayes(Pharma) workshop. Its specificity: half-and-half industry/academic participants and speakers, all in Pharmaceutical statistics, with a great care to welcome newcomers to Bayes, so as to spread as much as possible the love where it will actually be used.

Read more »

End User Computing and why R can help meeting Solvency II

May 20, 2012
By
End User Computing and why R can help meeting Solvency II

John D. Cook gave a great talk about 'Why and how people use R'. The talk resonated with me and highlighted why R is such a great tool for end user computing. A topic which has become increasingly important in the European insurance industry.John's mai...

Read more »

Cleveland Indians’ Attendance

May 20, 2012
By
Cleveland Indians’ Attendance

Recently, Chris Perez, the closer for the Indians, displayed some frustration with the fans for not supporting the team. Currently, they have the lowest attendance in the majors -- by a decent margin. The Indians are averaging about 15,000 fans per hom...

Read more »

Better R support in pygments by monkey patching SLexer

May 20, 2012
By
Better R support in pygments by monkey patching SLexer

I started using knitr with reStructuredText today and I found that the syntax highlighting with pygments (used by rst2html.py) was not as nice as the output of pandoc. So I ended up doing some monkeypatching. Try adding the following to rst2html.py: Before: After: Note: I assume you already added pygments’ rst-directive.py to rst2html.py.

Read more »

Another cut at market randomness

May 20, 2012
By
Another cut at market randomness

I have some background in computer security and one day found myself tasked with assessing the quality of randomness for session id tokens generated by popular web frameworks (namely Java and .NET). As it turns out, NIST have developed a series of tests for just this purpose detailed here.As a non-believer in the absolute randomness of markets, I thought...

Read more »

My two favorite IDE’s for R – tips & tricks

May 20, 2012
By
My two favorite IDE’s for R – tips & tricks

The two IDE that I use for R are RStudio and Eclipse with StatET. They complement each other nicely, RStudio works out of the box while I previously shown how to get Eclipse & StatET going, you can find it here, which is slightly challanging. RStudio I use RStudio for all my statistics where I don't want to create...

Read more »

Births and week-ends, in France

May 19, 2012
By
Births and week-ends, in France

This week, I have seen on the internet (sorry, I cannot find proper references) the graph produced here on the right: which birthday is most likely ? The fact that I have no further information is important, since I do not know in which country suc...

Read more »

Interestingness comparisons

Interestingness comparisons

In three previous posts (April 3, 2011,  April 12, 2011,and May 21, 2011), I have discussed interestingness measures, which characterize the distributional heterogeneity of categorical variables.  Four specific measures are discussed in Chapter 3 of Exploring Data in Engineering, the Sciences and Medicine: the Bray measure, the Gini measure, the Shannon measure, and the Simpson measure.  All four of...

Read more »

Map of divorce in Mexico

May 19, 2012
By
Map of divorce in Mexico

Keeping with this week's divorce theme, here's a map of the Mexican states where marriages are most likely to end in divorce. Perhaps not surprisingly, there seems to be an inverse correlation with the state percentage of the population that is catho...

Read more »

Average Annual Population Growth Rate of Tawi-Tawi

May 19, 2012
By
Average Annual Population Growth Rate of Tawi-Tawi

R Codeslibrary(ggplot2) TawiTawiGrowthRate <- as.numeric(c(2.6, 3.3, 5.9, 1.6, 1.8, 5.5, 5))CensalYear <- c("1948-1960","1960-1970","1970-1980","1980-1990","1990-1995","1995-2000","2000-2007") qplot(CensalYear, TawiTawiGrowthRate, xlab ...

Read more »

[R-bloggers]RcmdrPlugin.KMggplot2_0.1-0 is on CRAN now

May 18, 2012
By
[R-bloggers]RcmdrPlugin.KMggplot2_0.1-0 is on CRAN now

I posted a new version of the ”RcmdrPlugin.KMggplot2” package, which is an Rcmdr plug-in for a ”ggplot2” GUI front-end. This package assists you to make ”ggplot2” graphics. RcmdrPlugin.KMggplot2 (CRAN) NEWS Changes in version 0.1-0 (2012-05-18) Restructu

Read more »

My experiences with Rcpp

May 18, 2012
By
My experiences with Rcpp

The last seven days till Tuesday I have been working on the conversion of the code of my master thesis from scripted R (statistics) to compiled C++ using the Rcpp package from Dirk Eddelbuettel. Despite the initial effort necessary to … Continue reading →

Read more »

R is to SAS as Java is to COBOL

May 18, 2012
By

An interview with Revolution Analytics CEO Dave Rich was published this week by BeyeNetwork. During the interview, Dace was asked about how the statistical modeling platforms have changed over the decades: People have been doing statistical modeling and predictive analytics for 50 years now, SAS and SPSS have been around since the early ‘70s. What’s different now -- what’s...

Read more »