## Some Code for Dumping Data from Twitter Gardenhose

March 30, 2010
Gardenhose is a Streaming API feed that continuously sends a sample (roughly 15% according to Ryan Sarver at the 140tc in September 2009) of all tweets to feed recipients. This is some code for dumping the tweets to files named by date and hour. It is in PHP which is not my favorite language, but works nonetheless. I received...

## TTR_0.20-2 on CRAN

March 30, 2010
An updated version of TTR is now on CRAN. It fixes a couple bugs and includes a couple handy tweaks. Here's the full contents of the CHANGES file:TTR version 0.20-2 Changes from version 0.20-1NEW FEATURES:Added VWAP and VWMA (thanks to Brian Peterson...

## Scientists misusing Statistics

March 30, 2010
In ScienceNews this month, there's controversial article exposing the fact that results claimed to be "statistically significant" in scientific articles aren't always what they're cracked up to be. The article -- titled "Odds Are, It's Wrong" is interesting, but I take a bit of an issue with the sub-headline, "Science fails to face the shortcomings of Statistics". As it...

## Example 7.30: Simulate censored survival data

March 30, 2010
To simulate survival data with censoring, we need to model the hazard functions for both time to event and time to censoring. We simulate both event times from a Weibull distribution with a scale parameter of 1 (this is equivalent to an exponential ra...

## Smoothing time series with R

March 29, 2010
Smoothing is a statistical technique that helps you to spot trends in noisy data, and especially to compare trends between two or more fluctuating time series. It's a useful visualization tool that I'm pleased to see cropping up more and more in statistical graphics on the Web -- it's now a staple in econometric charts and is heavily used...

## Looking for Software Paths in Windows Registry

March 28, 2010
hen we want to call external programs in R under Windows, we often need to know the paths of these programs. For instance, we may want to know where ImageMagick is installed, as we need the convert (convert.exe) utility to convert images to other formats, or where OpenBUGS is installed because we need this path

## Example 7.29: Bubble plots colored by a fourth variable

March 27, 2010
In Example 7.28, we generated a bubble plot showing the relationship among CESD, age, and number of drinks, for women. An anonymous commenter asked whether it would be possible to color the circles according to gender. In the comments, we showed simp...

## Finance::YahooQuote 0.24

March 26, 2010
Having espoused rule number one in regression testing in the post about yesterday's bug fix upload 0.23, we can now add rule number zero: Do not introduce a new error by omitting the trailing semicolon. I guess it shows that I don't really program in...

## Rcpp 0.7.11

March 26, 2010
By

A new versions 0.7.11 of Rcpp is awaiting inclusion into CRAN and Debian. It is also available from here. This version fixes a somewhat serious bug uncovered by Doug Bates when working with vectors of strings. We also added a few new accessor function...

## ‘R’ = dna.translate("AGG") . A custom C function for R, My notebook.

March 26, 2010
By

In the following post, I will show how I've implemented a custom C function for R. This C function will translate a DNA to a protein. I'm very new to 'R' so feel free to make any comment about the code.C codeThe data in 'R' are stored in an opaque stru...

## Code Highlights in WordPress

March 26, 2010
I’ve come across a very useful plugin for WordPress which highlights code in posts using GeSHi called WP-Syntax. This plugin is easy to use and adds highlights simply by putting the appropriate tags around code blocks. For instance, we can make the following R code much more readable by using WP-Syntax. ## Generate 100

## Predicting Pizza

March 26, 2010
What's the secret to the best pizza in New York? That's what statistical consultant and R user Jared Lander sought to find out, by analyzing the rankings of NY pizza joints at MenuPages.com, and building a regression model for ratings based on variables like localion, price, number of reviews, and pizza-oven type (gas, coal or wood)? Here's a scatterplot...

## Summarising data using dot plots

March 26, 2010
A dot plot is a type of display that compares counts, frequencies, totals or other summary measures for a series of categories. The dot plot can be arranged with the categories either on the vertical or horizontal axis of the display to allow comparising between the different categories as well as comparison within categories where

## BioMart (and biomaRt)

March 26, 2010
I’ve been vaguely aware of BioMart for a few years. Inexplicably, I’ve only recently started to use it. It’s one of the most useful applications I’ve ever used. The concept is simple. You have a set of identifiers that describe a biological object, such as a gene. These are called filters. They have values –

## Finance::YahooQuote 0.23

March 25, 2010
Rule number one in regression testing is to not depend on volatile data. Which I seem to have violated in file t/02simple.t in the Perl package Finance::YahooQuote. Which lead the automated Perl test scripts to remind me for a few days now that the f...

## How Misinformed are Tea Party Protesters About Tax Policy?

March 25, 2010
$How Misinformed are Tea Party Protesters About Tax Policy?$

For those of you used to reading about international relations, I apologize for the following brief foray into American politics. It appears that the American Enterprise Institute and David Frum have decided to (abruptly) part ways. Before David left, however, he and his team of interns provided some interesting statistical insight into the

## R plotting fun

March 25, 2010
Not easy to produce cool looking graphs in R, but it can be done. The results of some messing around are above. Here is the code I used:

## Future of Open Source Survey – Results

March 25, 2010
The results of the 2010 Future of Open Source survey were presented at last week's Open Source Business Conference in San Francisco, and here are they are in slide format: While I was at the presentation I captured a few additional tidbits from the presentation that weren't in the slides. The continued growth of open-source generally was a prevalent...

## A von Mises variate…

March 25, 2010
Inspired from a mail that came along the previous random generation post the following question rised : How to draw random variates from the Von Mises distribution? First of all let’s check the pdf of the probability rule, it is , for . Ok, I admit that Bessels functions can be a bit frightening, but

## Create odf, pdf and html report from a single Sweave document

March 25, 2010
A lot of us know about Sweave and Latex and they work very well in creating elegant dynamic reports from R computation. However, sometimes we would like to also produce a word processing document for a colleague or a html version of the same report. Now there are tools for producing these like odfWeave. But

## NetLogo & R Extension

March 25, 2010
I'm really a heavy user of R and was so much before starting to do any agent based models. So the first thing I was looking in any software package for ABM was some automated link to R (much like spgrass6 for GRASS and R for GIS). I thought  Repast Simphony was the way to go, since...

## Matlab and R (getting started)

March 24, 2010
Matlab and R are two popular languages for data analysis and visualization. The similarity between the two languages is high. Both are interpreted languages that run in a shell-like environment (while also allowing to run scripts or functions written off-line). Both tend to be slow if your code contains many loops but are fast when