### Resampling Hierarchically Structured Data Recursively

April 4, 2012 |

That's a mouthful! I presented this topic to a group of Vandy statisticians a few days ago. My notes (essentially reproduced in this post) are recorded at the Dept. of Biostatistics wiki: HowToBootstrapCorrelatedData. The presentation covers some bootstrap strategies for hierarchically structured (correlated) data, but focuses on the multi-stage bootstrap; ... [Read more...]

### useR! 2012 Simple Abstract Helper

January 3, 2012 |

useR! 2012 has issued a call for abstracts! I've extended the WebSweave concept to offer a tool to create simple abstracts online, including those with markup, which may then be submitted at the conference website. Use the following link for the Simple Abstract Helper. [Read more...]

### Mortgage Refinance Calculator

December 20, 2011 |

Mortgage rates are low, considering historical rates for the last 50 years. It may be timely to consider a mortgage refinance. The image above links to a simple tool for exploring mortgage refinance, built using rapache and the yet-to-be-archived yarr package for R. Hence, there are now two mortgage-related calculators on ... [Read more...]

### New Powerball (lottery) Rules Will Cost You More

December 16, 2011 |

The popular news are reporting [1,2,3,4,5] that the Multi-State Lottery Commission (MUSL) will change the rules for their lottery game Powerball, effective Jan. 15, 2012. I sent an email to the MUSL (at 8:00am Dec, 14th) asking for the new official rules, but haven't received a response yet (as of 10:30am Dec, 16th). ... [Read more...]

### Why balloons are better than balls (in urn schemes)

November 18, 2011 |

The below is taken from a work in progress: The Polya urn is a heuristic associated with Dirichlet process mixtures. We present the scheme in a modified format, using balloons instead of balls, where the probability of drawing a balloon from the urn is proportional to its volume. Balloons are ... [Read more...]

### Bayesian vs. Frequentist Intervals: Which are more natural to scientists?

November 17, 2011 |

I don't know, of course, because the evidence at hand is based on my experience. But, I'll leave the reader to consider whether these observations generalize. Proponents of Bayesian statistical inference argue that Bayesian credible intervals are more intuitive than the frequentist confidence intervals, because the Bayesian inference is a ... [Read more...]

### Parameter vs. Observation Dimension?

October 24, 2011 |

Bill Bolstad's response to Xi'an's review of his book Understanding Computational Bayesian Statistics included the following comment, which I found interesting: Frequentist p-values are constructed in the parameter dimension using a probability distribution defined only in the observation dimension. Bayesian credible intervals are constructed in the parameter dimension using a ... [Read more...]

### Another Mystery: sas7bdat != sd2

October 14, 2011 |

I received an email from a very inconvenienced statistician a few weeks ago. The problem was an old data file with the extension .sd2. Apparently, this is an obsolete data storage format used by past versions of SAS. A quick glance at the file contents revealed that this sd2 formatted ... [Read more...]

### A Note on Antoniak’s Approximation for Dirichlet Processes

September 21, 2011 |

Antoniak's 1974 article titled Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems (Annals of Statistics 2(6):1152-1174) is a fundamental work for most modern developments in this area. The article gives two expressions for the expected number of distinct values in a sample of size n, drawn from a Dirichlet ... [Read more...]

### More sas7bdat progress

September 13, 2011 |

The development version of the read.sas7bdat function (in the sas7bdat package) now reads field labels and formats. In addition, errors of the type "found subheaders where 1 expected" are now a thing of the past. These improvements are largely due to work by Clint Cummins. The function also ... [Read more...]

### The Open Governance Index: Results for The R Project

August 24, 2011 |

Just over two weeks ago, I invited readers to complete the Open Governance Index (OGI) Questionnaire regarding The R Project. The OGI evaluates several facets of governance in open source projects (OGI publication). The OGI questionnaire is reproduced below, and each question is linked from the table of useR responses. ... [Read more...]

### tty Connection + sas7bdat: useR! 2011 Presentation Slides

August 21, 2011 |

Experimenting with a tty Connection for R I presented twice at this years useR!. The first was a regular talk on the tty connection patch for R. The talk went smoothly, despite a live demonstration using the DLP-232PC data acquisition module (datasheet). The slides for this presentation are here: ... [Read more...]

### The Open Governing Index: How open is the R project?

August 8, 2011 |

The Open Governing Index is a new measure developed by VisionMobile, that rates open-source projects regarding their governance process. The index has four facets, described thoroughly in the "Open Governance Index" publication, and briefly below. access - These criteria assess the availability of source code, a permissive license, developer support ... [Read more...]

### Outlier Detection with DPM Slides from JSM 2011

August 5, 2011 |

Here are the 14 slides I used during my talk at the Joint Statistical Meetings 2011: shotwell-jsm-2011.pdf. I'm trying hard to minimize the text in my presentation slides. But, this usually requires that I practice more. Hence, you will know which talks I have practiced thoroughly by the amount of text ... [Read more...]

July 25, 2011 |

### Prepping for useR! 2011 – tty connection update

July 22, 2011 |

I'm putting together my presentation for useR! 2011 titled "Experimenting with a tty connection for R". Hence, I've updated the tty connection patch to work with R versions 2.13.0 and 2.13.1. And, instead of re-listing the patch files and re-writing instructions on their application, I've devoted a small portion of my Code page ... [Read more...]

### Slides for Reproducible Research Talk at Interface 2011

July 20, 2011 |

I gave a talk at the Interface Symposium on reproducible research in practice. I went first in the session, so the slides have a bit more background and philosophy. It was a great session; one of Jon Claerbout's colleagues spoke, Sergey Fomel, a founding author of Madagascar; Sorin Mitran from ... [Read more...]

June 14, 2011 |

An earlier post (1216) introduced a compatibility study (i.e. reverse engineering) of the sas7bdat database file format. The code and documentation for this are here: http://github.com/biostatmatt/sas7bdat. I've recently restructured the code as an R package, and added some functionality. Look for the sas7bdat ... [Read more...]

### David Banks on Reproducible Research

June 8, 2011 |

Just got an email linking to Reproducible Research: A Range of Response, in the new journal Statistics, Politics, and Policy 2(1) by David Banks, who is also the journal's editor. Interestingly, the commentary doesn't mention the journal's policy (if one exists) on the reproducibility of research submitted there. Banks' writing is ... [Read more...]

### Sweave diagram, following Knuth’s original

June 2, 2011 |

In preparation for a talk, I updated Knuth's original diagram in Donald E. Knuth. Literate programming. The Computer Journal, 27(2):97–111, May 1984. The new diagram is Sweave specific. Click the Sweave diagram for a PDF version, or right-click and select 'save image as' for the PNG version. Permission is granted for any ... [Read more...]
