MLB Baseball Pitching Matchups ~ grabbing pitcher and/or batter codes by specify game date using R XML

June 1, 2010
By
MLB Baseball Pitching Matchups ~ grabbing pitcher and/or batter codes by specify game date using R XML

MLB Gameday stores its game data in XML format, with the players denoted in ID numbers. To find out who is who, the codes are stored in pitchers.xml or batters.xml of each game. My DownloadPitchFX.R script can download the ID numbers, but it doesn’t look to see who the ID is because of the extra

Read more »

Data mining with R

June 1, 2010
By
Data mining with R

Good long weekend... Ready to get back into R and data mining...After a personal appointment this morning, I'm now reading an early draft of Data Mining with R: Learning with Case Studies (Chapman & Hall/CRC Data Mining and Knowledge Discovery Seri...

Read more »

Recent picture of my niece Lily

June 1, 2010
By
Recent picture of my niece Lily

Read more »

Access attribute_hidden Functions in R Packages

June 1, 2010
By
Access attribute_hidden Functions in R Packages

Maybe the title should have been prepended with “Don’t…” The source code of R is littered with “attribute_hidden” declarations. These declarations attempt to ensure that the variable or function may only be accessed by code in the core R distribution, and not by R extension packages. Generally there is a good reason for this. For

Read more »

Vanilla Rao-Blackwellisation [re]revised

May 31, 2010
By
Vanilla Rao-Blackwellisation [re]revised

Although the revision is quite minor, it took us two months to complete from the time I received the news in the Atlanta airport lounge… The vanilla Rao-Blackwellisation paper with Randal Douc has thus been resubmitted to the Annals of Statistics. And rearXived. The only significant change is the inclusion of two tables detailing computing

Read more »

MLB Baseball Pitching Matchups ~ manipulating pitch f/x data using the RMySQL package in R

May 31, 2010
By
MLB Baseball Pitching Matchups ~ manipulating pitch f/x data using the RMySQL package in R

After downloading some pitch f/x data using my R script, we can finally have some fun. But because the pitch f/x data is very elaborate, R can easily get overwhelmed by copying the dataset back and forth in memory, as you manipulate the data. So the natural progression is to use relational database systems. Here,

Read more »

R 2.11.1 released

May 31, 2010
By

It's official: R 2.11.1 is out. Source code and binaries for Windows and MacOS are available at the master CRAN mirror, and will be available for download from your local mirror soon. As anticipated, this is an update release focussing mainly on bugfixes and with just one new feature. According to R core team member Peter Dalgaard one fix...

Read more »

highlight 0.2-0

May 31, 2010
By

I've released version 0.2-0 of highlight to CRAN This version brings some more additions to the sweave driver that uses highlight to produce nice looking vignettes with color coded R chunks The driver gains new arguments boxes, bg and border to c...

Read more »

Bike The Drive 2010

Memorial Day weekend is also time for the annual Bike The Drive in Chicago. This time only half the household got up bright and early and enjoyed Lakeshore Drive free of cars. A highly recommended event.

Read more »

Betting on Pi

May 31, 2010
By
Betting on Pi

I was reading over at math-blog.com about a concept called numeri ritardatari. This sounds a lot like “retarded numbers” in Italian, but apparently “retarded” here is used in the sense of “late” or “behind” and not in the short bus sense. I barely scanned the page, but I think I got the gist of it:

Read more »

A data visualization manifesto

May 31, 2010
By

Details matter (at least, they do for me), but we don't yet have a systematic way of going back and forth between the structure of a graph, its details, and the underlying questions that motivate our visualizations. (Cleveland, Wilkinson, and....

Read more »

JPM Chase Corporate Challenge 2010

It's Memorial Day weekend so it was time for the Chicago's JP Morgan Chase Corporate Challenge on Thursday. The weather was glorious, the usual 20-some thousand runners participated and a good time was had. Work had arranged for a nice tent, food, mu...

Read more »

Example 7.39: Nelson-Aalen estimate of cumulative hazard

May 31, 2010
By
Example 7.39: Nelson-Aalen estimate of cumulative hazard

In our previous example, we demonstrated how to calculate the Kaplan-Meier estimate of the survival function for time to event data. A related quantity is the Nelson-Aalen estimate of cumulative hazard. In addition to summarizing the hazard incurred ...

Read more »

Simulating a Queue in R

May 30, 2010
By
Simulating a Queue in R

In the GCaP class earlier this month, we talked about the meaning of the load average (in Unix and Linux) and simulating a grocery store checkout lane, but I didn't actually do it. So, I decided to take a shot at constructing a discrete-event simulatio...

Read more »

Talk at CRiSM

May 30, 2010
By
Talk at CRiSM

This is the talk I am giving at the workshop on model uncertainty organised by the Centre for Research in Statistical Methodology (CRiSM) at the University of Warwick, on May 30-June 1. Careful readers will notice there is not much difference with my previous talk on the topic, as I only included the Savage-Dickey slides

Read more »

Dynamic Modeling 2: Our First Substantive Model

May 30, 2010
By
Dynamic Modeling 2: Our First Substantive Model

(This is the second of a series of ongoing posts on using Graph Algebra in the Social Sciences.) First-order linear difference equations are powerful, yet simple modeling tools.  They can provide access to useful substantive insights to real-world phenomena.  They can have powerful predictive ability when used appropriately.  Additionally, they may be classified in any number

Read more »

Notice that even though output is in a log scale, output is…

May 29, 2010
By
Notice that even though output is in a log scale, output is…

Notice that even though output is in a log scale, output is shooting up in an exponential way. DATA from Brad DeLong

Read more »

Source Code Files in R

May 29, 2010
By
Source Code Files in R

R's interactive programming style is similar to what I have seen in other environments (e.g. ruby's irb and Oracle's SQL*Plus, etc). There are a few commands that you need to be aware of to get up and running with developing R programs.To identify yo...

Read more »

Weekend art in R (part 1?)

May 29, 2010
By
Weekend art in R (part 1?)

As usual click on the image for a full-size version. Code: par(bg="black") par(mar=c(0,0,0,0)) plot(c(0,1),c(0,1),col="white",pch=".",xlim=c(0,1),ylim=c(0,1)) iters = 500 for(i in 1:iters) { center = runif(2) size = rbeta(2,1,50)   # Let's create random HTML-style colors color = sample(c(0:9,"A","B","C","D","E","F"),12,replace=T) fill = paste("#", paste(color[1:6],collapse=""),sep="") brdr = paste("#", paste(color[7:12],collapse=""),sep="")   rect(center[1]-size[1], center[2]-size[2], center[1]+size[1], center[2]+size[2], col=fill, border=brdr, density=NA, lwd=1.5) }

Read more »

highlight 0.1-9

May 29, 2010
By

The version 0.1-8 of highlight introduced a small bug in the latex renderer. This is now fixed in version 0.1-9 and the latex renderer also gains an argument "minipage" which wraps the latex code in a minipage environment. I've used this to make...

Read more »

Syncing files across computers using DropBox

May 29, 2010
By
Syncing files across computers using DropBox

Motivation In the past few months I have been using DropBox for syncing my work files between my home and work computer. It has saved me from numerous mistakes and from sending the files to myself via e-mail. Recently I found this service highly useful for sharing files with 4 other people with whom I am working on a...

Read more »

An XML Representation of the Keys to Soil Taxonomy?

May 28, 2010
By
An XML Representation of the Keys to Soil Taxonomy?

Western Fresno Soil Hierarchy: partial view of the hierarchy within the US Soil Taxonomic system Maybe this is just craziness, but wouldn't be neat to have an XML formatted version of the Keys to Soil Taxonomy? The format might look something like the ...

Read more »

R: More plotting fun with Poission

May 28, 2010
By
R: More plotting fun with Poission

Coded as follows: x = seq(.001,50,.001) par(bg="black") par(mar=c(0,0,0,0)) plot(x,sin(1/x)*rpois(length(x),x),pch=20,col="blue")

Read more »

Tuesday’s child is full of probability puzzles

May 28, 2010
By
Tuesday’s child is full of probability puzzles

COUNTERINTUITIVE PROBLEM, INTUITIVE REPRESENTATION Blog posts about counterintuitive probability problems generate lots of opinions with a high probability. Andrew Gelman and readers have been having a lot of fun with the following probability problem: I have two children. One is a boy born on a Tuesday. What is the probability I have two boys? The

Read more »

Dynamic Modeling 1: Linear Difference Equations

May 28, 2010
By
Dynamic Modeling 1: Linear Difference Equations

(This is the first in a series on the use of Graph Algebraic models for Social Science.) Linear Difference models are a hugely important first step in learning Graph Algebraic modeling.  That said, linear difference equations are a completely independent thing from Graph Algebra.  I’ll get into the Graph algebra stuff in the next post or

Read more »

Must Have Software

May 28, 2010
By

Having worked with Unix (BSD, HPUX, IRIX, Linux and OSX), Windows (NT4, 2000, XP, Vista and 7) for quite a while I have seen a lot of different software tools. I would like to quickly exhibit my “must have” list. These are the packages that I find to be the single “must have offerings” in Related posts:

Read more »

Creating surface plots

May 28, 2010
By
Creating surface plots

A 3d wireframe plot is a type of graph that is used to display a surface – geographic data is an example of where this type of graph would be used or it could be used to display a fitted model with more than one explanatory variable. These plots are related to contour plots which

Read more »

Because it’s Friday: The dating equation

May 28, 2010
By
Because it’s Friday: The dating equation

According to internet lore, there's a mathematical equation that governs the lower bound for the socially acceptable age of a potential dating partner: half your age plus 7, or, in mathematical terms, if x is your age then the lower bound is f(x) = x/2 + 7. Seems simple, right? if you're 20, then the minimum socially acceptable age...

Read more »