Blog Archives

The Best Statistical Programming Language is …Javascript?

April 27, 2012
By

R-Bloggers has recently been buzzing about Julia, the new kid on the statistical programming block. Julia, however, is hardly the sole contender for the market of R defectors, with Clojure-fork Incanter generating buzz as well. Even with these two making noise, I think there’s a huge point that everyone is missing, and it’s front-and-center on

Read more »

Using wordcloud on search terms & phrases

March 28, 2012
By
Using wordcloud on search terms & phrases

The wordcloud package for R is great, but all the examples I found used the tm package to process a large amount of textual data (web pages, text files, google docs, etc.)But what if you have normalized data where you have a word and its frequency? Or,...

Read more »

Musings on Correlation (or yet another reason I fear for those non-methodologically inclined students in my cohort)

August 12, 2011
By
Musings on Correlation (or yet another reason I fear for those non-methodologically inclined students in my cohort)

I’ve been thinking a lot about what it means for two variables to be correlated.  Scientists throw around the term like it’s uniformly understood, but I fear that an understanding of the concept is elusive to substantive researchers who aren’t interested in empirical methods, except as a means by which we can demonstrate that our

Read more »

Measuring the EIU Democracy Index (with Polity IV)

July 12, 2011
By
Measuring the EIU Democracy Index (with Polity IV)

Yet again, I have conjured up an (academically) unusual dataset on democracy! This time it’s the Economist Intelligence Unit’s Democracy Index, a weird little gem.  The dataset is the basis for a paper the Economist publishes every two years.  Because of this biannuality, there is data estimating the “Democratic-ness” of the world’s countries for 2006,

Read more »

More fun with the Failed States Index (and the State Fragility Index)

July 9, 2011
By
More fun with the Failed States Index (and the State Fragility Index)

So the other day’s experiment with the Failed States Index and the Polity Data didn’t yield the linear trend I had originally expected.  After all, the two measure fundamentally distinct things.  But perhaps there’s another dataset which will match linearly.  The same people who made polity also put out a dataset called the State Fragility

Read more »

Analyzing the Failed States Index (with Polity IV)

July 7, 2011
By
Analyzing the Failed States Index (with Polity IV)

So, I decided to sit down and have a little fun with that Failed States Index data I put together. To start, I expect that the dataset will be pretty linearly correlated with the polity IV data. This makes sense–true democracies aren’t failed states, and failed states tend not to be democratic. To test this,

Read more »

Using R for Stata to CSV Conversion

June 3, 2011
By

I recently found myself in the unpleasant situation of needing to read a Stata .dta file, but not having Stata readily available to me. Normally, I’d fire up a text editor and deconstruct the file, except Stata saves its data in a proprietary Binary format, meaning it garbles some of the content of the file.

Read more »

Statistical Analysis with R, a Review

February 12, 2011
By
Statistical Analysis with R, a Review

Long Version: I have a Bachelor’s degree in Computer Science.  I’m pretty handy when it comes to

Read more »

Help! My model fits too well!

October 22, 2010
By
Help! My model fits too well!

This is sort-of related to my sidelined study of graph algebra. I was thinking about data I could apply a first-order linear difference model to, and the stock market came to mind. After all, despite some black swan sized shocks, what better predicts a day’s closing than the previous day’s closing? So,

Read more »

Dynamic Modeling 3: When the first-order difference model doesn’t cut it

June 12, 2010
By
Dynamic Modeling 3: When the first-order difference model doesn’t cut it

Data must be selected carefully.  The predictive usefulness of the model is grossly diminished if outliers taint the available data.  Figure 1, for instance, shows the Defense spending (as a fraction of the national budget) between 1948 and 1968. Note how the trend curve (as defined by our linear difference model from the last post: see

Read more »