Blog Archives

Possibly slightly better text analysis with lme4

December 16, 2012
By
Possibly slightly better text analysis with lme4

lme4 and its cousin arm are extremely useful for a huge variety of modeling applications (see Gelman and Hill’s book), but today we’re going to do something a little frivolous with them. Namely, we’re going to extend our Denver Debat...

Read more »

Text analysis made too easy with the tm package

December 15, 2012
By
Text analysis made too easy with the tm package

Today’s Gist takes the CNN transcript of the Denver Presidential Debate, converts paragraphs into a document-term matrix, and does the absolute most basic form of text analysis: a raw word count. There are actually quite a few steps in this proc...

Read more »

Everything is a Network, featuring the sna package

December 14, 2012
By
Everything is a Network, featuring the sna package

We’ve gotten some requests, through the Ask us anything page, to do some plotting of networks. We may come back to this later, but today’s Gist shows how you can plot pretty much literally anything as a network. First, we go back to our

Read more »

Fuzzy clustering with fanny()

December 13, 2012
By
Fuzzy clustering with fanny()

This is kind of a fun example, and you might find the fuzzy clustering technique useful, as I have, for exploratory data analysis. In this Gist, I use the unparalleled breakfast dataset from the smacof package, derive dissimilarities from breakfast it...

Read more »

Multidimensional metric unfolding with SMACOF

December 12, 2012
By
Multidimensional metric unfolding with SMACOF

SMACOF stands for “Scaling by MAjorizing a COmplicated Function,” and it is a multidimensional scaling algorithm for metric unfolding of, among other things, rectangular ratings matrices. One neat Political Science application of MDS is i...

Read more »

US State Maps using map_data()

December 11, 2012
By
US State Maps using map_data()

Today’s short post will show how to make a simple map using map_data(). Let’s assume you have data in a CSV file that may look like this:Notice the lower case state names; they will make merging the data much easier. The variable of inte...

Read more »

"Economics-style" graphs with bezier() from Hmisc

December 10, 2012
By
"Economics-style" graphs with bezier() from Hmisc

So, I really think this one is pretty cool. We spend much of our time in R making graphs with data, but what if you have a theory that you’d like to express graphically? Something like what I’ll call “economics-style” graphs, i...

Read more »

Handling missing data with Amelia

December 9, 2012
By
Handling missing data with Amelia

So, what if you have data, but some of the observations are missing? Many statistical techniques assume no missingness, so we might want to “fill in” or rectangularize our data, by replacing missing observations with plausible substitutes....

Read more »

Evaluating term popularity with twitteR

December 8, 2012
By
Evaluating term popularity with twitteR

I really wanted to put something together for this series on the twitteR package. Unfortunately, at the moment the number of interesting things than can be done with twitteR, as opposed to through API calls and RCurl, is limited. Regardless, I have Ye...

Read more »

Dot-density maps with spsample()

December 7, 2012
By
Dot-density maps with spsample()

Today’s example is a little odd, in that the code isn’t pretty and the example isn’t really something you’d actually produce in real life — but if you’ll overlook those oddities, you’ll find that the spsample(...

Read more »