Crayfish or crawdad? Mapping US dialect variations with R

June 7, 2013
By

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

I grew up in Australia, where I learned to speak English. Or so I thought: when I moved overseas to the UK, and especially when I moved to the States, I soon learned these are distinct cultures separated by a common language. Words which I previously had no context for being different anywhere else, such as "runners" ("sneakers"), "lemonade" (Sprite — is there even a generic name for this?) and "rubber" (eraser), were met with blank stares, confusion or even guffaws. 

Having been in the States for over a decade now, I thought I'd gotten most of these variations figured out. But now, thanks to this worldwide survey of English dialects and an analysis of the US data by Joshua Katz of NC State University, I realise (sorry — realize) that I haven't even scratched the surface. For example, what do you call a small freshwater crustacean in the States?

Crawfish

As you can see, there's no one answer: depending on where you live it could be "crawfish", "crayfish" or "crawdad". (In Australia, they're called "yabbies".) The regional variations can be quite pronounced: for example, a long sandwich is a "sub" almost everywhere except Pennsylvania and around New Orleans. (In the charts below, blue means few people use the indicated term; red means almost everyone does.)

Sub-hoagie
Getting even more specific, for most people in the USA, "The City" refers to New York City. But I can attest that in San Francisco (and apparently also Boston and Chicago), it means something else:

What is the CityJoshua used the R language to create these maps from the survey data. To smooth out the individual responses located around the lower 48 states, he used k-nearest neighbor kernel smoothing to color the maps according to the top 3 responses and "other" (all other responses). (Hawaii and Alaska weren't included to keep things simple.)  You can browse through 122 different dialect variations in this interactive application (made with the "Shiny" package from RStudio). Be sure to click on the "Individual" tab to see the geographic distribution of specific terms (as in "sub" vs "hoagie" above). It's certainly expanded my vovabulary, and I hope to one day learn American as well as English.

Shiny: Dialect Survey Results (via Business Insider)

To leave a comment for the author, please follow the link and comment on his blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Mango solutions

Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training

RStudio homepage

REvolution analytics

http://www.eoda.de

Plotly: collaborative, publication-quality graphing.







rapporter.net: An R based reporting and data analysis platform in the cloud