Site icon R-bloggers

Crayfish or crawdad? Mapping US dialect variations with R

[This article was first published on Revolutions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I grew up in Australia, where I learned to speak English. Or so I thought: when I moved overseas to the UK, and especially when I moved to the States, I soon learned these are distinct cultures separated by a common language. Words which I previously had no context for being different anywhere else, such as "runners" ("sneakers"), "lemonade" (Sprite — is there even a generic name for this?) and "rubber" (eraser), were met with blank stares, confusion or even guffaws. 

Having been in the States for over a decade now, I thought I'd gotten most of these variations figured out. But now, thanks to this worldwide survey of English dialects and an analysis of the US data by Joshua Katz of NC State University, I realise (sorry — realize) that I haven't even scratched the surface. For example, what do you call a small freshwater crustacean in the States?

As you can see, there's no one answer: depending on where you live it could be "crawfish", "crayfish" or "crawdad". (In Australia, they're called "yabbies".) The regional variations can be quite pronounced: for example, a long sandwich is a "sub" almost everywhere except Pennsylvania and around New Orleans. (In the charts below, blue means few people use the indicated term; red means almost everyone does.)


Getting even more specific, for most people in the USA, "The City" refers to New York City. But I can attest that in San Francisco (and apparently also Boston and Chicago), it means something else:

Joshua used the R language to create these maps from the survey data. To smooth out the individual responses located around the lower 48 states, he used k-nearest neighbor kernel smoothing to color the maps according to the top 3 responses and "other" (all other responses). (Hawaii and Alaska weren't included to keep things simple.)  You can browse through 122 different dialect variations in this interactive application (made with the "Shiny" package from RStudio). Be sure to click on the "Individual" tab to see the geographic distribution of specific terms (as in "sub" vs "hoagie" above). It's certainly expanded my vovabulary, and I hope to one day learn American as well as English.

Shiny: Dialect Survey Results (via Business Insider)

To leave a comment for the author, please follow the link and comment on their blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.