US Names by State: Part I (Mary is everywhere!)

July 18, 2014
By

(This article was first published on Analyst At Large » R, and kindly contributed to R-bloggers)

I was browsing the Social Security Administration’s website and found a link for the open government initiative (http://www.ssa.gov/open/data/).  There seems to be a fair amount of interesting data here, but I grabbed the names of people born in the US since 1910 (http://www.ssa.gov/oact/babynames/limits.html).  Each state has a data file that lists the number of births under a given name by year in that state and the gender of the child.

There’s a lot of interesting analysis that could be done with this data, but I’m going to start by just plotting the most popular name by state by gender across the entire dataset (after 1910).

Here is the plot for males:

Male

We can see that John is most popular in the Mid-Atlantic (PA, NY, etc.)  Robert is most popular in the Midwest and the northeastern states.  James dominates large portions of the South while Michael is most popular in the West, Southwest, and Florida.

Here is the plot for females:

Female

Mary was the most popular name basically everywhere in the country (with the exceptions of CA and NV where there were more Jennifers).

It’s interesting to see how dominant Mary is across the entire country while the males names seem to have more regional dominance.  It is particularly unusual because states tended to have many more distinct female names than male names.

More analysis will follow, but here is the code…

###### Settings
library(plyr)
library(maps)
setwd("C:/Blog/StateName")
files<-list.files()
files<-files[grepl(".TXT",files)]
files<-files[files!="DC.TXT"]
 
###### State structure
regions1=c("alabama","arizona","arkansas","california","colorado","connecticut","delaware",
	"florida","georgia","idaho","illinois","indiana","iowa","kansas",
	"kentucky","louisiana","maine","maryland","massachusetts:main","michigan:south","minnesota",
	"mississippi","missouri","montana","nebraska","nevada","new hampshire","new jersey",
	"new mexico","new york:main","north carolina:main","north dakota","ohio","oklahoma",
	"oregon","pennsylvania","rhode island","south carolina","south dakota","tennessee",
	"texas","utah","vermont","virginia:main","washington:main","west virginia",
	"wisconsin","wyoming")
 
mat<-as.data.frame(cbind(regions1,NA,NA))
mat$V2<-as.character(mat$V2)
mat$V3<-as.character(mat$V3)
 
###### Reading files
for (i in 1:length(files))
	{
	data<-read.csv(files[i],header=F)
	colnames(data)<-c("State","Gender","Year","Name","People")
	data1<-ddply(data,.(Name,Gender),summarise,SUM=sum(People))
	male1<-data1[data1$Gender=="M",]
	female1<-data1[data1$Gender=="F",]
	male1<-male1[order(male1$SUM,decreasing=TRUE),]
	female1<-female1[order(female1$SUM,decreasing=TRUE),]
 
	mat$V2[grep(tolower(state.name[grep(data$State[1], state.abb)]),mat$regions)]<-as.character(male1$Name[1])
	mat$V3[grep(tolower(state.name[grep(data$State[1], state.abb)]),mat$regions)]<-as.character(female1$Name[1])
	}
 
jpeg("Male.jpeg",width=1200,height=800,quality=100)
map("state",fill=TRUE,col="skyblue")
map.text(add=TRUE,"state",regions=regions1,labels=mat$V2)
title("Most Popular Male Name (since 1910) by State")
dev.off()
 
jpeg("Female.jpeg",width=1200,height=800,quality=100)
map("state",fill=TRUE,col="pink")
map.text(add=TRUE,"state",regions=regions1,labels=mat$V3)
title("Most Popular Female Name (since 1910) by State")
dev.off()

Created by Pretty R at inside-R.org

To leave a comment for the author, please follow the link and comment on their blog: Analyst At Large » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Mango solutions



RStudio homepage



Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training

datasociety

http://www.eoda.de





ODSC

ODSC

CRC R books series





Six Sigma Online Training









Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)