Spring Cleaning Data: 5 of 6- 2 ifelse vs Merge

April 12, 2013
By

(This article was first published on OutLie..R, and kindly contributed to R-bloggers)

The blog in the data cleaning series looks at separating out the Federal Reserve Districts. What I wanted was two additional columns, where I had the name of the city and the number for each district. Since I was on a separation kick I thought it would be fun to do this using ifelse() function.

Well, what started out as a fun romp in the fields turned to an exercise in precision and frustration that did end well, but took too much time, and too many lines of code to do what I wanted.

While I was banging my head against the keyboard in frustration, the thought occurred to me. Instead of using the ifelse() function, create a table with the new columns of data then merge the original data with the table just created. Two lines of code for both columns of data, definitely one of those eureka moments.

The lesson in all of this, ifelse() functions are good within a limited use, I would say 5 or less. Unless you really like doing them, then have fun. If there are limited number of occurrences like this example 12 different districts, the table works very well. What took me 2 hours of work using the ifelse() function, took me 15 minutes using the table method. The code is simpler, and easier to understand. Sure, there is the extra table to be imported, but it is small and very manageable. 

I have placed the code below, with the merge code first, followed by the ifelse() code. The table I used can be downloaded from here (District Data). Read the district data in by using the read.csv() then merge the two files using the 'district' as the column they both have in common. The ifelse(logic, true, false), the logic is if the column looks like one of the districts, if true a 1/Boston, at the end there is the 'Error' just in case.

#Merging the data

dist<-read.csv(file.choose(), header=T)
dw<-merge(dw, dist, by='district')
 
 
#re-coding the district data to numerical
tmp1<-ifelse(dw$district=='Boston (1)', 1,
ifelse(dw$district=='New York (2)', 2,
ifelse(dw$district=='Philadelphia (3)', 3,
ifelse(dw$district=='Cleveland (4)', 4,
ifelse(dw$district=='Richmond (5)', 5,
ifelse(dw$district=='Atlanta (6)', 6,
ifelse(dw$district=='Chicago (7)', 7,
ifelse(dw$district=='St. Louis (8)', 8,
ifelse(dw$district=='Minneapolis (9)', 9,
ifelse(dw$district=='Kansas City (10)', 10,
ifelse(dw$district=='Dallas (11)', 11,
ifelse(dw$district=='San Francisco (12)', 12,
'Error'))))))))))))
 
dw$dist.no<-as.numeric(tmp1)
 
 
#Isolating the names, making to factor
tmp2<-ifelse(dw$district=='Boston (1)', 'Boston',
ifelse(dw$district=='New York (2)', 'New York',
ifelse(dw$district=='Philadelphia (3)', 'Philadelphia',
ifelse(dw$district=='Cleveland (4)', 'Cleveland',
ifelse(dw$district=='Richmond (5)', 'Richmond',
ifelse(dw$district=='Atlanta (6)', 'Atlanta',
ifelse(dw$district=='Chicago (7)', 'Chicago',
ifelse(dw$district=='St. Louis (8)', 'St. Louis',
ifelse(dw$district=='Minneapolis (9)', 'Minneapolis',
ifelse(dw$district=='Kansas City (10)', 'Kansas City',
ifelse(dw$district=='Dallas (11)', 'Dallas',
ifelse(dw$district=='San Francisco (12)', 'San Francisco',
'Error'))))))))))))
 
dw$dist.city<-as.factor(tmp2)
Created by Pretty R at inside-R.org

Previous Posts (Part 1, Part 2, Part 3, Part 4)

To leave a comment for the author, please follow the link and comment on his blog: OutLie..R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.