Site icon R-bloggers

Spring Cleaning Data: 2 of 6- Changing Column Names and Adding a Column

[This article was first published on OutLie..R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The first post (found here) we downloaded the data and imported it to R using the gdata package. This post we will be changing the column names to make them more reasonable, and adding a quarter variable. The reason for changing the column names is because the dw.2010.q1 file column names are messed up due to the formatting done in Excel. So if I was going to have to change one, just as well change them all, so i did.

The first chunk of code defines the labels I am going to use as c.label. Then I used the colnames() function to rename each file.

#Defining the new labels
c.label<-c('loan.date', 'mat.date', 'term',
   'repay.date', 'district', 'borrower', 'city',
   'state', 'ABA', 'type.credit', 'i.rate',
   'amount', 'outstanding.credit',
   'total.outstanding', 'collateral',
   'commercial', 'residential.morg',
   'comm.real', 'consumer', 'treasury',
   'municipal', 'corp', 'mbs.cmo',
   'mbs.cmo.other', 'asset.backed',
   'internat', 'tdfd')
 
#Changing the column names
colnames(dw.2010.q3)<-c.label
colnames(dw.2010.q4)<-c.label
colnames(dw.2011.q1)<-c.label

I also like to add a few additional variables when I see a potential need when I can. At this point the files are individual, and adding the quarter variable might be helpful. Sure I could write a loop to create the new column based on the month of the date, but I like to keep things as simple as possible. Why add complexity when there is no reason. I used the ABA to define the length of the data set because it did not have any missing values, while others did. The new column name is qtr, and the function rep() is used to repeat the quarter number the length of the column ABA.

#defining a quarter variable for future use, so I can 
#isolate quarters to compare and contrast
dw.2010.q3$qtr<-rep(3, length(dw.2010.q3$ABA))
dw.2010.q4$qtr<-rep(4, length(dw.2010.q4$ABA))
dw.2011.q1$qtr<-rep(1, length(dw.2011.q1$ABA))

To leave a comment for the author, please follow the link and comment on their blog: OutLie..R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.