Looping through factor variables

[This article was first published on Daniel MarcelinoDaniel Marcelino » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

LoopingToday I was typing to a friend who is working on a Ph.D thesis about my favor issue on elections: campaign contributions. We discussed about analyzing probabilities of a particular group/category grasp more money than the others, in this case, male and female given a district. A preliminary approach should consider identify in the data whether a candidate received less or more money than the district average. In doing so, should be valuable coding those case as “0 or 1″ accordingly, so you can perform a logistic regression to see the odds ratio for gender effects. More basic yet, we should consider visualize the difference of revenues on average between males and females for each district. This task can be performed pretty easy in all statistical package and also in Excel. In Stata, for instance, it is enough to type: egen statemu = mean(revenue), by(state gender) while in objective-C language of R it is enough typing:
mean <- with(data, aggregate(revenue, by=list(state=state, gender=gender), FUN=mean, na.rm=TRUE)). Problem solved clean and fast.

Despite I’m not a “gaúcho” I’d say: if we can complicate, why simplify? Joke aside, in general I favor to use the most simple solution ever. Thus, why bother with loops across observations and variables? What’s more, create loops, special the nested ones in large-N (i.e. large number of cases) data set, might be a suboptimal choice since loops decrease the computing efficiency. But, suppose that we’re faced to a data bank with wide number of factor variables, which have many categories: ordered or not. So, perhaps a nested loop across those factors could help us much better. That said, let’s go back to the case of my friend to try a different approach to visualize the average differences using a loop.

Below, I scratch a loop to find the average of contributions revenues to female and male for each state. Despite the following example is quite simple as I’m only looping through 2 factor variables, we can add as many factors we want to compute. Also, the outcome values doesn’t necessary should be average, but you can chose among standard error, percentiles et cetera.

After running the loop below, the averages for males and females will be displayed on your Stata screen as the picture shows.

/* Drawing loop across the groups: electoral districts and gender */ /* Note that the command line levelsof is necessary for store the levels contained in the variable "state" and "gender" otherwise I need to provide them by hand */ quietly levelsof state, local(states) foreach s of local states { quietly levelsof gender if state == "`s'", local(genders) foreach g of local genders { quietly summarize revenue if gender =="`g'" & state == "`s'", meanonly local mu = r(mean) display "Meam of Contributions for `g' in `s' = `mu'" } }

stata-screen
For example, regardless my data set is pooled altogether candidates for a variety of offices, in Acre (AC) the first state shown in the picture above, the average of contributions for female is 45,983.57 while for male is around 51,116.96. For miner the outcome we might introduce other factors like office, incumbency, et cetera.

To leave a comment for the author, please follow the link and comment on their blog: Daniel MarcelinoDaniel Marcelino » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)