Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Python has some pretty awesome data-manipulation and graphing capabilities. If you’re a heavy R-user who dabbles in Python like me, you might wonder what the equivalent commands are in Python for dataframe manipulation. Additionally, I was curious to see how many lines of code it took me to do that same task (load, clean, and graph data) in both R and Python. (I’d like to stop the arguments about efficiency and which language is better than which here, because neither my R nor Python code are the super-efficient, optimal programming methods. They are, however, how I do things. So to me, that’s what matters. Also, I’m not trying to advocate one language over the other (programmers can be a sensitive bunch), I just wanted to post an example showing how to do equivalent tasks in each language).

First, R

# read Data
# drop incomplete data
feeding <- subset(JapBeet_NoChoice, Consumption!='NA')
# refactor and clean
feeding$Food_Type <- factor(feeding$Food_Type)
feeding$Temperature[which(feeding$Temperature==33)] <- 35

# subset
plants <- c('Platanus occidentalis', 'Rubus allegheniensis', 'Acer rubrum', 'Viburnum prunifolium', 'Vitis vulpina')
subDat <- feeding[feeding\$Food_Type %in% plants, ]

# make a standard error function for plotting
seFunc <- function(x){
se <- sd(x) / sqrt(sum(!is.na(x)))
lims <- c(mean(x) + se, mean(x) - se)
names(lims) <- c('ymin', 'ymax')
return(lims)
}

# ggplot!
ggplot(subDat, aes(Temperature, Herb_RGR, fill = Food_Type)) +
stat_summary(geom = 'errorbar', fun.data = 'seFunc', width = 0, aes(color = Food_Type), show_guide = F) +
stat_summary(geom = 'point', fun.y = 'mean', size = 3, shape = 21) +
ylab('Mass Change (g)') +
xlab(expression('Temperature '*degree*C)) +
scale_fill_discrete(name = 'Plant Species') +
theme(
axis.text = element_text(color = 'black', size = 12),
axis.title = element_text(size = 14),
axis.ticks = element_line(color = 'black'),
legend.key = element_blank(),
legend.title = element_text(size = 12),
panel.background = element_rect(color = 'black', fill = NA)
)


Snazzy!

Next, Python!

# read data

# clean up
feeding = JapBeet_NoChoice.dropna(subset = ['Consumption'])
feeding['Temperature'].replace(33, 35, inplace = True)

# subset out the correct plants
keep = ['Platanus occidentalis', 'Rubus allegheniensis', 'Acer rubrum', 'Viburnum prunifolium', 'Vitis vulpina']
feeding2 = feeding[feeding['Food_Type'].isin(keep)]

# calculate means and SEs
group = feeding2.groupby(['Food_Type', 'Temperature'], as_index = False)
sum_stats = group['Herb_RGR'].agg({'mean' : np.mean, 'SE' : lambda x: x.std() / np.sqrt(x.count())})

# PLOT
for i in range(5):
py.errorbar(sum_stats[sum_stats['Food_Type'] == keep[i]]['Temperature'],
sum_stats[sum_stats['Food_Type'] == keep[i]]['mean'],
yerr = sum_stats[sum_stats['Food_Type'] == keep[i]]['SE'],
fmt = 'o', ms = 10, capsize = 0, mew = 1, alpha = 0.75,
label = keep[i])

py.xlabel(u'Temperature (\u00B0C)')
py.ylabel('Mass Change')
py.xlim([18, 37])
py.xticks([20, 25, 30, 35])
py.legend(loc = 'upper left', prop = {'size':10}, fancybox = True, markerscale = 0.7)
py.show()


Snazzy 2!

So, roughly the same number of lines (excluding importing of modules and libraries) although a bit more efficient in Python (barely). For what it’s worth, I showed these two graphs to a friend and asked him which he liked more, he chose Python immediately. Personally, I like them both. It’s hard for me to pick one over the other. I think they’re both great. The curious can see much my older, waaayyy less efficient, much more hideous version of this graph in my paper, but I warn you.. it isn’t pretty. And the code was a nightmare (it was pre-ggplot2 for me, so it was made with R’s base plotting commands which are a beast for this kind of graph).