**Quantum Forest » rblogs**, and kindly contributed to R-bloggers)

A few days ago I came across Jack Heinemann and collaborators’ article (Sustainability and innovation in staple crop production in the US Midwest, Open Access) comparing the agricultural sectors of USA and Western Europe^{‡}. While the article is titled around the word sustainability, the main comparison stems from the use of Genetically Modified crops in USA versus the absence of them in Western Europe.

I was curious about part of the results and discussion which, in a nutshell, suggest that *“GM cropping systems have not contributed to yield gains, are not necessary for yield gains, and appear to be eroding yields compared to the equally modern agroecosystem of Western Europe”*. The authors relied on several crops for the comparison (Maize/corn, rapeseed/canola, soybean and cotton); however, I am going to focus on a single one (corn) for two reasons: 1. I can’t afford a lot of time for blog posts when I should be preparing lectures and 2. I like eating corn.

When the authors of the paper tackled corn the comparison was between the USA and Western Europe, using the United Nations definition of Western Europe (i.e. Austria, Belgium, France, Germany, Liechtenstein, Luxembourg, Monaco, Netherlands, Switzerland). Some large European corn producers like Italy are not there because of the narrow definition of Western.

I struggled with the comparison used by the authors because, in my opinion, there are potentially so many confounded effects (different industry structures, weather, varieties, etc.) that it can’t provide the proper counterfactual for GM versus non-GM crops. Anyway, I decided to have a look at the same data to see if I would reach the same conclusions. The article provides a good description of where the data came from, as well as how the analyses were performed. Small details to match exactly the results were fairly easy to figure out. I downloaded the FAO corn data (3.7 MB csv file) for all countries (so I can reuse the code and data later for lectures and assignments). I then repeated the plots using the following code:

# Default directory setwd('~/Dropbox/quantumforest') # Required packages require(ggplot2) require(labels) # Reading FAO corn data FAOcorn = read.csv('FAOcorn.csv') # Extracting Area FAOarea = subset(FAOcorn, Element == 'Area Harvested', select = c('Country', 'Year', 'Value')) names(FAOarea)[3] = 'Area' # and production FAOprod = subset(FAOcorn, Element == 'Production', select = c('Country', 'Year', 'Value')) names(FAOprod)[3] = 'Production' # to calculate yield in hectograms FAOarea = merge(FAOarea, FAOprod, by = c('Country', 'Year')) FAOarea$Yield = with(FAOarea, Production/Area*10000) # Subsetting only the countries of interest (and years to match paper) FAOarticle = subset(FAOarea, Country == 'United States of America' | Country == 'Western Europe') # Plot with regression lines ggplot(FAOarticle, aes(x = Year, y = Yield, color = Country)) + geom_point() + stat_smooth(method = lm, fullrange = TRUE, alpha = 0.1) + scale_y_continuous('Yield in hectograms', limits = c(0, 100000), labels = comma) + opts(legend.position="top") |

I could obtain pretty much the same regression model equations as in the article by expressing the years as deviation from 1960 as in:

# Expressing year as a deviation from 1960, so results # match paper FAOarticle$NewYear = with(FAOarticle, Year - 1960) usa.lm = lm(Yield ~ NewYear, data = FAOarticle, subset = Country == 'United States of America') summary(usa.lm) #Call: #lm(formula = Yield ~ NewYear, data = FAOarticle, subset = Country == # "United States of America") # #Residuals: # Min 1Q Median 3Q Max #-18435.4 -1958.3 338.3 3663.8 10311.0 # #Coefficients: # Estimate Std. Error t value Pr(>|t|) #(Intercept) 38677.34 1736.92 22.27 <2e-16 *** #NewYear 1173.83 59.28 19.80 <2e-16 *** #--- #Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 # #Residual standard error: 6049 on 48 degrees of freedom #Multiple R-squared: 0.8909, Adjusted R-squared: 0.8887 weu.lm = lm(Yield ~ NewYear, data = FAOarticle, subset = Country == 'Western Europe') summary(weu.lm) #Call: #lm(formula = Yield ~ NewYear, data = FAOarticle, subset = Country == # "Western Europe") # #Residuals: # Min 1Q Median 3Q Max #-14726.6 -3205.8 346.4 4000.6 10289.5 # #Coefficients: # Estimate Std. Error t value Pr(>|t|) #(Intercept) 31510.14 1665.90 18.91 <2e-16 *** #NewYear 1344.42 56.86 23.65 <2e-16 *** #--- #Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 # #Residual standard error: 5802 on 48 degrees of freedom #Multiple R-squared: 0.9209, Adjusted R-squared: 0.9193 #F-statistic: 559.1 on 1 and 48 DF, p-value: < 2.2e-16 |

Heinemann and collaborators then point out the following:

…the slope in yield increase by year is steeper in W. Europe (y = 1344.2x + 31512, R² = 0.92084) than the United States (y = 1173.8x + 38677, R² = 0.89093) from 1961 to 2010 (Figure 1).

This shows that in recent years W. Europe has had similar and even slightly higher yields than the United States despite the latter’s use of GM varieties.

However, that interpretation using all data assumes that both ‘countries’ are using GMO all the time. An interesting thing is that USA and Western Europe were in different trends **already before the introduction of GM corn**. We can state that because we have some idea of when GM crops were introduced in the USA. This information is collected by the US Department of Agriculture in their June survey to growers and made publicly available at the State level (GMcornPenetration.csv):

cornPenetration = read.csv('GMcornPenetration.csv') ggplot(cornPenetration, aes(x = Year, y = PerAllGM)) + geom_line() + facet_wrap(~ State) + scale_y_continuous('Percentage of GM corn') + opts(axis.text.x = theme_text(angle=90)) |

This graph tells us that by the year 2000 the percentage of planted corn was way below 50% in most corn producing states (in fact, it was 25% at the country level). From that time on we have a steady increase reaching over 80% for most states by 2008. Given this, it probably makes sense to assume that, at the USA level, yield reflects non-GM corn until 1999 and progressively reflects the effect of GM genotypes from 2000 onwards. This division is somewhat arbitrary, but easy to implement.

We can repeat the previous analyzes limiting the data from 1961 until, say, 1999:

usa.lm2 = lm(Yield ~ NewYear, data = FAOarticle, subset = Country == 'United States of America' & Year < 2000) summary(usa.lm2) #Call: #lm(formula = Yield ~ NewYear, data = FAOarticle, subset = Country == # "United States of America" & Year < 2000) # #Residuals: # Min 1Q Median 3Q Max #-17441 -2156 1123 3989 9878 # #Coefficients: # Estimate Std. Error t value Pr(>|t|) #(Intercept) 39895.57 2084.81 19.14 < 2e-16 *** #NewYear 1094.82 90.84 12.05 2.25e-14 *** #--- #Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 # #Residual standard error: 6385 on 37 degrees of freedom #Multiple R-squared: 0.797, Adjusted R-squared: 0.7915 #F-statistic: 145.2 on 1 and 37 DF, p-value: 2.245e-14 weu.lm2 = lm(Yield ~ NewYear, data = FAOarticle, subset = Country == 'Western Europe' & Year < 2000) summary(weu.lm2) #Call: #lm(formula = Yield ~ NewYear, data = FAOarticle, subset = Country == # "Western Europe" & Year < 2000) # #Residuals: # Min 1Q Median 3Q Max #-10785 -3348 -34 3504 11117 # #Coefficients: # Estimate Std. Error t value Pr(>|t|) #(Intercept) 29802.17 1813.79 16.43 <2e-16 *** #NewYear 1454.48 79.03 18.40 <2e-16 *** #--- #Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 # #Residual standard error: 5555 on 37 degrees of freedom #Multiple R-squared: 0.9015, Adjusted R-squared: 0.8988 #F-statistic: 338.7 on 1 and 37 DF, p-value: < 2.2e-16 |

These analyses indicate that Western Europe started with a lower yield than the USA (29802.17 vs 39895.57 hectograms) and managed to increase yield much more quickly (1454.48 vs 1094.82 hectograms per year) **before any use of GM corn** by the USA. Figure 1 shows a messy picture because there are numerous factors affecting yield each year (e.g. weather has a large influence). We can take averages for each decade and see how the two ‘countries’ are performing:

# Aggregating every decade FAOarticle$Decade = cut(FAOarticle$Year, breaks = seq(1959, 2019, 10), labels = paste(seq(1960, 2010, 10), 's', sep = '')) decadeYield = aggregate(Yield ~ Country + Decade, data = FAOarticle, FUN = mean) ggplot(decadeYield, aes(x = Decade, y = Yield, fill = Country)) + geom_bar(stat = 'identity', position = 'dodge') + scale_y_continuous('Yield [hectograms]', expand = c(0, 0)) + opts(legend.position="top") |

This last figure requires more attention. We can *again* see that Western Europe starts with lower yields than the USA; however, it keeps on increasing those yields faster than USA, overtaking it during the 1990s. Again, all this change happened **while both USA and Western Europe were not** using GM corn. The situation **reverses** in the 2000s, when the USA overtakes Western Europe, **while the USA continuously increased the percentage of GM corn**. The last bar in Figure 3 **is misleading** because it includes a single year (2010) and we know that yields in USA went down in 2011 and 2012, affected by a very large drought (see Figure 4).

At least when looking at corn, I can’t say (with the same data available to Heinemann) that there is no place or need for GM genotypes. I do share some of his concerns with respect to the low level of diversity present in staple crops but, in contrast to his opinion, I envision a future for agriculture that includes large-scale operations (either GM or no-GM), as well as smaller operations (including organic ones). I’d like to finish with some optimism looking further back to yield, because the USDA National Agricultural Statistics Service keeps yield statistics for corn since 1866(!) (csv file), although it uses bizarre non-metric units (bushels/acre). As a metric boy, I converted to kilograms per hectare (multiplying by 62.77 from this page) and then to hectograms (100 g) multiplying by 10.

# Reading NASS corn data NASS = read.csv('NASScorn.csv') # Conversion to sensical units (see Iowa State Extension article) # http://www.extension.iastate.edu/agdm/wholefarm/html/c6-80.html NASS$Yield = with(NASS, Value*62.77*10) # Average by decade NASS$Decade = cut(NASS$Year, breaks = seq(1859, 2019, 10), labels = paste(seq(1860, 2010, 10), 's', sep = '')) oldYield = aggregate(Yield ~ Decade, data = NASS, FUN = mean) # Plotting ggplot(oldYield, aes(x = Decade, y = Yield)) + geom_bar(stat = 'identity') + scale_y_continuous('Yield [hectograms]', expand = c(0, 0)) |

It is interesting to see that there was little change until the 1940s, with the advent of the Green Revolution (modern breeding techniques, fertilization, pesticides, etc.). The 2010s decade in Figure 4 includes 2010, 2011 and 2012, with the last two years reflecting extensive droughts. Drought tolerance is one of the most important traits in modern breeding programs.

^{‡} While Prof. Heinemann and myself work for the same university I don’t know him in person.

P.S. Did you know that Norman Borlaug (hero of mine) studied forestry?

P.S.2. Time permitting I’ll have a look at other crops later. I would have liked to test a regression with dummy variables for corn to account for pre-2000 and post-1999, but there are not yet many years to fit a decent model (considering natural variability between years). We’ll have to wait for that one.

P.S.3. I share some of Heinemann’s concerns relating to subsidies and other agricultural practices.

P.S.4. In case anyone is interested, I did write about a GM-fed pigs study not long ago.

**leave a comment**for the author, please follow the link and comment on their blog:

**Quantum Forest » rblogs**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...