Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I am playing around with Eurostat data and ggplot2 a bit more. As I progress it seems the plotting gets more easy, the data pre-processing a bit more simple and the surprises on the data stay.

#### Eurostat data

The data used are fish_fleet (number of ships) and fish_pr (production=catch+aquaculture). After a bit of year selection, 1992 and later, I decided to pull the data not as xls but as csv with formatting ‘1 234.56‘. The consequence is that the data now comes as tall and skinny, which may actually be better. However, the actual number format and ‘:’ for missing still make a bit of processing needed.
library(ggplot2)
fleet$Number <- scan(textConnection(gsub(' ', '',fleet$Value)),na.strings=':')
catch$Tonnes <- scan(textConnection(gsub(' ', '',catch$Value)),na.strings=':')
Still need to make the GEO labels a bit shorter
shortlevels <- function(xx) {
levels(xx) <- gsub('European Economic Area','EEA' ,levels(xx))
levels(xx) <- gsub(' plus IS, LI, NO','+' ,levels(xx),fixed=TRUE)
levels(xx) <- sub(' countries)',')' ,levels(xx),fixed=TRUE)
levels(xx) <- sub(' (under United Nations Security Council Resolution 1244/99)','' ,levels(xx),fixed=TRUE)
levels(xx) <- sub('European Free Trade Association','EFTA' ,levels(xx),fixed=TRUE)
levels(xx) <- sub('Former Yugoslav Republic of Macedonia, the','FYROM' ,levels(xx),fixed=TRUE)
levels(xx) <- sub('including +former GDR','Incl GDR' ,levels(xx))
levels(xx) <- sub('European Union','EU' ,levels(xx))
levels(xx)[grep(‘Germany’,levels(xx))] <- 'Germany'
xx
}
catch$GEO <- shortlevels(catch$GEO)
fleet$GEO <- shortlevels(fleet$GEO)

Only preparation needed was to select Tonnage as property and use only countries. EFTA and EEA and EU have a number like 15, 25 or 27 in them
f2 <- fleet[grep('Tonnage',as.character(fleet$VESSIZE)) ,] f2 <- f2[-grep('15|25|27',f2$GEO),]
f2 <- f2[complete.cases(f2),]
f2$VESSIZE <- factor(f2$VESSIZE)
f2$GEO <- factor(f2$GEO)
order levels of VESSIZE by value for a nice display
lev <- gsub('(-|\\+).*','',levels(f2$VESSIZE)) nlev <- as.numeric(gsub('^[[:alpha:]]* ','',lev)) f2$VESSIZE <- factor(as.character(f2$VESSIZE), levels= levels(f2$VESSIZE)[order(nlev,lev)])
levels(f2$VESSIZE) <- gsub('Tonnage ','',levels(f2$VESSIZE))
First aim is a dotplot of the last year (2010). With countries ordered by size of fleet
f3 <- f2[f2$TIME==2010 ,] f4 <- f3[f3$VESSIZE=='Total all Classes',]
f3$GEO <- factor(as.character(f3$GEO),
levels=as.character(f4$GEO[order(f4$Number)]))
ggplot(f3,
aes(y=GEO,x=Number,colour=VESSIZE))  +
geom_point() +
labs(colour=’Tonnage’)
It seems Greece had the largest fleet. All my thoughts that Netherlands was a fishing country have been erased.
For a time related plot I chose to put the number of vessels on a logarithmic scale. As the number of countries is a bit large the biggest countries have been selected.
mfleet <- aggregate(f4$Number,list(GEO=f4$GEO),max)
bigfleet <- mfleet$GEO[mfleet$x>quantile(mfleet$x,1-9/nrow(mfleet))] ggplot(f2[f2$GEO %in% bigfleet & f2$VESSIZE!=’Total all Classes’ ,], aes(x=TIME,y=Number,colour=VESSIZE)) + geom_line() + facet_wrap( ~ GEO, drop=TRUE) + scale_y_log10() + labs(colour=’Tonnage’) The interesting thing about this plot is that the number of vessels is decreasing. That is, except for one category, the biggest, more than 2000 Tonnage, there are only a few tens of those, but they must count for loads of smaller vessels. #### Catch Fish caught is probably same thing. In this case, SPECIES and GEO have far too many levels for a decent display. So the biggest catches are shown. On top of that three SPECIES categories are almost the same. These are ‘Total’, ‘Aquatic animals’ and ‘Finfish and invertebrates’. Finfish probably needs an explanation. To quote wikipediaMany types of aquatic animals commonly referred to as “fish” are not fish in the sense given above; examples include shellfishcuttlefish,starfishcrayfish and jellyfish. In earlier times, even biologists did not make a distinction – sixteenth century natural historians classified also seals, whales, amphibianscrocodiles, even hippopotamuses, as well as a host of aquatic invertebrates, as fish.[15] However, according the definition above, all mammals, including cetaceans like whales and dolphins, are not fish. In some contexts, especially in aquaculture, the true fish are referred to as finfish (or fin fish) to distinguish them from these other animals. c2010 <- catch[catch$TIME==2010,]
c2010 <- c2010[complete.cases(c2010),]
mcatch <- aggregate(c2010$Tonnes,list(GEO=c2010$GEO),max)
bigcatch <- mcatch$GEO[mcatch$x>quantile(mfleet$x,.5)] c2010 <- c2010[c2010$GEO %in% bigcatch,]
mcatch <- aggregate(c2010$Tonnes,list(SPECIES=c2010$SPECIES),max)
bigcatch <- mcatch$SPECIES[mcatch$x>quantile(mcatch$x,.5)] bigcatch <- bigcatch[!(bigcatch %in% c(‘Aquatic animals’,’Finfish and invertebrates’))] c2010 <- c2010[c2010$SPECIES %in% bigcatch,]
c2010$SPECIES <- factor(c2010$SPECIES)

ggplot(c2010,
aes(y=GEO,x=Tonnes,colour=SPECIES))  +
geom_point() +
labs(colour=’Tonnes live weight’)
The surprise here is Denmark. It is getting loads of fish. Same is true for UK, Spain

#### Combination of fleet and catch

Since we have both data sets, they can be combined. The merging id’s are GEO and TIME, which means the data have to be transposed beforehand. The newly created variables have Number and Tonnes in the newly created variables, which are not needed for me.
tfl <- reshape(fleet,direction='wide',idvar=c('TIME','GEO'),
timevar=’VESSIZE’,drop=c(‘Value’,’Flag.and.Footnotes’,’UNIT’),v.names=’Number’)
names(tfl) <- gsub('Number.','',names(tfl),fixed=TRUE)
rca <- reshape(catch,direction='wide',idvar=c('TIME','GEO'),
timevar=’SPECIES’,drop=c(‘Value’,’FISHREG’,’UNIT’),v.names=’Tonnes’)
names(rca) <- gsub('Tonnes.','',names(rca),fixed=TRUE)
both <- merge(tfl,rca,id=c('TIME','GEO'))
both2 <- both[-grep('15|25|27',both$GEO),] ggplot(both2[!(both2$GEO %in% c(‘Belgium’,’Bulgaria’,’Cyprus’,’Estonia’,
‘Latvia’,’Lithuania’,’Malta’,’Romania’,’Slovenia’,’Poland’,
‘Germany’,’Finland’,’Ireland’,’Netherlands’,’Sweden’)),],
aes(y=Total fishery products,
x=Total all Tonnage Classes,colour=TIME))  +
geom_point() +
facet_wrap( ~ GEO, drop=TRUE)
I like very much how ggplot2 defaulted TIME as colour variable. It shows very nicely how catches and fleets are getting smaller. The latter obviously not true for the biggest ships as seen above. It is also shown that Denmark and Iceland have remarkably efficient fleets. Small but catching loads of fish. In contrast Greece has a big fleet but small catch. That does not seem economical, but tonnes do not equal Euro’s. Regarding UK and Spain, yes the Spanish are just a bit bigger than the UK, so that pain may exist.

#### Catch per species

As a final, I wanted to look per species. However, this would be a bit too long for this blog, so I only show one. It runs in a function, which just takes a bit of string from the SPECIES variable. To keep the plot simple only the six largest countries are taken. Facet_wrap does two things here. It puts a title even if there is only one species and makes separate panes if more than one value for species fits the string.
byspecies <- function(species) {
ca <- catch[grep(species,catch$SPECIES,ignore.case=TRUE),c(-4,-5,-6)] ca <- ca[complete.cases(ca),] ca <- ca[!(ca$GEO %in% c('EFTA','EU (15)','EU (27)')),]
ag <- aggregate(ca$Tonnes,list(GEO=ca$GEO),median)
ag <- ag[order(-ag$x),] ca <- ca[ca$GEO %in% ag$GEO[1:6],] ca$GEO <- factor(ca\$GEO)
ggplot(ca   ,   aes(y=Tonnes,x=TIME,colour=GEO))  +
geom_line() +
facet_wrap( ~ SPECIES, drop=TRUE)
}
byspecies(‘octop’)