Earlier this year I had a lot of fun learning how to use the BeautifulSoup and mechanize modules in python to scrape websites. My goal was to scrape the European Parliament website for information on the activity levels of the different MEPs. I struggled a bit in the beginning with getting to grips with the structure of the website and the syntax of the two python modules, but once the initial steps where taken the combination of BeautifulSoup and mechanize proved to be remarkably flexible and easy to use. However the scraping process itself will have to wait for another post, the point is: I managed to get the data from the EP website.
Recently I used this data in a class to show how the activity levels of MEPs are distributed. This is an interesting question as MEPs are commonly divided into two groups. There are the old timers that have been rewarded a seat in the EP by their national parties as a reward for loyal service to the party, and there are the young ambitious politicians that see the EP as a springboard for a career in national politics. Either way, the incentive to do serious work at in the EP is seen as low. This is commonly also associated with high levels of absence from voting sessions in the EP (although according to votewatch.eu on average between 80 an 90 percent of MEPs attend the plenary sessions).
To find a good metric for "activity level" is daunting, as it is a fairly loose term. In the data there are counts of how many speeches an MEP gave in the plenary, how many questions an MEP asked in the plenary, how many motions, opinions and reports an MEP was responsible for. Given that very few MEPs end up authoring reports, and the cost for giving speeches or asking questions in the plenary is very low, I opted to look at the number of speeches and questions as indictors for how active MEPs are in EP. Since we the 7th parliament is still in session I opted to look at the 6th EP, as here we have data for the entire period.
First it is interesting to look at the distribution of questions and speeches. Below there are two plots showing the distribution of the two variables. For each variable the median value has been marked with a dotted line. Both distributions are highly skewed, with many MEPs having asked very few questions and given very few speeches. This is offset by a few MEPs that show an almost supernatural level of activity, with one MEP having asked 2009 questions and one MEP haven given 984 speeches in the 6th EP.
It is of course possible that there are MEPs who do not give a lot of speeches but ask many questions and vice versa. Thus to be fair we should probably look at how the variables are jointly distributed in order identify outliers in the EP with regards to activity levels. In the plot below the two variables have been plotted agains each other, with dotted lines showing where the medians intersect.
Here it clear that there are a few MEPs that are very active with regards to both speeches and questions, and we also have quite a few MEPs that hold many speeches, but do not ask many questions. To get a better feel for who the true outliers might be we can show where the 95% quantiles of the two variables intersect and scale the points that lie in the upper quadrant by their euclidian distance to the intersection.
Here it becomes clear that there are ten MEPs that outliers on the joint distribution of speeches and questions. The fact that there are only 10 outliers on the joint distribution, is interesting, as well as the fact the vast majority of MEPs are indeed not very active in terms of questions and speeches in the plenary.
Before one draw any foregone conclusions from this very preliminary analysis it is possible that MEPs might be highly active on parameters that are difficult to measure, such as active in discussing politics within the party groups. So the measure used here for activity might not be the most appropriate, however with the data that is available it is not a bad shot. With that in mind, it would be interesting to know whether the pattern shown here is present in other parliaments, or whether the EP truly is an outlier.Here is the R code and data to reproduce the plots: