In my previous post I briefly mentioned an early draft of a working paper (HERE) I’ve written that looks into the possible causes of violence between legislators (like the violence shown in this picture from the Turkish Parliament).
|From The Guardian|
In this post I’m going to briefly discuss how I used Zelig‘s rare events logistic regression (relogit) and ggplot2 in R to simulate and plot the legislative violence probabilities that are in the paper. In this example I am plotting simulated probabilities at fitted values on three variables:
- Age of democracy (Polity IV > 5)
- A dichotomous electoral proportionality variable where 1 is high proportionality, 0 otherwise (see here for more details)
- Governing parties’ majority (as a % of total legislative seats. Data is from DPI.)
Once you have the Zelig object returned from sim() it is simply a matter of extracting the simulation results. The default is to run 1,000 simulations for each fitted value. Zelig stores these in qi$ev in the Zelig object. In this example the fitted values of democratic age (fitted at years 0 through 85) are in a Zelig simulation object that I called Model.DemSim. To extract the simulations of the predicted probabilities use: Now turn the object Model.demAge.e into a data frame and use melt() from Reshape2 to reshape the data so that you can use it in ggplot2. Since the numbers in Variable actually mean something (years of democracy) the final cleanup stage is to remove the "X" prefixes attached to Variable. Now we can use Model.demAge.e as the source of data for geom_point() and stat_smooth() in ggplot! You might want to drop simulation results outside the 2.5 and 97.5 percentiles to keep only the middle 95%. The red bars in the Zelig base plots represent the middle 95%. Right now I prefer keeping all of the simulation results and simply changing the alpha (transparency) of the points. This allows us to see all of the results, both outliers and those within widely accepted, but still somewhat arbitrary 95% bounds. Here is the full code for completely reproducing the last plot above (which is also in the working paper). The last thing to mention is that subsetting the data with complete.cases() to keep only observations with full data on all variables is a crucial step to make before running zelig().