Donor analysis in R – Smith for Congress

June 13, 2011
By

(This article was first published on Offensive Politics » R, and kindly contributed to R-bloggers)

In a previous post I introduced the Smith for Congress data set. The data is 49k contributions made by individuals to a congressional campaign for the 2006-2010 electoral cycles. Smith for Congress is not the name of the actual campaign.

Individual contributions are not required to be disclosed by a campaign unless the individual donates more than $200 during a single electoral cycle. The Smith for Congress campaign has, for their own reasons, published every individual contribution. This disclosure allows us an unprecedented look into how a modern campaign raises money. I’ve collected and scrubbed these contributions and published them for research use. In this post I will perform a detailed donor analysis on with R to better understand how the Smith for Congress campaign financed its 2010 election. Full code and graphs can be found on the simple-analysis github repository for this post:

Prepartion

We need to download the data and load it into R. The latest data can always be downloaded from: Smith for Congress Latest

# latest smith for congress data as of this writing is March 23 2011.
cd <- read.csv("smithforcongress-03232011.csv")
#subset the data to just the 2010 cycle
cd0 <- cd[cd$cycle == 2010,]
# clean up a date variable, and drop amounts < $1. 
cd$contribution_date <- as.Date(cd$contribution_date,format="%m/%d/%Y")
cd0 <- cd0[-which(cd0$amount < 1),]

Data for the 2010 electoral cycle consists of 11,721 contributions made by 6949 individuals, totaling over $770,000. Here is a sample:

personidamountctd_aggregatecontribution_datecycle
9zvlnzw1qj9bvq7k1x47v486a10202009-04-012010
iy8xcopedihv9vwqpg3iwmal15352009-04-012010
1f0lct995ckygk6y4vaxk2q4420202009-04-012010
bf2d43vdjdg07pgfmph6ghy7o20202009-04-012010
7sj05z74r8y10fcctvx4a38pn20202009-04-012010

Data Summary

Since the number of individual donors (6,949) is so much lower than the number of contributions (11,717) we can guess a good portion of those donors gave multiple times. The long-form contribution data is somewhat difficult to work when looking at multiple contributions from the same person. We’ll generate a summary data frame to help with our analysis. The following variables will be captured per individual donor:

  • Date of first contribution
  • The total value of all contributions by this individual
  • The total number of contributions by this individual
  • The amount of the first three contributions. Blank or NA if they have made less than 3 contributions.
  • The difference in time for the first three contributions. Blank or NA if they have made less than 3 contributions.
summarize.contributions <- function(x) {
  xo <- x[order(x$contribution_date),]
  dtx <- as.integer(diff(x$contribution_date))
 
  return(data.frame(
		first.contribution=xo$contribution_date[1], 
		num.contributions = nrow(xo),
		dt1=dtx[1],
		dt2=dtx[2],
		dt3=dtx[3],
		am1=xo$amount[1],
		am2=xo$amount[2],
		am3=xo$amount[3],
		total.value=sum(x$amount)
	))
}
cd0s <- ddply(cd0, "personid", summarize.contributions)

Now the cd0s data frame holds our summary table, which looks like this:

personidfirst.contributionnum.contributionsdt1dt2dt3am1am2am3total.value
1023ryaqqbvz76kh3yq0r2ngq2010-10-181 NA NANA 25NA NA 25
1036lg58hd4skceuyqrr2peb42010-03-252166 NA NA 3525 NA 60
106f366ysq6xe9ci731wejh0k2009-12-114 9118563 505050 250
1081wyujzkgninrt1srf79tbo2009-08-273 58114 NA 253010 65
1094yhx62fcdx3c012mlpxnex2009-10-151 NA NA NA1000 NA NA1000

Giving Levels

With detailed giving levels we can infer a lot of information about a campaign, and about how the fundraisers are doing their jobs. If most of the giving was in the $15-20 range we can assume they focus on small donors and maybe online contributions. If most of the giving is in the $100-250 range then maybe the campaign throws lots of medium sized dinners. If most of the donations are close to the legal maximum of $4800 then the campaign is focused on major donors, and might be ignoring smaller donors all together.

Plotting a histogram of total donation amount per individual will give us better insight into the giving levels.

> qplot(total.value,data=cd0s,geom="histogram",binwidth=50)
nrow(cd0[cd0$amount<250,]) / nrow(cd0)
summary(cd0s$total.value)
Giving Levels, Smith for Congress 2010

Giving Levels, Smith for Congress 2010

Min.1st Qu.MedianMean3rd Qu.Max.
1 25 50 111 1004800

In 2010, 75% of contributors gave $100 or less total to the campaign. The summary table shows us the median total value donated was $50, while the overall average was $111. The maximum was $4800, which is also the maximum allowed by law for 2010. We can infer that while there was certainly some major-donor solicitation, the fundraisers were focused on much smaller donors.

Repeat donors

Now that we know more about giving levels, it would be helpful to better understand giving frequency. The amount of repeat giving may give us insight in to how involved the fundraisers are getting, and maybe even how often they are asking for money.
We’ll use a histogram and a cross-tab of the total number of contributions by individuals to help us with this analysis:

qplot(num.contributions,data=cd0s,geom="histogram",binwidth=1)
table(cd0s$num.contributions)
Giving Frequency, Smith for Congress 2010

Giving Frequency, Smith for Congress 2010

1234567891013141820
42421599 621 256 120 60 28 7 7 5 1 1 1 1

Our plot and table shows about two thirds (61%, 4,242) of the contributors to Smith for Congress only gave one time, leaving 2,707 people who gave more than once. Most of the people who gave more than once gave twice, but there were still several hundred people who gave 3 or 4 times each.

To understand how important repeat giving might be we need more detailed information. We need to look at the total amount donated by each group of contributors; we’ll also include the cumulative total, cumulative percentage, and individual percentage of total for each group.

gft <- ddply(cd0s,"num.contributions",function(x) { data.frame(total=sum(x$total.value),n=nrow(x))})
gft$percent <- gft$total / sum(gft$total) * 100
gft$running.total <- cumsum(gft$total) 
gft$running.percent <- gft$running.total / sum(gft$total) * 100

Our gft data frame looks like this:

num.contributionstotalnpercentrunning.totalrunning.percent
1284043424236.821284043 37
2212697159927.572496740 64
3118998 62115.426615738 80
4 72197 256 9.359687935 89
5 43513 120 5.641731448 95
6 24428 60 3.167755876 98
7 4825 28 0.625760701 99
8 3988 7 0.517764689 99
9 4340 7 0.563769029100
10 990 5 0.128770019100
13 167 1 0.022770186100
14 675 1 0.088770861100
18 360 1 0.047771221100
20 200 1 0.026771421100

We see the campaign raised $284,000 (36.8% of the total raised) from the 4,242 contributors that gave only once, and $212,000 (27.5% of the total raised) from the 1,599 contributors who gave two times. We also see the campaign raised $487,378 from 2,702 repeat donors; that is almost 64% of the total value raised for the entire cycle from individuals. It is obvious the Smith for Congress campaign is good at attracting small dollar donors, one-third whom gave more man once. This is a pretty impressive repeat donor rate.

Finally I’d like to look at what kind of donations make up each level of giving. We know repeat donors gave $487,000, but we don’t know if that was mostly in $50 donations or in $250 donations. We can use a box and whisker plot to break down each giving level. I’m leaving off contribution levels 8 – 14 since giving was so sparse at those levels. We’ll be plotting this histogram with a log transform on the y axis since few very large values will skew graph and render it mostly useless. I used a trick from this stack overflow thread to get the formatting correct on the Y axis:

formatBack <- function(x) paste(round(10^x, 2), "$", sep=' ') 
qplot(factor(num.contributions),log10(total.value),data=cd0s[cd0s$num.contributions < 8,],geom="boxplot",ylab="Total Value (log)",xlab="Giving Frequency",main="Giving Levels by Giving Frequency, Smith for Congress 2010") + scale_y_continuous(formatter=formatBack)
# same data, but in table format 
ddply(cd0s,"num.contributions",function(x) { data.frame(total=sum(x$total.value),n=nrow(x), min=min(x$total.value),mean=mean(x$total.value), median=median(x$total.value),std=sd(x$total.value),max=max(x$total.value))})
Giving Levels by Giving Frequency, Smith for Congress 2010

Giving Levels by Giving Frequency, Smith for Congress 2010

num.contributionstotalnminmeanmedianstdmax
12840434242 1 67 35 1492400
22126971599 2133 70 2804800
3118998 621 4192105 2993800
4 72197 256 20282144 4433800
5 43513 120 5363175 6164129
6 24428 60 30407168 7494700
7 4825 28 33172175 103 475
8 3988 7 8057016010943048
9 4340 7 90620225 6271450
10 990 5100198200 72 280
13 167 1167167167 NA 167
14 675 1675675675 NA 675
18 360 1360360360 NA 360
20 200 1200200200 NA 200

This latest plot and table are both incredibly text heavy, but this is the critical intelligence required to start a fundraising plan.

We see the average total contribution increases with the giving frequency, this makes sense. The average increases in an approximately linear fashion which suggests the individual contribution amounts are staying constant. This may be a function of some campaign fundraising tactic, like “donate $35 now for a free tshirt.” We can also get a sense of how much success the Smith for Congress major donor program enjoys. An individual can legally donate $2,400 for both a primary and a general election per cycle. We can count how many individuals have maxed out at $4800 and measure how much impact the major donors have on the total amounts raised:

# how many individuals gave the max for one election
nrow(cd0s[cd0s$total.value == 2400,])
nrow(cd0s[cd0s$total.value == 4800,])

We see 7 individuals who gave the maximum for one election, and only 2 individuals who maxed out for the entire cycle. The maxed out donors make up only 1.2% of total giving; this is very low for the average campaign. This tells us major donors aren’t the most important segment to Smith for Congress, but it could also mean that the campaign isn’t able or isn’t willing to ask the max amount from large donors.

Take Away

We can take away the following facts from our analysis:

  • 40% of individual donors gave more than once to Smith for Congress
  • 80% of donors gave $100 or less to the campaign
  • Repeat donors gave $487,000 total to the campaign
  • Two out of 6,949 (0.028 percent) donors gave the maximum amount allowable by law for a total of 1.2% of the total amount raised

From all this we can infer that Smith for Congress is running a very strong repeat donor program, and isn’t focused on only high-dollar donors. This information could be very useful in a number of different ways. A treasurer for Smith for Congress could use this information to design a 2012 fundraising plan and campaign budget. A candidate similar to Smith, or running in a similar district, could use this same information to plan their own campaign. Or a rival campaign could use this during opposition research and financial planning. Or researchers could use this to build better generic models of US House individual fundraising. I hope this shows that detailed campaign finance analysis is pretty simple when you’ve got access to the relevant data, which unfortunately is very uncommon.

Thanks for reading, questions or comments are always appreciated: [email protected]

To leave a comment for the author, please follow the link and comment on his blog: Offensive Politics » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.