Donor analysis in R – Smith for Congress

Posted on June 13, 2011 by jjh in R bloggers | 0 Comments

[This article was first published on Offensive Politics » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In a previous post I introduced the Smith for Congress data set. The data is 49k contributions made by individuals to a congressional campaign for the 2006-2010 electoral cycles. Smith for Congress is not the name of the actual campaign.

Individual contributions are not required to be disclosed by a campaign unless the individual donates more than $200 during a single electoral cycle. The Smith for Congress campaign has, for their own reasons, published every individual contribution. This disclosure allows us an unprecedented look into how a modern campaign raises money. I’ve collected and scrubbed these contributions and published them for research use. In this post I will perform a detailed donor analysis on with R to better understand how the Smith for Congress campaign financed its 2010 election. Full code and graphs can be found on the simple-analysis github repository for this post:

Prepartion

We need to download the data and load it into R. The latest data can always be downloaded from: Smith for Congress Latest

# latest smith for congress data as of this writing is March 23 2011.
cd <- read.csv("smithforcongress-03232011.csv")
#subset the data to just the 2010 cycle
cd0 <- cd[cd$cycle == 2010,]
# clean up a date variable, and drop amounts < $1. 
cd$contribution_date <- as.Date(cd$contribution_date,format="%m/%d/%Y")
cd0 <- cd0[-which(cd0$amount < 1),]

Data for the 2010 electoral cycle consists of 11,721 contributions made by 6949 individuals, totaling over $770,000. Here is a sample:

personid	amount	ctd_aggregate	contribution_date	cycle
9zvlnzw1qj9bvq7k1x47v486a	10	20	2009-04-01	2010
iy8xcopedihv9vwqpg3iwmal	15	35	2009-04-01	2010
1f0lct995ckygk6y4vaxk2q44	20	20	2009-04-01	2010
bf2d43vdjdg07pgfmph6ghy7o	20	20	2009-04-01	2010
7sj05z74r8y10fcctvx4a38pn	20	20	2009-04-01	2010

Data Summary

Since the number of individual donors (6,949) is so much lower than the number of contributions (11,717) we can guess a good portion of those donors gave multiple times. The long-form contribution data is somewhat difficult to work when looking at multiple contributions from the same person. We’ll generate a summary data frame to help with our analysis. The following variables will be captured per individual donor:

Date of first contribution
The total value of all contributions by this individual
The total number of contributions by this individual
The amount of the first three contributions. Blank or NA if they have made less than 3 contributions.
The difference in time for the first three contributions. Blank or NA if they have made less than 3 contributions.

summarize.contributions <- function(x) {
  xo <- x[order(x$contribution_date),]
  dtx <- as.integer(diff(x$contribution_date))
 
  return(data.frame(
		first.contribution=xo$contribution_date[1], 
		num.contributions = nrow(xo),
		dt1=dtx[1],
		dt2=dtx[2],
		dt3=dtx[3],
		am1=xo$amount[1],
		am2=xo$amount[2],
		am3=xo$amount[3],
		total.value=sum(x$amount)
	))
}
cd0s <- ddply(cd0, "personid", summarize.contributions)

Now the cd0s data frame holds our summary table, which looks like this:

personid	first.contribution	num.contributions	dt1	dt2	dt3	am1	am2	am3	total.value
1023ryaqqbvz76kh3yq0r2ngq	2010-10-18	1	NA	NA	NA	25	NA	NA	25
1036lg58hd4skceuyqrr2peb4	2010-03-25	2	166	NA	NA	35	25	NA	60
106f366ysq6xe9ci731wejh0k	2009-12-11	4	91	185	63	50	50	50	250
1081wyujzkgninrt1srf79tbo	2009-08-27	3	58	114	NA	25	30	10	65
1094yhx62fcdx3c012mlpxnex	2009-10-15	1	NA	NA	NA	1000	NA	NA	1000

Giving Levels

With detailed giving levels we can infer a lot of information about a campaign, and about how the fundraisers are doing their jobs. If most of the giving was in the $15-20 range we can assume they focus on small donors and maybe online contributions. If most of the giving is in the $100-250 range then maybe the campaign throws lots of medium sized dinners. If most of the donations are close to the legal maximum of $4800 then the campaign is focused on major donors, and might be ignoring smaller donors all together.

Plotting a histogram of total donation amount per individual will give us better insight into the giving levels.

> qplot(total.value,data=cd0s,geom="histogram",binwidth=50)
nrow(cd0[cd0$amount<250,]) / nrow(cd0)
summary(cd0s$total.value)

Giving Levels, Smith for Congress 2010

Min.	1st Qu.	Median	Mean	3rd Qu.	Max.
1	25	50	111	100	4800

In 2010, 75% of contributors gave $100 or less total to the campaign. The summary table shows us the median total value donated was $50, while the overall average was $111. The maximum was $4800, which is also the maximum allowed by law for 2010. We can infer that while there was certainly some major-donor solicitation, the fundraisers were focused on much smaller donors.

Repeat donors

Now that we know more about giving levels, it would be helpful to better understand giving frequency. The amount of repeat giving may give us insight in to how involved the fundraisers are getting, and maybe even how often they are asking for money.
We’ll use a histogram and a cross-tab of the total number of contributions by individuals to help us with this analysis:

qplot(num.contributions,data=cd0s,geom="histogram",binwidth=1)
table(cd0s$num.contributions)

Giving Frequency, Smith for Congress 2010

1	2	3	4	5	6	7	8	9	10	13	14	18	20
4242	1599	621	256	120	60	28	7	7	5	1	1	1	1

Our plot and table shows about two thirds (61%, 4,242) of the contributors to Smith for Congress only gave one time, leaving 2,707 people who gave more than once. Most of the people who gave more than once gave twice, but there were still several hundred people who gave 3 or 4 times each.

To understand how important repeat giving might be we need more detailed information. We need to look at the total amount donated by each group of contributors; we’ll also include the cumulative total, cumulative percentage, and individual percentage of total for each group.

gft <- ddply(cd0s,"num.contributions",function(x) { data.frame(total=sum(x$total.value),n=nrow(x))})
gft$percent <- gft$total / sum(gft$total) * 100
gft$running.total <- cumsum(gft$total) 
gft$running.percent <- gft$running.total / sum(gft$total) * 100

Our gft data frame looks like this:

num.contributions	total	n	percent	running.total	running.percent
1	284043	4242	36.821	284043	37
2	212697	1599	27.572	496740	64
3	118998	621	15.426	615738	80
4	72197	256	9.359	687935	89
5	43513	120	5.641	731448	95
6	24428	60	3.167	755876	98
7	4825	28	0.625	760701	99
8	3988	7	0.517	764689	99
9	4340	7	0.563	769029	100
10	990	5	0.128	770019	100
13	167	1	0.022	770186	100
14	675	1	0.088	770861	100
18	360	1	0.047	771221	100
20	200	1	0.026	771421	100

We see the campaign raised $284,000 (36.8% of the total raised) from the 4,242 contributors that gave only once, and $212,000 (27.5% of the total raised) from the 1,599 contributors who gave two times. We also see the campaign raised $487,378 from 2,702 repeat donors; that is almost 64% of the total value raised for the entire cycle from individuals. It is obvious the Smith for Congress campaign is good at attracting small dollar donors, one-third whom gave more man once. This is a pretty impressive repeat donor rate.

Finally I’d like to look at what kind of donations make up each level of giving. We know repeat donors gave $487,000, but we don’t know if that was mostly in $50 donations or in $250 donations. We can use a box and whisker plot to break down each giving level. I’m leaving off contribution levels 8 – 14 since giving was so sparse at those levels. We’ll be plotting this histogram with a log transform on the y axis since few very large values will skew graph and render it mostly useless. I used a trick from this stack overflow thread to get the formatting correct on the Y axis:

formatBack <- function(x) paste(round(10^x, 2), "$", sep=' ') 
qplot(factor(num.contributions),log10(total.value),data=cd0s[cd0s$num.contributions < 8,],geom="boxplot",ylab="Total Value (log)",xlab="Giving Frequency",main="Giving Levels by Giving Frequency, Smith for Congress 2010") + scale_y_continuous(formatter=formatBack)
# same data, but in table format 
ddply(cd0s,"num.contributions",function(x) { data.frame(total=sum(x$total.value),n=nrow(x), min=min(x$total.value),mean=mean(x$total.value), median=median(x$total.value),std=sd(x$total.value),max=max(x$total.value))})

Giving Levels by Giving Frequency, Smith for Congress 2010

num.contributions	total	n	min	mean	median	std	max
1	284043	4242	1	67	35	149	2400
2	212697	1599	2	133	70	280	4800
3	118998	621	4	192	105	299	3800
4	72197	256	20	282	144	443	3800
5	43513	120	5	363	175	616	4129
6	24428	60	30	407	168	749	4700
7	4825	28	33	172	175	103	475
8	3988	7	80	570	160	1094	3048
9	4340	7	90	620	225	627	1450
10	990	5	100	198	200	72	280
13	167	1	167	167	167	NA	167
14	675	1	675	675	675	NA	675
18	360	1	360	360	360	NA	360
20	200	1	200	200	200	NA	200

This latest plot and table are both incredibly text heavy, but this is the critical intelligence required to start a fundraising plan.

We see the average total contribution increases with the giving frequency, this makes sense. The average increases in an approximately linear fashion which suggests the individual contribution amounts are staying constant. This may be a function of some campaign fundraising tactic, like “donate $35 now for a free tshirt.” We can also get a sense of how much success the Smith for Congress major donor program enjoys. An individual can legally donate $2,400 for both a primary and a general election per cycle. We can count how many individuals have maxed out at $4800 and measure how much impact the major donors have on the total amounts raised:

# how many individuals gave the max for one election
nrow(cd0s[cd0s$total.value == 2400,])
nrow(cd0s[cd0s$total.value == 4800,])

We see 7 individuals who gave the maximum for one election, and only 2 individuals who maxed out for the entire cycle. The maxed out donors make up only 1.2% of total giving; this is very low for the average campaign. This tells us major donors aren’t the most important segment to Smith for Congress, but it could also mean that the campaign isn’t able or isn’t willing to ask the max amount from large donors.

Take Away

We can take away the following facts from our analysis:

40% of individual donors gave more than once to Smith for Congress
80% of donors gave $100 or less to the campaign
Repeat donors gave $487,000 total to the campaign
Two out of 6,949 (0.028 percent) donors gave the maximum amount allowable by law for a total of 1.2% of the total amount raised

From all this we can infer that Smith for Congress is running a very strong repeat donor program, and isn’t focused on only high-dollar donors. This information could be very useful in a number of different ways. A treasurer for Smith for Congress could use this information to design a 2012 fundraising plan and campaign budget. A candidate similar to Smith, or running in a similar district, could use this same information to plan their own campaign. Or a rival campaign could use this during opposition research and financial planning. Or researchers could use this to build better generic models of US House individual fundraising. I hope this shows that detailed campaign finance analysis is pretty simple when you’ve got access to the relevant data, which unfortunately is very uncommon.

Thanks for reading, questions or comments are always appreciated: [email protected]

To leave a comment for the author, please follow the link and comment on their blog: Offensive Politics » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Donor analysis in R – Smith for Congress

Prepartion

Data Summary

Giving Levels

Repeat donors

Take Away

Related

Prepartion

Data Summary

Giving Levels

Repeat donors

Take Away

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)