Where are $5.4 billion in PA grant funds being spent?

[This article was first published on Normally Skewed, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Answer: Mostly in Philly & Pittsburgh.


The Commonwealth of Pennsylvania has a major grant called the Redevelopment Assistance Capital Program. As reported by Jacob Adelman for Philly.com, the Philly area recently nabbed $25.2M for a number of major projects, including the Viaduct Rail Park & renovations at the Gallery at Market East.

Here’s what this grant does (italics my own):

“…administered by the Office of the Budget for the acquisition and construction of regional economic, cultural, civic, recreational, and historical improvement projects. RACP projects are authorized in the Redevelopment Assistance section of a Capital Budget Itemization Act, have a regional or multi-jurisdictional impact, and generate substantial increases or maintain current levels of employment, tax revenues, or other measures of economic activity. RACP projects are state-funded projects that cannot obtain primary funding under other state programs.”

The criteria and guidelines for a successful project are extensively laid out on the website and I’d encourage you, dear (lone) reader, to take a gander. The purpose of the grant is designed to have fund projects that are going to have an immediate and lasting impact on the greater community, so it seems by no means an “easy” win. There’s an extensive [scoring rubric]() for projects that are deemed acceptable candidates. Below are the primary criteria and their respective weights (see the rubric for more detail on each):

  • Jobs created or retained (40%)
  • Community impact (20%)
  • Strategic cluster for development (5%)
  • Financial impact (25%)
  • Construction readiness (10%)

The projects also need to be at least $1 million in cost.

Once a grant is awarded, it essentially works on a reimbursement basis, where the grantee submits expenses from the project which are reviewed and reimbursed by the Commonwealth. Receiving the remaining funds after the Commonwealth conducts a legislatively mandated close-out audit of the completed project (here’s how reimbursement works). Additionally, the awardee is also required to contribute money to the project, so the project effectively is jointly funded. Another important point, private companies (e.g. hospital systems) and public organizations can apply for funds.

And speaking of the data award results are kept in a series of spreadsheets listed on the lower right-hand side of the main page. There’s data from 1986 to 2015 here. I’m going to look at the 1986-2014 data since those releases have had enough time to see disbursements.

Which counties are winning these awards?

I didn’t expect predicting county winners to be this easy, but nope, it is. Just be in Philadelphia or Allegheny (home to Pittsburgh).

## load packages mods = c(‘data.table’,’ggplot2′,’readxl’) # load required packages rbind(lapply(mods, function(i) suppressPackageStartupMessages(require(i,quietly=TRUE,character.only=TRUE)))) ## download data from RACP website ## download files links=c(‘http://www.budget.pa.gov/Programs/RACP/Documents/All%20RACP%20Awards.xlsx’, # awards 1986-2015 ‘http://www.budget.pa.gov/Programs/RACP/Documents/Round%202015%20Submissions-Awards.xlsx’ # awards through 2016 ) invisible(mapply(download.file,links[1:2],c(‘racp_86_14.xlsx’,’racp_15.xlsx’))) # NOTE: these are password-protected .xlsx files, and as such, painful to i/o from R. I just copy-pasted the relevent contents (i.e. stripped out merged header rows) into new excel sheets and saved as .csv’s. Sorry that’s not totally reproducible…
# read in data
# racp_15=fread('racp_15.csv')

# format dollar values
names(racp_86_14)[c(1,3,6)] = c('ME_NO','RELEASE_AMOUNT','RELEASE_DATE')
racp_86_14[,c(index) := lapply(.SD, function(i) gsub('\\$|,','',i)),.SDcols=index]
racp_86_14[,c(index) := lapply(.SD, as.numeric),.SDcols=index]
racp_86_14[COUNTY=='Alleghney',COUNTY:='Allegheny'] #Allegheny misspell't

## summarize county projects
total_county_wins = racp_86_14[!duplicated(ME1),.N,COUNTY] # summarize by county

## summarize $$$ by county
amount_by_county = racp_86_14[,list(TOTAL_AMOUNT=sum(RELEASE_AMOUNT)),COUNTY]

# plot projects by county
labs(x='County Name',y='Number of projects awarded',title='Total Projects Awarded, 1986-2014')+
theme(axis.text.x = element_text(angle=90, vjust=.3, hjust=1, color='black', size=11)
,axis.text.y = element_text(color='black', size=10)
,panel.background = element_blank() )+
theme(axis.line.x = element_line(color='lightgrey',size=1)
,axis.line.y = element_line(color='lightgrey',size=1) )

labs(x='County Name',y='% of $ awarded',title='Proportion of all awarded grants, 1986-2014\n(Total Dollars Awarded in Billions)')+
theme(axis.text.x = element_text(angle=90, vjust=.3, hjust=1, color='black', size=11)
,axis.text.y = element_text(color='black', size=10)
theme(axis.line.x = element_line(color='lightgrey',size=.5)
,axis.line.y = element_line(color='darkgrey',size=.5))+
geom_text(aes(label = paste0('$',round(amount_by_county[,TOTAL_AMOUNT/1e9],2))),vjust=1,fontface='bold',size=3)



Philly and Pittsburgh together have won about 50% of all the awards and all initial funding since 1986. Philly missed getting awards in three of the past 28 years (1996, 2003, 2014). Of course, there may have been no applications from the County in that year.

RACP’s distribution of awards has actually been a point of contention. As the above two graphs indicate, the two largest counties in the Commonwealth have received the lion’s share of funding from this program. An analysis by Adam Millsap (@aa_millsap) at GMU’s Mercatus Center found that the program has spurred few long-term benefits such as sustained job growth and economic activity, and may even have unintended economic consequences (e.g. funding is based on bond sales, which need to be repaid at interest to the holders). Alternatively, see a summary of Millsap’s study, written by Andrew Staub (@andrewstaub) for Norristown’s Times Herald, which describes some of the issues.

The bigger the better

To get an idea of how RACP inherently favors these larger and more economically developed counties, I plotted out the counties’ award amounts in 2000 and 2010 against their populations and size of business in those years.

County Population

Yeah, there’s a pretty clear relationship between the size of the county and how much it receives in awards.

<br data-mce-bogus="1">
<h2>download & unzip population files from US Census
temp <- tempfile() # temp folder
download.file("http://www2.census.gov/prod2/statcomp/usac/zip/POP.zip",temp) # read in
unzip(temp, exdir=workdir) # unzip
unlink(temp) # detach temp folder

## read in county population file
county_pop2010 = data.table(read_excel('POP01.xls',1))
county_pop2010 = county_pop2010[substr(STCOU,1,2)=='42',] # Pennsylvania -- Census FIPS Code 42, baby! I have that tattooed on my chest.
county_pop2010[,County := tstrsplit(Area_name,",")[1]] # get county names...

## Join county pop & funding together for 2000 & 2010
setkeyv(racp_86_14, 'COUNTY',verbose = FALSE); setkeyv(county_pop2010, 'County',verbose = FALSE) # set join keys

# 2000 county population = POP010200D
racp_2000 = racp_86_14[AUTH_YEAR == 2000,][county_pop2010] # join datasets 2000
racp_summary_2000 = unique(racp_2000[,list(sum_release_amt=sum(RELEASE_AMOUNT)

# 2010 county population = POP010210D
racp_2010 = racp_86_14[AUTH_YEAR == 2010,][county_pop2010] # join datasets 2010
racp_summary_2010 = unique(racp_2010[,list(sum_release_amt=sum(RELEASE_AMOUNT)

ggplot(data=racp_summary_2000,aes(x=log(pop),y=log(sum_release_amt), label=COUNTY))+
geom_smooth(method='lm', formula=y~x, se = TRUE, size=.7)+
labs(x='County Population (Logged)',
y='Award Amount (Logged)',
title='Relationship between funding & county population (2000)')

## 2010
ggplot(data=racp_summary_2010,aes(x=log(pop),y=log(sum_release_amt), label=COUNTY))+
geom_smooth(method='lm', formula=y~x, se = TRUE, size=.7)+
labs(x='County Population (Logged)',
y='Award Amount (Logged)',
title='Relationship between funding & county population (2010)')



Organizational capacity

The county population is telling, but looking at organizational capacity (in revenue) in each county is probably a more direct measure of distribution. This one was a little bit fuzzier to get a handle on. While I was able to get total number of businesses by county, I wasn’t able to easily grab total revenues. Organizations need to front part of the cost, and any applicant project needs to be at least $1 million in costs. I’m using total employee payroll instead. This isn’t really capturing what I want it to, but my basic assumption is that a larger total payroll per business indicates generally larger businesses in the region. Also, private businesses aren’t the only organizations who can apply for this. So this is more of a convenient sample than a representative estimate. However, I think it’s still directionally indicative.

<br data-mce-bogus="1">


# BZA010200D - Private nonfarm establishments 2000
# BZA010209D - Private nonfarm establishments 2009
business_establishments = data.table(read_excel('bza_census_data.xls',3))

# BZA210200D - Private nonfarm annual payroll 2000 (sheet 9)
# BZA210209D - Private nonfarm annual payroll 2009 (sheet 10)
business_payroll00 = data.table(read_excel('bza_census_data.xls',9))
business_payroll09 = data.table(read_excel('bza_census_data.xls',10))

lapply(list(business_establishments,business_payroll00,business_payroll09), setkeyv,"STCOU")
lapply(list(racp_2000, racp_2010), setkey, "COUNTY")


business=business[substr(STCOU,1,2)=='42',] # FIPS 42 = PA
business[,County := tstrsplit(Areaname,",")[1]] # get county names...

## join business data to racp 2000 &amp; racp 2010 data
racp_2000 = racp_2000[business[,list(County,BZA210200D,BZA010200D)]]
racp_2010 = racp_2010[business[,list(County,BZA210209D,BZA010209D)]]
racp_2000[,business_payroll := BZA210200D/BZA010200D]
racp_2010[,business_payroll := BZA210209D/BZA010209D]

racp2000_business_summary = unique(racp_2000[,list(sum_release_amt=sum(RELEASE_AMOUNT)

racp2010_business_summary = unique(racp_2010[,list(sum_release_amt=sum(RELEASE_AMOUNT)

## Plot business payroll &amp; award winnings - 2000
ggplot(data=racp2000_business_summary,aes(x=business_payroll,y=log(sum_release_amt), label=COUNTY))+
geom_smooth(method='lm',formula=y~x,se = TRUE,size=.7)+
labs(x='Average Payroll per Business',
y='Award Amount (Logged)',
title='Relationship between funding &amp; business payrolls (2000)')

## Plot business payroll &amp; award winnings - 2010
ggplot(data=racp2010_business_summary, aes(x=business_payroll, y=log(sum_release_amt), label=COUNTY))+
geom_smooth(method='lm',formula=y~x,se = TRUE,size=.7)+
# xlim(c(250,900))+
labs(x='Average Payroll per Business',
y='Award Amount (Logged)',
title='Relationship between funding &amp; business payrolls (2010)')


These graphs reveal that counties with higher payroll per business also tend to receive more funding; and although there isn’t as clear of a relationship as that of total population, I would imagine you’d see a stronger pattern if non-private enterprises were included and we looked at total revenue brought into the county. Note that Geisinger Health System sits in Montour County, which explains the large average payroll in the county.

Conclusion: Thinking about wealth distribution

This was a brief look at how a whole lot of PA taxpayer money gets distributed to organizations around the Commonwealth. As you can see, a lot of it goes to Philadelphia & Pittsburgh. And although that’s not necessarily by design, the grant is structured in such a way that inherently favors locales with sufficient economic activity to meet the grant criteria.

This fact, along with how the grant is funded, has not been lost on some. In 2015 a bill was successfully pushed through the State Legislature to limit borrowing for the program. And hey, if we are throwing that much money around, it’s probably worth a careful examination how equitably it’s being distributed.


To leave a comment for the author, please follow the link and comment on their blog: Normally Skewed.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)