Site icon R-bloggers

City of Fort Collins Spending Analysis

[This article was first published on R-Projects – Stoltzmaniac, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Stoltzmaniac is going local in today’s blog post! I dug into the City of Fort Collins open data and published my findings below.

The data was surprisingly clean and laid out in JSON format on Amazon S3. This post shows my exploration of the data along with some observations.


As always, I loaded the required libraries. I also loaded a custom ‘cleaningData.R’ script, which I wrote specifically for this project.

source("cleaningData.R")  
library(ggplot2)  
library(lubridate)  
library(dplyr)  
library(scales)

I created a custom function to fetch, clean, and merge the data (code provided at the end of this post). I also created columns to help with the time series analysis.

The data is from The City of Fort Collins Website

data = fetchAndCleanData()  
data$year = year(data$gldate)  
data$month = month(data$gldate,label=TRUE)  
data$day = day(data$gldate)  
data$dayOfWeek = wday(data$gldate,label=TRUE)

When are vendors being paid by the City of Fort Collins?

p = ggplot(data, aes(day, fill = factor(year)))

p + geom_bar(position='dodge') +  
  scale_y_continuous(name="Daily Payments",
                     labels = comma) +
  ggtitle('Fig 1. Payments by Day of Month') + 
  scale_fill_discrete(name = "Year")

p = ggplot(data, aes(day, fill = month))  
p + geom_density(alpha = 0.5) +  
  scale_y_continuous(name="% of Payments by Day of Month",
                     labels = percent) +
  ggtitle('Fig 2. Payments by Day of Month')

p = ggplot(data, aes(dayOfWeek,fill=factor(year)))  
p + geom_bar(position='dodge') +  
  scale_y_continuous(name="Payment Count", 
                     labels = comma) +
  scale_fill_discrete(name = "Year") + 
  ggtitle('Fig 3. Payments by Day of the Week 2014 vs 2015') + 
  theme(legend.position='bottom')

Which departments spent the most money?

df = data %>%  
  group_by(year,DEPTNAME) %>%
  summarise(payments = sum(glamount)) %>%
  arrange(desc(payments)) %>%
  top_n(10)
p = ggplot(df,aes(x=reorder(DEPTNAME,payments),y=payments,fill=factor(year)))  
p + geom_bar(stat='identity',position='dodge') +  
  labs(title='Fig 4. Spending By Department - Top 10',x='') + 
  scale_y_continuous(name="", 
                     labels = dollar) + 
  scale_fill_discrete(name = "Year") + 
  coord_flip()

Where was L&P spending all of that money?

df = data %>%  
  filter(DEPTNAME == 'L&P Operations Service Unit') %>%
  group_by(year,glvendor) %>%
  summarise(payments = sum(glamount)) %>%
  arrange(desc(payments)) %>%
  top_n(10)
p = ggplot(df,aes(x=reorder(glvendor,payments),y=payments,fill=factor(year)))  
p + geom_bar(stat='identity',position='dodge') +  
  labs(title='Fig 5. Spending By Vendor - Top 10',x='') + 
  scale_y_continuous(name="", 
                     labels = dollar) + 
  scale_fill_discrete(name = "Year") + coord_flip()

df = data %>%  
  filter(DEPTNAME == 'L&P Operations Service Unit') %>%
  filter(glvendor != 'platte river power authority (') %>%
  group_by(year,glvendor) %>%
  summarise(payments = sum(glamount)) %>%
  arrange(desc(payments)) %>%
  top_n(10)
p = ggplot(df,aes(x=reorder(glvendor,payments),y=payments,fill=factor(year)))  
p + geom_bar(stat='identity',position='dodge') +  
  labs(title='Fig 6. Spending By Vendor - Top 10',x='') + 
  scale_y_continuous(name="", 
                     labels = dollar) + 
  scale_fill_discrete(name = "Year") + coord_flip()

Which vendors received the highest average payments?

vendorSummary = data %>% group_by(year,glvendor) %>%  
  summarize(
    payments = sum(glamount), 
    count = length(glvendor),
    averagePayment = 
      round(sum(glamount)/ length(glvendor),2)) %>%
  arrange(desc(averagePayment))

vendorTop10 = top_n(vendorSummary, 10, payments)

p = ggplot(vendorTop10,  
           aes(x=reorder(glvendor,averagePayment),
               y=averagePayment,fill=factor(year)))
p + geom_bar(stat='identity',position='dodge') +  
  coord_flip() + 
  labs(title='Fig 7 Average Payments to Vendor - Top 10',x='') + 
  scale_y_continuous(name="", 
                     labels = dollar) + 
  scale_fill_discrete(name = "Year") + coord_flip()

After diving into a few different departments and finding relatively unremarkable spending patterns I stumbled across ‘Patrol’ and it piqued my interest.

What vendors had the highest payment totals in the ‘Patrol’ department?

df = data %>%  
  filter(DEPTNAME == 'Patrol') %>%
  group_by(glvendor) %>%
  summarise(payments = sum(glamount)) %>%
  arrange(desc(payments)) %>%
  top_n(20)

p = ggplot(df,aes(x=reorder(glvendor,payments),y=payments))  
p + geom_bar(stat='identity') +  
  labs(title='Fig 8. Payments to Vendor - Top 20',x='') + 
  scale_y_continuous(name="", 
                     labels = dollar) + 
  coord_flip()

Conclusion

Code used in this post is on my GitHub

The fetchAndCleanData() function is below. I had some difficulties with binding the rows upon reading in JSON. Chip Oglesby helped me out, thanks Chip for introducing me to bind_rows!

library(rvest)  
library(jsonlite)  
library(gdata)  
library(dplyr)

fetchAndCleanData = function(){  
  landingPage <- read_html("http://www.fcgov.com/opendata/")

  links <- landingPage %>%
    html_nodes('a') %>%
    html_attr('href')

  glServiceArea <- fromJSON("https://s3.amazonaws.com/fortcollins-opendata/glservicearea.json")
  glFund <- fromJSON("https://s3.amazonaws.com/fortcollins-opendata/glfund.json")
  glDepartment <- fromJSON("https://s3.amazonaws.com/fortcollins-opendata/gldepartment.json")
  glLedgerLinks <- links[grepl("https://s3.amazonaws.com/fortcollins-opendata/generalledger-",links) == TRUE]

  data <- fromJSON(glLedgerLinks[1])

  for(i in 2:length(glLedgerLinks)){
    data <- bind_rows(data, fromJSON(glLedgerLinks[i]))
  }

  # Lowercase column names
  names(data) <- tolower(names(data))

  # Lowercase vendor info
  data$glvendor <- tolower(data$glvendor)
  data$glexplanation <- tolower(data$glexplanation)

  # Clean up data types
  data$glfund <- as.factor(data$glfund)
  data$glamount <- as.numeric(data$glamount)
  data$gldeptno <- as.factor(data$gldeptno)
  data$glserviceareaid <- as.factor(data$glserviceareaid)
  data$glpo <- as.factor(data$glpo)
  data$gldate <- as.Date(data$gldate)

  # Columns are null and can be dropped

  data$glvisadesc <- NULL
  data$glvisacardholder <- NULL

  library(gdata)
  data = merge(data,glServiceArea,by.x='glserviceareaid',by.y='SERVICEAREAID')
  data$SERVICEAREANAME = trim(data$SERVICEAREANAME,'right')
  data = merge(data,glFund,by.x='glfund',by.y='GLFUND')
  data$GLFUND = trim(data$GLFUND,'right')
  data = merge(data,glDepartment,by.x='gldeptno',by.y='DEPTNO')
  data$DEPTNAME = trim(data$DEPTNAME,'right')

  return(data)
}

To leave a comment for the author, please follow the link and comment on their blog: R-Projects – Stoltzmaniac.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.