Site icon R-bloggers

Dallas Animal Services: Shelter Intake Types vs. Outcomes Analysis

[This article was first published on novyden, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Thanks to Dallas OpenData anyone has access to the city animal shelter records.  If you lost or found a pet it could be that he or she spent some time in a shelter – I personally took lost dogs there. It’s unfortunate but every year tens of thousands of animals find their way to shelters with significant fraction never finding way out. 

City of Dallas animal shelter dataset contains 5 types of animals with solid lead belonging to dogs:


Admissions by Animal Types

For consistency and plausibility of analysis we will focus on the records with dogs only

More exactly, each shelter record contains an animal admitted to a shelter with certain intake type and later discharged with certain outcome. Top 3 reasons why dogs turn up at shelters are Confiscated (abused, no owner, etc.), Owner Surrender (willingly brought in by owner), and Stray (lost or abandoned)


Dogs Admitted by Intake Types

Dogs leave shelters (either alive or dead) for 4 main reasons (outcomes): Adoption (good), Euthanized (bad), Returned to Owner (good), and Transfer (neutral):



So what is the relationship between top intake types and outcomes? Which and to what extent intake types drive outcomes? The good news there is some causality effect because each stay begins with intake type and ends with outcome. 

Let’s begin with higher level (in that case) but visually appealing visualization called sankey diagram (or just sankey). It is a specific type of flow diagram, in which the width of the arrows is shown proportionally to the flow quantity:




Each dog shelter stay contributed to the size of one of the pipes flowing from left (an intake type) to right (an outcome). With this we basically visualized conditional probabilities of dog leaving shelter with certain outcome given its admission with known intake type.

Next, we go beyond total aggregates used in the sankey (counts of intakes and outcomes above) to computing correlations. To compute correlations between intake types and outcomes we aggregated and computed counts over time (monthly) to obtain trends (time series). Then we computed correlations between monthly trends of dogs brought in and removed from Dallas animal shelters for each pair of top animal intake types (Confiscated, Owner Surrender, and Stray) and outcomes (Adoption, Euthanized, Returned to Owner, and Transfer) – 12 coefficients in total:



In this case strong correlation implies (at least to some extent) causation effect due to presence of temporal relationship, consistency, and plausibility criteria (see here and here). Few observations to note:
But can we do better than correlations of these trends? What if instead of coefficients (which technically are still sophisticated aggregates) we observe actual actual monthly trends? Next visual places actual time series instead of correlation coefficients inside the same matrix grid :


Each row corresponds to an intake type and each column to an outcome (just like correlation matrix before). Now we can see trends over time (months) in volume so note the following observations (following the matrix order top down):
I will be back with more analysis (survival analysis) and R code for data processing, analysis, and visualizations of this dataset.

To leave a comment for the author, please follow the link and comment on their blog: novyden.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.