# Surviving Shelter: Analysis of Time Spent and Outcome in Dallas Animal Shelters

[This article was first published on

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In previous post we discovered Dallas Animal Services data sources (available on Dallas Open Data) and successfully analyzed how animals get admitted to and discharged from the city shelters. We loaded actual shelter records and looked at the types of admittance, different outcomes and their relationships.Â In this post we continue this analysis by focusing on the **novyden**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

*time*animals spend and factors that favor or hinder

*survival*of dogs in the shelters. For consistency and representation only types of admission

**Confiscated**,Â

**Owner Surrender**, andÂ

**Stray**and outcomes

**Adoption**,Â

**,Â**

**Died,**Euthanized**Returned to Owner**, andÂ

**Transfer**were included.Â Outcome

**Dead on Arrival**was excluded from survival analysis because it preempties outcome conditions before stay in shelter begins.

### Time Spent in Shelters

Compare the distributions of time spent in shelter for cats and dogs to note both similarities and differences:

Distributions are bimodal with relatively fat tails but they differ in how major modes compare to minor ones. As Wikipedia rightly notices “a bimodal distribution most commonly arises as a mixture of two differentÂ unimodalÂ distributions” and dissectingÂ data by admission and outcome types opens the door to further discovery:

If the former histogram used facets for separate plots for cats and dogs, the latter plot switched to dodged bars to pack more information into less space. Some interesting observations:

And then for dogs:

Distributions are bimodal with relatively fat tails but they differ in how major modes compare to minor ones. As Wikipedia rightly notices “a bimodal distribution most commonly arises as a mixture of two differentÂ unimodalÂ distributions” and dissectingÂ data by admission and outcome types opens the door to further discovery:

If the former histogram used facets for separate plots for cats and dogs, the latter plot switched to dodged bars to pack more information into less space. Some interesting observations:

**Confiscated**admissions have distinctively different profile and peaks presuminglyÂ attributed to legal obligations to owners;**Confiscated**has distinct bimodal distributions when outcomes are eitherÂ**Returned to Owner**Â or**Transfer**;**Adoption**times are similar for both cats and dogs;- Most distributions have clear unimodal profiles specific to the types of admission and outcome that vary between dogs and cats in density;
**Adoption**Â and to less degreeÂ**Owner Surrender**distributions are almost indistinguishable between cats and dogs.

Rendering the same data using density curve estimates let us validate the differences and similarities observed:

The densities demonstrate striking similarity in

**Adoption**and most differences in**Euthanized**outcome times.Â### Sankeys With Average Times

We already used Sankey diagrams to project flow from admission to discharge by total number of occurrences in each transition. This time we decided on novel approach to Sankeys when thickness reflects average time spent in shelter. First diagram is for cats:

And then for dogs:

The thinner the line the shorter average stay between admission and outcome it connects. And the larger vertical panel (admission or outcome)Â the longer it indicates an animal spends in shelter after admission or before dischargeÂ (on average and unweighted). Â

We begin with rather simple calculations – an estimates of chance of dying in shelter given animal satisfies certain condition. Plot below contains conditional probabilities for dogs (unless cats specified)

Two health conditions stand out with the highest rates:

There is one more factor

Note that probability scale is different between the two plots. Surprisingly, breed

### Expected Chance of Not Surviving in Shelter

For the purpose of this analysis any outcome other than**Died**or**Euthanized**means animal survived to leave shelter alive (most with outcomes**Adoption**,**Foster**,**Returned to Owner**or**Transfer**). Remember that we also excluded dogs with intake type**Dead on Arrival**(see introduction).We begin with rather simple calculations – an estimates of chance of dying in shelter given animal satisfies certain condition. Plot below contains conditional probabilities for dogs (unless cats specified)

**not**surviving in shelter given certain factor at the time of admission (intake categories):Two health conditions stand out with the highest rates:

*untreatable*and*unmanageable*, while another health condition*contagious*is present in 3 out of top 4 factors.There is one more factor

*breed*which has over 200 values just for dogs. Below we display chances of dying for the dog breeds with at least 100 recorded admissions:Note that probability scale is different between the two plots. Surprisingly, breed

**Chow Chow**took the top spot with Pit Bull Terrier breeds**Staffordshire**,**Pit Bull**,**Am Pit Bull Ter**rier, and**American Stafford**shire close next.Â### Survival Analysis

While applying classic survival analysis to animal shelter data presents certain challenges we apply the approach by ignoring few details. But any suggestions or comments how to improve are welcome. The survival function

In this case pets survived when discharged with any outcome other than

*S(t)*gives the probability that the subject (pet admitted to shelter) survives longer than time*t*.ÂIn this case pets survived when discharged with any outcome other than

**Died**or**Euthanized**. The time*t*is always in days since the day of admission and all animal records included in this analysis are for animals that were discharged (effectively eliminating both left and right censoring cases).Â Survival analysis accounts for censored data – those subjects with last known status alive and no later information available. In our case all animal records contain outcomeÂ and thus all discharged alive are censored at discharge date.#### Kaplan-Meier Estimator

Kaplan-Meier (KM) estimate is a non-parametric maximum likelihood estimate of the survival function,

*S(t)*, given univariate categorical factor. It measures the fraction of animals living for a certain number of days*t*after admission and produces a declining step function with drops (KM curve) that approximates the real survival function from data. Applying this technique to various categories of animals we compare their survival curves between multiple factor values. KM curves estimate and visualize survival chances in time just as survival functions: given time*t*what is probability that subject survives at least to that time or longer.#### Cats vs. Dogs KM Curves

The survival curve plot (top) is augmented with the bar chart of totals by categories and survival outcome (bottom) to give better understanding of underlying data. Survival chances for cats are never better than those and overall cats fare much worse than dogs – see bar chart above. Zooming in into the most critical first days after admission reveals more differences:

Day of admission is the worst for both but cats fare twice as bad with 25% lost right away. Days 4 and 5 are critical for dogs as their survival plummets on these days. After that survival rates stabilize and trend in similar pattern.

#### KM Curves by Dog Intake Types

To make further analysis more plausible we include only dog records from this point on. We also exclude petsÂ admitted asÂ**Dead on Arrival**Â orÂ

**Euthanasia Requested**Â since their outcomes are obvious and immediate.

Confiscated dogs survival chances are the best in first 10 days or so but then they quickly deteriorate crossing and diving below 2 other types after 2 weeks. The worst chances as expected belong to dogs surrendered by owner. And after 2 weeks all 3 curves cross to become less distinguishable.

Â

Â

#### KM Curves by Dog Origins

Dallas Animal Services also maintain origin field assigning it at admission with 3 most prevalent values being

**Field**,**Over the Counter**, and**Sweep**. These are how survival curves differ depending on dog origin:Again, significant shifts in survival chances happen after 5 days and then after 2-3 weeks when the fortunes of different origins turn around: after 5 days

**Over the Counter**from the worst becomes 2d worst (or best) and then after 3 weeks the best. Both**Field**and**Sweep**drop after 5 days. In absolute numbers (shown in the bar plots)**Field**dogs surviveÂ the worst.#### Health Conditions at Admission

Unhealthy animals have little chance to survive shelters as evident from the following:

No surprise that unhealthy animals survival is significntly below healthy ones. Also, dominant majority of dogs accepted are in unhealthy condition, which is both not surprising and unfortunate.Â

There is more information about unhealthy dogs available from shelter records: treatable vs. untreatable and contagious vs. non-contagious. Unfortunately, these values reside inside single field so the survival curves include combinations of the health factors:

It clearly shows how each health factor reduces survival chances: from

If we extract and analyze each health factor (ignoring the rest) then these relationships become more apparent:

Â

Â

Still having a dog microchipped will almost certainly keep survival chances higher.

Â

Top 4 breeds –

It turns out there are more breeds closely related to Pit Bull:

Similar pattern for three of four breeds from the group sharply differ from the 4th –

Â

Â

No surprise that unhealthy animals survival is significntly below healthy ones. Also, dominant majority of dogs accepted are in unhealthy condition, which is both not surprising and unfortunate.Â

There is more information about unhealthy dogs available from shelter records: treatable vs. untreatable and contagious vs. non-contagious. Unfortunately, these values reside inside single field so the survival curves include combinations of the health factors:

It clearly shows how each health factor reduces survival chances: from

**Healthy**to**Treatable Rehabilitable**to**Treatable Manageable**to**Unhealthy Untreatable**to finally**.Â****Unhealthy Untreatable Contagious**If we extract and analyze each health factor (ignoring the rest) then these relationships become more apparent:

Â

Â

#### Survival of Dogs with Chips

As of June 17, 2017, all dogs and cats four months and older in the city of Dallas must be microchipped. This relatively new regulation will likely change both the share of chipped dogs in Dallas and survival curves as observed below from 2015 through October 2017:Still having a dog microchipped will almost certainly keep survival chances higher.

Â

#### Dog Breeds

Dallas shelters admitted dogs of over 200 different breeds from 2015 through 2017. Among them 56 breeds appeared 100 times or more (over 95% of all admissions):ÂTop 4 breeds –

**Pit Bull**,**Labrador Retriever**,**Chihuahua**, and**German Shepherd**– account for almost 60% of all admissions with next breed –**Cairn Terrier**– dropping to just under 3%. The survival curves for these 5 breeds contain almost 2/3 of all dogs admitted to Dallas shelters:**Pit Bull**‘s suffer the worst survival rate of the 5 most admitted breeds. It drops to below 50% survival rate after just over a week at shelter.**Labrador**and**German Shepherd**get 50% some timeÂ into 3 week period. Smaller breeds last much better as evident from**Chihuahua**and**Cairn Terrier**curves.It turns out there are more breeds closely related to Pit Bull:

**American Staff**,**Am Pit Bull Ter**, and**Staffordshire**:Similar pattern for three of four breeds from the group sharply differ from the 4th –

**American Staff**ordshire for reason(s) beyond this analysis.Â

### Next

In the next and final post on Dallas animal shelters we will apply Cox proportional hazard semi-parameters statistical analysis to assess simultaneously the effect of several factors on survival time and outcome.Â

### Resources

The R notebook (source code) with data pipeline and visualizations can be found here with knitted version on RPubs.To

**leave a comment**for the author, please follow the link and comment on their blog:**novyden**.R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.