Almost six months ago (!) I wrote a blog post about the NEISS data set, a sample of accidents reported to emergency rooms in the U.S. that are related to consumer products. Ever since I did that exploration, I have been wanting to ask a bit of a different question from that sample of accidents. How do the accidents that people suffer depend on their demographic characteristics? We can get a bit of a sense of that from looking at the plot with age on the x-axis (or exploring Hadley Wickham’s NEISS Shiny app) but the NEISS data set includes quite a bit more demographic information to interact with.
Before we get started, it is probably good to be reminded that this data set doesn’t necessarily include everything you might think it does. After I published that first post, Henrik Bengtsson asked about hang gliding injuries reported in this data set. There appeared to be none, and I was befuddled until Alison Hill pointed out that the NEISS coding manual says that they don’t include such injuries.
First, let’s get the NEISS data. It’s a pretty big data set so this can take a while.
Now let’s open up the main data set and see what is there.
Each row is a case, i.e. injury. The consumer product(s) implicated in the injury are in prod1 and prod2 as numbers, which can be looked up in another data set, products. Let’s join these data frames together so we have the product names rather than codes.
What Should I Worry About?
I am a white woman in my (ever later) thirties, so let’s find what the most common injuries are for someone with my demographic characteristics. This is just some basic dplyr.
Let’s make a visualization for this.
Looks like I should really be careful on our basement stairs. (ALSO, KNIVES!!!) There’s still a fair showing for exercise and sports injuries for white women in their (our?) thirties but a lot of this looks very domestic. “Containers, not specified”?! Not sure on that one.
So that means boxes mainly, apparently.
What Should YOU Worry About?
Those are the most common injuries for my demographic, but what about the rest of everyone else? I have made a Shiny app where you can explore the NEISS data and see how the most common injuries change with age, sex, and race/ethnicity. Check out the app itself, and the code to make the app on GitHub.
Race/ethnicity and also sex/gender can be fraught categories for people whose identities are not easily categorizable; I have chosen to just use these demographics as reported. It appears that an age is reported for every injury in the data set (all 2.3 million of them), but there is missing information for sex and race/ethnicity.
You can look in the Shiny app at the injuries for which these quantities are not reported as “None listed”.
The distribution of common injuries changes quite a lot with various demographic indicators. Check out, for example, the shape of the distribution for children of some sex/race compared to basically any decade of adulthood for the same sex/race. There are also some relative differences by sex and race; compare black and white teenage girls, or male and female children of some race. The R Markdown file used to make this blog post is available here. I am very happy to hear feedback or questions!
To leave a comment for the author, please follow the link and comment on their blog: data science ish.