Exploring Aviation Accidents from 1908 through the Present

January 9, 2018

(This article was first published on R – NYC Data Science Academy Blog, and kindly contributed to R-bloggers)

If you can walk away from a landing, it’s a good landing. If you use the airplane the next day, it’s an outstanding landing.
-Chuck Yeager

As the first airplane invented by Wright brothers in 1903, aviation accident became an inevitable tragedy as well as an attractive topic. In this project, I used the data of full history of airplane crashes throughout the world, from 1908 to present, to analyze time, airplane type, airlines and accident summary of aviation accidents



The original data I used contain over 5,000 rows coming from Open Data by Socrata, each row of the data represent a single accident, and the data consist of following features:
Date: date the accident happened
Time: time the accident happened
Location: where the accident happened
Operator: operator the crashed aircrafts belong to
Type: type of crashed aircrafts
Aborad: # of people aboard the crashed aircrafts
Fatalities: # of people dead (who aboard the aircrafts)
Ground: # of people dead (who did not aboard the aircrafts)
Summary: brief accident description
Notice that I removed all the rows belong to military operator in order to focus on commercial airplane.



The first two plot shows air crash count in each year and fatality count and death ratio in each ten years. We can see that after 1970, total amount of accidents and people die from accidents are both decreasing. Due to lack of annually flights count data, we can’t simply say that accident rate is decreasing. However, the death rate went down from over 90% to around 65% throughout history, which means passengers are more likely to survive than before in an air crash.
The third plot shows air crash count and death ratio by time of day. Number of air crash are distinguished by day and night. Again, due to data limitation, we can’t conclude that at what time it has higher accident rate, but death rate during night is higher than during day. (4 of the top 5 highest death rate time periods are within 12am to 5am)


Aircraft Type

Then I analyzed the type of crashed aircraft. Firstly, I want to see if aircraft size affect death rate. As we can see from the first plot, as the aircraft size increasing, the death rate is overall decreasing. We can conclude that passenger in larger aircraft have higher survive rate than in smaller aircraft.
The next two plot shows air crash count for different aircraft model and air craft manufacturer. By selecting whole time range (can drag the bar in shiny app to change the year range), we find that Douglas DC-3 has the most count of air crash, which is far more than the second one DHC-6, but if only consider accidents after 1964, DHC-6 surpass DC-3 become the one has most accidents. After 2000, Cessna 208-B become the lead.
In the second plot, it’s easy to compare the proportion of different manufacturer’s crashed aircraft in different age. And also, we can select different manufacturer to compare in the shiny app. For example, in the subplot, we selected Boeing and Airbus. From the plot, it’s obvious that Boeing experienced more accident than Airbus through the whole history, but it’s too rough to conclude which is safer. We still need more data such as total number of aircraft in service per year, total number of passenger it delivered, etc.




The plots show air crash count and death rate by different airlines. Throughout whole history, Aeroflot has the most count of air crash (179 accidents in total), which is far more than the second one Air France(67 accidents in total), but by selecting after 1991, private airplane surpass Aeroflot, become the one has most accidents.


Accident Summary

Lastly, I put all accident summary together and generated a word cloud. Not surprisingly, Crashed is the most frequent word in the summary. We can see that Landing is more frequent than Takeoff, which illustrate that air crash happened more in the phase of landing rather than takeoff. Also, depends on words such as Mountain, Ground, Runway, Engine, Fuel, Fire, etc. we can roughly estimate where the accident happened and what was the cause of the accident.


Thanks for watching, please feel free to browse my Shiny App and Github via following link!

Link to Shiny App

Link to Githup

Email: [email protected]

To leave a comment for the author, please follow the link and comment on their blog: R – NYC Data Science Academy Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)