# When to fly to get there on time? Six million flights analyzed.

November 6, 2014
By

(This article was first published on Decision Science News » R, and kindly contributed to R-bloggers)

EVERY U.S. FLIGHT IN 2013 ANALYZED

(click to enlarge)

If you read Decision Science News, you are probably interested in decision making, you probably fly a lot, and you probably like making decisions about flying.

Big data of the type the U.S. Government provides enable us to predict how delayed we will be when we fly at various hours of the day.

To make the plot above, we analyzed every single flight in the United States in 2013 for which there were Bureau of Transportation Statistics data. Filtering out flights between midnight and 6AM that leaves us with a little over six million flights (6,283,085 flights, to be precise). The BTS defines delay as the difference between the time the plane actually arrived and the time listed in the computerized reservation system. Many flights got in early, but because we’re just interested in delays (not speedups), we negative delays with zeroes.

What do we learn?

The later you leave, the greater the average delay you will face until around 6PM when things flatten out and 10PM when we see benefits in leaving later. It makes sense that delays increase as the day goes on because, we understand, the primary cause of delays is waiting for the plane to arrive from another city. The first flights out in the morning don’t have this problem.

About 60% of flights had no delay at all (3,726,061/6,283,085 or 59.3% to be precise). This has something to do with padding the expected arrival times in the computerized reservation system. Hence all the “negative” delays.

Leaving at 11PM gives you the same delay as leaving at 11AM. Miracle of miracles. Want a rule of thumb? Try not to leave between 11AM and 11PM.

The arrival and departure curves are quite similar. To save space, we’ll only look at departure delays from here on.

Now, you may be thinking “20 minutes delay if you depart at the worst possible time? That’s not such a big deal.” But remember, these are averages and 60% of the time there will be zero delay. To show you how bad things can get, here we plot the 95th and 75th percentiles of the delay distribution:

If you leave at the worst time of day,  1 time in 4 you’ll be delayed more than 20 minutes, and 1 time in 20  you’ll be delayed more than an hour and a half!

Do different airports have differing delay patterns? One might expect them to due to weather, total number of flights, longitude and the like? We isolate four of the nations’ largest airports below:

Chicago O’Hare (ORD) and Dallas Forth Worth (DFW) are big connecting airports in the center of the country, both of which suffer from serious delays. New York’s JFK starts out the best and ends up the worst. LAX is the winner, especially in the evening.

In an early analysis, we thought we’d discovered something pretty cool about day of the week effects. We had chosen two months at random and noticed certain days were predictably worse than others. But then, when we looked at two different months, different days emerged as the worst ones. Digging deeper, we found that the day-0f-week effects are attributable mostly to rather random events which change from month to month. Here we look at median (not mean) delays on every day of 2013. Each panel represents one month.

The big spike on April 18, 2013? Five inches of rain in Chicago. December 9th, 2013? Delays are mostly due to winter weather in Texas. These little bumps can really alter the day-of-week findings.

Bon voyage!

R-code, as usual, for those who want it. To get the flight data, just go to … aw heck, I’ll be nice and let you download my cleaned up copy (25 Mb)

This is our first use of Hadley Wickham’s tidyr package. We like it!

The post When to fly to get there on time? Six million flights analyzed. appeared first on Decision Science News.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...