Creating nice tables using R Markdown

[This article was first published on Chester's R blog » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

One of the neat tools available via a variety of packages in R is the creation of beautiful tables using data frames stored in R. In what follows, I’ll discuss these different options using data on departing flights from Seattle and Portland in 2014. (More information and the source code for this R package is available at

We begin by ensuring the needed packages are installed and then load them into our R session.

# List of packages required for this analysis
pkg <- c("dplyr", "knitr", "devtools", "DT", "xtable")

# Check if packages are not installed and assign the
# names of the packages not installed to the variable new.pkg
new.pkg <- pkg[!(pkg %in% installed.packages())]

# If there are any packages in the list that aren't installed,
# install them
if (length(new.pkg)) {
  install.packages(new.pkg, repos = "")

# Load the packages into R

# Install Chester's pnwflights14 package (if not already)
if (!require(pnwflights14)){

# Load the flights dataset
data("flights", package = "pnwflights14")

The dataset provides for the development of a lot of interesting questions. Here I will delve further into some of the questions I addressed in two recent workshops I led in the Fall 2015 Data @ Reed Research Skills Workshop Series. (Slides available at

The questions I will analyze by creating tables are

  1. Which destinations had the worst arrival delays (on average) from the two PNW airports?
  2. How does the maximum departure delay vary by month for each of the two airports?
  3. How many flights departed for each airline from each of the airports?

The kable function in the knitr package

To address the first question, we will use the dplyr package written by Hadley Wickham as below. We’ll use the top_n function to isolate the 5 worst mean arrival delays.

worst_arr_delays <- flights %>% group_by(dest) %>%
  summarize(mean_arr_delay = mean(arr_delay, na.rm = TRUE)) %>%
  arrange(desc(mean_arr_delay)) %>%
  top_n(n = 5, wt = mean_arr_delay)

This information is helpful but you may not necessarily know to which airport each of these FAA airport codes refers. One of the other data sets included in the pnwflights14 package is airports that lists the names. Here we will do a match to identify the names of these airports using the inner_join function in dplyr.

data("airports", package = "pnwflights14")
joined_worst <- inner_join(worst_arr_delays, airports, by = c("dest" = "faa")) %>%
  select(name, dest, mean_arr_delay) %>%
  rename("Airport Name" = name, "Airport Code" = dest, "Mean Arrival Delay" = mean_arr_delay)

Lastly we output this table cleanly using the kable function.

Airport NameAirport CodeMean Arrival Delay
Cleveland Hopkins IntlCLE26.150000
William P HobbyHOU10.250000
Metropolitan Oakland IntlOAK10.067460
San Francisco IntlSFO8.864937
Bellingham IntlBLI8.673913

Oddly enough, flights to Cleveland (from PDX and SEA) had the worst arrival delays in 2014. Houston also had around a 10 minute delay on average. Surprisingly, the airport in Bellingham, WA (only around 100 miles north of SEA) had the fifth largest mean arrival delay.

The DT package

In order to answer the second question, we’ll again make use of the various functions in the dplyr package.

dep_delays_by_month <- flights %>% group_by(origin, month) %>%
summarize(max_delay = max(dep_delay, na.rm = TRUE))

The DT package provides a nice interface for viewing data frames in R. I’ve specified a few extra options here to show all 12 months by default and to automatically set the width. Go ahead and play around with the filter boxes at the top of each column too. (An excellent tutorial on DT is available at

          filter = 'top', options = list(
            pageLength = 12, autoWidth = TRUE

The created table in HTML is available here.

If you click on the max_delay column header, you should see that the maximum departure delay for PDX was in March and for Seattle was in May.

The xtable package to produce nice tables in a PDF

Again, we find ourselves using the extremely helpful dplyr package to answer this question and to create the underpinnings of our table to display. We merge the flights data with the airlines data to get the names of the airlines from the two letter carrier code.

data("airlines", package = "pnwflights14")
by_airline <- flights %>% group_by(origin, carrier) %>%
  summarize(count = n()) %>%
  inner_join(x = ., y = airlines, by = "carrier") %>%

The xtable package and its xtable function (and also the kable function you saw earlier) provide the functionality to generate HTML code or LaTeX code to produce a table. We will focus on producing the LaTeX code in this example.

      comment = FALSE)

& origin & carrier & count & name
1 & PDX & AS & 12844 & Alaska Airlines Inc.
2 & PDX & WN & 11193 & Southwest Airlines Co.
3 & PDX & OO & 9841 & SkyWest Airlines Inc.
4 & PDX & UA & 6061 & United Air Lines Inc.
5 & PDX & DL & 5168 & Delta Air Lines Inc.
6 & PDX & US & 2361 & US Airways Inc.
7 & PDX & AA & 2187 & American Airlines Inc.
8 & PDX & F9 & 1362 & Frontier Airlines Inc.
9 & PDX & B6 & 1287 & JetBlue Airways
10 & PDX & VX & 666 & Virgin America
11 & PDX & HA & 365 & Hawaiian Airlines Inc.
12 & SEA & AS & 49616 & Alaska Airlines Inc.
13 & SEA & WN & 12162 & Southwest Airlines Co.
14 & SEA & DL & 11548 & Delta Air Lines Inc.
15 & SEA & UA & 10610 & United Air Lines Inc.
16 & SEA & OO & 8869 & SkyWest Airlines Inc.
17 & SEA & AA & 5399 & American Airlines Inc.
18 & SEA & US & 3585 & US Airways Inc.
19 & SEA & VX & 2606 & Virgin America
20 & SEA & B6 & 2253 & JetBlue Airways
21 & SEA & F9 & 1336 & Frontier Airlines Inc.
22 & SEA & HA & 730 & Hawaiian Airlines Inc.

If you don’t know LaTeX, I’ve also duplicated a similar table using kable for you to compare:

PDXAS12844Alaska Airlines Inc.
PDXWN11193Southwest Airlines Co.
PDXOO9841SkyWest Airlines Inc.
PDXUA6061United Air Lines Inc.
PDXDL5168Delta Air Lines Inc.
PDXUS2361US Airways Inc.
PDXAA2187American Airlines Inc.
PDXF91362Frontier Airlines Inc.
PDXB61287JetBlue Airways
PDXVX666Virgin America
PDXHA365Hawaiian Airlines Inc.
SEAAS49616Alaska Airlines Inc.
SEAWN12162Southwest Airlines Co.
SEADL11548Delta Air Lines Inc.
SEAUA10610United Air Lines Inc.
SEAOO8869SkyWest Airlines Inc.
SEAAA5399American Airlines Inc.
SEAUS3585US Airways Inc.
SEAVX2606Virgin America
SEAB62253JetBlue Airways
SEAF91336Frontier Airlines Inc.
SEAHA730Hawaiian Airlines Inc.

With the originating airport duplicating across all of the airlines, it would be nice if we could reduce this duplication and just bold PDX or SEA and have each appear once. Awesomely enough, the rle function in R will be of great help to us in this endeavor. It counts how many times a value is repeated in a table. We will then make a call to the multirow function in LaTeX in a sneaky way of pasting the appropriate text in addition to using the force option for sanitizing the text into LaTeX.

We add in a few options to make the output of the table a little nicer by specifying horizontal lines and removing the default rownames.

rle.lengths <- rle(by_airline$origin)$lengths
first <- !duplicated(by_airline$origin)
by_airline$origin[!first] <- ""
by_airline$origin[first] <- paste("\multirow{", 
                                  by_airline$origin[first], "}}")

              comment = FALSE,
              hline.after=c(-1,0,nrow(by_airline), 11),
              sanitize.text.function = force,
              include.rownames = FALSE)

The resulting table produced by LaTeX can be found at at

We see that Alaska Airlines had the most flights out of both airports with Southwest coming in second at both airports.

(The generating R Markdown file for this HTML document—saved in the .Rmd extension—is available here.)

To leave a comment for the author, please follow the link and comment on their blog: Chester's R blog » R. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)