ggplot2: Histogram

January 18, 2018
By

(This article was first published on Rsquared Academy Blog, and kindly contributed to R-bloggers)

Introduction

This is the eleventh post in the series Elegant Data Visualization with
ggplot2
. In the previous post, we learnt to build box plots. In this post,
we will learn to

  • build histogram
  • specify bins
  • modify
    • color
    • fill
    • alpha
    • bin width
    • line type
    • line size
  • map aesthetics to variables

A histogram is a plot that can be used to examine the shape and spread of continuous data. It looks very similar to a bar graph and can be used to detect outliers and skewness in data. The histogram graphically shows the following:

  • center (location) of the data
  • spread (dispersion) of the data
  • skewness
  • outliers
  • presence of multiple modes

To construct a histogram, the data is split into intervals called bins. The intervals may or may not be equal sized. For each bin, the number of data points that fall into it are counted (frequency). The Y axis of the histogram represents the frequency and the X axis represents the variable.

Libraries, Code & Data

We will use the following libraries in this post:

All the data sets used in this post can be found here
and code can be downloaded from here.

Data

ecom <- readr::read_csv('https://raw.githubusercontent.com/rsquaredacademy/datasets/master/web.csv')
ecom
## # A tibble: 1,000 x 11
##       id referrer device bouncers n_visit n_pages duration country purchase
##                               
##  1     1 google   laptop TRUE          10       1      693 Czech ~ FALSE   
##  2     2 yahoo    tablet TRUE           9       1      459 Yemen   FALSE   
##  3     3 direct   laptop TRUE           0       1      996 Brazil  FALSE   
##  4     4 bing     tablet FALSE          3      18      468 China   TRUE    
##  5     5 yahoo    mobile TRUE           9       1      955 Poland  FALSE   
##  6     6 yahoo    laptop FALSE          5       5      135 South ~ FALSE   
##  7     7 yahoo    mobile TRUE          10       1       75 Bangla~ FALSE   
##  8     8 direct   mobile TRUE          10       1      908 Indone~ FALSE   
##  9     9 bing     mobile FALSE          3      19      209 Nether~ FALSE   
## 10    10 google   mobile TRUE           6       1      208 Czech ~ FALSE   
## # ... with 990 more rows, and 2 more variables: order_items ,
## #   order_value 

Data Dictionary

  • id: row id
  • referrer: referrer website/search engine
  • os: operating system
  • browser: browser
  • device: device used to visit the website
  • n_pages: number of pages visited
  • duration: time spent on the website (in seconds)
  • repeat: frequency of visits
  • country: country of origin
  • purchase: whether visitor purchased
  • order_value: order value of visitor (in dollars)

Histogram

To create a histogram, we will use geom_histogram() and specify the variable
name within aes(). In the below example, we create histogram of the variable
n_visit.

ggplot(ecom) +
  geom_histogram(aes(n_visit))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Specify Bins

The default number of bins in ggplot2 is 30. You can modify the number of
bins using the bins argument. In the below example, we create a histogram
with 7 bins.

ggplot(ecom) +
  geom_histogram(aes(n_visit), bins = 7)

Aesthetics

Now that we know how to create a histogram, let us learn to modify its
appearance. We will begin with the background color. Use the fill argument
to modify the background color of the histogram. In the below case, we change
the color of the histogram to ‘blue’.

ggplot(ecom) +
  geom_histogram(aes(n_visit), bins = 7, fill = 'blue')

As we have learnt before, the transparency of the background color can be
modified using the alpha argument. It can take any value between 0 and 1.

ggplot(ecom) +
  geom_histogram(aes(n_visit), bins = 7, fill = 'blue', alpha = 0.3)

The color of the histogram border can be modified using the color argument.
The color can be specified either using its name or the associated hex code.

ggplot(ecom) +
  geom_histogram(aes(n_visit), bins = 7, fill = 'white', color = 'blue')


Putting it all together…

Let us modify the bins, the background and border color of the histogram in
the below example.

ggplot(ecom) +
  geom_histogram(aes(n_visit), bins = 7, fill = 'blue', color = 'white')

Bin Width

Another way to control the number of bins in a histogram is by using the
binwidth argument. In this case, we specify the width of the bins instead
of the number of bins. As you can see, in the below example, we do not use
the bins argument when using the binwidth argument. You can use either of
them but not both.

ggplot(ecom) +
  geom_histogram(aes(n_visit), binwidth = 2, fill = 'blue', color = 'black')

Line Type

The line type of the histogram border can be modified using the linetype
argument. It can take any integer value between 0 and 6.

ggplot(ecom) +
  geom_histogram(aes(n_visit), bins = 5, fill = 'white', 
    color = 'blue', linetype = 3)

Line Size

Use the size argument to modify the width of the border of the histogram bins.
It can take any value greater than 0.

ggplot(ecom) +
  geom_histogram(aes(n_visit), bins = 5, fill = 'white', 
    color = 'blue', size = 1.25)

Map Variables

You can map the aesthetics to variables as well. In the below example, we map
fill to the device variable. You can try mapping color, linetype and size to
variables as well.

ggplot(ecom) +
  geom_histogram(aes(n_visit, fill = device), bins = 7)

Summary

In this post, we learnt to:

  • build histogram
  • specify bins
  • modify
    • color
    • fill
    • alpha
    • bin width
    • line type
    • line size
  • map aesthetics to variables

Up Next..

In the next post, we will learn to modify the axes of a plot.

To leave a comment for the author, please follow the link and comment on their blog: Rsquared Academy Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)