How to Convert Continuous variables into Categorical by Creating Bins

[This article was first published on R – Predictive Hacks, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A very common task in data processing is the transformation of the numeric variables (continuous, discrete etc) to categorical by creating bins. For example, is quite ofter to convert the age to the age group. Let’s see how we can easily do that in R.

We will consider a random variable from the Poisson distribution with parameter λ=20

library(dplyr)
# Generate 1000 observations from the Poisson distribution 
# with lambda equal to 20
df<-data.frame(MyContinuous = rpois(1000,20))

# get the histogtam
hist(df$MyContinuous)
  
How to Convert Continuous variables into Categorical by Creating Bins 1

Create specific Bins

Let’s say that you want to create the following bins:

  • Bin 1: (-inf, 15]
  • Bin 2: (15,25]
  • Bin 3: (25, inf)

We can easily do that using the cut command. Let’s start:

df<-df%>%mutate(MySpecificBins = cut(MyContinuous, breaks = c(-Inf,15,25,Inf)))
head(df,10)
 
How to Convert Continuous variables into Categorical by Creating Bins 2

Let’s have a look at the counts of each bin.

df%>%group_by(MySpecificBins)%>%count()
 
How to Convert Continuous variables into Categorical by Creating Bins 3

Notice that you can define also you own labels within the cut function.


Create Bins based on Quantiles

Let’s say that you want each bin to have the same number of observations, like for example 4 bins of an equal number of observations, i.e. 25% each. We can easily do it as follows:

numbers_of_bins = 4

df<-df%>%mutate(MyQuantileBins = cut(MyContinuous, 
                                 breaks = unique(quantile(MyContinuous,probs=seq.int(0,1, by=1/numbers_of_bins))), 
                                                 include.lowest=TRUE))

head(df,10)
 
How to Convert Continuous variables into Categorical by Creating Bins 4

We can check the MyQuantileBins if contain the same number of observations, and also to look at their ranges:

df%>%group_by(MyQuantileBins)%>%count()
 
How to Convert Continuous variables into Categorical by Creating Bins 5

Notice that in case that you want to split your continuous variable into bins of equal size you can also use the ntile function of the dplyr package, but it does not create labels of the bins based on the ranges.

To leave a comment for the author, please follow the link and comment on their blog: R – Predictive Hacks.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)