Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A very common task in data processing is the transformation of the numeric variables (continuous, discrete etc) to categorical by creating bins. For example, is quite ofter to convert the age to the age group. Let’s see how we can easily do that in R.

We will consider a random variable from the Poisson distribution with parameter λ=20

library(dplyr)
# Generate 1000 observations from the Poisson distribution
# with lambda equal to 20
df<-data.frame(MyContinuous = rpois(1000,20))

# get the histogtam
hist(df\$MyContinuous)



## Create specific Bins

Let’s say that you want to create the following bins:

• Bin 1: (-inf, 15]
• Bin 2: (15,25]
• Bin 3: (25, inf)

We can easily do that using the cut command. Let’s start:

df<-df%>%mutate(MySpecificBins = cut(MyContinuous, breaks = c(-Inf,15,25,Inf)))



Let’s have a look at the counts of each bin.

df%>%group_by(MySpecificBins)%>%count()



Notice that you can define also you own labels within the cut function.

## Create Bins based on Quantiles

Let’s say that you want each bin to have the same number of observations, like for example 4 bins of an equal number of observations, i.e. 25% each. We can easily do it as follows:

numbers_of_bins = 4

df<-df%>%mutate(MyQuantileBins = cut(MyContinuous,
breaks = unique(quantile(MyContinuous,probs=seq.int(0,1, by=1/numbers_of_bins))),
include.lowest=TRUE))



We can check the MyQuantileBins if contain the same number of observations, and also to look at their ranges:

df%>%group_by(MyQuantileBins)%>%count()



Notice that in case that you want to split your continuous variable into bins of equal size you can also use the ntile function of the dplyr package, but it does not create labels of the bins based on the ranges.