Create groups based on the lowest and highest values in R?

[This article was first published on Data Analysis in R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The post Create groups based on the lowest and highest values in R? appeared first on finnstats.

Create groups based on the lowest and highest values in R, to divide an input vector into n buckets, use the ntile() function in the R dplyr package.

The basic syntax used by this function is as follows.

ntile(x, n)

where:

x: Input vector

n: Number of buckets

Note: The bucket sizes might vary by up to one.

Create groups based on the lowest and highest values in R

The practical application of this function is demonstrated in the examples that follow.

Example 1: Use ntile() with a Vector

The ntile() function can be used to divide a vector of 11 elements into 5 groups using the following code.

library(dplyr)

Let’s create a vector

x <- c(10, 13, 14, 26, 27, 18, 11, 12, 15, 20, 13)
x
[1] 10 13 14 26 27 18 11 12 15 20 13

and divide the vector into five buckets.

ntile(x, 5)
[1] 1 2 3 5 5 4 1 1 3 4 2

We can see from the result that each component of the original vector has been assigned to one of five bins.

The bucket with the fewest values is number 1, while the bucket with the biggest values is number 5.

For instance:

Bucket 1 is given the 10, 11, and 12 values with the lowest values.

The bucket with the highest values of 26 and 27 is number 5.

Example 2: Use ntile() with a Data Frame

Consider the following R data frame, which displays the points scored by different basketball players:

Let’s create a data frame

df <- data.frame(player=LETTERS[1:9],
                 points=c(102, 109, 57, 122, 824, 528, 125, 159, 195))

Now we can view the data frame

df
   player points
1      A    102
2      B    109
3      C     57
4      D    122
5      E    824
6      F    528
7      G    125
8      H    159
9      I    195

The following code demonstrates how to add a new column to the data frame using the ntile() function that places each player into one of three buckets based on their total number of points.

add a new column that sorts players according to their point totals.

df$bucket <- ntile(df$points, 3)

Let’s view the updated data frame

df
  player points bucket
1      A    102      1
2      B    109      1
3      C     57      1
4      D    122      2
5      E    824      3
6      F    528      3
7      G    125      2
8      H    159      2
9      I    195      3

Each player is given a value between 1 and 3 in the new bucket column.

Players who have the fewest points are assigned a value of 1, while those who have the most points are assigned a value of 3.

To read more visit Create groups based on the lowest and highest values in R?.

If you are interested to learn more about data science, you can find more articles here finnstats.

The post Create groups based on the lowest and highest values in R? appeared first on finnstats.

To leave a comment for the author, please follow the link and comment on their blog: Data Analysis in R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)