(This article was first published on

**One Tip Per Day**, and kindly contributed to R-bloggers)Say I have a list of values, and I cut them by some break points, how do I know the number of values in each interval?

We know cut() function in R works for the purpose. For example,

tx0 <- c(9, 4, 6, 5, 3, 10, 5, 3, 5)

x <- rep(0:8, tx0)

> x

[1] 0 0 0 0 0 0 0 0 0 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 4 4 4 5 5 5 5 5 5 5 5 5 5 6

[39] 6 6 6 6 7 7 7 8 8 8 8 8

> table( cut(x, b = 8))

(-0.008,0.994] (0.994,2] (2,3] (3,4] (4,5]

9 4 6 5 13

(5,6] (6,7.01] (7.01,8.01]

5 3 5

In the cut() document, there is a note, saying

Instead of`table(cut(x, br))`

,`hist(x, br, plot = FALSE)`

is more efficient and less memory hungry. Instead of`cut(*, labels = FALSE)`

,`findInterval()`

is more efficient.

But if you try as it said, you will the counts returned look different:

> hist(x, 8, plot=F)

$breaks

[1] 0 1 2 3 4 5 6 7 8

$counts

[1] 13 6 5 3 10 5 3 5

What's wrong?

Nothing is wrong. Just missed argument. "When`breaks`

is specified as a single number, the range of the data is divided into`breaks`

pieces of equal length, and then the outer limits are moved away by 0.1% of the range to ensure that the extreme values both fall within the break intervals. (If`x`

is a constant vector, equal-length intervals are created, one of which includes the single value.)"

The conclusion is:

when breaks is a vector, table( cut(x, b = 0:8,include.lowest = T)) is equal to hist(x, breaks=0:8, plot=F)$counts; when breaks is a single number, it's not.

To

**leave a comment**for the author, please follow the link and comment on their blog:**One Tip Per Day**.R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...