**Life in Code**, and kindly contributed to R-bloggers)

### In search of bin counts

I look at histograms and density functions of my data in R on a regular basis. I have some idea of the algorithms behind these, but I’ve never had any reason to go under the hood until now. Lately, I’ve been looking using the bin counts for things like Shannon entropy ( in the very nice entropy package. I figured that binning and counting data would either be supported via a native, dedicated R package, or quite simple to code. Not finding the former (

myhist = function(x, dig=3) { x=trunc(x, digits=dig); ## x=round(x, digits=dig); aa = bb = seq(0,1,1/10^dig); for (ii in 1:length(aa)) { aa[ii] = sum(x==aa[ii]) }; return(cbind(bin=bb, dens=aa/length(x))) } ## random variates test = sort(runif(1e4)) get1 = myhist(test)

### Trouble in paradise

Truncate the data to a specified precision, and count how many are in each bin. Well, first I tried

### Dear Google…

An hour of irritation and confusion later, I ask google and, small wonder, the second search result links to the ash package that contains said tool. And it runs somewhere between 100 and 1,000 times faster. It doesn’t return the bin boundaries by default, but it’s good enough for a quick-and-dirty empirical probability mass distribution.

To be fair, there’s something to be said for cooking up a simple solution to a simple problem, and then realizing that, for one reason or another, the problem isn’t quite as simple as one first thought. On the other hand, sometimes we just want answers. When that’s the case, asking google is a pretty good bet.

## their method require(ash) get2 = bin1(test, c(0,1), 1e3+1)$nc

**leave a comment**for the author, please follow the link and comment on their blog:

**Life in Code**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...