Infamous Inf – Part I

[This article was first published on R – R-BAR, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R’s Inf keyword – Have you’ve ever wondered what to do with it? If so, this is the first in series of posts that explore how we can exploit the keyword’s interesting properties to get the answers we need and improve code robustness.

For those unfamiliar with R’s Inf keyword, it is defined as a positive or negative number divided by zero yielding positive or negative infinity, respectively.

c(plus_inf = 1/0, minus_inf = -1/0)

# plus_inf minus_inf 
#      Inf      -Inf

Sounds very theoretical. So how we can make practical use of infinity in R? In this first post, we’ll be discussing how Inf can make binning data with cut() a more robust process.

Inf with cut()

Suppose you want to bin the following set of numbers into five discrete levels. Take note of the extreme values on the positive and negative end of the vector.

{-100,-1,1,1,1,2,2,2,5,5,5,10,10,10,100,1000}

You might do something like,

numbers <- c(-100, -1,1,1,1,2,2,2,5,5,5,10,10,10,100,1000)
bins <- cut(numbers, breaks = c(-101,-2,2,5,11, 1001))
xtabs(~bins)

bins Freq
(-101,-2] 1
(-2,2] 7
(2,5] 3
(5,11] 3
(11,1e+03] 2

The above code and data explicitly shows how we might bin the data including the extreme values -100, 100, and 1000. But with real data, extreme values can change when new data is added. For example, what if we received the following new pieces of data: -102 and 1002? Below, we’ve added them to our list and re-binned them as before to see what happens.

numbers <- c(-102, -100, -1,1,1,1,2,2,2,5,5,5,10,10,10,100,1000,1002)
  bins <- cut(numbers, breaks = c(-101,-2,2,5,11, 1001))
  xtabs(~bins)
bins Freq
(-101,-2] 1
(-2,2] 7
(2,5] 3
(5,11] 3
(11,1e+03] 2

Oops! The new data was not included in the binning. We need to manually expand our limits, or we could perhaps do something clever with the max and min functions.

Alternatively, we can solve this problem with R’s \pm Inf keyword by placing \pm Inf in the breaks argument of the cut function (see line 2 below)

numbers <- c(-102, -100, -1,1,1,1,2,2,2,5,5,5,10,10,10,100,1000,1002)
bins <- cut(numbers, breaks = c(-Inf,-2,2,5,11, Inf))
xtabs(~bins)
bins Freq
(-Inf,-2] 2
(-2,2] 7
(2,5] 3
(5,11] 3
(11, Inf] 3

Now, when new data is collected that is less than -2 or greater than 11 there will be a bin to catch it, no mater how big or how small.

To leave a comment for the author, please follow the link and comment on their blog: R – R-BAR.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)