Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Logic will get you from A to B. Imagination will you take everywhere. (Einstein)

R can already take you everywhere. With it we can learn about the minutest particles and the largest galaxies. So, to celebrate the release of R 4.3 (“Already Tomorrow”, on April 21st, 2023), let’s reverse Einstein’s quote and take you from A to B with logic.

### Two modes of comparison

In R, almost all of your data will be stored as a vector. Even if your vector holds a single value it is still considered to be a vector by R. This is unlike many other languages, and getting comfortable “thinking for the whole vector” can gain you efficiencies from several viewpoints. Your code will be more concise and it may even run quicker, when compared with an iterative approach to the same problem.

```1:10 # A vector of integers
##    1  2  3  4  5  6  7  8  9 10
is.vector(1:10)
##  TRUE
sum(1:10) # A vectorised computation
##  55

integer(0) # An empty vector of integers
## integer(0)
1L # A single integer, stored as a vector
##  1
```

But the conciseness that R’s vectorised operations provide may trip you up unexpectedly. A typical case is when you think you are working with a scalar (a length-1 vector) but you are actually working with an empty or multivalued vector.

The `logical` values in R (`TRUE`, `FALSE`) are a little bit special. A vector of logical values might be used to represent some quality in a dataset, for example, to select those rows of a dataset that are to be kept in `dplyr::filter()`.

```library("tidyverse")
## # A tibble: 6 × 10
##   carat cut       color clarity depth table price     x     y     z
##   <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1  0.23 Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43
## 2  0.21 Premium   E     SI1      59.8    61   326  3.89  3.84  2.31
## 3  0.23 Good      E     VS1      56.9    65   327  4.05  4.07  2.31
## 4  0.29 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63
## 5  0.31 Good      J     SI2      63.3    58   335  4.34  4.35  2.75
## 6  0.24 Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48

head(diamonds\$cut == "Ideal") # A logical vector
##   TRUE FALSE FALSE FALSE FALSE FALSE
filter(diamonds, cut == "Ideal") # Subsetting a data-frame using a logical vector
## # A tibble: 21,551 × 10
##    carat cut   color clarity depth table price     x     y     z
##    <dbl> <ord> <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
##  1  0.23 Ideal E     SI2      61.5    55   326  3.95  3.98  2.43
##  2  0.23 Ideal J     VS1      62.8    56   340  3.93  3.9   2.46
##  3  0.31 Ideal J     SI2      62.2    54   344  4.35  4.37  2.71
##  4  0.3  Ideal I     SI2      62      54   348  4.31  4.34  2.68
##  5  0.33 Ideal I     SI2      61.8    55   403  4.49  4.51  2.78
##  6  0.33 Ideal I     SI2      61.2    56   403  4.49  4.5   2.75
##  7  0.33 Ideal J     SI1      61.1    56   403  4.49  4.55  2.76
##  8  0.23 Ideal G     VS1      61.9    54   404  3.93  3.95  2.44
##  9  0.32 Ideal I     SI1      60.9    55   404  4.45  4.48  2.72
## 10  0.3  Ideal I     SI2      61      59   405  4.3   4.33  2.63
## # ℹ 21,541 more rows

##  FALSE FALSE FALSE FALSE  TRUE FALSE
filter(diamonds, carat > 0.3)
## # A tibble: 49,737 × 10
##    carat cut       color clarity depth table price     x     y     z
##    <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
##  1  0.31 Good      J     SI2      63.3    58   335  4.34  4.35  2.75
##  2  0.31 Ideal     J     SI2      62.2    54   344  4.35  4.37  2.71
##  3  0.32 Premium   E     I1       60.9    58   345  4.38  4.42  2.68
##  4  0.31 Very Good J     SI1      59.4    62   353  4.39  4.43  2.62
##  5  0.31 Very Good J     SI1      58.1    62   353  4.44  4.47  2.59
##  6  0.31 Good      H     SI1      64      54   402  4.29  4.31  2.75
##  7  0.33 Ideal     I     SI2      61.8    55   403  4.49  4.51  2.78
##  8  0.33 Ideal     I     SI2      61.2    56   403  4.49  4.5   2.75
##  9  0.33 Ideal     J     SI1      61.1    56   403  4.49  4.55  2.76
## 10  0.32 Good      H     SI2      63.1    56   403  4.34  4.37  2.75
## # ℹ 49,727 more rows
```

But there are places where you use logical values, where it would make no sense (and could potentially be dangerous) to use a multivalued logical vector. We use `if (...) {}` and `while (...) {}` statements for flow control in R. The conditional expression in these statements (the `...` in `if (...) {}`) should always evaluate to a logical scalar: either `TRUE` or `FALSE`.

When R 4.2.0 was released, stricter guarantees were placed on the length of these conditional expressions. We mentioned this in an earlier blog post. So in addition to getting an error when the conditional is empty, we now get an error when the conditional is too long:

```# Comparison with an empty logical vector:
if (logical(0)) {
print("I didn't expect to get here")
}
## Error in if (logical(0)) {: argument is of length zero

# Comparison with an over-sized logical vector:
numbers <- c(1, 3, 5, 6)

print(numbers %% 2 == 0) # Determine if even
##  FALSE FALSE FALSE  TRUE

if (numbers %% 2 == 0) {
print("Should we ever be allowed to get here?")
}
## Error in if (numbers%%2 == 0) {: the condition has length > 1
```

Previously, R would use the first entry in a non-scalar conditional vector to decide whether to enter the `if` or `while` block.

Data comes in all shapes and sizes. It can often be difficult to know where to start. Whatever your problem, Jumping Rivers can help.

### Strictly comparing

So, we have two main ways of using a logical vector, one of which now requires that the vector is a scalar.

Another place where it is really important to know the length of your vectors is when combining logical values together.

R has a number of ways to combine logical values together that build on the AND and OR operations in Boolean algebra:

• `all` and `any` for combining the values in a single vector (are `all` of the values TRUE; are `any` of the values TRUE)
• `&`, `&&` (representing “AND”), `|`, and `||` (for “OR”) for combining two different vectors
```is_april = TRUE
is_r_released = TRUE

# Logical AND within a single vector
##  FALSE

# Logical OR within a single vector
##  TRUE

# Logical AND between vectors
is_april & is_r_released
##  TRUE
##  FALSE

# Logical OR between vectors
is_april | is_r_released
##  TRUE
##  TRUE
```

For scalars, there’s no difference between the single-character operators (`&`, `|`) and the two-character operators (`&&`, `||`). So why have a pair of operators for each concept?

• `&&` and `||` are intended for use solely with scalars, they return a single logical value.
• `&` and `|` work with multivalued vectors, they return a vector whose length matches their input arguments.

Since they always return a scalar logical, you should use `&&` and `||` in your if/while conditional expressions (when needed). If an `&` or `|` is used, you may end up with a non-scalar vector inside `if (...) {}` and R will throw an error.

To illustrate the difference between the scalar operators and vectorised operators, here’s an example:

```x = c(TRUE, TRUE, FALSE, FALSE)
y = c(TRUE, FALSE, TRUE, FALSE)
```

The vectorised operators apply AND/OR on matched pairs of elements:

```x & y # c(x && y, x && y, ...)
##   TRUE FALSE FALSE FALSE

x | y # c(x || y, x || y, ...)
##   TRUE  TRUE  TRUE FALSE
```

In R 4.2.0, a warning is thrown when a non-scalar input is passed to the scalar-operators. But, a scalar logical is returned (here, the result of `x && y`). In earlier versions of R, no warning was printed.

```# R 4.2
x && y
 TRUE
Warning messages:
1: In x && y : 'length(x) = 4 > 1' in coercion to 'logical(1)'
2: In x && y : 'length(x) = 4 > 1' in coercion to 'logical(1)'
```

This could lead to hidden bugs. For example, if you used this code in an `if` conditional, a warning would be printed when a non-scalar vector was used but the code would continue happily:

```# R 4.2
if (x && y) {
print("The world can't end today...")
}
 "The world can't end today..."
Warning messages:
1: In x && y : 'length(x) = 4 > 1' in coercion to 'logical(1)'
2: In x && y : 'length(x) = 4 > 1' in coercion to 'logical(1)'
```

In R 4.3.0, this warning has been elevated to an error and no value is returned:

```# R 4.3
x && y
Error in x && y : 'length = 4' in coercion to 'logical(1)'
```

This more strict version of the scalar comparison operators will help catch those bugs where you didn’t realise a logical variable could contain more than one entry.

To check whether the strict comparison operators will affect your existing code, before upgrading to R 4.3.0, you can set an environment variable before running it:

```# In R:
Sys.setenv("_R_CHECK_LENGTH_1_LOGIC2" = TRUE)
```

Whether you want to start from scratch, or improve your skills, Jumping Rivers has a training course for you.

### A more logical flow

Where else do we work with scalars in R? Many functions expect certain arguments to be scalars. For example, the `seq()` function complains with non-scalar arguments:

```seq(from = 1:3, to = 4)
## Error in seq.default(from = 1:3, to = 4): 'from' must be of length 1

seq(from = 1, to = 4:5)
## Error in seq.default(from = 1, to = 4:5): 'to' must be of length 1
```

There are several other places where R will throw an error if we provide a value that is of the wrong size:

```a_data_frame[[column_index]] # column_index must be a scalar
a_matrix[rows, cols] = value # value must match the size of the replaced element(s)
```

There are other places where R will throw a warning, and try to gracefully handle values that are of an unexpected size:

```# R's recycling rules are used to match the size of the vector input
c(1, 3, 5) * c(2, 3) # c(1 * 2, 3 * 3, 5 * 2)
## Warning in c(1, 3, 5) * c(2, 3): longer object length is not a multiple of
## shorter object length
##   2  9 10

# The smaller vector was recycled to match the size of the larger
# c(1, 3, 5) * c(2, 3, 2)
```

An interesting case is the `:` operator, which like `seq()`, can be used to create sequences of numbers.

```3:5
##  3 4 5
```

If we provide a non-scalar on either side of the operator, R will warn us:

```# R 4.2
(1:2) : 5
 1 2 3 4 5
Warning message:
In (1:2):5 : numerical expression has 2 elements: only the first used

# R 4.2
1 : (4:6)
 1 2 3 4
Warning message:
In 1:(4:6) : numerical expression has 3 elements: only the first used
```

Now, because the output should be a single sequence, R has to pick a specific value for the start- and the end-point of that sequence from the arguments provided. It uses the first entry in each argument. So,

• `(1:2) : 5` is equivalent to `1:5`; and
• `1 : (4:6)` is equivalent to `1:4`.

If your code is providing non-scalar arguments to `:`, there may be a bug in your code or the packages that it depends upon. R 4.3.0 has introduced a more strict setting, which will catch the use of non-scalar values when constructing sequences with the `:` operator.

Much like with the stricter logic comparisons described above, the R developers have introduced this as an optional setting. After setting the environment variable `_R_CHECK_LENGTH_COLON_` to a true value, R will throw an error whenever an oversized argument is passed into `a:b`.

```# R 4.3
# Without the check enabled:
(1:2) : 5
 1 2 3 4 5
Warning message:
In (1:2):5 : numerical expression has 2 elements: only the first used

# With the strict check enabled:
Sys.setenv("_R_CHECK_LENGTH_COLON_" = TRUE)
(1:2) : 5
Error in (1:2):5 : numerical expression has length > 1
```

### And finally: Extracting from a pipe

Have you started using the native pipe yet? In our blog post to celebrate the release of R 4.2.0, we showed this example:

```mtcars |> lm(mpg ~ disp, data = _)
##
## Call:
## lm(formula = mpg ~ disp, data = mtcars)
##
## Coefficients:
## (Intercept)         disp
##    29.59985     -0.04122
```

Here the pipe `|>` passes the value on it’s left-hand side into the function on the right. By default that value will be used as the first argument to the right-hand function. But when an underscore is present, the piped-in value will replace that underscore. So the above is equivalent to:

```lm(mpg ~ disp, data = mtcars)
##
## Call:
## lm(formula = mpg ~ disp, data = mtcars)
##
## Coefficients:
## (Intercept)         disp
##    29.59985     -0.04122
```

What if you want to extract values that are output by a pipeline? For example, if you want the `coef` entry from the linear model above. One way would be to store the results in a variable and extract the `coef` from that:

```model = mtcars |> lm(mpg ~ disp, data = _)
model\$coef
## (Intercept)        disp
## 29.59985476 -0.04121512
```

Or you could wrap the pipeline in parentheses:

```(
mtcars |> lm(mpg ~ disp, data = _)
)\$coef
## (Intercept)        disp
## 29.59985476 -0.04121512
```

R 4.3.0 provides a much neater solution, where the underscore `_` can be used to refer to the final value from a pipeline. This can make your code much neater:

```mtcars |> lm(mpg ~ disp, data = _) |> _\$coef
(Intercept)        disp
29.59985476 -0.04121512
```

To take away the pain of installing the latest development version of R, you can use docker. To use the `devel` version of R, you can use the following commands:

```docker pull rstudio/r-base:devel-jammy
docker run --rm -it rstudio/r-base:devel-jammy
```

See the `r-docker` project for more details.