# Where for (loop) ARt Thou?

**rstats on Irregularly Scheduled Programming**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I’ve long been interested in exactly *how* R works – not quite enough for me to learn *all* the
internals, but I was surprised that I could not find a clear guide towards exactly how vectorization works at the deepest level.

Let’s say we want to add two vectors which we’ve defined as `x`

and `y`

x <- c(2, 4, 6) y <- c(1, 3, 2)

One way to do this (the verbose, elementwise way) would be to add each pair of elements

c(x[1] + y[1], x[2] + y[2], x[3] + y[3]) ## [1] 3 7 8

but if you are familiar with not repeating yourself, you might write this in a loop. Best practice involves pre-filling the result to the correct size

res <- c(NA, NA, NA) for (i in 1:3) { res[i] <- x[i] + y[i] } res ## [1] 3 7 8

There, we wrote a `for`

loop.

Now, if you’ve ever seen R tutorials or discussions, you’ve probably had it drilled into you that
this is “bad” and should be replaced with something else. One of those options is an `apply`

family
function

plus <- function(i) { x[i] + y[i] } sapply(1:3, plus) ## [1] 3 7 8

or something ‘tidier’

purrr::map_dbl(1:3, plus) ## [1] 3 7 8

(yes, yes, these functions are ‘bad’ because they don’t take `x`

or `y`

as arguments) but this stil
performs the operation elementwise. If you’ve seen even more R tutorais/discussions you’ve
probably been seen that *vectorization* is very handy - The R function `+`

knows what to do with objects that
aren’t just a single value, and does what you might expect

x + y ## [1] 3 7 8

Now, if you’ve really read a lot about R, you’ll know that ‘under the hood’ a `for`

-loop
is involved in every one of these, but it’s “lower down”, “at the C level”. Jenny Bryan makes the
point that “Of course someone has to write loops / It doesn’t have to be you” and for this reason,
vectorization in R is of great benefit.

So, there *is* a loop, but where exactly does that happen?

At some point, the computer needs to add the elements of `x`

to the elements of `y`

, and the simplest versions of this
happens one element at a time, in a loop. There’s a big sidetrack here about SIMD
which I’ll try to avoid, but I will mention that the Microsoft fork of R (artist, formerly known as Revolution R) running on Intel chips can do SIMD in MKL.

So, let’s start at the operator.

`+` ## function (e1, e2) .Primitive("+")

Digging into primitives is a little tricky, but `{pryr}`

can help

pryr::show_c_source(.Primitive("+")) + is implemented by do_arith with op = PLUSOP

We can browse a copy of the source for `do_arith`

(in arithmetic.c) here where we see some
logic paths for scalars and vectors. Let’s assume we’re working with our example which has `length(x) == length(y) > 1`

. With two non-scalar arguments

if !IS_SCALAR and argc == length(arg) == 2

This leads us to call `R_binary`

Depending on the class of the arguments, we need to call different functions, but for the sake of our example let’s say we have non-integer real numbers so we fork to `real_binary`

. This takes a `code`

argument for which type of operation we’re performing, and in our case it’s `PLUSOP`

(noted above). There’s a `case`

branch for this in which case, provided the arguments are of the same length (`n1 == n2`

) we call

R_ITERATE_CHECK(NINTERRUPT, n, i, da[i] = dx[i] + dy[i];);

That’s starting to look a lot like a loop - there’s an iterator `i`

and we’re going to call another function.

This jumps us over to a different file where we see `LOOP_WITH_INTERRUPT_CHECK`

definitely performs some sort of loop. This takes the body above and the argument `LOOP_ITERATE_CORE`

which is finally the actual loop!

#define R_ITERATE_CORE(n, i, loop_body) do { \ for (; i < n; ++i) { loop_body } \ } while (0)

so, that’s where the *actual* loop in a vectorized R call happens! *ALL* that sits behind the innocent-looking `+`

.

That was thoroughly satisfying, but I did originally have in mind comparing R to another language - one where loops aren’t frowned upon because of performance, but rather encouraged… How do Julia loops differ?

Julia is not a vectorized language per se, but it has a neat ability to “vectorize” any operation, though in Julia syntax it’s “broadcasting”.

Simple addition can combine scalar values

3+4 ## 7

Julia actually has scalar values (in R, even a single value is just a vector of length 1) so a single value could be

typeof(3) ## Int64

whereas several values need to be an `Array`

, even if it only has 1 dimension

Vector{Int64}([1, 2, 3]) ## 3-element Array{Int64,1}: ## 1 ## 2 ## 3

Trying to add two `Arrays`

does work

[1, 2, 3] + [4, 5, 6] ## 3-element Array{Int64,1}: ## 5 ## 7 ## 9

but only because a specific method has been written for this case, i.e.

methods(+, (Array, Array)) ## # 1 method for generic function "+": ## [1] +(A::Array, Bs::Array...) in Base at arraymath.jl:43

One thing I particularly like is that we can see *exactly* which method was called using the `@which`

macro

@which [1, 2, 3, 4] + [1, 2, 3, 4] +(A::Array, Bs::Array...) in Base at arraymath.jl:43

something that I really wish was easier to do in R. The `@edit`

macro even jumps us right into the actual code for this dispatched call.

This ‘add vectors’ problem *can* be solved through broadcasting, which performs an operation elementwise

[1, 2, 3] .+ [4, 5, 6] ## 3-element Array{Int64,1}: ## 5 ## 7 ## 9

The fun fact about this I recently learned was that broadcasting works on *any* operation, even if that’s the pipe itself

["a", "list", "of", "strings"] .|> [uppercase, reverse, titlecase, length] ## 4-element Array{Any,1}: ## "A" ## "tsil" ## "Of" ## 7

Back to our loops, the method for `+`

on two `Array`

s points us to arraymath.jl (linked to current relevant line) which contains

function +(A::Array, Bs::Array...) for B in Bs promote_shape(A, B) # check size compatibility end broadcast_preserving_zero_d(+, A, Bs...) end

The last part of that is the meaningful part, and that leads to `Broadcast.broadcast_preserving_zero_d`

.

This starts to get out of my depth, but essentially

@inline function broadcast_preserving_zero_d(f, As...) bc = broadcasted(f, As...) r = materialize(bc) return length(axes(bc)) == 0 ? fill!(similar(bc, typeof(r)), r) : r end @inline broadcast(f, t::NTuple{N,Any}, ts::Vararg{NTuple{N,Any}}) where {N} = map(f, t, ts...)

involves a `map`

operation to achieve the broadcasting.

Perhaps that’s a problem to tackle when I’m better at digging through Julia.

As always, comments, suggestions, and anything else welcome!

##
`devtools::session_info()`

## ─ Session info ─────────────────────────────────────────────────────────────── ## setting value ## version R version 4.1.2 (2021-11-01) ## os Pop!_OS 21.04 ## system x86_64, linux-gnu ## ui X11 ## language en_AU:en ## collate en_AU.UTF-8 ## ctype en_AU.UTF-8 ## tz Australia/Adelaide ## date 2022-04-22 ## ## ─ Packages ─────────────────────────────────────────────────────────────────── ## package * version date lib source ## blogdown 1.8 2022-02-16 [1] CRAN (R 4.1.2) ## bookdown 0.24 2021-09-02 [1] CRAN (R 4.1.2) ## brio 1.1.1 2021-01-20 [3] CRAN (R 4.0.3) ## bslib 0.3.1 2021-10-06 [1] CRAN (R 4.1.2) ## cachem 1.0.3 2021-02-04 [3] CRAN (R 4.0.3) ## callr 3.7.0 2021-04-20 [1] CRAN (R 4.1.2) ## cli 3.2.0 2022-02-14 [1] CRAN (R 4.1.2) ## crayon 1.5.0 2022-02-14 [1] CRAN (R 4.1.2) ## desc 1.4.1 2022-03-06 [1] CRAN (R 4.1.2) ## devtools 2.4.3 2021-11-30 [1] CRAN (R 4.1.2) ## digest 0.6.27 2020-10-24 [3] CRAN (R 4.0.3) ## ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.1.2) ## evaluate 0.14 2019-05-28 [3] CRAN (R 4.0.1) ## fastmap 1.1.0 2021-01-25 [3] CRAN (R 4.0.3) ## fs 1.5.0 2020-07-31 [3] CRAN (R 4.0.2) ## glue 1.6.1 2022-01-22 [1] CRAN (R 4.1.2) ## htmltools 0.5.2 2021-08-25 [1] CRAN (R 4.1.2) ## jquerylib 0.1.4 2021-04-26 [1] CRAN (R 4.1.2) ## jsonlite 1.7.2 2020-12-09 [3] CRAN (R 4.0.3) ## JuliaCall * 0.17.4 2021-05-16 [1] CRAN (R 4.1.2) ## knitr 1.37 2021-12-16 [1] CRAN (R 4.1.2) ## lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.1.2) ## magrittr 2.0.1 2020-11-17 [3] CRAN (R 4.0.3) ## memoise 2.0.0 2021-01-26 [3] CRAN (R 4.0.3) ## pkgbuild 1.2.0 2020-12-15 [3] CRAN (R 4.0.3) ## pkgload 1.2.4 2021-11-30 [1] CRAN (R 4.1.2) ## prettyunits 1.1.1 2020-01-24 [3] CRAN (R 4.0.1) ## processx 3.5.2 2021-04-30 [1] CRAN (R 4.1.2) ## ps 1.5.0 2020-12-05 [3] CRAN (R 4.0.3) ## purrr 0.3.4 2020-04-17 [3] CRAN (R 4.0.1) ## R6 2.5.0 2020-10-28 [3] CRAN (R 4.0.2) ## Rcpp 1.0.6 2021-01-15 [3] CRAN (R 4.0.3) ## remotes 2.4.2 2021-11-30 [1] CRAN (R 4.1.2) ## rlang 1.0.1 2022-02-03 [1] CRAN (R 4.1.2) ## rmarkdown 2.13 2022-03-10 [1] CRAN (R 4.1.2) ## rprojroot 2.0.2 2020-11-15 [3] CRAN (R 4.0.3) ## rstudioapi 0.13 2020-11-12 [3] CRAN (R 4.0.3) ## sass 0.4.0 2021-05-12 [1] CRAN (R 4.1.2) ## sessioninfo 1.1.1 2018-11-05 [3] CRAN (R 4.0.1) ## stringi 1.5.3 2020-09-09 [3] CRAN (R 4.0.2) ## stringr 1.4.0 2019-02-10 [3] CRAN (R 4.0.1) ## testthat 3.1.2 2022-01-20 [1] CRAN (R 4.1.2) ## usethis 2.1.5 2021-12-09 [1] CRAN (R 4.1.2) ## withr 2.5.0 2022-03-03 [1] CRAN (R 4.1.2) ## xfun 0.30 2022-03-02 [1] CRAN (R 4.1.2) ## yaml 2.2.1 2020-02-01 [3] CRAN (R 4.0.1) ## ## [1] /home/jono/R/x86_64-pc-linux-gnu-library/4.1 ## [2] /usr/local/lib/R/site-library ## [3] /usr/lib/R/site-library ## [4] /usr/lib/R/library

**leave a comment**for the author, please follow the link and comment on their blog:

**rstats on Irregularly Scheduled Programming**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.