Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

by Antonio Sánchez

Learning to code can be quite hard. Apart from the difficulties of learning a new language, following a book can be quite boring. From my point of view, one of the bests ways to become a good programmer is choosing small and funny experiments oriented to train specific techniques of programming. This is what I usually do in my blog Fronkonstin. In this tutorial, we will learn to combine C++ with R to create efficient loops. We will also learn interesting things about ggplot2, a very important package for data visualization. The excuse to do all this is to create beautiful images using a mathematical concept called Strange Attractors, that we will discuss briefly also. Combining art and coding is often called generative art, so be ready because after following this tutorial you can discover a new use of R to fall for as I am. Here we go!

## Introduction

Iteration is an indispensable technique in every programming language. Mastering it will help you to produce robust, fast, concise and easy to understand code. Iterating is repeating the same task over and over and the first way to do that efficiently is writing a loop, which is a way to resume many lines of code in just a few of them. In R, you can do this using the for function. For example, this loop calculates the mean of each column of a data frame called df:

library(tidyverse)

n <- 1000 # Rows of data frame

# df with 4 columns and n rows sampled from N(0,1)
df <- tibble(
a = rnorm(n),
b = rnorm(n),
c = rnorm(n),
d = rnorm(n)
)

output <- vector("double", ncol(df)) # Empty output
# For loop
for (i in 1:ncol(df)) {
output[[i]] <- mean(df[[i]])
}
output
## [1] -0.0069680221  0.0247380568  0.0014428622 -0.0002675645

In the previous example, we repeated 4 times the task (which is the piece of code inside the curly brackets). Another way to do iterations in R, sometimes more efficient than a for loop, is by using the apply family. This is how the previous loop can be rewritten using the apply function:

output <- apply(df, 2, mean)
output

These approaches have limitations. For example, if you want to use the result of an iteration as the input of the next one, you can not use apply. Another one is that sometimes R is not enough fast, which makes the for function quite inefficient. When you face this kind of situations, a good option is combinnig C++ and R. To illustrate this, let’s briefly introduce a stunning mathematical creature: the strange attractors.

## What are strange attractors?

In mathematics, a dynamical system is a set of functions which describes the time dependence of a point in a geometrical space. An attractor is a set of numerical values towards a dynamical system tend to evolve in discrete time regardless of their starting conditions. Simple attractors can be fixed points, sets of points or limit cycles. More interesting attractors are strange, chaotic or itinerant attractors, which span an array of possible states in which a dynamical system can roam around without repeating itself.

One of the best-known examples of strange attractors was discovered by Kiyohiro Ikeda 1 and bears his name. It is defined by the next equations:

$x_{n+1}=1+u(x_{n}\cos t_{n}-y_{n}\sin t_{n}) \\ y_{n+1}=u(x_{n}\sin t_{n}+y_{n}\cos t_{n})$

with $$u\ge0.6$$ and

$t_{n}=0.4-{\frac {6}{1+x_{n}^{2}+y_{n}^{2}}}$

In this example, our imaginary point moves in a 2D dimensional space, since we just need (x, y) coordinates to describe its movement. In the previous system, time is represented by n. As you can see, the Ikeda’s system only depends on one parameter, called u. The location of the point at n depends only on its location at n-1. For example, this image 2 shows the trajectories of 2000 random points following this previous equation with $$u = 0.918$$:

All points, regardless of their starting place, tend to end following the same path, represented in bright white color. Another interesting property of strange attractors is that points that are very close at some instant, can differ significantly in the next step. This is why strange attractors are also known as chaotic attractors.

Today we will experiment with Clifford attractors, which are defined by these equations:

$x_{n+1} = sin(a \: y_{n}) + c \: cos(a \: x_{n}) \\ y_{n+1} = sin(b \: x_{n}) + d \: cos(b \: y_{n})$

These attractors were discovered by Clifford Pickover, an american scientific divulgator and depend on four parameters; a, b, c and d.

## Populating attractors: an inefficient approach

To represent the trajectory of a particular point, we need to store in a data frame its locations at every single (discrete) moment of time. For example, this is how we can do it using a for loop:

# Parameters setting
a <- -1.25
b <- -1.25
c <- -1.82
d <- -1.91

# Number of jumps
n <- 5

# Initialization of df to store locations of point
df <- data.frame(x = double(n+1), y = double(n+1))

# The point starts at (0,0)
df[1, "x"] <- 0
df[1, "y"] <- 0

# For loop to store locations
for (i in 1:n) {
df[i+1, "x"] <- sin(a*df[i, "y"])+c*cos(a*df[i, "x"])
df[i+1, "y"] <- sin(b*df[i, "x"])+d*cos(b*df[i, "y"])
}

df
##            x          y
## 1  0.0000000  0.0000000
## 2 -1.8200000 -1.9100000
## 3  1.8629450  2.1543131
## 4  0.8172482  0.9946399
## 5 -1.8969558 -1.4673201
## 6  2.2716156  1.1936322

As you can see, our point started at (0,0) and jumped 5 times (the number of jumps is represented by n). You can try to change n from 5 to 10000000 and see what happens (disclaimer: be patient).

A better approach to do it is using purrr, a functional programming package which provides useful tools for iterating through lists and vectors, generalizing code and removing programming redundancies. The purrr tools work in combination with functions, lists and vectors and results in code that is consistent and concise. This can be a purrr approach to our problem:

library(purrrlyr)

n <- 5

# Initialization of our data frame
df <- tibble(x = numeric(n+1),
y = numeric(n+1))

# Convert our data frame into a list
by_row(df, function(v) list(v)[[1L]], .collate = "list")$.out -> df # This function computes current location depending of previous one f <- function(j, k, a, b, c, d) { tibble( x = sin(a*j$y)+c*cos(a*j$x), y = sin(b*j$x)+d*cos(b*j$y) ) } # We apply accumulate on our list to store all steps and convert into a data frame accumulate(df, f, a, b, c, d) %>% map_df(~ .x) -> df df ## # A tibble: 6 x 2 ## x y ## <dbl> <dbl> ## 1 0 0 ## 2 -1.82 -1.91 ## 3 1.86 2.15 ## 4 0.817 0.995 ## 5 -1.90 -1.47 ## 6 2.27 1.19 Even this approach is not efficient since the accumulate function must be applied on an existing object (the df in our case) ## Combining C++ and R: the good way The next paragraph, extracted from Hadley Wickham’s Advanced R, explains the motivations of combining C++ and R: Rcpp makes it very simple to connect C++ to R and provides a clean, approachable API that lets you write high-performance code, insulated from R’s complex C API. Typical bottlenecks that C++ can address include: • Loops that can’t be easily vectorised because subsequent iterations depend on previous ones. • Recursive functions, or problems which involve calling functions millions of times. The overhead of calling a function in C++ is much lower than in R. • Problems that require advanced data structures and algorithms that R doesn’t provide. Through the standard template library (STL), C++ has efficient implementations of many important data structures, from ordered maps to double-ended queues. This code populates our data frame using C++: # Load in libaries library(Rcpp) library(tidyverse) # Here comes the C++ code cppFunction('DataFrame createTrajectory(int n, double x0, double y0, double a, double b, double c, double d) { // create the columns NumericVector x(n); NumericVector y(n); x[0]=x0; y[0]=y0; for(int i = 1; i < n; ++i) { x[i] = sin(a*y[i-1])+c*cos(a*x[i-1]); y[i] = sin(b*x[i-1])+d*cos(b*y[i-1]); } // return a new data frame return DataFrame::create(_["x"]= x, _["y"]= y); } ') a <- -1.25 b <- -1.25 c <- -1.82 d <- -1.91 # Call the function starting at (0,0) to store 10M jumps df <- createTrajectory(10000000, 0, 0, a, b, c, d) head(df) ## x y ## 1 0.0000000 0.0000000 ## 2 -1.8200000 -1.9100000 ## 3 1.8629450 2.1543131 ## 4 0.8172482 0.9946399 ## 5 -1.8969558 -1.4673201 ## 6 2.2716156 1.1936322 Note that, apart from Rcpp, we also loaded the tidyverse package. This is because we will use some of its features, as dplyr or ggplot2, later. The function createTrajectory is written in C++ and stores the itinerary of an initial point. You can try it to see that It’s extremely fast. In the previous example, I populated on the fly our data frame with 10 million points starting at (0,0) in less than 5 seconds in a conventional laptop. ## Rendering the plot with ggplot Once we have built our data frame, it’s time to represent it with ggplot. Making graphics is a necessary skill and for this ggplot is an amazing tool, which by the way, I encourage you to learn. Usually, a good visualization is the best way to show your results. R is an extraordinary tool for visualizing data and one of the most important libraries to do it is ggplot2: it is powerful, flexible and is updated to include functionalities regularly. Although its syntax may seem a bit strange at first, here are four important concepts from my point of view to help you out with understanding it: • It’s truly important to have clear the type of graph you want to plot: for each one there’s a different geometry: geom_point to represent points, geom_line to represent time series, geom_bar to do bar chars … Have a look to ggplot2 package, there are many other types. • Each of those geometries needs to define what data is to be used and how. For example, as you will see in our graph, we placed on the x-axis the x column of our data frame df and y on the y-axis, so we did aes(x = x, y = y). • You can also combine geometries. We didn’t do that for our example but you may want, for example, to combine bars and lines on the same plot. To do it you can just use + operator. This operator is very important: it doesn’t only combine geometries, it’s also used to add features to the plot as we did to define its aspect ratio (with coord_equal) or its appearance (with theme_void). • You can modify every detail of the look and feel of your graph but be patient. To begin with, I recommend the use of predefined themes as I did with theme_void. There are many of them and it’s is a quick and easy way to change the appearance of your plot: just a way of making life easier and simpler. The ggplot2 package is a new world by itself and takes part of the tidyverse, a collection of efficient libraries designed to analyze data in a consistent way. This is how we can represent the points of our data frame: ggplot(df, aes(x = x, y = y)) + geom_point(shape = 46, alpha = .01) + coord_equal() + theme_void() -> plot ggsave("strange1.png", plot, height = 5, width = 5, units = 'in') The syntax of our plot is quite simple. Just two comments about it: • Shape is set to 46, the smallest filled point. More info about shapes, here • Since there are a lot of points, I make them transparent with alpha to take advantage on its overlapping and reveal patterns of shadows. ## Playing with parameters Strange attractors are extremely sensitive to their parameters. This feature makes them very entertaining as well as an infinite source of surprises. This is what happens to Clifford attractor when setting a, b, c and d parameters to -1.2, -1.9, 1.8 and -1.6 respectively: a <- -1.2 b <- -1.9 c <- 1.8 d <- -1.6 df <- createTrajectory(10000000, 0, 0, a, b, c, d) ggplot(df, aes(x = x, y = y)) + geom_point(shape = 46, alpha = .01) + coord_equal() + theme_void() -> plot ggsave("strange2.png", plot, height = 5, width = 5, units = 'in') Don’t you feel like playing with parameters? From my point of view, this is a very classy way of spending your time. ## The General 2D-Map Once we know what strange attractors look like, let’s try another family of them. This is the new formula we will experiment with: $x_{n+1} = a_{1} + a_{2}x_{n} + a_{3}y_{n} + a_{4}|x_{n}|^{a_{5}} + a_{6}|y_{n}|^{a_{7}} \\ y_{n+1} = a_{8} + a_{9}x_{n} + a_{10}y_{n} + a_{11}|x_{n}|^{a_{12}} + a_{13}|y_{n}|^{a_{14}}$ These attractors are localized mostly to a small region of the XY plane with tentacles that stretch out to large distances. If any of the exponents are negative and the attractor intersects the line along which the respective variable is zero, a point on the line maps to infinity. However, large values are visited infrequently by the orbit, so many iterations are required to determine that the orbit is unbounded. For this reason most of the attractors in the figures have holes in their interiors where their orbits are precluded from coming too close to their origins. cppFunction('DataFrame createTrajectory(int n, double x0, double y0, double a1, double a2, double a3, double a4, double a5, double a6, double a7, double a8, double a9, double a10, double a11, double a12, double a13, double a14) { // create the columns NumericVector x(n); NumericVector y(n); x[0]=x0; y[0]=y0; for(int i = 1; i < n; ++i) { x[i] = a1+a2*x[i-1]+ a3*y[i-1]+ a4*pow(fabs(x[i-1]), a5)+ a6*pow(fabs(y[i-1]), a7); y[i] = a8+a9*x[i-1]+ a10*y[i-1]+ a11*pow(fabs(x[i-1]), a12)+ a13*pow(fabs(y[i-1]), a14); } // return a new data frame return DataFrame::create(_["x"]= x, _["y"]= y); } ') a1 <- -0.8 a2 <- 0.4 a3 <- -1.1 a4 <- 0.5 a5 <- -0.6 a6 <- -0.1 a7 <- -0.5 a8 <- 0.8 a9 <- 1.0 a10 <- -0.3 a11 <- -0.6 a12 <- -0.3 a13 <- -1.2 a14 <- -0.3 df <- createTrajectory(10000000, 1, 1, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, a14) ggplot(df) + geom_point(aes(x = x, y = y), shape = 46, alpha = .01, color = "black") + coord_fixed() + theme_void() -> plot ggsave("strange3.png", plot, height = 5, width = 5, units = 'in') Ops! What happened? Why our image is so small now? The reason is because some few points are located quite far from the vast majority (you can think about them as outliers). Let’s remove 5% of extreme points from both tails independently: mx <- quantile(df$x, probs = 0.05)
Mx <- quantile(df$x, probs = 0.95) my <- quantile(df$y, probs = 0.05)
My <- quantile(df\$y, probs = 0.95)

df %>% filter(x > mx, x < Mx, y > my, y < My) -> df

ggplot(df) +
geom_point(aes(x = x, y = y), shape = 46, alpha = .01, color = "black") +
coord_fixed() +
theme_void() -> plot

ggsave("strange3.png", plot, height = 5, width = 5, units = 'in')

Now a stunning image appears.

Sometimes you will need to process the data frame after you generate it since:

• Extreme values can hide the pattern as happened before or even scape to infinite (solution: remove outliers).
• Some images can be quite boring: from my point of view, the more distributed ae points on the XY plane, the nicer is the resulting image (solution: try with a small data frame and keep it if some measure of dispersion over the plane or the individual axis is high enough; in this case, compute the big picture).

To conclude, let’s change the appearance of our plot. We will use the theme function to customize it setting the background color to black and adding a white line as frame.

opt <-  theme(legend.position  = "none",
panel.background = element_rect(fill="black", color="white"),
plot.background  = element_rect(fill="black"),
axis.ticks       = element_blank(),
panel.grid       = element_blank(),
axis.title       = element_blank(),
axis.text        = element_blank())

plot <- ggplot(df) +
geom_point(aes(x = x, y = y), shape = 46, alpha = .01, color = "white") +
coord_fixed() +
opt

ggsave("strange4.png", plot, height = 5, width = 5, units = 'in')

## Summary

More about recursivity and ggplot:

• Hadley’s R for Data Science and Advanced R: the first one covers iteration methods (including functional ones) as well as ggplot and the second one introduces Rcpp to use C++ when R is not fast enough.

1. K.Ikeda, Multiple-valued Stationary State and its Instability of the Transmitted Light by a Ring Cavity System, Opt. Commun. 30 257-261 (1979); K. Ikeda, H. Daido and O. Akimoto, Optical Turbulence: Chaotic Behavior of Transmitted Light from a Ring Cavity, Phys. Rev. Lett. 45, 709–712 (1980)

2. Accelerometer CC BY-SA 3.0