Genetic algorithms: a simple R example

August 1, 2012

(This article was first published on FishyOperations » R, and kindly contributed to R-bloggers)

Genetic algorithm is a search heuristic. GAs can generate a vast number of possible model solutions and use these to evolve towards an approximation of the best solution of the model. Hereby it mimics evolution in nature.

GA generates a population, the individuals in this population (often called chromosomes) have a given state. Once the population is generated, the state of these individuals is evaluated and graded on their value. The best individuals are then taken and crossed-over – in order to hopefully generate 'better' offspring – to form the new population. In some cases the best individuals in the population are preserved in order to guarantee 'good individuals' in the new generation (this is called elitism).

The GA site by Marek Obitko has a great tutorial for people with no previous knowledge on the subject.

To explain the example I will use my version of the Knapsack problem.

You are going to spend a month in the wilderness. You’re taking a backpack with you, however, the maximum weight it can carry is 20 kilograms. You have a number of survival items available, each with its own number of “survival points”. You’re objective is to maximize the number of survival points.

The following table shows the items you can choose from.

item survivalpoints weight
pocketknife 10.00 1.00
beans 20.00 5.00
potatoes 15.00 10.00
unions 2.00 1.00
sleeping bag 30.00 7.00
rope 10.00 5.00
compass 30.00 1.00

In R I have used the package genalg to set-up the model. Later on, ggplot2 will be used to visualize the evolution of the model.

Let's define the dataset and weight constraint;


dataset <- data.frame(item = c("pocketknife", "beans", "potatoes", "unions", 
    "sleeping bag", "rope", "compass"), survivalpoints = c(10, 20, 15, 2, 30, 
    10, 30), weight = c(1, 5, 10, 1, 7, 5, 1))
weightlimit <- 20

Before creating the model we have to set-up an evaluation function. The evaluation function will evaluate the different individuals (chromosomes) of the population on the value of their gene configuration.

An individual can for example have the following gene configuration: 1001100.

Each number in this binary string represents whether or not to take an item with you. A value of 1 refers to putting the specific item in the knapsack while a 0 refers to leave the item at home. Given the example gene configuration we would take the following items;

chromosome = c(1, 0, 0, 1, 1, 0, 0)
dataset[chromosome == 1, ]
##           item survivalpoints weight
## 1  pocketknife             10      1
## 4       unions              2      1
## 5 sleeping bag             30      7

We can check to what amount of surivival points this configuration sums up.

cat(chromosome %*% dataset$survivalpoints)
## 42

Above we gave a value to the gene configuration of a given chromosome. This is exactly what the evaluation function does.

The genalg algorithm tries to optimize towards the minimum value. Therefore, the value is calculated as above and multiplied with -1. A configuration which leads to exceeding the weight constraint returns a value of 0 (a higher value can also be given).

We define the evaluation function as follows.

evalFunc <- function(x) {
    current_solution_survivalpoints <- x %*% dataset$survivalpoints
    current_solution_weight <- x %*% dataset$weight

    if (current_solution_weight > weightlimit) 
        return(0) else return(-current_solution_survivalpoints)

Next, we choose the number of iterations, design and run the model.

iter = 100
GAmodel <- rbga.bin(size = 7, popSize = 200, iters = iter, mutationChance = 0.01, 
    elitism = T, evalFunc = evalFunc)
## GA Settings
##   Type                  = binary chromosome
##   Population size       = 200
##   Number of Generations = 100
##   Elitism               = TRUE
##   Mutation Chance       = 0.01
## Search Domain
##   Var 1 = [,]
##   Var 0 = [,]
## GA Results
##   Best Solution : 1 1 0 1 1 1 1 

The best solution is found to be 1111101. This leads us to take the following items with us on our trip into the wild.

solution = c(1, 1, 1, 1, 1, 0, 1)
dataset[solution == 1, ]
##           item survivalpoints weight
## 1  pocketknife             10      1
## 2        beans             20      5
## 3     potatoes             15     10
## 4       unions              2      1
## 5 sleeping bag             30      7
## 7      compass             30      1

This in turn gives us the total number of survival points.

# solution vs available
cat(paste(solution %*% dataset$survivalpoints, "/", sum(dataset$survivalpoints)))
## 107 / 117

Let's visualize how the model evolves.

animate_plot <- function(x) {
    for (i in seq(1, iter)) {
        temp <- data.frame(Generation = c(seq(1, i), seq(1, i)), Variable = c(rep("mean", 
            i), rep("best", i)), Survivalpoints = c(-GAmodel$mean[1:i], -GAmodel$best[1:i]))

        pl <- ggplot(temp, aes(x = Generation, y = Survivalpoints, group = Variable, 
            colour = Variable)) + geom_line() + scale_x_continuous(limits = c(0, 
            iter)) + scale_y_continuous(limits = c(0, 110)) + geom_hline(y = max(temp$Survivalpoints), 
            lty = 2) + annotate("text", x = 1, y = max(temp$Survivalpoints) + 
            2, hjust = 0, size = 3, color = "black", label = paste("Best solution:", 
            max(temp$Survivalpoints))) + scale_colour_brewer(palette = "Set1") + 
            opts(title = "Evolution Knapsack optimization model")


# in order to save the animation
saveMovie(animate_plot(), interval = 0.1, outdir = getwd())

Animated GA graph

The x-axis denotes the different generations. The blue line shows the mean solution of the entire population of that generation, while the red line shows the best solution of that generation. As you can see, it takes the model only a few generations to hit the best solution, after that it is just a matter of time until the mean of the population of subsequent generations evolves towards the best solution.

For more information on genetic algorithms, check out:

The post Genetic algorithms: a simple R example appeared first on FishyOperations.

To leave a comment for the author, please follow the link and comment on their blog: FishyOperations » R. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , , , , , ,

Comments are closed.


Mango solutions

RStudio homepage

Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training


CRC R books series

Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)