(This article was first published on

In competitive swimming, the medley relay is a team event in which four different swimmers each swim one of the four strokes: freestyle, breaststroke, backstroke, and butterfly. In general, every swimmer might be able swim any given stroke. How can we choose the fastest relay team? Here we solve this by enumerating all possible teams, though a more efficient routine likely exists.**SAS and R**, and kindly contributed to R-bloggers)Some example practice times can be seen on this Google Spreadsheet. Also, using the steps outlined here, the same spreadsheet is available as a CSV file here. (FTR, these are actual practice times for 100 yards for mostly 12-13 year-old swimmers; the names have been changed.)

**SAS**

We first read the data from the URL, using the technique outlined in section 1.1.6. Note that if you cut-and-paste this, you'll need to get the whole URL onto one line-- we break it up here for display only.

filename swimurl url 'https://docs.google.com/spreadsheet

/pub?key=0AvJKgZUzMYLYdE5xTHlEWkNUM3NoOHB1ZTJoTFMzUUE&

single=true&gid=0&output=csv';

proc import datafile=swimurl out=swim dbms=csv;

run;

Next, we use the

`point=`option in nested

`set`statements to generate a single data set with all the possible combinations of names and times. Meanwhile we change the names of the variables so they don't get overwritten in the next

`set`statement. Note the use of the

`nobs`option to find the number of rows in the data set.

data permute

(keep=free freetime fly flytime back backtime breast breasttime);

set swim (rename = (swimmer=free freestyle=freetime)) nobs=nobs;

do i = 1 to nobs;

set swim(rename = (swimmer=fly butterfly=flytime)) point=i;

do j = 1 to nobs;

set swim(rename = (swimmer=back backstroke=backtime)) point=j;

do k = 1 to nobs;

set swim(rename = (swimmer=breast breaststroke=breasttime))

point = k;

output;

end;

end;

end;

run;

The resulting data set has 12^4 rows, and includes rosters with the same swimmer swimming all four legs. In fact, a quick glance will show that Anna has the best time in each stroke, and thus the best "team" based on these practice times would use her for each stroke. This is against the rules, and also probably isn't reflective of performance in a race. We'll remove illegal line-ups using a

`where`statement (section 1.5.1) and also calculate the total team time.

data prep;

set permute;

where free ne back and free ne breast and free ne fly and

back ne breast and back ne fly and breast ne fly;

time = sum(freetime, flytime, backtime, breasttime);

run;

The resulting data set has (12 permute 4) lines. To find the best team, we just sort by the total time and look at the first line. Here the first 10 lines (10 best teams) are shown.

proc sort data=prep; by time; run;

proc print data=prep (obs=10); run;

b

r

f b e

r f a a

e l c b s

e y k r t

f t t b t e t t

r i f i a i a i i

e m l m c m s m m

e e y e k e t e e

Kara 109.3 Dora 126.8 Lara 117.7 Anna 126.9 480.7

Anna 102.8 Dora 126.8 Lara 117.7 Beth 134.6 481.9

Kara 109.3 Anna 120.5 Lara 117.7 Beth 134.6 482.1

Anna 102.8 Dora 126.8 Lara 117.7 Honora 136.4 483.7

Kara 109.3 Jane 129.8 Lara 117.7 Anna 126.9 483.7

Kara 109.3 Anna 120.5 Lara 117.7 Dora 136.4 483.9

Kara 109.3 Anna 120.5 Lara 117.7 Honora 136.4 483.9

Kara 109.3 Lara 123.1 Jane 124.7 Anna 126.9 484.0

Anna 102.8 Dora 126.8 Lara 117.7 Inez 136.8 484.1

Carrie 112.7 Dora 126.8 Lara 117.7 Anna 126.9 484.1

The best time shaves a whole second off the predicted time using the second-best team.

**R**

Since published Google Spreadsheets use

`https`rather than

`http`, we use the

`RCurl`package and its

`getURL()`function. (Note that if you cut-and-paste this, you'll need to get the whole URL onto one line-- we break it up here for display only.) Then we can read the data with

`read.csv()`and

`textConnection()`.

library(RCurl)

swim = getURL("https://docs.google.com/spreadsheet

/pub?key=0AvJKgZUzMYLYdE5xTHlEWkNUM3NoOHB1ZTJoTFMzUUE&

single=true&gid=0&output=csv")

swim2=read.csv(textConnection(swim))

To make an object with the combinations of names, we use the

`expand.grid()`function highlighted in Example 7.22, providing as arguments the swimmers names four times. As in the SAS example, the result has has 12^4 rows. The

`combn()`function might be a better fit here, but was more difficult to use.

test2 = expand.grid(swim2$Swimmer,swim2$Swimmer, swim2$Swimmer, swim2$Swimmer)

It'll be useful to assign these copies of the names to each of the strokes. We'll do that with the

`rename()`function available in the

`reshape`package. (This approach is mentioned in section 1.3.4.). Then we can remove the rows where the same name appears twice using some logic. The logic is nested in the

`with()`function to save some keystrokes and is generally preferable to

`attach()`ing the test2 object.

library(reshape)

test2 = rename(test2, c("Var1" = "free", "Var2" = "fly",

"Var3" = "back", "Var4" = "breast"))

test3 = with(test2, test2[(free != breast) & (free != fly)

& (free != back) & (breast != fly) & (breast != back)

& (fly != back) ,])

Finally, we can use the

`which.min()`function to pick the best team.

> bestteam =

+ test3[which.min(swim2$Freestyle[test3$free]+swim2$Breaststroke[test3$breast] +

+ swim2$Butterfly[test3$fly] + swim2$Backstroke[test3$back]),]

>bestteam

free fly back breast

1631 Kara Dora Lara Anna

For new users of R, this may look very peculiar-- it uses elegant but powerful features of the R language that may be challenging for new users to grasp. Essentially, in

`swim2$Freestyle[test3$free]`we say: from the "freestyle" times in the swim2 object, take the time from the row that has the name in a row of "free" names in the test3 object. The

`which.min()`function replicates this request for every row in the test3 object (which has all of the permutations) in it, returning the row number with that minimum sum. The outer

`test3[rows,columns]`syntax grabs the

*values*in this row. (The number 1631 is the row number, for some reason showing the row in the test2 object created by

`expand.grid()`.)

Now, we might also want the actual times associated with the best team. We can find them by calling the correct rows (names from the best team) and columns (stroke associated with that name) from the original data set.

> times = c(swim2[swim2$Swimmer == bestteam$free,2],

+ swim2[swim2$Swimmer == bestteam$fly,3],

+ swim2[swim2$Swimmer == bestteam$back,4],

+ swim2[swim2$Swimmer == bestteam$breast,5])

> times

[1] 109.3 126.8 117.7 126.9

If instead, one wanted to list the times in order, one approach would be to add columns to the test3 object with the time for each stroke, calculate their sum, and sort on the sum.

To

**leave a comment**for the author, please follow the link and comment on his blog:**SAS and R**.R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...