(This article was first published on

I thought of this after reading this post and perhaps also this one, one the Cheap Talk blog. Here's the puzzle: in general, being tall does not make you slow; but among professional tennis players, the tall athletes do tend to be relatively sluggish. Why does this happen? Cheap Talk gives a perfectly good written explanation, and I thought I'd complement it with something graphical.**mickeymousemodels**, and kindly contributed to R-bloggers)Suppose that, in the general population, the distribution of height and speed looks roughly like this:

The next step is to give individuals a maximum footspeed S = 10 + U4 + U5 + U6 mph, with the U independently uniform on (0, 5). By construction, speed is independent from height, and falls more or less in a bell curve from 10 to 25 mph. Fun anecdote: my population is too slow to include Usain Bolt, whose top footspeed is close to 28 mph.

Back to tennis. Let's imagine that tennis ability increases with both height and speed -- and, moreover, that those two attributes are substitutable: if you're short (and have a weak serve), you can make up for it by being fast. With that in mind, let's revisit the scatterplot:

As hinted in the title, you can apply the exact same thinking to fuzziness (height) and techniness (speed) among college students (professional tennis players). Literary and mathematical ability might be more-or-less independent in the general population; however, being admitted to university requires that you be excellent at one of them, or solidly above average at both; and just as before, excluding everyone else creates a downward slope, which might explain why fuzziness and techiness are negatively correlated among college students.

(Caveat: When I say "are," I should probably say "seem to be," because I have no hard data. Certainly the stereotype is that a CS whiz probably writes bad essays, and that a humanities superstar is likely to struggle with algorithms.)

Some R code pour les curieux:

`n <- 2000`

df <- data.frame(height=4 + runif(n) + runif(n) + runif(n), speed = 10 + runif(n, 0, 5) + runif(n, 0, 5) + runif(n, 0, 5))

dev.new(height=6, width=6)

plot(df, main="Height and Speed", xlim=c(4, 7), ylim=c(10, 25), col=rgb(60, 120, 180, 30, maxColorValue=255), pch=16, cex=2)

savePlot("height_and_speed.png")

# A(n arbitrary but convenient) function returning the probability of being a good tennis player

df$p <- with(df, 1 / (1 + exp(1 - 20 * (((height / 6)^2 + (speed / 20)^2) - 2))))

# You are unlikely to be a tennis player unless ((height / 6)^2 + (speed / 20)^2) > 2

df$is.tennis.player <- (runif(n) < df$p)

# Two plots side by side

dev.new(height=6, width=12)

par(mfrow=c(1,2))

plot(df[ , c("height", "speed")], main="General Population", xlim=c(4, 7), ylim=c(10, 25), col=rgb(60, 120, 180, 30, maxColorValue=255), pch=16, cex=2)

plot(subset(df, is.tennis.player)[ , c("height", "speed")], main="Tennis Players", xlim=c(4, 7), ylim=c(10, 25), col=rgb(180, 10, 10, 30, maxColorValue=255), pch=16, cex=2)

savePlot("height_and_speed_general_population_and_tennis_players.png")

To

**leave a comment**for the author, please follow the link and comment on his blog:**mickeymousemodels**.R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...