Of Height and Speed in Tennis, or Fuzziness and Techiness in College

April 24, 2011

(This article was first published on mickeymousemodels, and kindly contributed to R-bloggers)

I thought of this after reading this post and perhaps also this one, one the Cheap Talk blog. Here’s the puzzle: in general, being tall does not make you slow; but among professional tennis players, the tall athletes do tend to be relatively sluggish. Why does this happen? Cheap Talk gives a perfectly good written explanation, and I thought I’d complement it with something graphical.

Suppose that, in the general population, the distribution of height and speed looks roughly like this:

Where did I get this data? It’s entirely hypothetical. I made it up! That said, I did try to keep it semi-realistic: the heights are generated as H = 4 + U1 + U2 + U3 feet, where the U are independently uniform on (0, 1); the result is a bell curve on (4, 7) feet, which I prefer to the (-Inf, +Inf) of an actual normal distribution.  (I’ve created something similar to the N=3 frame in this animation.)

The next step is to give individuals a maximum footspeed S = 10 + U4 + U5 + U6 mph, with the U independently uniform on (0, 5). By construction, speed is independent from height, and falls more or less in a bell curve from 10 to 25 mph. Fun anecdote: my population is too slow to include Usain Bolt, whose top footspeed is close to 28 mph.

Back to tennis. Let’s imagine that tennis ability increases with both height and speed — and, moreover, that those two attributes are substitutable: if you’re short (and have a weak serve), you can make up for it by being fast. With that in mind, let’s revisit the scatterplot:

There it is: height and speed are independent in the general population, but very much dependent — and negatively correlated — among tennis players.  The plot really drives the point home:  top athletes will be either very tall, very fast, or nearly both; and excluding everyone else creates a downward slope.

As hinted in the title, you can apply the exact same thinking to fuzziness (height) and techniness (speed) among college students (professional tennis players). Literary and mathematical ability might be more-or-less independent in the general population; however, being admitted to university requires that you be excellent at one of them, or solidly above average at both; and just as before, excluding everyone else creates a downward slope, which might explain why fuzziness and techiness are negatively correlated among college students.

(Caveat:  When I say “are,” I should probably say “seem to be,” because I have no hard data. Certainly the stereotype is that a CS whiz probably writes bad essays, and that a humanities superstar is likely to struggle with algorithms.)

Some R code pour les curieux:

n <- 2000

df <- data.frame(height=4 + runif(n) + runif(n) + runif(n), speed = 10 + runif(n, 0, 5) + runif(n, 0, 5) + runif(n, 0, 5))

dev.new(height=6, width=6)

plot(df, main="Height and Speed", xlim=c(4, 7), ylim=c(10, 25), col=rgb(60, 120, 180, 30, maxColorValue=255), pch=16, cex=2)


# A(n arbitrary but convenient) function returning the probability of being a good tennis player

df$p <- with(df, 1 / (1 + exp(1 - 20 * (((height / 6)^2 + (speed / 20)^2) - 2))))

# You are unlikely to be a tennis player unless ((height / 6)^2 + (speed / 20)^2) > 2

df$is.tennis.player <- (runif(n) < df$p)

# Two plots side by side

dev.new(height=6, width=12)


plot(df[ , c("height", "speed")], main="General Population", xlim=c(4, 7), ylim=c(10, 25), col=rgb(60, 120, 180, 30, maxColorValue=255), pch=16, cex=2)

plot(subset(df, is.tennis.player)[ , c("height", "speed")], main="Tennis Players", xlim=c(4, 7), ylim=c(10, 25), col=rgb(180, 10, 10, 30, maxColorValue=255), pch=16, cex=2)


To leave a comment for the author, please follow the link and comment on their blog: mickeymousemodels.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Mango solutions

RStudio homepage

Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training



CRC R books series

Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)