**The Pith of Performance**, and kindly contributed to R-bloggers)

Call me old-fashioned, but I never saw the movie *The Social Network* until last year (at a private screening). In case you also missed it, it’s the Hollywood version of how Facebook.com came into being.

Quite apart from any artistic criticisms, I have a genuine psychological problem with movies like TSN. I keep getting caught up in technical inaccuracies and tend to lose the plot. So, it’s very hard for me to watch such movies as the director intended. It’s the same reason I can’t stand SciFi movies or books: I can’t get past the impossible. It turns out that TSN is generally fairly accurate regarding things like Linux, MySQL, PHP, and so forth, but there is a real clanger: the ranking algorithm used by Facemash—the Facebook precursor.

Mark Zuckerberg wants to rank Harvard women based on crowd-sourced scores. He recalls that his best friend (at the time), Eduardo Saverin, had previously mentioned a ranking formula but Zuck can’t remember how it goes. When Saverin shows up again, Zuck urgently asks him to reveal it. In typical Hollywood style (and perhaps to keep a math-phobic audience visually engaged), Saverin writes the following **ranking equations** on the dorm window for the desperate Zuckerberg. Where else would you write equations? \begin{align} Ea &= \dfrac{1}{1 + 10 (Rb-Ra)/400}, & Eb &= \dfrac{1}{1 + 10 (Ra-Rb)/400} \label{eqn:movie} \end{align} There’s just one slight problem: they’re **wrong**! I wasn’t sure at the time, but as Saverin was writing them, I was thinking to myself, “Why wouldn’t $10/400$ be simplified to $1/40$?” Even a Harvard student could figure that out. And, on continued mulling: “If woman ‘a’ and woman ‘b’ had a rank difference of $-40$, the fractions would blow up! How can you rank infinity?”

You see the kind of mental problems I have. As I continued to try and watch TSN, I decided that perhaps I had misread the equations. The scene goes by quite quickly when you’re not expecting it, and I didn’t want to ask the host to replay that section of the movie just for me. By the time TSN ended, there were some other suspect scenes^{*} and it was getting too late in the evening to review the entire movie.

The next day, however, I checked the web and it wasn’t hard to find references to TSN with the ranking algorithm scene, some of which thankfully included a video clip. To my surprise, nobody else seems to have spotted the error. Sure enough, the equations that Saverin writes are indeed those in \eqref{eqn:movie}. Moreover, I also found the original **movie script**, and on pages 15 and 16 the equations are incorrect there too—written exactly as they appear in \eqref{eqn:movie}. The guy who played Eduardo Saverin must be a good actor because he followed the script exactly.

So, what are the **correct equations**? It turns out the equations come from ranking chess players, which is probably how the real Eduardo Saverin knew about them. That’s how it’s portrayed in TSN. Anyway, Saverin didn’t *invent* them. The correct equations are: \begin{align} Ea &= \dfrac{1}{1 + 10^{ \big( \frac{Rb-Ra}{400} \big) }}, & Eb &= \dfrac{1}{1 + 10^{ \big( \frac{Ra-Rb}{400} \big) }} \label{eqn:chess} \end{align} Note the **exponential** form. This has the effect of giving the ranking equations a complementary sigmoidal shape such that when the score of woman ‘a’ is up, the score of woman ‘b’ is correspondingly down. This fits with the requirement (not mentioned in TSN) that $Ea + Eb = 1$: the relative scores have to add to 100%.

Furthermore, this is not just a simple typographical error due to a missing ^ (caret) exponentiation character in Excel or Word; the most likely applications used for the movie script. The 400 has to go *inside* the exponent of \eqref{eqn:chess}, which means that *more* brackets are needed than shown in \eqref{eqn:movie}. In other words, the movie equations are a bit of a mess, transcribed by someone (presumably the scriptwriter) who had no idea what he was writing. Nothing new, I suppose, but these days Hollywood is known to hire experts (like mathematicians) to verify movie concepts. Not in this case, apparently.

Finally, it also occurred to me that if I replace the $10$ in the chess equations by the number 2.71828 (otherwise denoted by the letter $e$), I get the Fermi-Dirac distribution^{†} over energy range $E$ at a given temperature $T$ (see above plots): \begin{align} Ea &= \dfrac{1}{1 + 10^{ \big( \frac{Rb-Ra}{400} \big) }}, & f(E) &= \dfrac{1}{1 + e^{ \big( \frac{E-\mu}{kT} \big) }} \label{eqn:elofd} \end{align} The Fermi-Dirac function, $f(E)$, is key to understanding how electrons “rank” *themselves* in a **semiconductor** circuit; a different kind of **network**, also made famous in the Silicon Valley.^{‡}

^{*}For example, the scene where the computer science lecturer asks the class an impromptu question. Everyone is stumped. Not too surprising since the question is so hopelessly ill-posed that nobody would be able to provide an answer. Undeterred by such details, the Zuckerberg character superciliously throws off the (“correct”) answer as he storms out of the room in a huff. Not only is the technical question ludicrous but, according to the real-life lecturer of that Harvard class, neither that nor any similar event ever occurred!

^{†}In a twist of irony while writing this blog post, I recalled that in another but not totally unrelated context, the Bose-Einstein distribution has also been applied to the Hollywood movie industry (PDF).

^{‡}Here is the R script for producing the above plots.

# Compariosn b/w chess ranking equations from the movie

# "The Social Network" and the Fermi-Dirac distribution.

# Created by NJG on Monday, January 14, 2013

op <- par(mfrow=c(1,2), pty="s")

# Normalized Chess Ranking profile

Ra <- 1

Rb <- seq(0,2,0.01)

plot(Rb, 1/(1+10^(((Rb/Ra)-1) * (Ra*1000/800))), type="l", ylim=c(0,1), xlab="Rb/Ra", ylab="Ea",col="red")

lines(Rb, 1/(1+10^(((Rb/Ra)-1) * (Ra*1000/400))),col="green")

lines(Rb, 1/(1+10^(((Rb/Ra)-1) * (Ra*1000/1))),col="blue")

title(main="Chess Ranking Dsn")

# Fermi-Dirac distribution

E <- seq(0,2,0.01)

plot(E, 1/(1 + exp(1)^((E-1) * 2)), type="l", ylim=c(0,1), xlab="E", ylab="f(E)",col="red")

lines(E, 1/(1 + exp(1)^((E-1) * 10)),col="green")

lines(E, 1/(1 + exp(1)^((E-1) * 400)),col="blue")

title(main="Fermi-Dirac Dsn")

par(op)

**leave a comment**for the author, please follow the link and comment on his blog:

**The Pith of Performance**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...