**The Pith of Performance**, and kindly contributed to R-bloggers)

The ability to **visualize data**, enabled by the advent of graphical computer tools, has been a great boon to Cap and Perf. The power derives from the way graphical displays provide an efficient impedance match to the visual system in our brain. The weakness derives from the way graphical displays provide an efficient impedance match to the visual system in our brain. We can get carried away by visual representations alone. Every marketing organization exploits that weakness. Numbers do have poor cognitive impedance, but that doesn’t mean numbers should ignored altogether. In fact, we often need a combination of both numerical and visual data representations so that we don’t suffer visual miscues and thus jump to the wrong conclusion. The following presents an example of how easily this can happen.

Recently, Guerrilla alumnus, Scott J. pointed me at this Chart of the Day showing how Google revenue growth was outpacing both Facebook and Yahoo, when compared 7 years after launching the respective companies.

Clearly, this chart is intended to be an attention getter for the Silicon Alley Insider website but, it looks about right and normally I might have just accepted the claim without giving it anymore thought. The notion that *Google growth is dominating*, is also consistent with a lot of other things one sees. No surprises there.

#### Exponential doubling period

In this particular case, however, I was struck by the *shape* of the data and curious to find out if the growth of GOOG and FB revenue follows an exponential trend or not. Exponential growth is not unexpected because it’s the continuous analog of compound interest. If they are growing exponentially, I can compare their *doubling periods* **numerically** and determine by how their growth will look in the future.

The doubling period is an analysis technique that I use in Chapter 8 of my Guerrilla Capacity Planning book to determine the **traffic growth** of major **websites**. In section 8.7.5 the doubling time t_{2} is defined as:

t

_{2}= Ln(2) / A

where A is the growth parameter of the fitted exponential curve (the rate at which it bends upward) and Ln(2) is the natural logarithm of 2 (2 for doubling). The only fly in the ointment is that I don’t have the actual numeric values used in the histogram chart, but that need not be a showstopper. There are only a half dozen data points for each company, so I can estimate them visually. Then, I can use R to fit the exponential models and calculate the respective doubling times.

#### Analysis in R

First, we read the data (as eyeballed from the online chart) into R. Since the amount of data is small, I simply use the `textConnection` trick to write the data in situ, rather than using an external file.

gd <- read.table(textConnection("Year GOOG FB\tYAH

1 0.001 0.002 0.001

2 0.01 0.02 0.01

3 0.1 0.2 0.1

4 0.5 0.45 0.3

5 1.5 0.75 0.6

6 3.2 2.0 1.1

7 6.1 4.0 0.75"),

header=TRUE,sep="\t")

closeAllConnections()

I can now plot those estimated data points and compare them with the original chart.

plot(gd$Year,gd$GOOG,type="b",col="green",lwd=2,lty="dashed",

main="Annual revenues for GOOG (green), FB (blue), YAH (red)",

xlab="Years after launch", ylab="$ billions")

points(gd$Year,gd$FB,type="b",col="blue",lwd=2,lty="dashed")

points(gd$Year,gd$YAH,type="b",col="red",lwd=2,lty="dashed")

The result looks like this:

The dashed lines simply connect related points together. The two solid lines are produced by performing the corresponding exponential fits to the GOOG and FB data.

# x-values for continuous exp curves

x<-seq(from=1, to=7, by=0.1)

ggfit<-nls(gd$GOOG ~ g0*exp(g1*gd$Year),data=gd,start=list(g0=1,g1=1))

gc<-coef(ggfit)

lines(x,y=gc[1]*exp(gc[2]*x))

fbfit<-nls(gd$FB ~ f0*exp(f1*gd$Year),data=gd,start=list(f0=1,f1=1))

fc<-coef(fbfit)

lines(x,y=fc[1]*exp(fc[2]*x))

# report the doubling periods

text(1,5.0,sprintf("%2s doubling time: %4.2f months", names(gd)[2],12*log(2)/gc[2]),adj=c(0,0))

text(1,4.5,sprintf("%2s doubling time: %4.2f months", names(gd)[3],12*log(2)/fc[2]),adj=c(0,0))

From the R analysis we see that the doubling period for Google (t_{2} = 11.39 months) is slightly **longer** than that for Facebook (t_{2} = 10.94 months). Despite the banner claim made by Silicon Alley Insider, based on these estimated data, Google is growing revenue at a slightly *slower* rate than Facebook. How can that be?

#### Conclusion

In the original histogram chart, it looks like Google is growing faster than Facebook. Well, **looks can be deceiving**. Your brain can be fooled (easily) by optical illusions. That’s why we need to do analysis in the first place. Viewed uncritically, your brain can easily be led astray.

To resolve this paradox, let’s do two things:

- Project the growth models out further than the 7 years associated with the data
- Plot the projected curves on log-linear axes (for reasons that will become clear shortly)

Here’s the result (you might want to click on the image to magnify it).

The left-hand plot shows that the two curves cross somewhere between 7 years out and 40 years out. Whereas green (Google) is currently on top, according to the data, blue (Facebook) eventually ends up on top according to the exponential models; assuming nothing else changes in the future. The right-hand plot uses a log-scaled y-axis to reveal more clearly that the crossover occurs at t = 23.9 years. Once again, if you rely purely on visuals, you might think the crossover doesn’t occur until **after 30 years** (what looks like a “knee” in the left-hand plot), but you’d be misled. It occurs almost **10 years earlier**.

If, for example, you were only interested in short-term gains (as Wall St is wont to do), the original visual (histogram) is correct. If, on the other hand, you are in your 20s and investing longer term, e.g., for your retirement, you might get a surprise.

By now, you might be thinking that these projections are not very accurate, and I wouldn’t completely disagree with you. But what *is* accurate here? The original data in the histogram (even the really real actual data) probably aren’t very accurate either; we really can’t know without deeper investigation. And that’s my point: independent of the accuracy of the data, *the numerical analysis can cause you to pay attention to, and possibly ask questions about, something you might otherwise have taken for granted on purely visual grounds*.

*Even wrong expectations are better than no expectations*

I’m a big fan of data visualization, but not to the exclusion of numerical analysis. We need both and we need both to be easily accessible.

*The art is in the science*

**leave a comment**for the author, please follow the link and comment on his blog:

**The Pith of Performance**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...