It is "simply" the average value

[This article was first published on Freakonometrics - Tag - R-english, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

for some obscure reasons, simple things are usually supposed to be simple. Recently, on the internet, I saw a lot of posts on the “average time in which you hold a stock“, and two rather different values are mentioned

  • Take any stock in the United States. The average time in which you hold a stock is – it’s gone up from 20 seconds to 22 seconds in the last year” (Michael Hudson on http://www.telegraph.co.uk/) or “The founder of Tradebot, in Kansas City, Mo., told students in 2008 that his firm typically held stocks for 11 seconds” (on http://www.nytimes.com/) among many others
  • Based on the NYSE index data, the mean duration of holding period by US investors was around 7 years in 1940. This stayed the same for the next 35 years.  The average holding period had fallen to under 2 years by the time of the 1987 crash. By the turn of the century it had fallen to below one year. It was around 7 months by 2007” (on http://topforeignstocks.com/ see also the graph below) or “Two-thirds [of the managers of more than 800 institutional funds interviewed in a study] had higher turnover than they predicted […] Even though most are judged by performance over three-year horizons, their average holding period was about 17 months, and 19% of the managers held the typical stock for one year or less” (mentioned on http://online.wsj.com/) again among many others

How comes that on the one hand, some people talk about less than 20 sec. for the “average time in which you hold a stock“, and on the other, around a year. How can we have such a difference ? We are talking about an average time here, not a rare event probability…

To understand what might be wrong, consider the following case, with a market, and two stocks: one is kept over a year (52 weaks) while the other other is traded – and exchanged – every week (52 times per year). What is the “average time in which you hold a stock” ? Is it

  • 26.5 weeks ? the average time for the first stock is 52 weeks, while it is 1 for the second one, i.e. 53 over 2
  • 1.96 weeks ? over a year the first stock has been traded once, while it was exchanged 52 times for the second one, i.e. 104 over 53 (total time over the total number of transactions)

Obviously, there is a selection bias in that study (see here for an illustration of that concept, in French). In order to get a better understanding, consider the following simple model, with a large number of simulated stocks. At each transaction, they can be hold by 3 types of investors,

  • with probability 70%, hold – on average – for 20 sec.
  • with probability 20%, hold – on average – for 15 days
  • with probability 10%, hold – on average – for 10 years

As claimed by Warren Buffett, “my favorite time frame for holding a stock is forever“, so it might not be absurd to consider investors who keep a stock for a long period of time. Assume further that the time frame for holding a stock is exponentially distributed (the rate depending on the kind of investor). Assume that those stocks are observed during a period of time of 20 years (which might sound reasonable). Several techniques can be used to estimate the “average time in which you hold a stock

  • The first one is to calculate the mean, per stock, of the holding time, and to consider the average over all the stocks. Maybe it would be a good idea to exclude the last observation (since data were censored),
  • The second one is to divide the (total) period of time by the (total) number of investors that hold the stock during that time frame (or number of transactions)
  • A third idea might be to use the first method, but instead of removing the last one, to use an estimator of the mean based on Kaplan-Meier estimate
  • A fourth idea is to look at what happened at a specific date (say after 10 years), i.e. which investor had the stock, and how long he kept it.

The code to generate that process is the following

> set.seed(1)
> invest=sample(size=ns,c("A","B","C"),
+ prob=c(.7,.2,.1),replace=TRUE)
> lambda=(invest=="A")*20/(365*24*60*60)+
+        (invest=="B")*15/365+
+        (invest=="C")*10
> E=rexp(ns,rate=1/lambda)
> T=cumsum(E)
> T=T[T<20]
> plot(c(T,50),0:length(T),type="s",xlim=c(0,20),col="blue")

with the following trajectory for the number of investor that did hold that specific stock between time 0 and time 20.

Then, the different techniques are the following,

# method 1
> E1=diff(T)
> m1=mean(E)
> M1[s]=m1

for the first one (means of time length, per stock),

# method 2
> if(length(T)>1){
+ n2=length(T)-1
+ d2=T[length(T)-l]-T[1]
+ N2[s]=n2; D2[s]=d2
+ }

for the second one (time length and number of transactions),

+ # method 3
+ T3=c(T,20)
+ C3=c(rep(0,length(T)-1),1)
+ km=survfit(Surv(diff(T3), 1-C3)~1)
+ m3=summary(km,rmean='individual')$table[5]
+ M3[s]=m3

for the third one (based on a prediction of the expected mean, from Kaplan-Meier estimate) and

# method 4
> T0=c(0,T,20)
> m4=min(T0[T0>10])-max(T0[T0<10])
> M4[s]=m4

for the fourth one (based on what happened at time 10). Using monte carlo simulations, we get very different quantities, that can all be interpreted as the “average time in which you hold a stock

> sum(D2,na.rm=TRUE)/sum(N2,na.rm=TRUE)
[1] 0.3692335
> mean(M1,na.rm=TRUE)
[1] 0.5469591
> mean(M3,na.rm=TRUE)
[1] 1.702908
> mean(M4,na.rm=TRUE)
[1] 12.40229

If we change to probabilities (and assume that high frequency investors are much more important than long-term ones), e.g. 

> invest=sample(size=ns,c("A","B","C"),
+ prob=c(.9,.09,.01),replace=TRUE)

then the first two estimates are rather different. But not the last two.

> sum(D2,na.rm=TRUE)/sum(N2,na.rm=TRUE)
[1] 0.04072227
> mean(M1,na.rm=TRUE)
[1] 0.06393767
> mean(M3,na.rm=TRUE)
[1] 0.2504322
> mean(M4,na.rm=TRUE)
[1] 12.05508

So I have to confess that the “average time in which you hold a stock” can be almost anything from 10 sec. to 10 years, it clearly depends on the way the average is calculated. The second point is that if the proportion of high frequency trading is extremely high, I should not affect the last one (which is, from my point of view, the most interesting one, an might also be improved by here also integrate a censored variate). So I guess people should be careful when discussing such quantities… And if anyone is willing to share data on that topic, I’d be glad to look at them…

To leave a comment for the author, please follow the link and comment on their blog: Freakonometrics - Tag - R-english.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)