Site icon R-bloggers

It is "simply" the average value

[This article was first published on Freakonometrics - Tag - R-english, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

for some obscure reasons, simple things are usually supposed to be simple. Recently, on the internet, I saw a lot of posts on the “average time in which you hold a stock“, and two rather different values are mentioned

How comes that on the one hand, some people talk about less than 20 sec. for the “average time in which you hold a stock“, and on the other, around a year. How can we have such a difference ? We are talking about an average time here, not a rare event probability…

To understand what might be wrong, consider the following case, with a market, and two stocks: one is kept over a year (52 weaks) while the other other is traded – and exchanged – every week (52 times per year). What is the “average time in which you hold a stock” ? Is it

Obviously, there is a selection bias in that study (see here for an illustration of that concept, in French). In order to get a better understanding, consider the following simple model, with a large number of simulated stocks. At each transaction, they can be hold by 3 types of investors,

As claimed by Warren Buffett, “my favorite time frame for holding a stock is forever“, so it might not be absurd to consider investors who keep a stock for a long period of time. Assume further that the time frame for holding a stock is exponentially distributed (the rate depending on the kind of investor). Assume that those stocks are observed during a period of time of 20 years (which might sound reasonable). Several techniques can be used to estimate the “average time in which you hold a stock

The code to generate that process is the following

> set.seed(1)
> invest=sample(size=ns,c("A","B","C"),
+ prob=c(.7,.2,.1),replace=TRUE)
> lambda=(invest=="A")*20/(365*24*60*60)+
+        (invest=="B")*15/365+
+        (invest=="C")*10
> E=rexp(ns,rate=1/lambda)
> T=cumsum(E)
> T=T[T<20]
> plot(c(T,50),0:length(T),type="s",xlim=c(0,20),col="blue")

with the following trajectory for the number of investor that did hold that specific stock between time 0 and time 20.

Then, the different techniques are the following,

# method 1
> E1=diff(T)
> m1=mean(E)
> M1[s]=m1

for the first one (means of time length, per stock),

# method 2
> if(length(T)>1){
+ n2=length(T)-1
+ d2=T[length(T)-l]-T[1]
+ N2[s]=n2; D2[s]=d2
+ }

for the second one (time length and number of transactions),

+ # method 3
+ T3=c(T,20)
+ C3=c(rep(0,length(T)-1),1)
+ km=survfit(Surv(diff(T3), 1-C3)~1)
+ m3=summary(km,rmean='individual')$table[5]
+ M3[s]=m3

for the third one (based on a prediction of the expected mean, from Kaplan-Meier estimate) and

# method 4
> T0=c(0,T,20)
> m4=min(T0[T0>10])-max(T0[T0<10])
> M4[s]=m4

for the fourth one (based on what happened at time 10). Using monte carlo simulations, we get very different quantities, that can all be interpreted as the “average time in which you hold a stock

> sum(D2,na.rm=TRUE)/sum(N2,na.rm=TRUE)
[1] 0.3692335
> mean(M1,na.rm=TRUE)
[1] 0.5469591
> mean(M3,na.rm=TRUE)
[1] 1.702908
> mean(M4,na.rm=TRUE)
[1] 12.40229

If we change to probabilities (and assume that high frequency investors are much more important than long-term ones), e.g. 

> invest=sample(size=ns,c("A","B","C"),
+ prob=c(.9,.09,.01),replace=TRUE)

then the first two estimates are rather different. But not the last two.

> sum(D2,na.rm=TRUE)/sum(N2,na.rm=TRUE)
[1] 0.04072227
> mean(M1,na.rm=TRUE)
[1] 0.06393767
> mean(M3,na.rm=TRUE)
[1] 0.2504322
> mean(M4,na.rm=TRUE)
[1] 12.05508

So I have to confess that the “average time in which you hold a stock” can be almost anything from 10 sec. to 10 years, it clearly depends on the way the average is calculated. The second point is that if the proportion of high frequency trading is extremely high, I should not affect the last one (which is, from my point of view, the most interesting one, an might also be improved by here also integrate a censored variate). So I guess people should be careful when discussing such quantities… And if anyone is willing to share data on that topic, I’d be glad to look at them…

To leave a comment for the author, please follow the link and comment on their blog: Freakonometrics - Tag - R-english.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.