Ordered Probit Model and Price Movements of High-Frequency Trades

August 19, 2018
By

(This article was first published on S+/R – Yet Another Blog in Statistical Computing, and kindly contributed to R-bloggers)

The analysis of high frequency stock transactions has played an important role in the algorithmic trading and the result can be used to monitor stock movements and to develop trading strategies. In the paper “An Ordered Probit Analysis of Transaction Stock Prices” (1992), Hausman, Lo, and MacKinlay discussed estimating trade-by-trade stock price changes with the ordered probit model by incorporating potential model drivers, including previous price changes, trading volumes, and the time between consecutive trades. Following the same logic, Tsay demonstrated how to employ the ordered probit model to project price movements of high frequency stock trades in his book “An Introduction to Analysis of Financial Data with R” (2013).

The exercise below is simply to mimic the analysis shown in the chapter 6 of Tsay’s book. Please note that the output of rms::orm() function slightly differs from the one of MASS::polr() used in the book due to the different parameterization. Otherwise, results are largely consistent.

cat = read.table("Downloads/chap6/taq-cat-t-jan042010.txt", header = T)

### CALCULATE PRICE DIFFERENCE ###
pchg = cat$price[2:nrow(cat)] - cat$price[1:nrow(cat) - 1]

### CATEGORIES PRICE CHANGE ###
cchg = as.factor(memisc::cases((pchg  1, 
                               (pchg >= -0.01 & pchg  2, 
                               (pchg == 0) -> 3, 
                               (pchg > 0 & pchg  4, 
                               (pchg > 0.01) -> 5))

### PLOT HISTOGRAM OF PRICE CHANGES ###
barplot(table(cchg) / length(cchg), space = 0, col = "gray", border = NA, main = "Distribution of Price Changes", xlab = "Price Movements")

hist

From the histogram above, it is interesting to see that the distribution of price movements looks very symmetrical and centering around the zero and that price changes for consecutive trades are mostly within the range of 1 – 2 cents.

y_raw = pchg[4:length(cchg)]
y = cchg[4:length(cchg)]

### CREATE LAGGED Y AS MODEL PREDICTORS ###
y1 = cchg[3:(length(y) + 2)]
y2 = cchg[2:(length(y) + 1)]

### CREATE LAGGED PRICE CHANGES AS MODEL PREDICTORS ###
pch1 = pchg[3:(length(y) + 2)]
pch2 = pchg[2:(length(y) + 1)]
pch3 = pchg[1:length(y)]

### CREATE LAGGED TRADING VOLUME AS MODEL PREDICTORS ###
vol1 = cat$size[4:(3 + length(y))] / 100
vol2 = cat$size[3:(2 + length(y))] / 100

### CREATE LAGGED SECONDS BETWEEN TRADES AS MODEL PREDICTORS ###
cat$time = strptime(paste(sprintf("%02d", cat$hour), sprintf("%02d", cat$minute), sprintf("%02d", cat$second), sep = ':'), "%H:%M:%S")
tdif = as.numeric(difftime(cat$time[-1], cat$time[-length(cat$time)]))
tdif1 = tdif[3:(length(y) + 2)]
tdif2 = tdif[2:(length(y) + 1)]

df = data.frame(y, y1, y2, vol1, vol2, tdif1, tdif2, pch1, pch2, pch3)

### VOL1 / TDIF1 / TDIF2 ARE NOT SIGNIFICANT ###
m1 = rms::orm(y ~ y1 + y2 + pch1 + pch2 + pch3 + vol1 + vol2 + tdif1 + tdif2, data = df, family = probit)
#       Coef     S.E.   Wald Z Pr(>|Z|)
# vol1    0.0011 0.0012   0.88 0.3775  
# tdif1  -0.0030 0.0034  -0.88 0.3783  
# tdif2  -0.0018 0.0035  -0.52 0.6058

### REFIT THE MODEL WITH SIGNIFICANT DRIVERS ###
m2 = update(m1, y ~ y1 + y2 + pch1 + pch2 + pch3 + vol2)

### PREDICT PROBABILITY OF EACH CATEGORY ###
head(predict(m1, type = "fitted.ind"), 3)
#          y=1        y=2       y=3        y=4         y=5
#1 0.017586540 0.08172596 0.6655605 0.17209486 0.063032101
#2 0.098890397 0.22135286 0.6180407 0.05228561 0.009430461
#3 0.001268321 0.01270428 0.4104822 0.30700447 0.268540702

### PREDICT CUMULATIVE PROBABILITY OF EACH CATEGORY ###
head(predict(m2, type = "fitted"), 3)
#       y>=2      y>=3       y>=4        y>=5
#1 0.9824135 0.9006875 0.23512696 0.063032101
#2 0.9011096 0.6797567 0.06171607 0.009430461
#3 0.9987317 0.9860274 0.57554517 0.268540702

### MODEL ACCURACY ASSESSMENT FOR PREDICTING PRICE INCREASES ###
pROC::roc(ifelse(y_raw > 0, 1, 0), predict(m2, type = "fitted")[, 3])
# Area under the curve: 0.6994

par(mfrow = c(2, 1))
ts.plot(y_raw, main = "Price Changes", ylab = "Price Changes")
ts.plot(predict(m2, type = "fitted")[, 3], main = "Probability of Price Increase", ylab = "Probability")

cat

To leave a comment for the author, please follow the link and comment on their blog: S+/R – Yet Another Blog in Statistical Computing.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)