Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The analysis of high frequency stock transactions has played an important role in the algorithmic trading and the result can be used to monitor stock movements and to develop trading strategies. In the paper “An Ordered Probit Analysis of Transaction Stock Prices” (1992), Hausman, Lo, and MacKinlay discussed estimating trade-by-trade stock price changes with the ordered probit model by incorporating potential model drivers, including previous price changes, trading volumes, and the time between consecutive trades. Following the same logic, Tsay demonstrated how to employ the ordered probit model to project price movements of high frequency stock trades in his book “An Introduction to Analysis of Financial Data with R” (2013).

The exercise below is simply to mimic the analysis shown in the chapter 6 of Tsay’s book. Please note that the output of rms::orm() function slightly differs from the one of MASS::polr() used in the book due to the different parameterization. Otherwise, results are largely consistent.

cat = read.table("Downloads/chap6/taq-cat-t-jan042010.txt", header = T)

### CALCULATE PRICE DIFFERENCE ###
pchg = cat$price[2:nrow(cat)] - cat$price[1:nrow(cat) - 1]

### CATEGORIES PRICE CHANGE ###
cchg = as.factor(memisc::cases((pchg  1,
(pchg >= -0.01 & pchg  2,
(pchg == 0) -> 3,
(pchg > 0 & pchg  4,
(pchg > 0.01) -> 5))

### PLOT HISTOGRAM OF PRICE CHANGES ###
barplot(table(cchg) / length(cchg), space = 0, col = "gray", border = NA, main = "Distribution of Price Changes", xlab = "Price Movements")


From the histogram above, it is interesting to see that the distribution of price movements looks very symmetrical and centering around the zero and that price changes for consecutive trades are mostly within the range of 1 – 2 cents.

y_raw = pchg[4:length(cchg)]
y = cchg[4:length(cchg)]

### CREATE LAGGED Y AS MODEL PREDICTORS ###
y1 = cchg[3:(length(y) + 2)]
y2 = cchg[2:(length(y) + 1)]

### CREATE LAGGED PRICE CHANGES AS MODEL PREDICTORS ###
pch1 = pchg[3:(length(y) + 2)]
pch2 = pchg[2:(length(y) + 1)]
pch3 = pchg[1:length(y)]

### CREATE LAGGED TRADING VOLUME AS MODEL PREDICTORS ###
vol1 = cat$size[4:(3 + length(y))] / 100 vol2 = cat$size[3:(2 + length(y))] / 100

### CREATE LAGGED SECONDS BETWEEN TRADES AS MODEL PREDICTORS ###
cat$time = strptime(paste(sprintf("%02d", cat$hour), sprintf("%02d", cat$minute), sprintf("%02d", cat$second), sep = ':'), "%H:%M:%S")
tdif = as.numeric(difftime(cat$time[-1], cat$time[-length(cat\$time)]))
tdif1 = tdif[3:(length(y) + 2)]
tdif2 = tdif[2:(length(y) + 1)]

df = data.frame(y, y1, y2, vol1, vol2, tdif1, tdif2, pch1, pch2, pch3)

### VOL1 / TDIF1 / TDIF2 ARE NOT SIGNIFICANT ###
m1 = rms::orm(y ~ y1 + y2 + pch1 + pch2 + pch3 + vol1 + vol2 + tdif1 + tdif2, data = df, family = probit)
#       Coef     S.E.   Wald Z Pr(>|Z|)
# vol1    0.0011 0.0012   0.88 0.3775
# tdif1  -0.0030 0.0034  -0.88 0.3783
# tdif2  -0.0018 0.0035  -0.52 0.6058

### REFIT THE MODEL WITH SIGNIFICANT DRIVERS ###
m2 = update(m1, y ~ y1 + y2 + pch1 + pch2 + pch3 + vol2)

### PREDICT PROBABILITY OF EACH CATEGORY ###
#          y=1        y=2       y=3        y=4         y=5
#1 0.017586540 0.08172596 0.6655605 0.17209486 0.063032101
#2 0.098890397 0.22135286 0.6180407 0.05228561 0.009430461
#3 0.001268321 0.01270428 0.4104822 0.30700447 0.268540702

### PREDICT CUMULATIVE PROBABILITY OF EACH CATEGORY ###
#       y>=2      y>=3       y>=4        y>=5
#1 0.9824135 0.9006875 0.23512696 0.063032101
#2 0.9011096 0.6797567 0.06171607 0.009430461
#3 0.9987317 0.9860274 0.57554517 0.268540702

### MODEL ACCURACY ASSESSMENT FOR PREDICTING PRICE INCREASES ###
pROC::roc(ifelse(y_raw > 0, 1, 0), predict(m2, type = "fitted")[, 3])
# Area under the curve: 0.6994

par(mfrow = c(2, 1))
ts.plot(y_raw, main = "Price Changes", ylab = "Price Changes")
ts.plot(predict(m2, type = "fitted")[, 3], main = "Probability of Price Increase", ylab = "Probability")