I see high frequency data

[This article was first published on Quantitative thoughts » EN, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In the previous post I shared an example how to get high frequency data from IB broker (well, it is retail version of HFD – it has only best bid/ask and the trades). Now, once you saved some data – what should you do next?

Next logical step would be data sanity check and visualization. For example, while preparing R script for this post, I found, that IB data contains numerous duplicates in the quotes. Every time, when the trade happens, IB trading platform sends the price and the size of the trade bundled together. Additionally, it sends the size of the trade as separate quote as well and this completely mess up the data. So, data sanity check and visualization gave me a hint, that something is wrong with the data.

Today I want to show an example in R, which loads data from mongodb and plots some parts of the data. This should give you better intuition on collected data.

Photobucket

The plot shows bid (light blue) prices , ask(green) prices and the trades (red). The size of the red dot indicates volume of the trade.

The source code is shared on github and below:

?View Code RSPLUS
#Author Dzidorius Martinaitis
#Date 2012-03-01
#Description 
 
require(rmongodb)
require(xts)
require(ggplot2)
mongo=mongo.create()
 
buf = mongo.bson.buffer.create()
mongo.bson.buffer.append(buf, "tickerId", 20L)
mongo.bson.buffer.start.object(buf, "size")
mongo.bson.buffer.append(buf, "$exists", "true")
mongo.bson.buffer.finish.object(buf)
 
query = mongo.bson.from.buffer(buf)
 
count = mongo.count(mongo,'quotes.trinti',query)
cursor=mongo.find(mongo,'quotes.trinti',query)
 
#############  very slow code #############
#size=''
#system.time(
#while(mongo.cursor.next(cursor)){
#  temp=(mongo.cursor.value(cursor));
#  if(is.xts(size))
#    size=rbind(size,xts(cbind(mongo.bson.value(temp,"field"),mongo.bson.value(temp,"size")),order.by=as.POSIXct(mongo.bson.value(temp,"tstamp")/1000,origin='1970-01-01',tz='Europa/Paris')))
#  else
#    size=xts(cbind(mongo.bson.value(temp,"field"),mongo.bson.value(temp,"size")),order.by=as.POSIXct(mongo.bson.value(temp,"tstamp")/1000,origin='1970-01-01',tz='Europa/Paris'))
})
#############  end very slow  #############
 
size=matrix(nrow=count,ncol=3)
counter=1
system.time(
  while(mongo.cursor.next(cursor))
  {
    temp=(mongo.cursor.value(cursor));
    size[counter,1]=mongo.bson.value(temp,"field");
    size[counter,2]=mongo.bson.value(temp,"size");
    size[counter,3]=mongo.bson.value(temp,"tstamp");
    counter=counter+1;
    if(counter>count)break;
    })
size=xts(size[,1:2],order.by=as.POSIXct(size[,3]/1000,origin='1970-01-01',tz='Europe/Paris'))
colnames(size)=c('field','size')
 
 
buf = mongo.bson.buffer.create()
mongo.bson.buffer.append(buf, "tickerId", 26L)
mongo.bson.buffer.start.object(buf, "price")
mongo.bson.buffer.append(buf, "$exists", "true")
mongo.bson.buffer.finish.object(buf)
 
query = mongo.bson.from.buffer(buf)
count = mongo.count(mongo,'quotes.trinti',query)
 
cursor=mongo.find(mongo,'quotes.trinti',query)
price=matrix(nrow=count,ncol=3)
counter=1
system.time(
  while(mongo.cursor.next(cursor))
  {
    temp=(mongo.cursor.value(cursor));
    price[counter,1]=mongo.bson.value(temp,"field");
    price[counter,2]=mongo.bson.value(temp,"price");
    price[counter,3]=mongo.bson.value(temp,"tstamp");
    counter=counter+1;
    if(counter>count)break;
  })
price=xts(price[,1:2],order.by=as.POSIXct(price[,3]/1000,origin='1970-01-01',tz='Europe/Paris'))
price=(price[which(price[,2]>0)])
 
colnames(price)=c('field','price')
 
quotes=cbind(price[,2][price[,1]==1],
             #cac40.volume[,2][cac40.volume[,1]==0],
             price[,2][price[,1]==2],
             #cac40.volume[,2][cac40.volume[,1]==3],
             price[,2][price[,1]==4]
             ,size[,2][size[,1]==5]
             )
 
quotes[,1]=na.locf(quotes[,1])
quotes[,2]=na.locf(quotes[,2])
quotes[,3]=na.locf(quotes[,3])
quotes[which(is.na(quotes[,4])),3]=NA
 
temp=tail(head(quotes,3000),1000)
temp=data.frame(ind=1:NROW(temp),trd=as.numeric(temp[,3])                
                ,bid=as.numeric(temp[,1]),ask=as.numeric(temp[,2])
                ,size=as.numeric(temp[,4])
                )
temp=melt(temp,id=c('ind'),na.rm=TRUE)
x=temp[which(temp$variable=='trd'),]
 
rez=temp[which(temp$variable!='trd'),]
rez=rez[which(rez$variable!='size'),]
a=temp[which(temp$variable=='size'),][,3]
ggplot(rez,aes(x=ind,y=value,color=variable))+geom_line()+geom_point(data=x,aes(size=a))

 

To leave a comment for the author, please follow the link and comment on their blog: Quantitative thoughts » EN.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)