Levenshtein distance in C++ and code profiling in R

March 25, 2012

At work, the client requested, if existing search engine could accept singular and plural forms equally, e. g. "partner" and "partners" would lead to the same result. The first option – stemming. In that case, search engine would use root of a word, e. g. "partn". However, stemming has many weaknesses:

I see high frequency data

March 1, 2012

In the previous post I shared an example how to get high frequency data from IB broker (well, it is retail version of HFD – it has only best bid/ask and the trades). Now, once you saved some data – what should you do next? Next logical step would be data

How to save high frequency data in mongodb

February 24, 2012

Are you looking for ways how to save real time, high frequency data taken from Interactivebrokers.com API ? I built an example in C++ which saves all incoming data in Mongodb. Check this link if you are interested: https://github.com/kafka399/TwsMongo

Vectorized R vs Rcpp

February 1, 2012

In my previous post, I tried to show, that Rcpp is 1000 faster than pure R and that generated the fuss in the comments. Being lazy, I didn't vectorize R code and at the end I was comparing apples vs oranges. To fix that problem, I built a new script,

The power of Rcpp

January 30, 2012

While ago I built two R scripts to track OMX Baltic Benchmark Fund against the index. One script returns the deviation of  fund from the index and it works fast enough. The second calculates the value of the fund every minute and it used to take for while. For example,

C++ is dead. Long live C++

December 1, 2011

During the summer I was contacted by a hedge fund from Bahamas. The fund was looking for someone with R language skills on-site and insisted for phone interview. Besides obvious questions about finance, statistics, coding and how many tennis balls can fit in Boeing 747 (ok, this question was omitted), they

Trading volume forecast for an illiquid stock

August 8, 2011

When dealing with transaction cost analysis, a stock's volume is assumed to be stable or foreseeable.  However, there is different picture, then we are dealing with an illiquid stock. It is relatively easy to forecast the volume of a liquid stock, because trading volume has high autocorrelation – the volumes

How big block trades affect stock market prices?

July 27, 2011

I will be giving a presentation on "Optimal transaction cost" in Vilnius on  16  August. While preparing the presentation and looking for an optimal execution solution, a natural question arises: does the size of the trade affect stock market price? I'm sure, you would say 100 % yes. Well, you would be

timezone issue in R

May 14, 2011

While investigating Intraday patterns in FX returns and order flow paper I have faced the problem with timezone. I had 3 data sources with different timezones (GMT, CET, CEST). Most confusing thing was, that I didn't know, how to deal with summer time. But why did I have the data

Correlation network

March 22, 2011

I came up with an idea to draw correlation network to get a grasp about relationship between a list of stocks. An alternative way to show correlation matrix would be head map, which can have limitations with big matrices (__100). Unfortunately,  ggplot2 package doesn't have a easy way to draw
[Read more...]

Interesting volatility measurement, part 2

January 21, 2011

A few weeks ago I have mentioned about an interesting volatility prediction. It is based on two periods of historical volatility (standard deviation). The remaining question was – does it really works? I could not give the answer, because I didn't have VIX futures data at that time. Later on,
[Read more...]

