Introduction to Kaggle Algorithmic Trading Challenge

January 10, 2012
By

(This article was first published on R, Ruby, and Finance, and kindly contributed to R-bloggers)

I recently participated in the Kaggle Algorithmic Trading Competition under the username VikP. For those who do not know what Kaggle is, it is a web site where individuals and corporations can host data analysis competitions. This particular competition involved the prediction of how the prices of 50,000 observations of 102 different securities at the tick level recovered after both buyer and seller initiated liquidity shocks.

Each competitor was provided with approximately 750,000 rows of training data, each of which corresponded to a separate liquidity shock event. Each row contained observations of the bid and ask prices and an event indicator(trade event or quote event) for the 50 time points immediately preceding the liquidity shock, and the bid and ask prices alone for the 50 events immediately following the event. There was also metadata provided, such as the number of shares that were traded to create the liquidity shock event and whether the event was buyer or seller initiated.

There were some limitations to what predictions could be made from the data, notably because volume information was missing for each trade. Because the evaluation metric was RMSE, which valued higher prices stocks much more highly lower priced ones, the competition became heavily dependent on filtering outliers.

I had a great time participating, enjoyed the high level of competition, and highly recommend Kaggle competitions to aspiring data miners. During the competition, I gained many insights into tick data and into analysis of financial data. I will share some of these insights in the next week or so, but I just wanted to introduce the competition first!

To leave a comment for the author, please follow the link and comment on their blog: R, Ruby, and Finance.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)