After my last post, I came across a few articles supporting the opinion that if you have a good reason to take random samples from a “big” dataset, you’re not committing some kind of sin: Big Data Blasphemy: Why Sample? … Continue reading →

What's with those estimates?By Ben OgorekIn R, categorical variables can be added to a regression using the lm() function without a hint of extra work. But have you ever look at the resulting estimates and wondered exactly what they were?First, let's define a data set.set.seed(12255)n = 30sigma = 2.0AOV.df <- data.frame(category = c(rep("category1", n) ...

The process of working on metadata and temperature series gives rise to several situations where I need to calculate the distance from every station to every other station. With a small number of stations this can be done easily on the fly with the result stored in a matrix. The matrix has rows and columns

Paulo (from the Instituto de Matemática e Estatística, Universidade de São Paulo, Brazil) has posted an answer to my earlier question both as a comment on the ‘Og and as a solution on StackOverflow (with a much more readable LaTeX output). His solution is based on the observation that the multidimensional log-normal distribution still allows

Wondering about the question I posted on Friday (on StackExchange, no satisfactory answer so far!), I looked further at the special case of the gamma distribution I suggested at the end. Starting from the moment conditions, and the solution is (hopefully) given by the system The resolution of this system obviously imposes conditions on those

I want to introduce the Transaction Cost and Execution Price functionality in the Backtesting library in the Systematic Investor Toolbox. The Transaction Cost is implemented by a commission parameter in the bt.run() function. You may specify the commissions in $ per share for “share” type backtest and as a percentage of total trade for “weight”

Web-scraping, or web-crawling, sounds like a seedy activity worthy of an Interpol investigative department. The reality, however, is far less nefarious. Web-scraping is any procedure by which someone extracts data from the internet. Given that it’s possible to get the internet on computers these days; web-scrapping opens an array of interesting possibilities to social-science researchers

e-mails with the latest R posts.

(You will not see this message again.)