"Russia is a riddle wrapped in a mystery inside an enigma." -- Winston Churchill, radio address in 1939 A couple of weeks ago, Graph of the Week published an article describing the significant improvement in medals won by the host...

I’ve been going through the medal statistics in London 2012 Olympics recently. I was planning to present some extra charts, such as medal-per-milli-population or medal-vs-GDP. However, it’s a little boring to present the same kind of charts. Thus, I’d like to look into some particular … Continue reading →

As some of you may know already, I’m co-organizing an upcoming conference called DataGotham that’s taking place in September. To help spread the word about DataGotham, I’m cross-posting the most recent announcement below: We’d like to let you know about DataGotham: a celebration of New York City’s data community! http://datagotham.com This is an event run

In this tutorial I am going to share my R&D and trading experience using the well-known from statistics Autoregressive Moving Average Model (ARMA). There is a lot written about these models, however, I strongly recommend Introductory Time Series with R, which I find is a perfect combination between light theoretical background and practical implementations in

People use the R language every day to create the elements of reports: tables, charts, analyses, and forecasts. But assembling all of that information into a print-ready document laid out with text can a hassle. You can cut-and-paste all of the elements into Word, but then what do you do when the data file gets updated at the last...

Data science is a sophisticated and complex discipline, but since it's still an emerging field, its practitioners come from a wide variety of backgrounds. Typically, though, a background in working with large data sets in a research setting is advantageous. This is why you may be mingling with a former physicist or immunologist at the next data hackathon...

(This article was first published on chem-bla-ics, and kindly contributed to R-bloggers) To leave a comment for the author, please follow the link and comment on their blog: chem-bla-ics. R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL,...

Today I want to continue with Adaptive Asset Allocation theme and examine how the strategy results are sensitive to look-back parameters used for momentum and volatility computations. I will follow the sample steps that were outlined by David Varadi on the robustness of parameters of the Adaptive Asset Allocation algorithm post. Please see my prior

Last week's meeting of the Chicago area Hadoop User Group (a joint meeting the Chicago R User Group, and sponsored by Revolution Analytics) focused on crunching Big Data with R and Hadoop. Jeffrey Breen, president of Atmosphere Research Group, frequently deals with large data sets in his airline consulting work, and R is his "go-to tool for anything data-related"....

RStudio’s mission from the beginning has been to create powerful tools that support the practices and techniques required for creating trustworthy, high quality analysis. For many years Hadley Wickham has been teaching and working on his own set of tools for R with many of the same core goals. We’ve been collaborating quite a bit

In the last post, The Relative Importance of Predictors, I showed how difficult it can be to assess the independent contribution that each predictor makes to the overall R-squared when the predictors are highly correlated. We spent some time looking at one example where the predictors were ratings from an airline satisfaction study. As is common in such studies,...

A herd of heuristic algorithms is compared using a portfolio optimization. Previously “A comparison of some heuristic optimization methods” used two simple and tiny portfolio optimization problems to compare a number of optimization functions in the R language. This post expands upon that by using a portfolio optimization problem that is of a realistic size … Continue reading...

I wrote before about heatmap tables as a better way of producing frequency or other tables, with a solution which works nicely in latex. It is possible to do them much more easily in ggplot2, like this library(Hmisc) library(ggplot2) library(reshape) data(HairEyeColor) P=t(HairEyeColor) Pm=melt(P) ggfluctuation(Pm,type="heatmap")+geom_text(aes(label=Pm$value),colour="white")+ opts(axis.text.x=theme_text(size = 15),axis.text.y=theme_text(size = 15)) Note that ggfluctuation will also take … Continue reading...

The book R for Dummies was released recently, and was just reviewed by Dirk Eddelbuettel in the Journal of Statistical Software. Dirk is an R luminary, creating such fantastic works as Rcpp. R for Dummies seems to have beaten Dirk's natural disinclination to like anything with "for Dummies" appended to it, receiving a pretty positive review. Here is the last bit: "R

About once a week someone will tell me there is a bug in my forecast package for R because it gives forecasts that are the same for all future horizons. To save answering the same question repeatedly, here is my response. A point forecast is (usually) the mean of the distribution of a future observation in the time series,...

This post is a quick tip on how to use the paste( ) function to read and write multiple files. First, let’s create some data. The next step is not necessary, but makes the subsequent code more readable. The following example is silly because you would rarely want to split your data as

The Timely Portfolio blog via R-bloggers has recently published some interesting entries about the value of horizon plots for visual comparison of a number of time series. Very nice it looks too. You can read more about them here. The trick to understanding them is to imagine that each row was orginally a line chart … Continue reading...

Piecewise regression comes about when you have ‘breakpoints’, where there are clearly two different linear relationships in the data with a sudden, sharp change in directionality. This crops up occasionally in ecology when dealing with, for example, species richness of understory plants … Continue reading →

I always wondered why is it so difficult to find an OpenBUGS example of simple linear regression on the Web. Curiously, such example is even missing in the OpenBUGS help. The only nice example so far is in the book … Continue reading →