This is the least complicated trend strategy in existance. You buy and hold the security as long as the security price is above a XXX-Day Simple Moving Average (SMA), and you can short it if it is below the SMA … Continue reading →

Let’s do an easy experiment. Lets caluclate the 25-day rolling volatility of the S&P 500 from 2007 onwards. 1-Get the data: getSymbols(‘SPY’,from=’2007/01/01′) 2-Run the volatility function from the package TTR (comes along with quantmod): vol=volatility(SPY,n=25,N=252,calc=’close’) #n=25 means we want 25 … Continue reading →

PCA is a very common method for exploration and reduction of high-dimensional data. It works by making linear combinations of the variables that are orthogonal, and is thus a way to change basis to better see patterns in data. You either do spectral decomposition of the correlation matrix or singular value decomposition of the data

I got "hooked" on OOP approach of R in particular reference classes. And after my last little project on option scenario analysis I reconstructed my messy technical strategy testing code.Now to begin I would like to reason why I have done this while there exists a nice "blotter" and "quantstrat" packages.First of all "quantstrat" is faster than blotter, which...

Introduction This will serve as an introduction to natural language processing. I adapted it from slides for a recent talk at Boston Python. We will go from tokenization to feature extraction to creating a model using a machine learning algorithm. The goal is to provide a reasonable baseline on top of which more complex natural language processing can be...

I just gave a talk at Boston Python about natural language processing in general, and edX ease and discern in specific. You can find the presentation source here, and the web version of it here. There is a video of it here. Nelle Varoquaux and Micha...

(This article was first published on Ecology in silico, and kindly contributed to R-bloggers) Violin plots are useful for comparing distributions. When data are grouped by a factor with two levels (e.g. males and females), you can split the violins in half to see the difference between groups. Consider a 2 x 2 factorial experiment: treatments A and B...

<p>Loading ...</p>

As R has evolved over the past 20 years its capabilities have improved in every area. The visual display of time series is no exception: as the folks from Timely Portfolio note that: Through both quiet iteration and significant revolutions, the volunteers of R have made analyzing and charting time series pleasant. R began with the basics, a simple...

The 8th iteration of the DREAM Challenges are underway. DREAM is something like the Kaggle of computational biology with an open science bent. Participating teams apply machine learning and statistical modeling methods to biological problems, competing to achieve the best predictive accuracy. This year's three challenges focus on reverse engineering cancer, toxicology and the kinetics of...

There are different ways of specifying and running Bayesian models from within R. Here I will compare three different methods, two that relies on an external program and one that only relies on R. I won’t go into much detail about the differences in syntax, the idea is more to give a gist about how the different modeling languages...

Introduction Continuing my recent series on exploratory data analysis (EDA), and following up on the last post on the conceptual foundations of empirical cumulative distribution functions (CDFs), this post shows how to plot them in R. (Previous posts in this series on EDA include descriptive statistics, box plots, kernel density estimation, and violin plots.) I

Introduction Continuing my recent series on exploratory data analysis (EDA), this post focuses on the conceptual foundations of empirical cumulative distribution functions (CDFs); in a separate post, I will show how to plot them in R. (Previous posts in this series include descriptive statistics, box plots, kernel density estimation, and violin plots.) To give you

A new version of Rcpp is now on the CRAN network for GNU R; binaries for Debian have been uploaded as well. This release brings a fairly large number of fixes and improvements across a number of Rcpp features, see below for the detailed list. We a...

If you know a beer you like and want some recommendations for a style of beer to try, check out the yhat Beer Recommender: This neat little app is the product of a recommendation system built using the R language by the folks behind the yhat blog. It's based on about 1.5 million beer reviews from the Beer Advocate....

On Thursday of last week I gave a short informal talk to Stat Bytes, the CMU Statistics department‘s twice a month computing seminar. Quick tricks for faster R code: Profiling to Parallelism Abstract: I will present a grab bag of … Continue reading →

I’ve been using Thomas Leeper‘s MTurkR package to administer my most recent Mechanical Turk study—an extension of work on representative-constituent communication claiming credit for pork benefits, with Justin Grimmer and Sean Westwood. MTurkR is excellent, making it quick and easy to: test … Continue reading →

Introduction A few first posts of this blog will demonstrate how to build each report hosted by the business intelligence (BI) application dashboard shown below (see Fig. 1). This application uses the following tools and technologies R – a free software environment for statistical computing and graphics, ASP.NET MVC4 – a free framework for building