Controlling multiple risk measures during construction of efficient frontier

October 26, 2011
By
Controlling multiple risk measures during construction of efficient frontier

In the last few posts I introduced Maximum Loss, Mean-Absolute Deviation, and Expected shortfall (CVaR) and Conditional Drawdown at Risk (CDaR) risk measures. These risk measures can be formulated as linear constraints and thus can be combined with each other to control multiple risk measures during construction of efficient frontier. Let’s examine efficient frontiers computed

Read more »

PAWL package on CRAN

October 26, 2011
By
PAWL package on CRAN

The PAWL package (which I talked about there, and which implements the parallel adaptive Wang-Landau algorithm and adaptive Metropolis-Hastings for comparison) is now on CRAN! http://cran.r-project.org/web/packages/PAWL/index.html which means that within R you can easily install it by typing install.packages("PAWL") Isn’t that amazing? It’s just amazing. Kudos to the CRAN team for their quickness and their

Read more »

New features in R-bloggers.com

October 26, 2011
By
New features in R-bloggers.com

Hello dear R community, In the past few months I have rolled out a bunch of new features to R-bloggers, and I wanted to raise awareness to them.  Please consider giving some of these a try and leave me any feedback that you have (by leaving a comment on this post): Comments – it is now possible to leave comments in...

Read more »

Batch Processing vs. Interactive Sessions

October 26, 2011
By
Batch Processing vs. Interactive Sessions

We introduced batch processing 3 weeks ago. Many people asked about differences and benefits of batch processing or interactive sessions. Lets start with the definitions: Batch Processing / Batch Jobs: Batch processing is the execution of a series of programs or only one task on a computer environment without manual intervention. All data and commands

Read more »

Machine Learning Ex 5.2 – Regularized Logistic Regression

October 25, 2011
By
Machine Learning Ex 5.2 – Regularized Logistic Regression

Now we move on to the second part of the Exercise 5.2, which requires to implement regularized logistic regression using Newton's Method. Plot the data:

Read more »

treebase package on cran

October 25, 2011
By
treebase package on cran

My treebase package is now up on the CRAN repository. (Source code is up, the binaries should appear soon). Here’s a few introductory examples to illustrate some of the functionality of the package. Thanks in part to new data deposition requirements at journals such as Evolution, Am Nat, and Sys Bio, and data management plan

Read more »

The Psychology of Music and the ‘tuneR’ Package

October 25, 2011
By

Introduction This semester I’m TA’ing a course on the Psychology of Music taught by Phil Johnson-Laird. It’s been a great course to teach because (i) so much of the material is new to me and (ii) because the study of the psychology of music brings together so many of the intellectual tools I enjoy, including

Read more »

"Anyone planning to work with Big Data ought to learn Hadoop and R"

October 25, 2011
By

Dan Woods at Forbes interviewed LinkedIn's Daniel Tunkelang about the rise of data science and on building data science teams. When asked how students today should prepare themselves to be data scientists, Tunkelang gives some good advice: When we built the data science team at LinkedIn a few years ago, we looked for raw talent, assuming that smart people...

Read more »

Catching up faster by switching sooner

October 25, 2011
By
Catching up faster by switching sooner

Here is our discussion (with Nicolas Chopin) of the Read Paper of last Wednesday by T. van Erven, P. Grünwald and S. de Rooij (Centrum voor Wiskunde en Informatica, Amsterdam), entitled Catching up faster by switching sooner: a predictive approach to adaptive estimation with an application to the Akaike information criterion–Bayesian information criterion dilemma. It

Read more »

Mapping Hotspots with R: The GAM

October 25, 2011
By
Mapping Hotspots with R: The GAM

I've been getting a lot of questions about the method used to map the hotspots in the seasonal drunk-driving risk maps.  It uses the GAM (Geographical Analysis Machine), a way of detecting spatial clusters from two data inputs: the data of interes...

Read more »

Installing the RMySQL package on Windows 7

October 25, 2011
By

So you want to get statistical? Nowadays one of the ways to go is to use R, mostly in combination with ggplot2 for generating the plots. These plots and graphs however need some data, for that we use data sources. There are a lot of data sources availa...

Read more »

Example 9.11: Employment plot

October 25, 2011
By
Example 9.11: Employment plot

A facebook friend posted the picture reproduced above-- it makes the case that President Obama has been a successful creator of jobs, and also paints GW Bush as a president who lost jobs. Another friend pointed out that to be fair, all of Bush's presi...

Read more »

Consecutive number and lottery

October 25, 2011
By
Consecutive number and lottery

Recently, I have been reading odd things about strategies to win at the lottery. E.g. or I wrote something a long time ago, but maybe it would be better to write another post. First, it is easy to get data on the French lotteries, including dra...

Read more »

Longitudinal analysis: autocorrelation makes a difference

October 25, 2011
By
Longitudinal analysis: autocorrelation makes a difference

Back to posting after a long weekend and more than enough rugby coverage to last a few years. Anyway, back to linear models, where we usually assume normality, independence and homogeneous variances. In most statistics courses we live in a … Continue reading →

Read more »

Expected shortfall (CVaR) and Conditional Drawdown at Risk (CDaR) risk measures

October 25, 2011
By
Expected shortfall (CVaR) and Conditional Drawdown at Risk (CDaR) risk measures

In the Maximum Loss and Mean-Absolute Deviation risk measures post I started the discussion about alternative risk measures we can use to construct efficient frontier. Another alternative risk measures I want to discuss are Expected shortfall (CVaR) and Conditional Drawdown at Risk (CDaR). I will use methods presented in Comparative Analysis of Linear Portfolio Rebalancing

Read more »

Email Netiquette

October 25, 2011
By
Email Netiquette

A short piece of web-scrapping I sent as a reminder to my colleague. If you run it the result should be something like... Datatata!

Read more »

Sabermetrics Meets R Meetup

October 25, 2011
By

I just ran across this post at Big Computing. On November 14th, there will be an R User meet-up in Washington, DC (Tyson's Corner) led by Mike Driscoll about using R for sabermetric analysis (linked here). I will actually be home in Maryland for a co...

Read more »

Pair trading strategy : how to use "PairTrading" package

October 25, 2011
By
Pair trading strategy : how to use "PairTrading" package

Mr.Ishikawa(my old friend) and I developed "PairTrading" package, and uploaded it on CRAN.This article shows you how you can use it.The pair trading is a market neutral trading strategy and gives traders a chance to profit regardless of market conditions. The idea of this strategy is quite simple. 1 : Select two stocks(or any assets) moving similarly 2 : Short...

Read more »

Approximate Bayesian computational methods on-line

October 25, 2011
By
Approximate Bayesian computational methods on-line

Fig. 4 – Boxplots of the evolution of ABC approximations to the Bayes factor. The representation is made in terms of frequencies of visits to models MA(1) and MA(2) during an ABC simulation when ε corresponds to the 10,1,.1,.01% quantiles on the simulated autocovariance distances. The data is a time

Read more »

Machine Learning Ex 5.1 – Regularized Linear Regression

October 25, 2011
By
Machine Learning Ex 5.1 – Regularized Linear Regression

The first part of the Exercise 5.1 requires to implement a regularized version of linear regression. Adding regularization parameter can prevent the problem of over-fitting when fitting a high-order polynomial. Read More: 194 Words Totally

Read more »

Vanilla C code for the Stochastic Simulation Algorithm

October 24, 2011
By
Vanilla C code for the Stochastic Simulation Algorithm

The Gillespie stochastic simulation algorithm (SSA) is the gold standard for simulating state-based stochastic models. If you are a R buff, a SSA novice and want to get quickly up and running stochastic models (in particular ecological models) that are not … Continue reading →

Read more »

Simple Heatmap in R with Formula One Dataset

October 24, 2011
By
Simple Heatmap in R with Formula One Dataset

Now, that the 2011 F1 season is over I decided to quickly scrub the Formula 1 data of the F1.com website, such as the list of drivers, ordered by the approximate amount of salary driver is getting (top list driver is making the most, approx. 30MM) and position at the end of each race. There

Read more »

One week left to enter the $20,000 "Applications of R" contest

October 24, 2011
By
One week left to enter the $20,000 "Applications of R" contest

The deadline to enter the "Applications of R in Business" contest is just a week away. To qualify for $20,000 in prizes from Revolution Analytics, your entry must be submitted to inside-r.org by midnight PST on October 31. Note that this doesn't have to be your final submission: as long as you've entered a draft version, you can still...

Read more »

Two seasonal investors – R snippet

October 24, 2011
By
Two seasonal investors – R snippet

In “A tale of 2 Seasonal Investors“, the Big Picture discusses the simple idea of comparing two simple investment approaches: being exposed to the market 6 months every year (from November to April), as opposed to investing in the other 6 months of every year (from May to October). Going back 50 years in the

Read more »

NYT on Big Data and R

October 24, 2011
By

In the New York Times' "Bits" blog today, Quentin Hardy offers recollections on Big Data talks at the recent Web 2.0 Summit. He begins with a definition of Big Data: Big Data is really about ... the benefits we will gain by cleverly sifting through it to find and exploit new patterns and relationships. You see it now in...

Read more »

Show me your WAR face!

October 24, 2011
By
Show me your WAR face!

Below is a chart of the top 20 offensive players based on FanGraphs WAR for the 2011 season.  The various features and their corresponding metric are clear in the image. I’ve also included the leader and last place for each … Continue reading →

Read more »

XLConnect 0.1-7

October 24, 2011
By
XLConnect 0.1-7

Mirai Solutions GmbH (http://www.mirai-solutions.com) is pleased to announce the availability of XLConnect 0.1-7. This release includes a number of improvements and new features: Performance improvements when writing large xlsx files New workbook data extraction & replacement operators [, [<-, [[, … Continue reading →

Read more »

Parameter vs. Observation Dimension?

October 24, 2011
By
Parameter vs. Observation Dimension?

Bill Bolstad's response to Xi'an's review of his book Understanding Computational Bayesian Statistics included the following comment, which I found interesting: Frequentist p-values are constructed in the parameter dimension using a probability distribution defined only in the observation dimension. Bayesian credible intervals are constructed in the parameter dimension using a probability distribution in the parameter

Read more »

R Tutorial Series: Exploratory Factor Analysis

October 24, 2011
By
R Tutorial Series: Exploratory Factor Analysis

Exploratory factor analysis (EFA) is a common technique in the social sciences for explaining the variance between several measured variables as a smaller set of latent variables. EFA is often used to consolidate survey data by revealing the groupings ...

Read more »