Multiple Factor Model – Building CSFB Factors

February 12, 2012
By
Multiple Factor Model – Building CSFB Factors

This is the third post in the series about Multiple Factor Models. I will build on the code presented in the prior post, Multiple Factor Model – Building Fundamental Factors, and I will show how to build majority of factors described in the CSFB Alpha Factor Framework. For details of the CSFB Alpha Factor Framework

Read more »

Data Exploration – Gold vs Gold Mining Stocks

February 12, 2012
By
Data Exploration – Gold vs Gold Mining Stocks

I have been looking into time series analysis with R.  I'm still ramping up the learning curve as I am very accustomed to SAS/ETS.  With ETS, everything is in a couple of procedures, I know where and how to get things done.  In R, things...

Read more »

R and presentations: a basic example of knitr and beamer

February 12, 2012
By

Manually combining R code and a presentation can be quite a pain. Luckily, using tools like odfWeave, Sweave and knitr, integrating documents and R code is quite painless. In this post I want to take a look at combining the… See more ›

Read more »

Write data (frame) to Excel file using R package xlsx

February 12, 2012
By
Write data (frame) to Excel file using R package xlsx

Writing to Excel files comes up rather often, especially if you’re collaborating with non-OSS users. There are several options, but I like the xlsx package way of doing things. Authors use Java to write to Excel files, which are basically compressed XML files. Alright, let’s get cracking. First, let’s create some data. If you don’t

Read more »

The R-Podcast Episode 1: Introduction

February 12, 2012
By

Here is the inaugural episode of the R-Podcast! In this episode, I take a few minutes to introduce myself and to explain the main goals of this podcast. I also define what R is and give an overview of R’s history of development and features that distinguish it from other statistical software. Please feel free

Read more »

Elegant & fast data manipulation with data.table

February 12, 2012
By
Elegant & fast data manipulation with data.table

Just learned about the R data.table package (ht @recology_) makes R data frames into ultra-fast, SQL-like objects. One thing we get is some very nice and powerful syntax. Consider some simple data of replicate time series: To apply a function to each set of replicates, instead of We can use: Note that we could have

Read more »

Unsupervised Image Segmentation with Spectral Clustering with R

February 12, 2012
By
Unsupervised Image Segmentation with Spectral Clustering with R

That title is quite a mouthful. This quarter, I have been reading papers on Spectral Clustering for a reading group. The basic goal of clustering is to find groups of data points that are similar to each other. Also, data points in one group should be ...

Read more »

R for Quants, Part I.A

February 12, 2012
By
R for Quants, Part I.A

I’m teaching an R workshop for the Baruch MFE program. This is the first installment of the workshop and focuses on …Continue reading »

Read more »

"R" PLS Package: Multiple Scatter Correction (MSC)

February 12, 2012
By
"R" PLS Package: Multiple Scatter Correction  (MSC)

MSC (Multiple Scatter Correction) is a Math treatment to correct the scatter in the spectra. The scatter is produced for different physical circumstances as particle size, packaging.Normally scatter make worse the correlation of the spectra with the constituent of interest.Almost all the chemometric software’s available include this math treatment and of course “R” have it as well in the...

Read more »

Machine Learning Examples in R

February 12, 2012
By
Machine Learning Examples in R

This is a post that has been a long time in the making. Following on from the excellent Stanford Machine Learning Course I have made examples of the main algorithms covered in R.We have Linear RegressionFollowed by Neural NetworksAnd Support ...

Read more »

Classifying Breast Cancer as Benign or Malignant Using RTextTools

RTextTools has largely been used for topic classification in the social sciences. However, recent discussions with researchers at various universities have demonstrated that the package can be applied to a host of problems in the natural sciences as well.One such application is using text classification to identify breast cancer masses as benign or malignant. Using the Wisconsin Diagnostic Breast Cancer...

Read more »

piecewise regression

February 11, 2012
By
piecewise regression

A beta of a stock generally means its relation with the market, how many percent move we should expect from the stock when the market moves one percent. Market, being a somewhat vague notion is approximated here, as usual, using … Continue reading →

Read more »

Generating directed Watts-Strogatz network

February 11, 2012
By
Generating directed Watts-Strogatz network

There are two limitations of Watts-Strogatz network generator in igraph package: (1) it works only for undirected graphs and (2) rewiring algorithm can produce loops or multiple edges.You can use simplify function of such a graph, but then number of ed...

Read more »

R jags rjags on an ec2 instance

February 11, 2012
By

Winbugs and Jags free Item Response Theory from the dot matrix plots of proprietary software and open up a multicoloured world of posterior predictive model checking. Fitting IRT models using brute force is not for the impatient, however. That’s why, just as early psychometricians shipped off their calculations to teams of monks. I’ve shipped off my model fitting to...

Read more »

Stupid R tricks: using outer to create many data.frame subsets

February 11, 2012
By
Stupid R tricks: using outer to create many data.frame subsets

Selecting subsets of a data.frame is easy in R if you define the predicates manually. But if you need to define many conditions the standard slicing and subsetting methods are cumbersome. For this illustration I want to pick some large number of numerical ranges and label all of the rows that match any of the

Read more »

Revolution R and Fedora: Revisited

February 10, 2012
By
Revolution R and Fedora: Revisited

A previous post of mine had suggested that, despite them being extremely similar operating systems, and really there being no clear reason why, Revolution R 5.0, which does support Red Hat Enterprise Linux, refused to work on Fedora 16. The installation failed, dependencies could not be installed, tech support was singularly unhelpful because I wasn’t

Read more »

RTextTools Short Course Materials

Attached are some of the materials from the recent short course at UNC. For confidential reasons, we are unable to present all of the materials, but this is enough to get someone started. 1. Lecture; 2. Intro to R; 3. NY Times; 4.

Read more »

More Thoughts on Potential Audience Metrics for Hashtag Communities

February 10, 2012
By
More Thoughts on Potential Audience Metrics for Hashtag Communities

Following on from the sketched ideas relating to estimating the Potential Audience Size for a Hashtag Community?, here are a few quick doodles around the graph representation of the tag users – followers graph that explore the extent to which we can use quite simple counts and analyses to get a feel for how the

Read more »

Simplified Example of Systematic Investor’s Fine Work

February 10, 2012
By
Simplified Example of Systematic Investor’s Fine Work

THIS IS ONLY AN EXAMPLE AND IS NOT INVESTMENT ADVICE. ACTING ON THIS WILL LOSE LOTS OF MONEY. Systematic Investor Blog (be sure to check out the site) offers extremely good examples of how to use R in finance.  Since I firmly believe more examples...

Read more »

Revisiting homicide rates

February 10, 2012
By
Revisiting homicide rates

A pint of R plotted an interesting dataset: intentional homicides in South America. I thought the graphs were pretty but I was unhappy about the way information was conveyed in the plots; relative risk should be very important but number … Continue reading →

Read more »

Reading Code

February 10, 2012
By

Code Readability is maybe the most important part of producing reproducible research. If it's impossible (i.e. very costly) for somebody else to read/understand the computer code that underlies your results, then the odds are that they will never be...

Read more »

Visualising the Metropolis-Hastings algorithm

February 10, 2012
By
Visualising the Metropolis-Hastings algorithm

In a previous post, I demonstrated how to use my R package MHadapive to do general MCMC to estimate Bayesian models. The functions in this package are an implementation of  the Metropolis-Hastings algorithm. In this post, I want to provide an intuitive way to picture what is going on ‘under the hood’in this algorithm. The

Read more »

A new local R user group in Cambridge, UK

February 10, 2012
By

It turns out there's another local R user group in Cambridge, UK. It's called CambR, and organizing committee member Laurent Gatto described its history to me in an email: After meeting repeatedly at several R related conferences (Bioconductor meetings, useR 2011), some R enthusiasts thought Cambridge deserved a local R user group and founded CambR in September 2011. Since...

Read more »

R charts used for analysis at Politico

February 10, 2012
By
R charts used for analysis at Politico

Zack Abrahamson, the "data whiz" at political analysis site Politico, is apparently an R user. Politico's Feb 10 2012 chart of the day clearly uses the ggplot2 graphics package and (quoting Politico) looks into the disenchanted slice of the GOP that’s not engaged with its party’s primary. And that slice doesn’t like Mitt Romney. People say turnout's down. When...

Read more »

managing projects using RStudio

February 10, 2012
By
managing projects using RStudio

We're continually amazed with new developments within RStudio, the integrated developed environment for R that we highlighted previously (Among others, Andrew Gelman agrees with us about its value). The most recent addition addresses one of our earlie...

Read more »

MAT8886 exchangeability, credit risk and risk measures

February 10, 2012
By
MAT8886 exchangeability, credit risk and risk measures

Exchangeability is an extremely concept, since (most of the time) analytical expressions can be derived. But it can also be used to observe some unexpected behaviors, that we will discuss later on with a more general setting. For instance, in a old...

Read more »

"R": Predicting a Test Set (Gasoline)

February 9, 2012
By
"R": Predicting a Test Set (Gasoline)

> data(gasoline)> #60 spectra of gasoline (octane is the constituent) > #We divide the whole Set into a Train Set and a Test Set.> gasTrain<-gasoline> gasTest<-gasoline> #Let´s develop the PLSR with the Tain Set ...

Read more »

On Unpublished Software

February 9, 2012
By
On Unpublished Software

sciseekclaimtoken-4f343317d3d60 I ran across this post at The Tree of Life entitled ‘Interesting new metagenomics paper w/ one big big big caveat – critical software not available”. The long and short of it? Paper appears in Science, has fancy new methodology, lacks the software for someone else to use their methodology. Blog author understandably annoyed. But I

Read more »

Daily casualties in Syria

February 9, 2012
By
Daily casualties in Syria

Every new day brings its statistics of new deaths in Syria… Here is an attempt to learn about the Syrian uprising by the figures. Data vary among sources: the Syrian opposition provides the number of casualties by day (here on Dropbox), updated on 8 February 2012, with a total exceeding 8 000. We note first

Read more »