## #2 Data Classes (CloudStat)

November 5, 2011
As stated in CloudStat Intro, we know that CloudStat is based on R Language, an object orientated language, everything in R is an object. Each object has a class. The simplest data objects are one-dimensional arrays called vectors, consisting of any nu...

## The Joy of R: A Feline Guide

November 5, 2011
Just because it’s caturday Images by Mario Pineda-Krch (CC BY-NC-SA 3.0) This is from the “Mario’s Entangled Bank” blog ( http://pineda-krch.com ) of Mario Pineda-Krch, a theoretical biologist at the University of Alberta. Filed under: cats, computing, humour, R, Sweave

## Colour wheels in R

November 5, 2011
Regular readers will know I use the R package to produce most of the charts that appear here on the blog. Being more quantitative than artistic, I find choosing colours for the charts to be one of the trickiest tasks when designing a chart, particularly as R has so many colours to choose from. In

## Data Referenced Journalism and the Media – Still a Long Way to Go Yet?

November 4, 2011
Reading our local weekly press this evening (the Isle of Wight County Press), I noticed a page 5 headline declaring “Alarm over death rates at St Mary’s”, St Mary’s being the local general hospital. It seems a Department of Health report on hospital mortality rates came out earlier this week, and the Island’s hospital, it

## Unit root versus breaking trend: Perron’s criticism

November 4, 2011
I came across an ingenious simulation by Perron during my Time-series lecture which I thought was worth sharing. The idea was to put your model to a further test of breaking trend before accepting the null of unit root. Let me try and illustrate this in simple language. A non-stationary time series is one that has its mean changing...

## Generating PPC Keywords in R – Part 2

November 4, 2011
In a previous post, I discussed how to generate PPC keywords in R. In this post I will provide another example of how to perform this task. Let’s say that I am a auto insurance company that only operates in the state of Illinois. I’m planing on bidding on keywords in Bing and Google which

## Rdatamarket Tutorial

November 4, 2011
The good folks at DataMarket have posted a new tutorial on using the rdatamarket package (covered here in August) to easily download public data sets into R for analysis. The tutorial describes how to install the rdatamarket package, how to extract metadata for data sets, and how to download the data themselves into R. The tutorial also illustrates a...

## match vs. %in%

November 4, 2011
match and %in% are two very commonly-used function in R. So, what's the difference of them?First, how to use them -- (copy from R manual)match returns a vector of the positions of (first) matches of its first argument in its second.%in% is a ...

## Confidence interval for predictions with GLMs

November 4, 2011
Consider a (simple) Poisson regression . Given a sample where , the goal is to derive a 95% confidence interval for given , where is the prediction. Hence, we want to derive a confidence interval for the prediction, not the potential observation, i.e. the dot on the graph below > r=glm(dist~speed,data=cars,family=poisson) > P=predict(r,type="response", + newdata=data.frame(speed=seq(-1,35,by=.2))) > plot(cars,xlim=c(0,31),ylim=c(0,170)) > abline(v=30,lty=2)...

## Factor to class-membership matrix

November 4, 2011
Recently on R-bloggers I found a post from chem-bla-ics blog concerning conversion of factors to integer vectors. At the end it stated a problem of conversion of factor variable to class-membership matrix. In comments several nice solutions were p...

## Help: stemming and stem completion with package tm in R

November 3, 2011
I came across a problem below when doing stemming and stem completion with package tm in R. Word “mining” was stemmed to “mine” with stemDocument(), and then completed to “miners”with stemCompletion(). However, I prefer to keep “mining” intact. For stemCompletion(), … Continue reading →

## Webinar on Portfolio Rebalancing with R and Sybase

November 3, 2011
R users in the financial industry may be interested in the following webinar hosted by Revolution Analytics' partner Sybase on November 10: Portfolio Rebalancing Using R and Sybase RAP for Intraday Risk Management With volatility and violent intraday swings becoming the new normal, intraday risk controls are now needed to not only reduce your exposures across multiple asset classes,...

## By: Super Nerdy Cool » Build multiarch R (32 bit and 64 bit) on Debian/Ubuntu

have the 64 bit version of R compiled from source on my Ubuntu laptop. I recently had a need for R based on 32 bit since a package I

## Modern Portfolio Optimization Theory: The idea

November 3, 2011
We were recently given a lecture (by Dr. Susan Thomas) on Harry Markowitz portfolio optimization theory, and I was really fascinating with the noble laureate's story of how he found it difficult to convince his guide about the importance of h...

## Variability of volatility estimates from daily returns

November 3, 2011
Investment Performance Guy has a post “Periodicity of risk statistcs (and other measures)” in which it is wondered how valid volatility estimates are from a month of daily returns. Here is a quick look.  Figure 1 shows the variability (and a 95% confidence interval) of volatility estimates for the S&P 500 index in January 2011.  … Continue reading...

## Maximizing Omega Ratio

November 3, 2011
$Maximizing Omega Ratio$

The Omega Ratio was introduced by Keating and Shadwick in 2002. It measures the ratio of average portfolio wins over average portfolio losses for a given target return L. Let x.i, i= 1,…,n be weights of instruments in the portfolio. We suppose that j= 1,…,T scenarios of returns with equal probabilities are available. I will

## Some Simple but Propably Useful Regex Examples with R-Package stringr…

November 3, 2011
I found that examples for the use of regex in R are rather rare. Thus, I will provide some examples from my own learning materials - mostly stolen from the help pages, with small but maybe illustrative adaptions.ps: I will extent this list of examples...

## First thoughts on R

November 2, 2011
Having worked just a little with R, I have some first impressions to share.  I'll give you some links to resources I found helpful with writing the previous project. First, the documentation is not very good.  I struggled on previous attempts to figure things out.  I still find it crap shoot when I Google, looking for an answer....

## Code Optimization: One R Problem, Ten Solutions – Now Eleven!

November 2, 2011
Earlier this year I came across a rather interesting page about optimisation in R from rwiki. The goal was to find the most efficient code to produce strings which follow the pattern below given a single integer input n: From this we can see that the general pattern for n is: It is rather heart

November 2, 2011
If you are creating maps then for goodness sake Use sensible colours!  I was helping some undergraduates with some work the other day, and they decided to use the following colour scheme for representing river depth: Deep water: Red Medium-depth water: Bright green Shallow water: Pink Why did they do this? Well, either they were

## The next generation of parallel R

November 2, 2011
In view of open-source parallel computing with R this week presents a big step to the future. R 2.14.0 was released at October 31th, 2011. Now, R base ships with a parallel computing package called “parallel”.  library(parallel) It combines advantages of the packages multicore and snow and it contains support for multiple RNG streams. The

## Strange behavior of correlation estimation

November 2, 2011
The Gaussian vector is extremely interesting since it remains Gaussian when conditioning. More precisely, if is a Gaussian random vector, then the conditional distribution of is also Gaussian. Further, it is possible to derive explicitly the cova...

## "Applications of R" contest submissions online

November 2, 2011
Thanks to everyone for participating in the "Applications of R in Business" contest. R users submitted more than 25 entries, describing how R is used in industries including life sciences, finance, manufacturing, sentiment analysis, and even sports. Some entries are just outlines for now (competitors have until November 30 to finalize their entries), but already there are some quite...

## Using Sweave with Beamer: A note on fonts

November 2, 2011
Recently, I've been preparing a poster using the LaTeX packages Beamer and beamerposter. The poster discusses a bunch of R stuff that I've been doing lately, so I successfully used Sweave to incorporate R code into the poster. However, I had some troub...

## Cycles in finite populations: A reproducible seminar in three acts

November 1, 2011
For this years Halloween I presented the mathematical biology seminar at the Centre for Mathematical Biology. Here is the title and the abstract… Cycles in finite populations: a reproducible seminar in three acts Many natural populations exhibit cyclic fluctuations. Explaining the underlying … Continue reading →

## Generating PPC Keywords in R

November 1, 2011
Paid search marketing refers to the process of driving traffic to a website by purchasing ads on search engines. Advertisers bid on certain keywords that users might search for, and that determines when and where their ads appear. For example, an individual who owns an auto dealership would want to bid on keywords relating to automobiles

## Selecting statistics for ABC model choice [R code]

November 1, 2011
As supplementary material to the ABC paper we just arXived, here is the R code I used to produce the Bayes factor comparisons between summary statistics in the normal versus Laplace example. (Warning: running the R code takes a while!) Filed under: R, Statistics, University life Tagged: ABC, Bayesian model choice, Laplace distribution, R, summary