## New Revolution Analytics office in Singapore

August 15, 2012
By

We're excited to announce the latest outpost of the Revolution Analytics team, with the opening of a new office in Singapore! This office will serve as the local HQ for Revolution Analytics serving our customers in the Asia-Pacific region. It was opened with the support of the support of the Infocomm Development Authority of Singapore, which is responsible for...

## Why trust some supposed laws of statistical sampling and…

August 15, 2012
By

Why trust some supposed laws of statistical sampling and convergence when you can just test them yourself? If you have a computer with R installed (also recommended: Rstudio) then you can stop dithering about whether these n=1000 studies cited in the n...

## What does a generalized linear model do?

August 15, 2012
By

What does a generalized linear model do? R supplies a modeling function called glm() that fits generalized linear models (abbreviated as GLMs). A natural question is what does it do and what problem is it solving for you? We work some examples and place generalized linear models in context with other techniques.For predicting a categorical Related posts:

## A New plot.xts

August 15, 2012
By

The Google Summer of Code (2012) project to extend xts has produced a very promising new plot.xts function.  Michael Weylandt, the project's student, wrote R-SIG-Finance to request impressions, feedback, and bug reports.  The function is hous...

## Probit Models with Endogeneity

August 15, 2012
By
$Probit Models with Endogeneity$

Dealing with endogeneity in a binary dependent variable model requires more consideration than the simpler continuous dependent variable case. For some, the best approach to this problem is to use the same methodology used in the continuous case, i.e. 2 stage least squares. Thus, the equation of interest becomes a linear probability model (LPM). The

## Project Euler — problem 18

August 15, 2012
By

The 18th Euler problem is sorta a route finding problem. It has occupied my mind for two days. Finally I came up to a clever solution. Find the maximum total from top to bottom of the triangle below: 75 95 64 17 … Continue reading →

## Processing sample labels using regular expressions in R

August 15, 2012
By

I am often found in possession of palaeo core data where the sample identifiers contain a core code or label plus the sample depth. Often these are things generated by colleagues who have used other software where for one reason … Continue reading →

## Predicting the memory usage of an R object containing numbers

August 15, 2012
By

To estimate if a certain vector of numbers will fit into memory, you can quite easily predict the memory usage based on the size of the vector. An integer vector will use 4 bytes per number, and a numeric vector… See more ›

## Processing sample labels using regular expressions in R

August 15, 2012
By

I am often found in possession of palaeo core data where the sample identifiers contain a core code or label plus the sample depth. Often these are things generated by colleagues who have used other software where for one reason or another they don’t want to store the depth information as a separate numeric variable. I also generate such...

## Chapter 2 Solutions – Statistical Methods in Bioinformatics

August 14, 2012
By

As I have mentioned previously, I have begun reading Statistical Methods in Bioinformatics by Ewens and Grant and working selected problems for each chapter. In this post, I will give my solution to two problems. The first problem is pretty straightforward. Problem 2.20 Suppose that a parent of genetic type Mm has three children. Then the parent transmits...

## Some Quirks of the R Language

August 14, 2012
By

R is my favorite programming language.  It's just so useful for getting work done.  Sometimes people will complain that R is a difficult language.  To me, this begs the questions:  difficult for what?  And for whom?  I personally think R is just about the easiest thing in the world for prototyping.  Meaning if you want to quickly crank out...

## Textbook – Statistical Methods in Bioinformatics

August 14, 2012
By

As part of my effort to acquaint myself more with biology, bioinformatics, and statistical genetics, I am trying to find as many resources as I can that provide a solid foundation. For instance, I am wading through Molecular Biology of the Cell at a pa...

## Minimum Expected Shortfall, Part 2

August 14, 2012
By

Previously, we setup the problem of constructing a minimum expected shortfall portfolio.   We exported the portfolio weights from each quarterly rebalancing into R objects. This post will process those weights and compare the portfolio s...

## The Statistical Sleuth (second edition) in R

August 14, 2012
By

For those of you who teach, or are interested in seeing an illustrated series of analyses, there is a new compendium of files to help describe how to fit models for the extended case studies in the Second Edition of the Statistical Sleuth: A Course in...

## Is gas cheaper than it used to be?

August 14, 2012
By

Biostatistician and R user Matt Cooper noticed recently that the price he pays for petrol (gasoline) at the pump in Perth, Australia was about the same as he was paying four years ago. Nonetheless, inflation has marched on over the years, so does that mean petrol is effectively cheaper now than it used to be? And how does the...

## Math Constants in C++

August 14, 2012
By

Some of my colleagues didn't know that you can use mathematical constants that are part of "cmath". Here is the small snippet that shows how to use PI from cmath library. Be aware that you need to write "#define _USE_MATH_DEFINES" before you include cm...

## Bank of America 1% Cash Rewards Aren’t Really 1%

August 14, 2012
By

Bank of America (BoA) has a "Cash Rewards" credit card that pays "1% cash back everywhere, every time"1. But if you read the fine print, it's clear that the reward is almost always less than 1%. Here's the relevant sentence from the terms and conditions2: Fractions are truncated at the 100th decimal place, and are

## Custom axis transformations in ggplot2

August 14, 2012
By

To apply a data transformation on an axis in a ggplot, you can use coordinate transformations. For more detail see the ggplot2 documentation. A number of coordinate transformations is available, including log10 and sqrt. However, if you want to perform… See more ›

## How to branch/fork a (StatET) project with SVN

August 14, 2012
By

I was introduced to version control at the 2011 Belgrade R+OSGeo in higher education summer school. I’ve been using it in my daily work ever since. Recently the need to branch my project came up and this post describes how after a few hours of reading teh internets satisfied my need. In a nutshell, you

## Random and fixed effects in sensory profiling

August 14, 2012
By

I am reading Introduction into mixed modelling by N.W. Galway. It is partly a repeat of things I know, but I expect to use mixed models quite a lot the coming time, so it is good to repeat these things.My problem with this book is a sensory exampl...

## London 2012 Olympics — medal statistics

August 14, 2012
By

The 2012 Olympic Games officially ended this Sunday in London. Although I missed most of the games, I was still entertaining myself with some hilarious news, such as Thomas’s re-diving. So much fun. I would remember this for years :) Games ended. … Continue reading →

## The essence of a handwritten digit

August 13, 2012
By

If you haven’t yet discovered the competitive machine learning site kaggle.com, please do so now. I’ll wait. Great – so, you checked it out, fell in love and have made it back. I recently downloaded the data for the getting started competition. It consists of 42000 labelled images (28×28) of hand written digits 0-9. The

August 13, 2012
By

Today I want to highlight a whitepaper about Adaptive Asset Allocation by Butler, Philbrick and Gordillo and the discussion by David Varadi on the robustness of parameters of the Adaptive Asset Allocation algorithm. In this post I will follow the steps of the Adaptive Asset Allocation paper, and in the next post I will show

## RInside 0.2.7

August 13, 2012
By

A new version 0.2.7 of RInside is now available via CRAN. RInside provides a set of convenience classes which facilitate embedding of R inside of C++ applications and programs, using the classes and functions provided by the Rcpp R and C++ integrati...

## Missouri: Comparison of Registered Voter Counts to Census Voting Age Population

August 13, 2012
By

By Earl F Glynn | Franklin Center A comparison of US Census voting age population data in Missouri to voter registration data shows a number of Missouri counties have bloated voter registration lists. Charts by county for the years 2000 to 2012 show how counties are maintaining their voter lists. Voter fraud potential is higher

## Cleaning sentences by recursively merging words using R

August 13, 2012
By

A question on StackOverflow really sparked my attention. The aim was to clean up a dataset of inappropriately spaced words. For example: My approach was to create what I call a wordpair object. The word pair object for the… See more ›

## Videos on Using R

August 13, 2012
By

In this post on his blog some months ago, Ethan Fosse drew attention to Anthony Damico's collection of over 90 videos on using the R software environment.Definitely worth looking at!© 2012, David E. Giles

## User Input using tcl/tk

August 13, 2012
By

I was inspired by Kay Cichini  recent post on creating a a tcl/tk dialog box for users to enter variable values. I am going to have a use for this very soon so took some time to make it a bit more generic. What I wanted is a function that takes a vector (of variable names)

## Quick SAP HANA and R usecase

DISCLAIMER: I'm not an SAP HANA expert or an R expert, not even a Python expert. I'm just a guy with a lot of ideas who loves to write blogs.The other day I was thinking about making some nice with SAP HANA and R, because people doesn't seem to be enou...