The Google Summer of Code (2012) project to extend xts has produced a very promising new plot.xts function. Michael Weylandt, the project's student, wrote R-SIG-Finance to request impressions, feedback, and bug reports. The function is hous...

Dealing with endogeneity in a binary dependent variable model requires more consideration than the simpler continuous dependent variable case. For some, the best approach to this problem is to use the same methodology used in the continuous case, i.e. 2 stage least squares. Thus, the equation of interest becomes a linear probability model (LPM). The

The 18th Euler problem is sorta a route finding problem. It has occupied my mind for two days. Finally I came up to a clever solution. Find the maximum total from top to bottom of the triangle below: 75 95 64 17 … Continue reading →

To estimate if a certain vector of numbers will fit into memory, you can quite easily predict the memory usage based on the size of the vector. An integer vector will use 4 bytes per number, and a numeric vector… See more ›

I am often found in possession of palaeo core data where the sample identifiers contain a core code or label plus the sample depth. Often these are things generated by colleagues who have used other software where for one reason or another they don’t want to store the depth information as a separate numeric variable. I also generate such...

As I have mentioned previously, I have begun reading Statistical Methods in Bioinformatics by Ewens and Grant and working selected problems for each chapter. In this post, I will give my solution to two problems. The first problem is pretty straightforward. Problem 2.20 Suppose that a parent of genetic type Mm has three children. Then the parent transmits...

R is my favorite programming language. It's just so useful for getting work done. Sometimes people will complain that R is a difficult language. To me, this begs the questions: difficult for what? And for whom? I personally think R is just about the easiest thing in the world for prototyping. Meaning if you want to quickly crank out...

Biostatistician and R user Matt Cooper noticed recently that the price he pays for petrol (gasoline) at the pump in Perth, Australia was about the same as he was paying four years ago. Nonetheless, inflation has marched on over the years, so does that mean petrol is effectively cheaper now than it used to be? And how does the...

Some of my colleagues didn't know that you can use mathematical constants that are part of "cmath". Here is the small snippet that shows how to use PI from cmath library. Be aware that you need to write "#define _USE_MATH_DEFINES" before you include cm...

Bank of America (BoA) has a "Cash Rewards" credit card that pays "1% cash back everywhere, every time"1. But if you read the fine print, it's clear that the reward is almost always less than 1%. Here's the relevant sentence from the terms and conditions2: Fractions are truncated at the 100th decimal place, and are

To apply a data transformation on an axis in a ggplot, you can use coordinate transformations. For more detail see the ggplot2 documentation. A number of coordinate transformations is available, including log10 and sqrt. However, if you want to perform… See more ›

I was introduced to version control at the 2011 Belgrade R+OSGeo in higher education summer school. I’ve been using it in my daily work ever since. Recently the need to branch my project came up and this post describes how after a few hours of reading teh internets satisfied my need. In a nutshell, you

If you haven’t yet discovered the competitive machine learning site kaggle.com, please do so now. I’ll wait. Great – so, you checked it out, fell in love and have made it back. I recently downloaded the data for the getting started competition. It consists of 42000 labelled images (28×28) of hand written digits 0-9. The

Today I want to highlight a whitepaper about Adaptive Asset Allocation by Butler, Philbrick and Gordillo and the discussion by David Varadi on the robustness of parameters of the Adaptive Asset Allocation algorithm. In this post I will follow the steps of the Adaptive Asset Allocation paper, and in the next post I will show

A new version 0.2.7 of RInside is now available via CRAN. RInside provides a set of convenience classes which facilitate embedding of R inside of C++ applications and programs, using the classes and functions provided by the Rcpp R and C++ integrati...

By Earl F Glynn | Franklin Center A comparison of US Census voting age population data in Missouri to voter registration data shows a number of Missouri counties have bloated voter registration lists. Charts by county for the years 2000 to 2012 show how counties are maintaining their voter lists. Voter fraud potential is higher

A question on StackOverflow really sparked my attention. The aim was to clean up a dataset of inappropriately spaced words. For example: My approach was to create what I call a wordpair object. The word pair object for the… See more ›

In this post on his blog some months ago, Ethan Fosse drew attention to Anthony Damico's collection of over 90 videos on using the R software environment.Definitely worth looking at!© 2012, David E. Giles

We have three new local R user groups to announce this month. The Alamo City R Users Group in San Antonio becomes the fifth R user group in Texas. The group's just getting started, and volunteers are always welcome. Although not a dedicated R group, the Milwaukee Chapter of the ASA hosts occasional R workshops. In May next year,...

I unfortunately was not there, but we can vicariously enjoy it via the presentations that are posted on the conference website. Below is my take on the highlights (in chronological order). Peter Carl and Brian Peterson “Constructing Strategic Hedge Fund Portfolios” is wonderful from my perspective. Promoting random portfolios is sure to win my heart. … Continue reading...