## Data types, part 3: Factors!

November 21, 2012
In this third part of the data types series, I'll go an important class that I skipped over so far: factors.Factors are categorical variables that are super useful in summary statistics, plots, and regressions. They basically act like dummy variables t...

## Upcoming Webinar: Real-time, big-data analytics

November 21, 2012
A quick heads-up that I'll be presenting another brand-new webinar on Thursday next week (November 29). In Real-time Big Data Analytics: From Deployment to Production, I'll review the process of making predictive models work in real-live operational environments. I'll also tackle those ubiquitous buzz-words "real-time" and "big data", and the fact that they can mean very different things in...

## IPython vs RStudio+knitr

November 21, 2012
At a meeting last night with some collaborators at the Vélobstacles project, I was excitedly told about the magic of IPython and it’s notebook functionality for reproducible research. This sounds familiar, I thought to myself. Using a literate programming approach to integrate computation with the communication of methodology and results has been at the core

## Representing density in two dimensions

November 21, 2012
I’ll be subbing today for Chris, as we continue to explore some ggplot2 basics. Today, imagine that you have data distributed in two dimensions, and that you would like to convey differences in point density over space. As with many things, this...

## Creating an R package in Windows

November 21, 2012
A nice package can be both beautiful and functional. The image is CC by MIAD Communication Design. Inspired by Read more »

## Fun with coin flips

November 21, 2012
We all know that the odds of flipping an unbiased coin is 50% heads, 50% tails. But what happens if you do this a lot of times. Do you expect the same number of heads and tails? What if we took a cumulative sum where heads = +1 and tails = -1. What wou...

## Video: SimpleR tricks and tools: Help, debugging, git, LaTeX, and workflow with R by Prof Rob Hyndman

November 21, 2012
This post shares the video from a talk presented on 20th November 2012 by Professor Rob Hyndman at Melbourne R Users. The talk provides an introduction to: Getting R help Debugging R functions R style guides Making good use of … Continue reading →

## Rcpp attributes: A simple example ‘making pi’

November 20, 2012
We introduced Rcpp 0.10.0 with a number of very nice new features a few days ago, and the activity on the rcpp-devel mailing list has been pretty responsive which is awesome. But because few things beat a nice example, this post tries to build some more excitement. We will illustrate how Rcpp attributes makes it really easy to add C++ code...

## R User Conference in Spain: Call for Tutorials

November 20, 2012
I'm really looking forward to useR! 2013 (the international conference for R users), and not just because it's being held in Spain next year (July 10-12). The program is already coming together, with a great lineup of invited speakers, including R-core member Duncan Murdoch and prolific package authoR Hadley Wickham. You too can be part of the program, by...

## optimising accept-reject

November 20, 2012
$optimising accept-reject$

I spotted on R-bloggers a post discussing optimising the efficiency of programming accept-reject algorithms. While it is about SAS programming, and apparently supported by the SAS company, there are two interesting features with this discussion. The first one is about avoiding the dreaded loop in accept-reject algorithms. For instance, taking the case of the truncated-at-one

## Functional programming with lambda.r

November 20, 2012
$Functional programming with lambda.r$

After a four month simmer on various back burners and package conflicts, I’m pleased to announce that the successor to …Continue reading »

## SimpleR tips, tricks and tools

November 20, 2012
I gave this talk last night to the Melbourne Users of R Network. Examples

## Claims reserving in R: ChainLadder 0.1.5-4 released

November 20, 2012
Last week we released version 0.1.5-4 of the ChainLadder package on CRAN. The R package provides methods which are typically used in insurance claims reserving. If you are new to R or insurance check out my recent talk on Using R in Insurance.The chain-ladder method which is a popular method in the insurance industry to forecast future...

## Heteroskedastic GLM in R

November 20, 2012
A commenter on my previous blog entry has drawn my attention to an R function called hetglm() that estimates heteroskedastic probit models. This function is contained in the glmx package. The glmx package is not available on CRAN yet, but thankfully can be downloaded here. The hetglm() function has a number of computational advantages compared with

## Prime Factorization Visualization with R and Shiny

November 20, 2012
Quite a lot of people have had fun recently with prime factorization. It all started on The Math Less Traveled, then various versions of the prime factorization diagrams appeared (here, here, this animated one, etc., they are actually more or less listed here). So I wanted to have fun too and give a try...

## Project Euler — problem 24

November 20, 2012
It’s a lovely day. I took a walk around the campus after lunch. The scene was enjoyable in one deep autumn day. Before the afternoon work, I’d like to spend a few moments on the 24th Euler Problem. A permutation is an ordered arrangement of … Continue reading →

## Drawdown Determined Position Size

November 19, 2012
This caught my eye as I searched for some more academic research on my favorite risk measure drawdown. Yang, Z. George and Zhong, Liang,Optimal Portfolio Strategy to Control Maximum Drawdown - The Case of Risk Based Dynamic Asset Allocation (February ...

## Dallas R Users Group Baseball Data Dive

November 19, 2012
This past Saturday I led a data dive workshop for the Dallas R Users Group using Lahman’s baseball statistics. After providing a brief introduction to the Lahman R package and showing how to load the data and make some basic plots,...

## Make a Graphical Figure of your SEM model in OpenMx

November 19, 2012
In this post, I made an SEM model and showed the results in a table.It’s a great feature of SEM that you can sketch your ideas about how the world works, and being able to get such a sketch back out of OpenMx is very helpful.Importantly, a figure can help readers understand what you’ve done, and it is a...

## A quick function for editing CSV files in R

November 19, 2012
I’ve been hunting for a lightweight CSV editor for OSX so I could to make fixes to data files and not need to fire up Excel. While you can edit a CSV file in any text editor, it’s a pain to navigate the files without a spreadsheet-like interface. Unfortunately there doesn’t seem to be a good,...

## Matching clustering solutions using the ‘Hungarian method’

November 19, 2012
Some time ago I stumbled upon a problem connected with the labels of a clustering. The partition an instance belongs to, is mostly labeled through an integer ranging from 1 to K, where k is the number of clusters. The task at that time was to plot a map of the results from the clustering of spatial polygons...

November 19, 2012
Registration is now open for BIWA Summit 2013.  This event, focused on Business Intelligence, Data Warehousing and Analytics, is hosted by the BIWA SIG of the IOUG on January 9 and 10 at the Hotel...

## A Video Tour of R, for Beginners

November 19, 2012
Coursera's introductory "Statistics One" course uses R for the practical data analysus exercises. To support course participants, Princeton University grad student Laura Suttle created a series of web videos introducing the R interface. These videos are available to the public, and are a great place for anyone new to R to start. The video series isn't designed to teach...

## The Hour of Hell of Every Morning – Commute Analysis, April to October 2012

November 19, 2012
IntroductionSo a little while ago I quit my job.Well, actually, that sounds really negative. I'm told that when you are discussing large changes in your life, like finding a new career, relationship, or brand of diet soda, it's important to frame things positively.So let me rephrase that - I've left job I previously held to pursue other directions. Why?...

## Function apply() – Tip 1

November 19, 2012
The function apply() is certainly one of the most useful function. I was scared of it during a while and refused to use it. But it makes the code so much faster to write and so efficient that we can't afford not using it. If you are like me, that yo...

## RMySQL Looking For A New Maintainer

November 19, 2012
## A Shiny new way of communicating Bayesian statistics

November 19, 2012
Bayesian data analysis follows a very simple and general recipe: Specify a model and likelihood, i.e. what process do you think is generating your data? Specify a prior distribution, i.e. quantify what you know about a problem before having seen … Continue reading →

## Podcast #5: Coursera Debrief

November 19, 2012
Jeff and I talk with Brian Caffo about teaching MOOCs on Coursera.

## Gathering RealClearPolitics Polling Trends with XML

November 19, 2012
Now that the election is over, you may want to use polling data in a model of the campaign. Simon Jackman has thoughtfully made his daily state-by-state predictions available for download, but a commonly-used dataset is the RealClearPolitics polling a...