## R Graphics with ggplot2

September 18, 2012
By

ggplot2 is one of the most elegant R package for data analysis and visualization.  Recently I gave a tutorial on ggplot2 package.  You could find my ggplot2 notes here(click the image below). You could find my presentation slide below. The … Continue reading →The post R Graphics with ggplot2 appeared first on Fiddling with data and...

## Side note…

September 18, 2012
By

MathJax allows you to customize how $$\LaTeX$$ is displayed. Simply right click over the math you’d like to see to access the display menu. Under “math settings” you can see zoom trigger and factor options. Given how small the text ...

## Embedding $$\LaTeX$$ in Tumblr

September 18, 2012
By

The classic Pythagorean identity is:  $$sin^2(\theta) + cos^2(\theta) =1$$ The binomial formula which calculates the probability of obtaining k tails when flipping a coin n times, with a assumed probability p for each trial is: \( P(E)   = {n \choos...

## Getting data from figures in published papers

September 18, 2012
By

The problem: There are a lot of figures in published papers in the scholarly literature, like the below, from (Attwood et. al. 2012)): At some point, a scientist wants to ask a question for which they can synthesize the knowledge on that question b...

## Using R in Insurance at GIRO 2012

September 17, 2012
By

Every year the UK’s general insurance actuarial community organises a big conference, which they call GIRO, short for General Insurance Research Organising committee. This year's conference is in Brussels from 18 - 21 September 2012. Despite the fac...

## Copulas and tail dependence, part 1

September 17, 2012
By

As mentioned in the course last week Venter (2003) suggested nice functions to illustrate tail dependence (see also some slides used in Berlin a few years ago). Joe (1990)'s lambda Joe (1990) suggested a (strong) tail dependence index. For lower t...

## Why are some things easier to forecast than others?

September 17, 2012
By

Forecasters are often met with skepticism. Almost every time I tell someone that I work in forecasting, they say something about forecasting the stock market, or forecasting the weather, usually suggesting that such forecasts are hopelessly inaccurate. In fact, forecasts of the weather are amazingly accurate given the complexity of the system, while anyone claiming to forecast the stock...

## Permanent Portfolio

September 17, 2012
By

First, just a quick update: I’m moving the release date of the SIT package a few months down the road, probably in November. Now back to the post. Recently I came across a series of interesting posts about the Permanent Portfolio at the GestaltU blog. Today I want to show you how to back-test the

## In search of large ice floes

September 17, 2012
By

In search of large ice floes.

## INLA functions (yet again)

September 17, 2012
By

This links back to previous posts here and here. Earlier today, I had a quick chat with Michela (by email, actually) on this topic. In particular, she was trying to use the function I've written to compute summaries from the posterior distrib...

## Start your new relationship with data together with Roger Peng and 30000 other students

September 17, 2012
By

A week from today (on September 24) Coursera, an education technology company committed to making education freely available to any person who seeks it, is launching their online course “Computing

## Podcast interview with Michael Kane

September 17, 2012
By

In this podcast interview with Michael Kane, Data Scientist and Associate Researcher at Yale University, Michael discusses the R statistical programming language, computational challenges associated with big data, and two projects involving data analysis he conducted on the stock market "flash crash" of May 6, 2010, and the tracking of transportation routes bird flu H5N1. Michael also...

## Simple Parallel randomForest with doMC package

I have been exploring how to speed up some of my R scripts and have started reading about some amazing corners of R. My first weapon was the Rcpp and RcppArmadillo package. These are wonderful tools and even for someone that has never written c++ before, there are enough to examples and documentation to get started. I...

## Example 10.2: Custom graphic layouts

September 17, 2012
By

In example 10.1 we introduced data from a CPAP machine. In brief, it's hard to tell exactly what's being recorded in the data set, but it seems to be related to the pattern of breathing. Measurements are taken five times a second, leading to on the o...

## Tips for Making R User Group Videos

September 17, 2012
By

Today's guest post is from Ron Fredericks, videographer and co-founder of LectureMaker, LLC — ed. I was initially surprised to find R user groups (RUGs) so popular. I filmed my first R session during the 2009 Predictive Analytics World in San Francisco. I filmed several more R user sessions over the past three years along with business/science clients and...

## What is Tony talking about?

September 17, 2012
By

I first experimented with word clouds several years ago and used them to visualise the speeches of Kevin Rudd and Malcolm Turnbull. I have now learned from the Fell Stats blog (via R-Bloggers) that there is an R package for generating word clouds.  The package makes use of tm, a text mining package for R, which I have been

## Olimpic predictions – from an R web service provider’s point of view

September 17, 2012
By

Hello, world!Back in July we have read Markus Gesmann’s great blogpost about a prediction for the 100m final in London. Soon we decided to create similar estimates about the forthcoming events and started to post our results on Facebook.We would like to emphasise again that these kind of extrapolated estimates are rather just for fun and we also think...

## Variability of garch estimates

September 17, 2012
By

Not exactly pin-point accuracy. Previously Two related posts are: A practical introduction to garch modeling garch and long tails Experiment 1000 simulated return series were generated.  The garch(1,1) parameters were alpha=.07, beta=.925, omega=.01.  The asymptotic variance for this model is 2.  The half-life is about 138 days. The simulated series used a Student’s t distribution … Continue reading...

## Create Beamer/knitr Lecture Slideshow with Bash, Explain the Script with knitr

September 17, 2012
By

Setting up a beamer slideshow is tedious. Creating new slideshows with the same header/footer/style files every week for your course lectures is very very tedious. To solve this problem I created a simple bash shell script. When you run the script in...

September 17, 2012
By

Metadata! Metadata is very cool. It's super hot right now - everybody is talking about it. Okay, maybe not everyone, but it's an important part of archiving scholarly work. We are working on a repo on GitHub rmetadata to be a one stop shop for quer...

## Online Questionnaire & Report Generation with Google Drive & R

September 17, 2012
By

Here's how I did it in 3 easy steps: (1) Set up a form in Google Docs/Drive. (2) Choose "Actions" and "Embed in Website" to get the URL for the iframe and put it in a post, like below. Then, go to the spreadsheet view of the form on Google Docs/Drive a...

## Etymology

September 16, 2012
By

Chris and I started this blog as an outlet for the work we were already doing every day: writing code and trying to avoid forgetting how we wrote it. To that end, gist.github.com is an extremely useful resource, and this blog allows us to add a little ...

## Changes in optimization performance of gcc over time

September 16, 2012
By

The SPEC benchmarks came out a year after the first release of gcc (in fact gcc was and still is one of the programs included in the benchmark). Compiling the SPEC programs using the gcc option -O2 (sometimes -O3) has always been the way to measure gcc performance, but after 25 years does this way

## The R-Podcast Episode 10: Adventures in Data Munging Part 2

September 16, 2012
By

I’m happy to present episode 10 of the R-Podcast! Season 1 of the R-Podcast concludes with part 2 of my series on data munging, in which I discuss issues surrounding importing data sets contained in HTML tables. I share how I used the XML and RCurl packages to validate and import data from hockey-reference.com for

## What’s the smallest amount you can’t make with 5 coins ?

September 16, 2012
By

My amazing, awesome wife often comes up with the little puzzles for our amazing children, and this one seemed destined to be solved in R. So, using up to 5 coins (1p, 2p, 5p, 10p, 20p and 50p) first she asked our kids whether they could make every val...

## New version of devtools: 0.8

September 16, 2012
By

We’re pleased to announce a new version of devtools, the package that makes R package development easy. The main features in this version are: A complete rewrite of the code loading system which simulates namespace loading much more accurately – this means using load_all is much closer to installing and loading the package. It also

## Confidence Regions for Regression Coefficients

September 16, 2012
By

Let’s consider the usual linear regression model, with the full set of assumptions:                     y = Xβ + ε ;    ε ~ N , (1)where X is a non-random (n × k) matrix with full column rank.Recall that, under our usual set of assumptions...

## Confidence Regions for Regression Coefficients

September 16, 2012
By

Let’s consider the usual linear regression model, with the full set of assumptions:                     y = Xβ + ε ;    ε ~ N , (1)where X is a non-random (n × k) mat...

## Football model

September 16, 2012
By

After reading Dutch football data (Eeredivisie 2011-2012) and making a predictions display it is time to look at a few simple models to predict goals. To reiterate the data setup, each game played consists of two rows in the data frame. ...