Starting this blog with initial thanks to Josh Suereth from whom I cloned the template of this blog, Tom Preston-Werner who made Jekyll (a static site generator for this blog), Scott Chamberlain who made me aware of it.

Following on from A Tool Chain for Plotting Twitter Archive Retweet Graphs – Py, R, Gephi, here’s a quick view summary view over #UKGC12 tweets saved in Google Spreadsheet archive as developed by Martin Hawksey, generated from an R script (R code available here; #ukgc12 tweet archive here)… (I did mean to tidy these up,

Linear regression can be a fast and powerful tool to model complex phenomena. However, it makes several assumptions about your data, and quickly breaks down when these assumptions, such as the assumption that a linear relationship exists between the predictors and the dependent variable, break down. In this post, I will introduce some diagnostics that you can...

THIS IS NOT INVESTMENT ADVICE. The information is provided for informational purposes only. In the Time Series Matching post, I used one to one mapping to the compute distance between the query(current pattern) and reference(historical time series). Following chart visualizes this concept. The distance is the sum of vertical lines. An alternative way to map

As someone who was a Java programmer for many years learning R’s object oriented programming framework has been frustrating to say the least. I like the simplicity of S3 but find it limiting when you wish to write methods that change the underlying data elements. That is, printing, summarizing, and plotting work great because they

Hadley Wickham, creator of the ggplot2 packages for R (as well as several others) will present a webinar on February 8 going behind the scenes of the popular graphics package. If you've never used ggplot2 before, this will be a great way to learn about the kinds of charts you can create with it; and if you're a regular...

Background One of my colleagues is an academic physical therapist (PT), and he's working on a paper to his colleagues related to power, sample size, and navigating the thicket of trouble that surrounds those two things. We recently got together to walk through some of the issues, and I thought I would share some of the wildlife we observed...

I had previously posted solutions in R to Project Euler problem 23 and problem 22. This is the next problem from Project Euler. The statement of problem 24 is as follows.A permutation is an ordered arrangement of objects....

Recently there have been some great posts that highlight how easy it is to hook into the Facebook Graph API using R. Crawling Facebook with R started the discussion and Apply R highlighted how easy it was to plot our network. In order to replicate the examples on Windows, most likely you will need to

I love TikZ in LaTeX and I use it exclusively when writing figures for my papers. I also use the tikzDevice package to convert all figures I create in R to TikZ code, so the font used in the text and figures is the same (and having your R figures in Ti...

I want to test embedding source code in the blog by using the handy Gist tool provided by GitHub. These two R functions are a good opportunity to test out embedding a Gist on the website. These functions allow for threshold testing within a vector in R...

I was searching for open data recently, and stumbled on Socrata. Socrata has a lot of interesting data sets, and while I was browsing around, I found a data set on federal bailout recipients. Here is the data set. However, data sets on Socrata are not always the most recent versions, so I followed a...

Introduction This post incorporates parts of yesterday's post about bagging. If you are unfamiliar with bagging, I suggest that you read it before continuing with this article. I would like to give a basic overview of ensemble learning. Ensemble learning involves combining multiple predictions derived by different techniques in order to create a stronger overall prediction....

A few days ago, Romain François explained how to interface with the Facebook Graph API explorer with R. This was a low-level interface, giving the R programmer the ability to the raw data that Facebook can provide about your connections. Now, just four days later, the first application in R (that I know of) based on the Facebook Graph...

Psychometrics, Qu’est-ce que c’est? Say psychometrics to people and they think IQ tests. Fair enough. I think eRm: 1 2 3 4 5 6 7 # Rasch model with beta.1 restricted to 0 data(raschdat1) res <- RM(raschdat1, sum0 = FALSE) print(res) summary(res) res$W The joy of fitting your first Rasch model in R is unparallelled. Go on try, it. Hmmm, a list of numbers. No idea what they mean? ok. so you take an IQ...

Ever since R was born (evoked?) geeks have been trying to get it to talk HTML. A list of web interfaces for R is updated on CRAN here. Aims are various. Some seek to replace R with a traditional GUI. Others are more ambitious and open up a glimpse of an architecture that provides live analysis of ever...

So you want to run R in the cloud so you can set your Gibbs sampling off, forget about it, and not be paranoid about power cuts and reboots. Andrew Gelman hosted a good debate on the pros and cons of R in the cloud on his blog. The consensus seems to be RStudio and EC2. P.S. If...

A not unusual part of a response on the R-sig-finance mailing list is: “Search the list archives.” In principle that makes sense. In practice it might not be clear what to do. Now it should be. The list The R-sig-finance mailing list deals with the intersection of questions about the R language and finance. It … Continue reading...

With bonus R codeIt came as a shock to learn from PubMed that almost 900 papers were published with the word "microarray" in their titles last year alone, just 12 shy of the 2010 count. More alarming, many of these papers were not of the innocuous "Microarray study of gene expression in dog scrotal tissue" variety, but dry...