August 2012

Getting started with Sweave & knitr

August 6, 2012 | Max Gordon

[caption id="attachment_457" align="aligncenter" width="500"] Cool woven artwork on the campus of Kansas University. The image is CC by Patrick Emerson[/caption] I recently started to work with Sweave (by Friedrich Leisch) and found it a truly awesome package. The ease of use is amazing. In this post I'll ... [Read more...]

Crawford-Howell (1998) t-test for case-control comparisons

August 6, 2012 | Dan Mirman

Cognitive neuropsychologists (like me) often need to compare a single case to a small control group, but the standard two-sample t-test does not work for this because the case is only one observation. Several different approaches have been proposed and in a new paper just published in Cortex, Crawford and ... [Read more...]

It is good to be explicit

August 6, 2012 | Frans Slothouber

Being careful not to repeat the year 1901 mistake, I set the TZ variable before I run R. I have the same set of data that I convert as follows: dates values date1 date2 and then plot plot( date1, values )plot( date2, values ) To my surprise I end up with the ... [Read more...]

Visualize a random forest that classifies digits

August 5, 2012 | David

My last post uses random forest proximity to visualize a set of diamond shapes (the random forest is trained to distinguish diamonds from non-diamonds).This time I looked at the digits data set that Kaggle is using as the basis of a competition for "ge... [Read more...]

Early August flotsam

August 5, 2012 | Luis

Back teaching a couple of subjects and it’s the constant challenge to find enough common ground with students so one can push/pull them to the other side of new concepts. We are not talking about complex hierarchical models using mixed … Continue reading → [Read more...]

Provincial Map using GADM

August 5, 2012 | arsalvacion

This blog demonstrates how to produce political/provincial boundary map (below) using R maptools and raster packages. ## Load required packageslibrary(maptools)library(raster) ## Download data from gadm.org adm [Read more...]

The R-Podcast Episode 9: Adventures in Data Munging Part 1

August 5, 2012 | Eric

It’s great to be back with a new episode after an eventful break! This episode begins a series on my adventures in data munging, a.k.a data processing. I discuss three issues that demonstrate the flexibility and versatility R brings for recoding messy values, important inconsistent data files, ... [Read more...]

Animation basics for a vacation

August 5, 2012 | Bogumił Kamiński

Since I have a vacation this time I decided to implement some entertaining graphics. I have chosen to animate a Cassini oval.The task is can be accomplished using polar equation:The implementation of the animation is given by the following code:library... [Read more...]

And Now I Blog Again

August 4, 2012 | John Ramey

One of my goals for 2012 has been to blog more. Much more. When I first set this goal, I had great aspirations of posting frequently. However, I had a Ph.D. to complete, and quite frankly, it demanded much higher priority. Now that I have submitted my ... [Read more...]

Getting Started Using R, Part 1: RStudio

August 4, 2012 | Randy Zwitch

Despite my preference for SAS over R, there are some add-ons to “basic” R that I’ve found that have made my learning process way easier.  While I’m still in my infancy in learning R, I feel like once I found … Continue reading →Getting Started Using R, Part 1: RStudio ... [Read more...]

Discriminating Between Iris Species

August 4, 2012 | dgrapov

The Iris data set is a famous for its use to compare unsupervised classifiers. The goal is to use information about flower characteristics to accurately classify the 3 species of Iris. We can look at scatter plots of the 4 variables in the data set and see that no single variable nor ... [Read more...]

Transformation of axes in R

August 4, 2012 | Travis Hinkelman

As a general rule, you should not transform your data to try to fit a linear model. But proportions can be tricky. If the proportion data do not arise from a binomial process (e.g., proportion of a leaf consumed by a caterpillar), then transformation is still the best option. ... [Read more...]

Surveys continue to rank R #1 for Data Mining

August 3, 2012 | David Smith

KDnuggets recently posted its annual poll on data mining software, and the R language retains its #1 ranking as the most commonly-used software for data mining: R is now used by 52.5% of poll respondents, compared with 45% last year. Donnie Berkholz provides an analysis of the year-on-year trends for Redmonk. He provides ... [Read more...]

Horizon Plots in Base Graphics

August 3, 2012 | klr

for background please see prior posts More on Horizon Charts, Application of Horizon Plots, Horizon Plot Already Available, and Cubism Horizon Charts in R There are three primary graphics routes in R (base graphics, lattice, and ggplot2), and each have...
[Read more...]

2012 Olympics Swimming – 100m Butterfly Men Finals prediction

August 3, 2012 | Actuarially (Matt Malin)

2012 Olympics Swimming - 100m Butterfly Men Finals prediction Author: Matt Malin Inspired by mages’ blog with predictions for 100m running times, I’ve decided to perform some basic modelling (loess and linear modelling) on previous Olympic results for the 100m Butterfly Men’s medal winning results. Code setup
<span>library</span>(XML)
<span>library</span>(ggplot2)

swimming_path <- <span>"http://www.databasesports.com/olympics/sport/sportevent.htm?sp=SWI&enum=200"</span>

swimming_data <- <span>readHTMLTable</span>(
  <span>readLines</span>(swimming_path), 
  <span>which =</span> <span>3</span>, 
  <span>stringsAsFactors =</span> <span>FALSE</span>)

<span># due to some potential errors in passing header = TRUE:</span>
<span>names</span>(swimming_data) <- swimming_data[<span>1</span>, ]
swimming_data <- swimming_data[-<span>1</span>, ]

swimming_data[[<span>"Result"</span>]] <- <span>as.numeric</span>(swimming_data[[<span>"Result"</span>]])
swimming_data[[<span>"Year"</span>]]   <- <span>as.numeric</span>(swimming_data[[<span>"Year"</span>]])
swimming_data             <- <span>na.omit</span>(swimming_data)

loess_prediction <- function(
  <span>medal_type =</span> <span>"GOLD"</span>, 
  <span>prediction_year =</span> <span>2012</span>) 
{
  medal_type <- <span>toupper</span>(medal_type)
 
 swimming_loess <- <span>loess</span>(
    Result ~ Year, 
    <span>subset</span>(swimming_data, Medal == medal_type),
    <span>control =</span> <span>loess.control</span>(<span>surface =</span> <span>"direct"</span>))
  
  swimming_prediction <- <span>predict</span>(
    swimming_loess, 
    <span>data.frame</span>(<span>Year =</span> prediction_year), 
    <span>se =</span> <span>FALSE</span>)

  <span>return</span>(swimming_prediction)
}

log_lm_prediction <- function(
  <span>medal_type =</span> <span>"GOLD"</span>, 
  <span>prediction_year =</span> <span>2012</span>) 
{
  medal_type <- <span>toupper</span>(medal_type)
  swimming_log_lm <- <span>lm</span>(
    <span>log</span>(Result) ~ Year, 
    <span>subset</span>(swimming_data, Medal == medal_type))
  
  swimming_prediction <- <span>exp</span>(<span>predict</span>(
    swimming_log_lm, 
    <span>data.frame</span>(<span>Year =</span> prediction_year), 
    <span>se =</span> <span>FALSE</span>))

  <span>return</span>(swimming_prediction)
}

swimming_data <- <span>rbind</span>(
  <span>data.frame</span>(
    swimming_data[<span>c</span>(<span>"Year"</span>, <span>"Medal"</span>, <span>"Result"</span>)], 
    <span>type =</span> <span>"actual"</span>),
  <span>data.frame</span>(
    <span>Year =</span> <span>rep</span>(<span>2012</span>, <span>3</span>),
    <span>Medal =</span> <span>c</span>(<span>"GOLD"</span>, <span>"SILVER"</span>, <span>"BRONZE"</span>),
    <span>Result =</span> <span>c</span>(
      <span>loess_prediction</span>(<span>"gold"</span>), 
      <span>loess_prediction</span>(<span>"silver"</span>),
      <span>loess_prediction</span>(<span>"bronze"</span>)),
    <span>type =</span> <span>rep</span>(<span>"loess_prediction"</span>, <span>3</span>)))

medal_colours <- <span>c</span>(
  <span>GOLD   =</span> <span>rgb</span>(<span>201</span>, <span>137</span>, <span>16</span>, <span>maxColorValue =</span> <span>255</span>),
  <span>SILVER =</span> <span>rgb</span>(<span>168</span>, <span>168</span>, <span>168</span>, <span>maxColorValue =</span> <span>255</span>),
  <span>BRONZE =</span> <span>rgb</span>(<span>150</span>, <span>90</span>, <span>56</span>, <span>maxColorValue =</span> <span>255</span>))
        
swimming_plot <- <span>ggplot</span>(
  swimming_data,
  <span>aes</span>(
    <span>x =</span> Year, 
    <span>y =</span> Result, 
    <span>colour =</span> Medal, 
    <span>group =</span> Medal)) + 
  <span>scale_x_continuous</span>(<span>limits =</span> <span>c</span>(<span>1968</span>, <span>2012</span>)) +
  <span>geom_point</span>() + 
  <span>stat_smooth</span>(
    <span>aes</span>(<span>fill =</span> Medal), 
    <span>alpha =</span> <span>0.25</span>, 
    <span>data =</span> <span>subset</span>(swimming_data, <span>type =</span> <span>"actual"</span>), 
    <span>fullrange =</span> <span>FALSE</span>, 
    <span>method =</span> loess)
    
swimming_plot <- swimming_plot + 
  <span>scale_fill_manual</span>(<span>values =</span> medal_colours) + 
  <span>scale_colour_manual</span>(<span>values =</span> medal_colours) + <span>theme_bw</span>()
Predictions ... [Read more...]
1 10 11 12 13 14

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)