Row-wise summary curves in faceted ggplot2 figures

December 29, 2012
By
Row-wise summary curves in faceted ggplot2 figures

I really enjoy reading the Junk Charts blog.  A recent post made me wonder how easy it would be to add summary curves for small-multiple type plots, assuming the “small multiples” to summarize were the X component of a ggplot2::facet_grid(Y ~ X) … Continue reading →

Read more »

RcppExamples 0.1.5 and RcppClassicExamples 0.1.0

December 29, 2012
By

The recent releases of Rcpp 0.10.2 and RcppClassic 0.9.3 had one more repercussion. On that dreaded OS, the linker no longer wanted to instantiate a symbol present in both packages; seems to me that the linker in the other two OSs is a little smarter...

Read more »

High-Dimensional Microarray Data Sets in R for Machine Learning

December 29, 2012
By

Much of my research in machine learning is aimed at small-sample, high-dimensional bioinformatics data sets. For instance, here is a paper of mine on the topic. A large number of papers proposing new machine-learning methods that target high-dimensional data use the same two data sets and consider few others. These data sets are the 1) Alon colon cancer...

Read more »

Speed skating 10 km

December 29, 2012
By
Speed skating 10 km

It is winter which makes it time for one of Netherlands beloved sports: speed skating. Speed skating is done over various distances, but for me, the most beautiful is the 10 km. The top men do this in about 13 minutes. In this post I try to u...

Read more »

Men who stare at needles

December 29, 2012
By

Buffon's needle problem is a question first posed in the 18th century by Georges-Louis Leclerc, Comte de Buffon:What is the probability that a needle thrown at a lined sheet of paper will cross a line?This problem can be used to estimate π. If we set the nail size and the line distance = 1, the estimator can be calculated...

Read more »

STL transform + remove_copy for subsetting

December 29, 2012
By

We have seen the use of the STL transform functions in the posts STL transform and Transforming a matrix. We use the same logic in conjuction with a logical (ie boolean) vector in order subset an initial vector. #include <Rcpp.h> using namespace...

Read more »

Surprising Performance of data.table in Data Aggregation

December 28, 2012
By
Surprising Performance of data.table in Data Aggregation

data.table (http://datatable.r-forge.r-project.org/) inherits from data.frame and provides functionality in fast subset, fast grouping, and fast joins. In previous posts, it is shown that the shortest CPU time to aggregate a data.frame with 13,444 rows and 14 columns for 10 times is 0.236 seconds with summarize() in Hmisc package. However, after the conversion from data.frame to

Read more »

Clustering with selected Principal Components

December 28, 2012
By
Clustering with selected Principal Components

In the Visualizing Principal Components post, I looked at the Principal Components of the companies in the Dow Jones Industrial Average index over 2012. Today, I want to show how we can use Principal Components to create Clusters (i.e. form groups of similar companies based on their distance from each other) Let’s start by loading

Read more »

Find Duplicate Files Using R

December 28, 2012
By
Find Duplicate Files Using R

Find Duplicate Files This is a simple script to search a directory tree for all files with duplicate content. It …Continue reading »

Read more »

Label placement with spplot and lattice

December 28, 2012
By
Label placement with spplot and lattice

The package maptools includes new functions to label points and labels. Line labelling The lineLabel function produces and draws text …Continuar leyendo »

Read more »

Visualising diurnal wind climatologies

December 28, 2012
By

In this post I want to highlight the second core function of the metvurst repository: The windContours function It is intended to provide a compact overview of the wind field climatology at a location and plots wind direction and speed as a functio...

Read more »

How to Add an Extra Vertical Axis to R Plots

December 28, 2012
By
How to Add an Extra Vertical Axis to R Plots

Especially when analyzing time series, we often need plots with two vertical axes. Researchers often expect the two series to \"move together,\" but with different locations and scales. To show that the series move together, you should give each series its own scale. One vertical scale should appear on the left side of the plot I encourage you...

Read more »

STL Transform

December 28, 2012
By

The STL transform function can be used to pass a single function over a vector. Here we use a simple function square(). #include <Rcpp.h> using namespace Rcpp; inline double square(double x) { return x*x ; } // ] std::vector<...

Read more »

UEFA, what were the odds ?

December 27, 2012
By
UEFA, what were the odds ?

Ok, I was supposed to take a break, but Frédéric, professor in Tours, came back to me this morning with a tickling question. He asked me what were the odds that the Champions League draw produces exactly the same pairings from the practice draw, and the official one (see e.g. dailymail.co.uk/…). To be honest, I don’t know much about soccer, so...

Read more »

ARMA+GARCH Experiences

December 27, 2012
By

A reader’s comment on my ARMA Models for Trading post asked about different aspects of my experience with ARMA+GARCH for trading forecasting. The more I thought about it, the more it looked like a full post. So here we go. Starting with the high level – what packages did I try? I have tried a

Read more »

My Intro to Multiple Classification with Random Forests, Conditional Inference Trees, and Linear Discriminant Analysis

December 27, 2012
By
My Intro to Multiple Classification with Random Forests, Conditional Inference Trees, and Linear Discriminant Analysis

After the work I did for my last post, I wanted to practice doing multiple classification.  I first thought of using the famous iris dataset, but felt that was a little boring.  Ideally, I wanted to look for a practice … Continue reading →

Read more »

Why Do the New Orleans Saints Lose? Data Visualization II

December 26, 2012
By
Why Do the New Orleans Saints Lose? Data Visualization II

I’m going to continue with my ‘making data visually appealing to the masses’ kick. I happen to like graphics and graphing data. I also happen to like American football (For the record, however, I’m a soccer player first, a rugby … Continue reading →

Read more »

Opening Large CSV Files in R

December 26, 2012
By
Opening Large CSV Files in R

Before heading home for the holidays, I had a large data set (1.6 GB with over 1.25 million rows) with columns of text and integers ripped out of the company (Kwelia) Database and put into a .csv file since I was going to be offline a lot over the break. I tried opening the csv file

Read more »

Science-y New Year’s Resolution: Learn to Code

December 26, 2012
By
Science-y New Year’s Resolution:  Learn to Code

In a 1995 interview Steve Jobs said he thought that computer programming should be a liberal art. In other words, he thought everyone’s education should include a year of learning a computer language, because it teaches you how to think in a certain way. If that was true in 1995, just think how much more

Read more »

Oracle R Enterprise 1.3 released

December 26, 2012
By

We're pleased to announce the latest release of Oracle R Enterprise, now available for download. Oracle R Enterprise 1.3 features new predictive analytics interfaces for in-database model building and scoring, support for in-database sampling and partitioning techniques, and transparent support for Oracle DATE and TIMESTAMP data types to facilitate data preparation for time series analysis and forecasting. Oracle...

Read more »

Recoding Polytomous Items with Missing Categories

December 26, 2012
By

This function helps prepare data for analysis with models that require polytomous items to be coded from 0 to N without missing categories, such as the Partial Credit Model (Masters, 1982). When there are no missing categories, an item that was suppos...

Read more »

Shiny Pubmed Word Clouds

December 26, 2012
By

Recently, I’ve started working on my website redesign, including the redesign of my research page. As somebody who works with different types of networks on (almost) daily basis, it would be easy to just create pretty network pictures and use … Continue reading →

Read more »

Wrapper functions in GNU R

December 26, 2012
By

Recently I have been working with GNU R optimization routines a lot. Function optim has a nice trace option that allows to monitor optimization progress. Another standard function optimize has no such feature but it is pos...

Read more »

R Markdown to other document formats

December 26, 2012
By

Perhaps you have a file written in Markdown with embedded R of the kind that RStudio makes so nice and easy but you’d like a range of output formats to keep your collaborators happy.  Say latex, pdf, html and MS Word.  Here’s what you might do I shall imagine your file is called doc.Rmd Install pandoc

Read more »

MeRRy ChRistmas!

December 25, 2012
By
MeRRy ChRistmas!

Merry Christmas is.R() readers! Thanks for accompanying us through an excellent first semester of R blogging, and for your feedback and enthusiasm. To celebrate, we’ve built an image mosaic from the shiny, happy avatars of our over 600 (!) Twitter followers. Click for a beautiful mosaic! We’ll be back in 2013 with...

Read more »

Common words in the Gathering Storm

December 25, 2012
By
Common words in the Gathering Storm

The Wheel of Time is a series of books started by Robert Jordan. Unfortunately he died too early. Like all fans of the series I feel very lucky that Brandon Sanderson was able to continue these books. The first book Sanderson wrote was the Gathering St...

Read more »

Who Survived on the Titanic? Predictive Classification with Parametric and Non-parametric Models

December 24, 2012
By
Who Survived on the Titanic? Predictive Classification with Parametric and Non-parametric Models

I recently read a really interesting blog post about trying to predict who survived on the Titanic with standard GLM models and two forms of non-parametric classification tree (CART) methodology. The post was featured on R-bloggers, and I think it's worth a closer look. The basic idea was to figure out which of these three

Read more »

Who Survived on the Titanic? Predictive Classification with Parametric and Non-parametric Models

December 24, 2012
By
Who Survived on the Titanic? Predictive Classification with Parametric and Non-parametric Models

I recently read a really interesting blog post about trying to predict who survived on the Titanic with standard GLM models and two forms of non-parametric classification tree (CART) methodology. The post was featured on R-bloggers, and I think it's worth a closer look. The basic idea was to figure out which of these three

Read more »

More about Aggregation by Group in R

December 24, 2012
By
More about Aggregation by Group in R

Motivated by my young friend, HongMing Song, I managed to find more handy ways to calculate aggregated statistics by group in R. They require loading additional packages, plyr, doBy, Hmisc, and gdata, and are extremely user-friendly. In terms of CPU time, while the method with summarize() is as efficient as the 2nd method with by()

Read more »

Sponsors