Download and Parse DJ/UBS Commodities Indexes

March 16, 2012
By
Download and Parse DJ/UBS Commodities Indexes

Here is another data downloading and parsing script, this one for the Dow Jones/UBS Commodities Indexes. Compared to the last post, this parser deals with multiple sheets and multiple columns in each sheet. It also constructs monthly series from the daily data, and stores it using a different symbol. Finally, it’s a good example of

Read more »

Do more with dates and times in R with lubridate 1.1.0

March 16, 2012
By

This is a guest post by Garrett Grolemund (mentored by Hadley Wickham) Lubridate is an R package that makes it easier to work with dates and times. The newest release of lubridate (v 1.1.0) comes with even more tools and …Read more »

Read more »

digest 0.5.2

March 15, 2012
By

A new version of the digest package (which generates hash function summaries for arbitrary (and possibly nested) R objects using any of the standard md5, sha-1, sha-256 or crc32 algorithms) is now on CRAN. Murray Stokely noticed a corner case where...

Read more »

A No BS Guide to the Basics of Parallelization in R

March 15, 2012
By

What is parallelization?Parallelization is using multiple processing cores to, hopefully, make your programs run faster than serial code, which is the use of just one processing core. Parallel code is not always faster than its serial counterpart (but if you're doing it right and you're careful about what you parallelize, it will be --- remember, that's your goal here). ...

Read more »

p curves revisited

March 15, 2012
By

I finally found some time to take a closer look at p curves. I haven't had a chance to follow-up my simulations (and probably won't for a few weeks if not months), but I have had time to think through the ideas the p curve approach raises based on some of the comments I've received and a brief exchange...

Read more »

R gotcha for the week

March 15, 2012
By
R gotcha for the week

I use the biomaRt package from Bioconductor in almost every R session. So I thought I’d load the library and set up a mart instance in my ~/.Rprofile: On starting R, I was somewhat perplexed to see this error message: Twitter to the rescue. @hadleywickham told me to load utils first and @vsbuffalo explained that

Read more »

Europe most dangerous cities

March 15, 2012
By
Europe most dangerous cities

When I was searching for data about U.S prison population, for another post, I run across eurostat, a nice source for data to play around with. I pooled some numbers, specifically homicides recorded by the police. A panel data for … Continue reading →

Read more »

SAP integrates R with HANA

March 15, 2012
By
SAP integrates R with HANA

We can add SAP to the list of vendors offering R integration with their products. InformationWeek reports that the new SAP BusinessObjects Predictive Analysis model provides a graphical user interface to R. Created in reaction to "competitive and market forces, including the momentum of open source R", the new module provides in-database processing (presumably by embedding R within HANA...

Read more »

Liking of apples – some data to link

March 15, 2012
By
Liking of apples – some data to link

I browsed through a paper by Peneau et al. (J. Sensory Studies, 2007) where they have nice data on apples; consumer evaluation, sensory evaluation and instrumental measurements. I think these are interesting data to examine if these variable blocks can...

Read more »

The Anachronism Machine: The Language of Downton Abbey

March 15, 2012
By
The Anachronism Machine: The Language of Downton Abbey

I've recently become hooked on the TV series Downton Abbey. I'm not usually one for costume dramas, but the mix of fine acting, the intriguing social relationships, and the larger WW1-era story make for compelling viewing. (Also: Maggie Smith is a treasure.) Despite the widespread criticial acclaim, Downton has met with criticism for some period-innapropriate uses of language. For...

Read more »

Opinions Not Backed by Money Updated Again

March 15, 2012
By
Opinions Not Backed by Money Updated Again

Strange that I am updating this post for a third time and nothing really has changed, but the fact that nothing has changed is incredibly interesting to me.  Since it is an update, I will not duplicate the explanation, so please read the last vers...

Read more »

A Graphical Explanation of how to Interpret a Dendrogram

March 15, 2012
By
A Graphical Explanation of how to Interpret a Dendrogram

Dendrograms are a convenient way of depicting pair-wise dissimilarity between objects, commonly associated with the topic of cluster analysis. This is a complex subject that is best left to experts and textbooks, so I won't even attempt to cover it her...

Read more »

New R User Group in Montreal

March 15, 2012
By
New R User Group in Montreal

The Montreal R User Group is now official. You can join the group by visiting the meetup site. The group has existed since 2010 in a narrower incarnation as the BGSA R/Stats Workshop Series. Previous workshops have featured invited facilitators on topics such as Causal Analysis, GLMs, GAMs, Multi-model inference, Phylogenetic analysis, Bayesian modeling, Meta-analysis,

Read more »

Project Euler: Problem 16

March 15, 2012
By

215 = 32768 and the sum of its digits is 3 + 2 + 7 + 6 + 8 = 26.What is the sum of the digits of the number 21000?Handling large numbers or rather, very large numbers, can be a pain at times. But have no fear, for GMP is here.GMP  makes the s...

Read more »

Call for chapters: Data Mining Applications with R

March 15, 2012
By
Call for chapters: Data Mining Applications with R

Data Mining Applications with R A book to be published by Elsevier http://www.RDataMining.com/books/book2 Proposal Submission Deadline: April 30, 2012 Introduction R is one of the most widely used data mining tools in scientific and business applications, among dozens of commercial … Continue reading →

Read more »

Ideas on A Really Fast Statistics Journal

March 15, 2012
By

I was writing comments on the blog post A proposal for a really fast statistics journal, and I realized the comment box was too small to write down my ideas. I like the proposal a lot, and I feel really bad about the current model of submitting and rev...

Read more »

R and Hadoop: Step-by-step tutorials

March 14, 2012
By
R and Hadoop: Step-by-step tutorials

At the recent Big Data Workshop held by the Boston Predictive Analytics group, airline analyst and R user Jeffrey Breen gave a step-by-step guide to setting up an R and Hadoop infrastructure. Firstly, as a local virtual instance of Hadoop with R, using VMWare and Cloudera's Hadoop Demo VM. (This is a great way to get familiar with Hadoop.)...

Read more »

AQP / soilDB Demo: Dueling Dendrograms

March 14, 2012
By
AQP / soilDB Demo: Dueling Dendrograms

Previously, soil profile comparison methods from the aqp package only took into account horizon-level attributes. As of last week the profile_compare() function can now accommodate horizon and site-level attributes. In other words, it is now possible t...

Read more »

Plotting individual growth charts

March 14, 2012
By
Plotting individual growth charts

This R code draws individual growth plots as shown in “Applied Longitudinal Data Analysis: Modeling Change and Event Occurrence” by Judith D. Singer and John B. Willett, an excellent book on multilevel modeling and survival analysis. This code recreates figure … Continue reading →

Read more »

R code for p curves

March 14, 2012
By

I have finally got around to posting the R code for my p curve simulation. Those familiar with R will realize how crude it is (I've been caught up with other urgent stuff and had no time to explore further).You are welcome to play with (and improve!) t...

Read more »

Portfolio Optimization: Specify constraints with GNU MathProg language

March 14, 2012
By
Portfolio Optimization: Specify constraints with GNU MathProg language

I have previously described a few examples of portfolio construction: Introduction to Asset Allocation Maximum Loss and Mean-Absolute Deviation risk measures 130/30 Portfolio Construction Minimum Investment and Number of Assets Portfolio Cardinality Constraints Multiple Factor Model – Building 130/30 Index (Update) I created a number of helper functions to simplify process of making the constraints(

Read more »

Scales and transformations in ggplot2 0.9.0

March 14, 2012
By
Scales and transformations in ggplot2 0.9.0

Some R code designed for ggplot2 0.8.9 is not compatible with ggplot2 0.9.0, and today the ggplot2 web site has outdated documentation which gives this broken example: Dennis Murphy points to the ggplot2 0.9.0 transition guide from where I derived … Continue reading →

Read more »

NIT: Fatty acids study in R – Part 007

March 14, 2012
By
NIT: Fatty acids study in R – Part 007

Once we have chosen the model, we can continue acquiring spectra of new samples. Spectra is exported to a txt or csv file and we imported in R to be reprocessed.We use the function “predict” from the PLS package. I have done this with 20 new sample...

Read more »

Creating a Stratified Random Sample of a Dataframe

March 14, 2012
By
Creating a Stratified Random Sample of a Dataframe

Expanding on a question on Stack Overflow I'll show how to make a stratified random sample of a certain size: d <- expand.grid(id = 1:35000, stratum = letters)p = 0.1dsample <- data.frame()system.time(for(i in levels(d$stratum)) { dsub <...

Read more »

Video Tip: Convert Gene IDs with Biomart

March 14, 2012
By

I get asked frequently how to convert from one gene identifier to another. This can be tricky, especially when relying on gene symbols, as Will pointed out in a previous post a few years ago. There are several tools that can do this, including DAVID an...

Read more »

March Madness! Wanna Win?

March 14, 2012
By
March Madness! Wanna Win?

Description: Winning percentage of all NCAA Men's Basketball Tournament Champions. Analysis: Down by one, the ball spins in his hand as he dribbles up the floor. With tennis shoes squeaking, he feints left, then right. Glancing up at the clock,...

Read more »

π Day Special! Estimating π using Monte Carlo

March 14, 2012
By
π Day Special! Estimating π using Monte Carlo

In honour of π day (03.14 – can’t wait until 2015~) , I thought I’d share this little script I wrote a while back for an introductory lesson I gave on using Monte Carlo methods for integration. The concept is simple – we can estimate the area of an object which is inside another object

Read more »

Visualising F1 Telemetry Data and Plotting Latitude and Longitude with ggplot Map Projections in R

March 14, 2012
By
Visualising F1 Telemetry Data and Plotting Latitude and Longitude with ggplot Map Projections in R

Why don’t X-Y plots of latitude and longitude data look “right” compared to traditional map views? For example, here’s an X-Y scatterplot of some of Jenson Button’s McLaren telemetry data from the 2010 Australian Formula One Grand Prix: The image was generated, from a data file hosted on Google Spreadsheets, using the following R script,

Read more »

A ridiculous proof of concept: xyz interpolation

March 14, 2012
By
A ridiculous proof of concept: xyz interpolation

Ridiculous OrbThis is really the last one on this theme for a while... I had alluded to a combination of methods regarding xyz interpolation at the end of my last post and wanted to demonstrate this in a final example.The ridiculousness that you see above involved two interpolation steps. First,...

Read more »