R and Hadoop: Step-by-step tutorials

March 14, 2012
By
R and Hadoop: Step-by-step tutorials

At the recent Big Data Workshop held by the Boston Predictive Analytics group, airline analyst and R user Jeffrey Breen gave a step-by-step guide to setting up an R and Hadoop infrastructure. Firstly, as a local virtual instance of Hadoop with R, using VMWare and Cloudera's Hadoop Demo VM. (This is a great way to get familiar with Hadoop.)...

Read more »

AQP / soilDB Demo: Dueling Dendrograms

March 14, 2012
By
AQP / soilDB Demo: Dueling Dendrograms

Previously, soil profile comparison methods from the aqp package only took into account horizon-level attributes. As of last week the profile_compare() function can now accommodate horizon and site-level attributes. In other words, it is now possible t...

Read more »

Plotting individual growth charts

March 14, 2012
By
Plotting individual growth charts

This R code draws individual growth plots as shown in “Applied Longitudinal Data Analysis: Modeling Change and Event Occurrence” by Judith D. Singer and John B. Willett, an excellent book on multilevel modeling and survival analysis. This code recreates figure … Continue reading →

Read more »

R code for p curves

March 14, 2012
By

I have finally got around to posting the R code for my p curve simulation. Those familiar with R will realize how crude it is (I've been caught up with other urgent stuff and had no time to explore further).You are welcome to play with (and improve!) t...

Read more »

Portfolio Optimization: Specify constraints with GNU MathProg language

March 14, 2012
By
Portfolio Optimization: Specify constraints with GNU MathProg language

I have previously described a few examples of portfolio construction: Introduction to Asset Allocation Maximum Loss and Mean-Absolute Deviation risk measures 130/30 Portfolio Construction Minimum Investment and Number of Assets Portfolio Cardinality Constraints Multiple Factor Model – Building 130/30 Index (Update) I created a number of helper functions to simplify process of making the constraints(

Read more »

Scales and transformations in ggplot2 0.9.0

March 14, 2012
By
Scales and transformations in ggplot2 0.9.0

Some R code designed for ggplot2 0.8.9 is not compatible with ggplot2 0.9.0, and today the ggplot2 web site has outdated documentation which gives this broken example: Dennis Murphy points to the ggplot2 0.9.0 transition guide from where I derived … Continue reading →

Read more »

NIT: Fatty acids study in R – Part 007

March 14, 2012
By
NIT: Fatty acids study in R – Part 007

Once we have chosen the model, we can continue acquiring spectra of new samples. Spectra is exported to a txt or csv file and we imported in R to be reprocessed.We use the function “predict” from the PLS package. I have done this with 20 new sample...

Read more »

Creating a Stratified Random Sample of a Dataframe

March 14, 2012
By
Creating a Stratified Random Sample of a Dataframe

Expanding on a question on Stack Overflow I'll show how to make a stratified random sample of a certain size: d <- expand.grid(id = 1:35000, stratum = letters)p = 0.1dsample <- data.frame()system.time(for(i in levels(d$stratum)) { dsub <...

Read more »

Video Tip: Convert Gene IDs with Biomart

March 14, 2012
By

I get asked frequently how to convert from one gene identifier to another. This can be tricky, especially when relying on gene symbols, as Will pointed out in a previous post a few years ago. There are several tools that can do this, including DAVID an...

Read more »

March Madness! Wanna Win?

March 14, 2012
By
March Madness! Wanna Win?

Description: Winning percentage of all NCAA Men's Basketball Tournament Champions. Analysis: Down by one, the ball spins in his hand as he dribbles up the floor. With tennis shoes squeaking, he feints left, then right. Glancing up at the clock,...

Read more »

π Day Special! Estimating π using Monte Carlo

March 14, 2012
By
π Day Special! Estimating π using Monte Carlo

In honour of π day (03.14 – can’t wait until 2015~) , I thought I’d share this little script I wrote a while back for an introductory lesson I gave on using Monte Carlo methods for integration. The concept is simple – we can estimate the area of an object which is inside another object

Read more »

Visualising F1 Telemetry Data and Plotting Latitude and Longitude with ggplot Map Projections in R

March 14, 2012
By
Visualising F1 Telemetry Data and Plotting Latitude and Longitude with ggplot Map Projections in R

Why don’t X-Y plots of latitude and longitude data look “right” compared to traditional map views? For example, here’s an X-Y scatterplot of some of Jenson Button’s McLaren telemetry data from the 2010 Australian Formula One Grand Prix: The image was generated, from a data file hosted on Google Spreadsheets, using the following R script,

Read more »

A ridiculous proof of concept: xyz interpolation

March 14, 2012
By
A ridiculous proof of concept: xyz interpolation

Ridiculous OrbThis is really the last one on this theme for a while... I had alluded to a combination of methods regarding xyz interpolation at the end of my last post and wanted to demonstrate this in a final example.The ridiculousness that you see above involved two interpolation steps. First,...

Read more »

How to convert contingency tables to data frames with R

March 14, 2012
By

I wanted to write contingency tables in HTML with hwrite(). I realized that the method hwrite() does not exist for the table objects. I could use as.data.frame(), but the table produced is non-intuitive. I did a search on R-bloggers and I quickly found the solution to my problem: the as.data.frame.matrix() function. The contingency table A

Read more »

ThinkStats … in R :: Example/Chapter 2 :: Example 2.1-2.3

March 14, 2012
By
ThinkStats … in R :: Example/Chapter 2 :: Example 2.1-2.3

As promised, this post is a bit more graphical, but I feel the need to stress the importance of the first few points in chapter 2 of the book (i.e. the difference between mean and average and why variance is meaningful). These are fundamental concepts for future work. The “pumpkin” example (2.1) gives us an

Read more »

Simple plots reveal interesting artifacts

March 14, 2012
By
Simple plots reveal interesting artifacts

I’ve recently been working with methylation data; specifically, from the Illumina Infinium HumanMethylation450 bead chip. It’s a rather complex array which uses two types of probes to determine the methylation state of DNA at ~ 485 000 sites in the genome. The Bioconductor project has risen to the challenge with a (somewhat bewildering) variety of

Read more »

More Anthromes !

March 13, 2012
By
More Anthromes !

First off let me thank folks for all the comments and suggestions. I’m just starting to explore this data so perhaps I should explain how I go about  doing this. First off, I am looking for a global bias in the record from UHI. It is well known that you can look through the data

Read more »

Japan Trade More Specifically with Korea

March 13, 2012
By
Japan Trade More Specifically with Korea

Macro analysis of Japanese trade in posts Japanese Trade and the Yen and Japan Trade by Geographic Region revealed some very interesting changes.  Since the Korean Won is so undervalued versus the Japanese Yen on a Purchasing Power Parity (PPP) ba...

Read more »

Plotting forecast() objects in ggplot part 1: Extracting the Data

March 13, 2012
By

Lately I've been using Rob J Hyndman's excellent forecast package. The package comes with some built in plotting functions but I found I wanted to customize and make my own plots in ggplot. In order to do that, I need a generalizable function that will...

Read more »

Plotting forecast() objects in ggplot part 1: Extracting the Data

March 13, 2012
By

Lately I've been using Rob J Hyndman's excellent forecast package. The package comes with some built in plotting functions but I found I wanted to customize and make my own plots in ggplot. In order to do that, I need a generalizable function that will...

Read more »

SNA with R workshop at Sunbelt XXXII in Redondo Beach

March 13, 2012
By
SNA with R workshop at Sunbelt XXXII in Redondo Beach

I am currently in Redondo Beach, CA at the Sunbelt XXXII social networks conference. The program is thick from numerous interesting talks so the event promises to be very interesting. Today in the morning I gave the workshop “Introduction to Social Network Analysis with R”. Over 50 people registered. I am grateful to all the

Read more »

Scatter Plot Matrix in R

March 13, 2012
By

Stata has a large number of graphics capabilities (and I highly recommend Stata over other statistical packages for a variety of reasons), but in a few instances R is more useful. In particular, I find R useful for creating beautiful scatter plot ...

Read more »

Shapley-Shubik Power Index in R

March 13, 2012
By
Shapley-Shubik Power Index in R

This spring we have Rector Elections at Warsaw School of Economics. One of my collegues Tomasz Szapiro agreed to start in the elections. This induced me to write Shapley-Shubik Power Index calculation snippet in R.Rector elections in Warsaw School...

Read more »

Video: Using R in Academic Finance

March 13, 2012
By

The slides and replay for Dr Sanjiv Das's webinar, Using R for Analyzing Loans, Portfolios and Risk: From Academic Theory to Financial Practice are now available. I've embedded the slides below: they tell a great story of how Das, after being mistaken for the then-CEO of Citibank (with whom he shares a name) was then led to research (using...

Read more »

R-Function to Read Data from Google Docs Spreadsheets

March 13, 2012
By
R-Function to Read Data from Google Docs Spreadsheets

I used this idea posted on Stack Overflow to plug together a function for reading data from Google Docs spreadsheets into R. google_ss <- function(gid = NA, key = NA) { if (is.na(gid)) {stop("\nWorksheetnumber (gid) is missing\n")} if (is....

Read more »

Oracle R Distribution and Open Source R

March 13, 2012
By

Oracle provides the Oracle R Distribution, an Oracle-supported distribution of open source R. Support for Oracle R Distribution is provided to customers of the Oracle Advanced Analytics option and the Oracle Big Data Appliance. The Oracle R Distribu...

Read more »

R code for Chapter 2 of Non-Life Insurance Pricing with GLM

March 13, 2012
By
R code for Chapter 2 of Non-Life Insurance Pricing with GLM

We continue working our way through the examples, case studies, and exercises of what is affectionately known here as “the two bears book” (Swedish björn = bear) and more formally as Non-Life Insurance Pricing with Generalized Linear Models by Esbjörn Ohlsson and Börn Johansson (Amazon UK | US

Read more »

R code for Chapter 2 of Non-Life Insurance Pricing with GLM

March 13, 2012
By
R code for Chapter 2 of Non-Life Insurance Pricing with GLM

We continue working our way through the examples, case studies, and exercises of what is affectionately known here as “the two bears book” (Swedish björn = bear) and more formally as Non-Life Insurance Pricing with Generalized Linear Models by Esbjörn Ohlsson and Börn Johansson (Amazon UK | US). At...

Read more »

In case you missed it: February Roundup

March 13, 2012
By

In case you missed them, here are some articles from February of particular interest to R users. February 29 marked the 12th anniversary of the release of R 1.0.0, and the release of R 2.14.2. A list of commercial vendors who have integrated R with their products for data, analysis, and presentation. The rmr package (part of the RHadoop...

Read more »