Large correlation in parallel

February 24, 2013
By
Large correlation in parallel

A little improvement to the bigcor function proposed on Rmazing to compute huge correlation matrix in R, I made the function work in parallel using all the CPU cores available on the machine. The code is here.Here is a benchmark of the 2 func...

Read more »

The Wisdom of Crowds – Clustering Using Evidence Accumulation Clustering (EAC)

February 24, 2013
By
The Wisdom of Crowds – Clustering Using Evidence Accumulation Clustering (EAC)

Today’s blog post is about a problem known by most of the people using cluster algorithms on datasets without given true labels (unsupervised learning). The challenge here is the “freedom of choice” over a broad range of different cluster algorithms and how to determine the right parameter values. The difficulty is the following: Every clustering algorithm and even...

Read more »

Earthquakes in Netherlands

February 24, 2013
By
Earthquakes in Netherlands

In the Netherlands we have Natural Gas. Unfortunately winning this gas seems to cause some quakes. As quakes go, they are not strong. However, our buildings are not made to resist quakes, before 1986 they were unheard of, so there is some damage. It is now predicted they could get stronger and more frequent. This caused a bit of a...

Read more »

Simplify your R workflow with functions #rstats

February 24, 2013
By
Simplify your R workflow with functions #rstats

Update/ Thanks to Bernd I could improve the function of how to import the data, so here’s the updated script! /Update In R, you often may have scripts or code snippets that will be reused. In such cases, you can … Weiterlesen →

Read more »

Multi-species dynamic occupancy model with R and JAGS

February 24, 2013
By
Multi-species dynamic occupancy model with R and JAGS

This post is intended to provide a simple example of how to construct and make inferences on a multi-species multi-year occupancy model using R, JAGS, and the ‘rjags’ package. This is not intended to be a standalone tutorial on dynamic community occupancy modeling. Useful primary literature references include MacKenzie et al. (2002), Kery and Royle (2007), Royle and Kery...

Read more »

Copying Data from Excel to R and Back

February 23, 2013
By
Copying Data from Excel to R and Back

A lot of times we are given a data set in Excel format and we want to run a quick analysis using R's functionality to look at advanced statistics or make better visualizations. There are packages for importing/exporting data from/to Excel, but I have found them to be hard to work with or only work with old versions of...

Read more »

Pareto plot with ggplot2

February 23, 2013
By

A Pareto chart, named after Vilfredo Pareto, is a type of chart that contains both bars and a line graph, where individual values are represented in descending order by bars, and the cumulative total is represented by the line (quoted from Wikipedia). ...

Read more »

Two papers about RcppEigen and RcppArmadillo published

February 23, 2013
By

Two papers got published recently. The first one is Bates and Eddelbuettel (2013). It is titled Fast and Elegant Numerical Linear Algebra Using the RcppEigen Package, and provides a pretty thorough introduction to our RcppEigen package which uses Rcpp to provide access to the Eigen C++ template library from GNU R. The paper is out as Volume 50, Issue 5 at the (all...

Read more »

The Financial Crisis on Tape Part I

February 23, 2013
By
The Financial Crisis on Tape Part I

Hello and welcome to Joe's Data Diner's first ever post!Today, I will touch on both R and Finance, but I'll try and make it accesible for those with an interest in either and not just Quants like myself!Almost everyone is now aware that asset correlati...

Read more »

Getting Help with R Programming: Useful Survival Skills

Getting Help with R Programming: Useful Survival Skills

Useful Resources to Learn about R on the Internet When I program in R and struggle with something, the first thing that I usually turn to is Google.  I search the relevant function or the desired outcome, and I often find the solutions within the first few hits.  They likely show up in the documentation,

Read more »

Simulating Population Growth in Cities Using R

February 23, 2013
By
Simulating Population Growth in Cities Using R

R is great for anyone who wants to get started on learning Simulation. (Both Discrete Event or Agent-based, with stochastic elements in the process.) This post is inspired by Matt Asher's "quick-and-dirty" R simulation work on Population Growth. Matt u...

Read more »

from OTU table to HEATMAP!

February 23, 2013
By
from OTU table to HEATMAP!

In this tutorial you will learn: what is a heatmap how to create a clean, highly customizable heatmap using heatmap.2 in the gplots package in R how to remove samples with poor output (not very many sequences) how to rearrange your samples by a metadata category how to make a color coded bar above the heatmap

Read more »

Free e-book on Data Science with R

February 22, 2013
By
Free e-book on Data Science with R

A new book by Jeffrey Stanton from Syracuse Iniversity School of Information Studies, An Introduction to Data Science, is now available for free download. The book, developed for Syracuse's Certificate for Data Science, is available under a Creative Commons License as a PDF (20Mb) or as an interactive eBook from iTunes. The book begins with the following clear definition...

Read more »

Shiny 0.4.0 now available

February 22, 2013
By
Shiny 0.4.0 now available

Shiny version 0.4.0 is now available on CRAN. The most visible change is that the API has been slightly simplified. Your existing code will continue to work, although Shiny will print messages about how to migrate your code. Migration should be straightforward, as described below. It will take a bit of work to switch to

Read more »

Video: IBM Opinionated Infrastructure Hangout

February 22, 2013
By

Had a great time earlier this week on a Google Hangout as part of the IBM Opinionated Infrastructure series. Moderator James Governor (analyst from RedMonk) kept the conversation lively, with topics ranging from to the value of information to the benefits of predictive analytics and evolution of Hadoop. R gets a mention at several points in the conversation, which...

Read more »

Migrating from SPSS to R #rstats

February 22, 2013
By
Migrating from SPSS to R #rstats

Preface I will every now and then post my experience with R, a package for statistical analyses. I try to show some solutions for common types of analyses or problems you are facing when you start working with R. These … Weiterlesen →

Read more »

Don’t use correlation to track prediction performance

February 22, 2013
By
Don’t use correlation to track prediction performance

Using correlation to track model performance is “a mistake that nobody would ever make” combined with a vague “what would be wrong if I did do that” feeling. I hope after reading this feel a least a small urge to double check your work and presentations to make sure you have not reported correlation where Related posts:

Read more »

What’s my daughter listening to? HTML chart gen in R

February 22, 2013
By

  My daughter, who turns 10 in April, has discovered pop music. She’s been listing to Virgin Radio 99.9, one of our local stations. Virgin provides an online playlist that goes back four days, so I scraped the data and brought it into R. The chart shown at top shows all of the songs played

Read more »

bigcor: Large correlation matrices in R

February 22, 2013
By
bigcor: Large correlation matrices in R

As I am working with large gene expression matrices (microarray data) in my job, it is sometimes important to look at the correlation in gene expression of different genes. It has been shown that by calculating the Pearson correlation between genes, one can identify (by high values, i.e. > 0.9) genes that share a common

Read more »

Why does IFELSE logic work differently on what appear to be the same values?

February 22, 2013
By

 Embarrassingly I'm stumped on this...I have a program in R for looking at grade distributions in my class. I found something weird recently with my 'ifelse' processing. I noticed that my program seemed to be over counting Cs and under counting...

Read more »

Does native R usage exist?

February 22, 2013
By

Note to R users: Users of other languages enjoy spending lots of time discussing the minutiae of the language they use, something R users don’t appear to do; perhaps you spend your minutiae time on statistics which I don’t yet know well enough to spot when it occurs). There follows a minutiae post that may

Read more »

knitr: Changing chunk options like fig.height programmatically, mid-chunk

February 22, 2013
By

Knitr is a great tool for doing reproducible research. You can produce all kinds of output inside a single knitr chunk, e.g. you can write a loop to produce lots of figures or tables. The only catch is if you want your figures to have differing captions, heights, etc (and usually you do). The standard

Read more »

R in the news: Interviews with Revolution Analytics execs

February 22, 2013
By

Here are three recent news articles that feature interviews with members of the Revolution Analytics team talking about the importance of the R language: In Forbes, CEO Dave Rich talks to Gil Press about the business landscape for Big Data. In the article, Dave says: SAS and SPSS remind me of Cobol and Fortran circa 1995. The scientific and...

Read more »

Simulated Power/Precision Analysis

February 21, 2013
By
Simulated Power/Precision Analysis

I cringe when I see research proposals that describe a sophisticated statistical approach, yet do not evaluate this approach in their power/precision/sample size planning. It's often the case that a simplified version of the proposed statistical approach is used instead. Presumably, this is due to the limited availability of power/precision/sample size planning software for sophisticated

Read more »

±∞

February 21, 2013
By
±∞

The Cauchy distribution (?dcauchy in R) nails a flashlight over the number line and swings it at a constant speed from 9 o’clock down to 6 o’clock over to 3 o’clock. (Or the other direction, from 3→6→9.) Then counts Read more »

Removing white space around R figures

February 21, 2013
By

When I want to insert figures generated in R into a LaTeX document, it looks better if I first remove the white space around the figure. Unfortunately, R does not make this easy as the graphs are generated to look good on a screen, not in a document. There are two things that can be done to fix this...

Read more »

Le Monde puzzle [#809]

February 21, 2013
By
Le Monde puzzle [#809]

Another number theory puzzle, completed in the plane to Hamburg: Integers n are called noble if they can be decomposed as a sum n=a+b+… of distinct integers such that 1/a+1/b+…=1. They are called bourgeois if they are not noble but can be decomposed as a sum n=a+b+… of integers, some of them identical, such that

Read more »

Additional Plots on French Breakpoints as Valuation

February 21, 2013
By
Additional Plots on French Breakpoints as Valuation

I feel like there might be some merit in Slightly Different Measure of Valuation using Ken French’s Market(ME) to Book(BE) Breakpoints by percentile to offer an additional valuation metric for US stocks.  I thought some additional plots might he...

Read more »

Elevation Profiles in R

February 21, 2013
By
Elevation Profiles in R

First, let's load up our data. The data are available in a gist. You can convert your own GPS data to .csv by following the instructions here, using gpsbabel.gps <- read.csv("callan.csv",  header = TRUE)Next, we can use the function SMA fr...

Read more »

Sponsors