A Kernel Density Approach to Outlier Detection

March 13, 2011
By
A Kernel Density Approach to Outlier Detection

I describe a kernel density approach to outlier detection on small datasets. In particular, my model is the set of prices for a given item that can be found online. Introduction Suppose you’re searching online for the cheapest place to … Continue reading →

Read more »

Eigensheep

March 13, 2011
By
Eigensheep

Aaron Koblin’s Sheep Market visualization is an awesome use of Mechanical Turk. But it’d be even more awesome if the grid were ordered, so inspired by the use of eigenfaces in facial recognition, I decided to try projecting the sheep … Continue reading →

Read more »

Counting Clusters

March 13, 2011
By
Counting Clusters

Given a set of numerical datapoints, we often want to know how many clusters the datapoints form. Two practical algorithms for determining the number of clusters are the gap statistic and the prediction strength. Gap Statistic The gap statistic algorithm … Continue reading →

Read more »

RStudio 0.92.44 Release: Try It! You’ll Be Surprised!

March 13, 2011
By
RStudio 0.92.44 Release: Try It! You’ll Be Surprised!

I recently downloaded RStudio’s v0.92.44 release, and, I must say, it’s light! I think I could even run it on a netbook, which is great for analysis on-the-go. I’ll likely uninstall Eclipse-StatET at this point and go with RStudio. Not only is it...

Read more »

RStudio 0.92.44 Release: Try It! You’ll Be Surprised!

March 13, 2011
By
RStudio 0.92.44 Release: Try It! You’ll Be Surprised!

I recently downloaded RStudio’s v0.92.44 release, and, I must say, it’s light! I think I could even run it on a netbook, which is great for analysis on-the-go. I’ll likely uninstall Eclipse-StatET at this point and go with RStudio. Not only is it...

Read more »

Code: LaTeX tables for lme4 models

March 13, 2011
By
Code: LaTeX tables for lme4 models

I have recently discovered memisc, an extremely useful R package by Martin Elff (see his memisc page here). The package contains any number of useful functions, and is particularly good at helping one manage and recode survey data. However, by far my … Continue reading →

Read more »

Using R for Introductory Statistics, The Geometric distribution

March 13, 2011
By
Using R for Introductory Statistics, The Geometric distribution

We've already seen two discrete probability distributions, the binomial and the hypergeometric. The binomial distribution describes the number of successes in a series of independent trials with replacement. The hypergeometric distribution describes the number of successes in a series of independent trials without replacement. Chapter 6 of Using R introduces the geometric distribution - the time to...

Read more »

Using R for Introductory Statistics, The Geometric distribution

March 13, 2011
By
Using R for Introductory Statistics, The Geometric distribution

We've already seen two discrete probability distributions, the binomial and the hypergeometric. The binomial distribution describes the number of successes in a series of independent trials with replacement. The hypergeometric distribution describes th...

Read more »

Legendary Plots

March 12, 2011
By
Legendary Plots

I was recently pointed in the direction of a thermal comfort model by the engineering company Arup (p27–28 of this pdf). Figure 3 at the top of p28 caught my attention. It’s mostly a nice graph; there’s not too much junk in it. One thing that struck me was that there is an awful lot

Read more »

A new series of mishaps

March 12, 2011
By
A new series of mishaps

Following the slight difficulties of last week, I had a hard week on the computer front: indeed, on Monday, I received my 2007 macbook from the repair shop, with a new video card, courtesy of Apple. Unfortunately, this started a series of problems. First, the old macbook stopped recognizing the NVIDIA video and, while it

Read more »

A quick look at #march11 / #saudi tweets

March 12, 2011
By
A quick look at #march11 / #saudi tweets

Well, so much for that #march11 #Saudi day of rage.  Whether it was really the "tempest in a teacup" that  Prince Al-Waleed suggested on CNBC (video below, transcript here) or not, the oil complex and Saudi markets seem to have shrugged … Continue reading →

Read more »

Ask R not to create a local directory tree

March 12, 2011
By

I don't like R to create a local directory tree in my home directory because new packages will automatically be installed into that directory. The way to do this is to modify the "/usr/local/lib64/R/etc/Renviron" and mark the line "R_LIBS_USER=${R_LIBS...

Read more »

Ask R not to create a local directory tree

March 12, 2011
By

I don't like R to create a local directory tree in my home directory because new packages will automatically be installed into that directory. The way to do this is to modify the "/usr/local/lib64/R/etc/Renviron" and mark the line "R_LIBS_USER=${R_LIBS...

Read more »

sab-R-metrics: Multiple Regression and Interactions

March 12, 2011
By

Last time, I covered ordinary least squares with a single variable. This time, I'll extend this to using multiple predictor variables in a regression, interacting terms in R, and start thinking about using polynomials of certain terms in the regression (like Age and Age Squared). This should be a pretty straight forward tutorial, especially if you've got...

Read more »

sab-R-metrics: Multiple Regression and Interactions

March 12, 2011
By

Last time, I covered ordinary least squares with a single variable. This time, I'll extend this to using multiple predictor variables in a regression, interacting terms in R, and start thinking about using polynomials of certain terms in the regression (like Age and Age Squared). This should be a pretty straight forward tutorial, especially if you've got...

Read more »

How to Vectorize Nested Loop in R?

Could any R expert here help me to vectorize my for loop? Thanks in advance for your favor. The reason I am in trouble is the variable inside my "for" function are updated after each loop, which makes me feel difficult to use lapply, sapply or whatever. Simplifed codes are listed below:for (i in 1:N) { #N could be...

Read more »

Things I wish I’d known before I started using R

March 12, 2011
By
Things I wish I’d known before I started using R

I’ve been using R for a couple of years now.  This post is aimed at me a couple of years ago, or you if you’re just starting to use R and are pressed for time.  Here’s some things I wish I’d known in early 2009. Use a naming convention read.csv is a great function, but

Read more »

RcppArmadillo 0.2.16

March 11, 2011
By

Conrad Sanderson continues an active release schedule for his wonderful Armadillo templated C++ library for linear algebra; release 1.1.8 just came out yesterday. So I made a new release 0.2.16 of RcppArmadillo, our Rcpp-based integration into R. No ...

Read more »

Survey: R used by more data miners than any other tool

March 11, 2011
By

According to respondents of the 2010 Rexer Analytics Data Miner Survey, open source R is the most commonly-used analysis tool amongst data miners: After a steady rise across the past few years, the open source data mining software R overtook other tools to become the tool used by more data miners (43%) than any other. STATISTICA, which has also...

Read more »

Plotting Indifference Curves with R Contour Function

March 11, 2011
By
Plotting Indifference Curves with R Contour Function

The following post at Constructing Difference Curves - Part 3 from economics.about.com provides a discussion on indifference curves (but actually I think they are isoquants) and how to construct them. I think I have a grasp on how to do this in R if yo...

Read more »

Script for Geostatistics with R

March 11, 2011
By

I received requests for the script used during the tutorial.All the material is available in the main page of the course.However, in order to facilitate the availability of the scripts to all the viewers of this blog I've put the link to donwnload them...

Read more »

Script for Geostatistics with R

March 11, 2011
By

I received requests for the script used during the tutorial.All the material is available in the main page of the course.However, in order to facilitate the availability of the scripts to all the viewers of this blog I've put the link to donwnload them...

Read more »

ggheat : a ggplot2 style heatmap function

March 11, 2011
By
ggheat  :  a ggplot2 style heatmap function

I hope the code here is fairly self-explanatory with the inset annotations. I feel this is just a bit 'prettier' than heatmap.2 and has for me the right balance of options and extensibility. I have also found it difficult to produce high quality plots...

Read more »

Programming Outside the Box: A Recursive Function in R

March 11, 2011
By
Programming Outside the Box: A Recursive Function in R

I heard a talk given by Ruby creator Yukihiro Matsumoto where he was rambling about how he came about becoming a programming language developer. He mentioned an important milestone as being his grasp of the idea behind recursion. I've kept that in mind...

Read more »

Review: R Graphs Cookbook by Hrishi Mittal

March 11, 2011
By
Review: R Graphs Cookbook by Hrishi Mittal

Summary: Very useful for reference while producing graphs, and very comprehensive (including heat-maps, 3D graphs and maps). Reference: Mittal, H. V., 2011, R Graph Cookbook, Packt Publishing, Birmingham, UK, 272 pages,  Publisher’s Website As a scientist I often need to plot graphs of my data, so I am keen to learn more about how to

Read more »

Copula Functions, R, and the Financial Crisis

March 10, 2011
By
Copula Functions, R, and the Financial Crisis

From: In defense of the Gaussian copula, The Economist"The Gaussian copula provided a convenient way to describe a relationship that held under particular conditions. But it was fed data that reflected a period when housing prices were not correlated to the extent that they turned out to be when the housing bubble popped."Decisions about...

Read more »

Analyzing big data with Revolution R Enterprise

March 10, 2011
By

This post from Sherry LaMonica is the first in a series from members of the Revolution Analytics Engineering team — ed. Do you know about the big data capabilities in the RevoScaleR package, included with every Revolution R Enterprise installation? RevoScaleR provides a framework for fast and efficient multi-core processing of large data sets. You can visualize and model...

Read more »

Comparison of GISS LOTAs During 5 El Nino – La Nina Cycles

March 10, 2011
By
Comparison of GISS LOTAs During 5 El Nino – La Nina Cycles

In this post I compare GISS LOTAs during 5 El Nino – La Nina cycles (2010, 1998, 19883, 1973 and 1970). El Nino – La Nina Cycles In a previous post I showed the Nino 34 SSTA cycles for 2010, … Continue reading →

Read more »

An easier to use IV regression command in R

March 10, 2011
By
An easier to use IV regression command in R

Update: I have added some functionality to my ivregress() command. Check out my newer post here.After I posted my last video tutorial on how to use my IV regression function, I received a comment asking why I didn't write the command a different way t...

Read more »