sab-R-metrics: Kernel Density Smoothing

May 25, 2011
By
sab-R-metrics: Kernel Density Smoothing

Last time I left you, I had gone over some basics of doing loess regression in R. If you remember, loess is a sort of regression that allows wigglyness in your regression of some dependent variable Y on some independent variable X (I will generalize t...

Read more »

sab-R-metrics: Kernel Density Smoothing

May 25, 2011
By
sab-R-metrics: Kernel Density Smoothing

Last time I left you, I had gone over some basics of doing loess regression in R. If you remember, loess is a sort of regression that allows wigglyness in your regression of some dependent variable Y on some independent variable X (I will generalize t...

Read more »

Sweave and pgfSweave in LyX 2.0.x (experimental)

May 25, 2011
By
Sweave and pgfSweave in LyX 2.0.x (experimental)

Please ignore this post completely, because Sweave support has become mature in LyX since 2.0.2, and I no longer plan to add the pgfSweave module in LyX. For pgfSweave users, you may consider the new knitr module (available since 2.0.3) which uses the ...

Read more »

Getting Started with Some Baseball Data

May 24, 2011
By

With all of the discussions (hype?) regarding applied statistics, machine learning, and data science, I have been looking for a go-to source of data unrelated to my day-to-day work. I loved baseball as a kid. I love baseball now. I love baseball stats....

Read more »

Utility Spread and Financial Turbulence Part 2 with Utility Slope

May 24, 2011
By
Utility Spread and Financial Turbulence Part 2 with Utility Slope

THIS IS NOT INVESTMENT ADVICE.  YOU ARE RESPONSIBLE FOR YOUR OWN GAINS AND LOSSES. I did not intend for this to be a two-part series but I just could not be complacent with Utility Spread and Financial Turbulence (for avid readers, there was a sm...

Read more »

A simple Big Data analysis using the RevoScaleR package in Revolution R

May 24, 2011
By
A simple Big Data analysis using the RevoScaleR package in Revolution R

This post from Stephen Weller is part of a series from members of the Revolution Analytics Engineering team. Learn more about the RevoScaleR package, available free to academics as part of Revolution R Enterprise — ed. The RevoScaleR package, installed with Revolution R Enterprise, offers parallel external memory algorithms that help R break through memory and performance limitations. RevoScaleR...

Read more »

A simple Big Data analysis using the RevoScaleR package in Revolution R

May 24, 2011
By
A simple Big Data analysis using the RevoScaleR package in Revolution R

This post from Stephen Weller is part of a series from members of the Revolution Analytics Engineering team. Learn more about the RevoScaleR package, available free to academics as part of Revolution R Enterprise — ed. The RevoScaleR package, installed with Revolution R Enterprise, offers parallel external memory algorithms that help R break through memory and performance limitations. RevoScaleR...

Read more »

R, JAGS and ggplot2

May 24, 2011
By
R, JAGS and ggplot2

Last week a question was asked on the ggplot2 list about using ggplot2 and jags in R (). Here’s what was my answer (a bit updated): Using as an example the school dataset from R2WinBUGS package: Than you can use the mcmcplots package which give a “feel” of ggplot2: If you really want to use

Read more »

Terrain Classification Experiment 2: GRASS, R, and the raster package

May 24, 2011
By
Terrain Classification Experiment 2: GRASS, R, and the raster package

Quick post on terrain classification, based on some trouble folks were having with a previous example on Windows. With the spgrass6 package, raster stacks are created by loading several GRASS files at once: x <- readRAST6(vname=c('beam_sum_m...

Read more »

KDnuggets: R used in 1 in 4 analytics projects

May 24, 2011
By

The most recent KDnuggets poll asked, "Which data mining/analytic tools you used in the past 12 months for a real project". Amongst all commercial and open-source tools, open-source R was the second most-frequently cited, by 23.3% or about 1 in 4 respondents. (The most frequent was the open-source data mining tool, RapidMiner.) See the full poll results at the...

Read more »

Participate in the 2011 Rexer Data Mining Survey

May 23, 2011
By

Last year's Rexer Data Mining Survey reported that R is used by more data miners than any other tool. If you're using R for data mining or data analysis generally, be counted at the 2011 Data Miner Survey (use access code: RL3X2), which closes in early June. Here's some background on the survey: The survey is conducted annually by...

Read more »

News about speeding R up

May 23, 2011
By
News about speeding R up

The most visited post ever on the ‘Og was In{s}a(ne), my report on Radford Neal’s experiments with speeding up R by using different brackets (the second most populat was Ross Ihaka’s comments, “simply start over and build something better”). I just spotted two new entries by Radford on his blog that are bound to rekindle the

Read more »

Quantifying gravitational lensing by dark matter

May 23, 2011
By
Quantifying gravitational lensing by dark matter

The latest prediction competition at Kaggle is literally "out of this world": the goal is to quantify the shape of 2-D images of galaxies from a simulated telescope, to test models for how invisible dark matter in the Universe distorts the images through gravitational lensing (as shown in the image below; see the FAQ for more details). If you're...

Read more »

Utility Spread and Financial Turbulence

May 23, 2011
By
Utility Spread and Financial Turbulence

THIS IS NOT INVESTMENT ADVICE.  YOU ARE RESPONSIBLE FOR YOUR OWN GAINS AND LOSSES. In Long XLU Short SPY Part 2 (More History), I explored the defensive nature of the spread and its potential as a bond substitute in troublesome periods for stocks...

Read more »

May 2011 Guerrilla Classes: Light Bulb Moments

May 23, 2011
By

It's impossible to know what will constitute a light bulb moment for someone else. In the recent Guerrilla classes (GBoot and GCaP), we seemed to be having many more than our usual quota of such moments. So much so, that I decided to keep a list. The first was mine. Asa H. was having trouble...

Read more »

May 2011 Guerrilla Classes: Light Bulb Moments

May 23, 2011
By

It's impossible to know what will constitute a light bulb moment for someone else. In the recent Guerrilla classes (GBoot and GCaP), we seemed to be having many more than our usual quota of such moments. So much so, that I decided to keep a list. The first was mine. Asa H. was having trouble...

Read more »

New York, September 8/9, 2011 – Portfolio Selection and Optimization in Practice

May 23, 2011
By

(This article was first published on Rmetrics blogs, and kindly contributed to R-bloggers) To leave a comment for the author, please follow the link and comment on his blog: Rmetrics blogs. R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web...

Read more »

Summarize Data by Several Variables

May 23, 2011
By
Summarize Data by Several Variables

Here's an example how to conveniently summarize data with the cast function (package reshape). By the way you see how this could be done "in-conveniently" by hand. You also see how a for-loop works and how a matrix is constructed and fill...

Read more »

Preparing RTextTools Beta Release for Catania 2011

Right now our development team is busy preparing a conference release of RTextTools for The 4th Annual Conference of the Comparative Policy Agendas Project at the University of Catania in Sicily. One of the key issues we've had thus far is memory consumption with very large datasets.In the past week we've pushed out a slew of

Read more »

Specific differences between Ledoit-Wolf and factor models

May 22, 2011
By
Specific differences between Ledoit-Wolf and factor models

What can we learn about the difference in structure between a Ledoit-Wolf variance matrix and a corresponding factor model variance? Previously We’ve generated a set of random portfolios with constraints on the risk fractions of a Ledoit-Wolf variance matrix, and a corresponding set of random portfolios with risk fraction constraints from a statistical factor model. … Continue reading...

Read more »

Legends in ggplot2

May 22, 2011
By
Legends in ggplot2

A simple plot takes a few lines of coding:g1 <- ggplot(d, aes(birth.year))g2 <- g1 + geom_line(aes(y=alive0, linetype="Famale")) +  geom_line(aes(y=alive1, linetype="Male")) + scale_linetype_discrete(name = "")g3 <- g2 + geom_point(aes(y=...

Read more »

Legends in ggplot2

May 22, 2011
By
Legends in ggplot2

A simple plot takes a few lines of coding:g1 <- ggplot(d, aes(birth.year))g2 <- g1 + geom_line(aes(y=alive0, linetype="Famale")) +  geom_line(aes(y=alive1, linetype="Male")) + scale_linetype_discrete(name = "")g3 <- g2 + geom_point(aes(y=...

Read more »

Music file graphs with R

May 22, 2011
By
Music file graphs with R

Today we will use R to extract some interesting summary statistics regarding the music files stored in the computer. For all mp3 files I keep certain metadata in their ID3 tag. We will use this information to explore the distribution of music files with respect to the year of release. All the following are done

Read more »

Comparing Student outcomes with Research Output (using R and ggplot2′s text labels)

May 22, 2011
By
Comparing Student outcomes with Research Output (using R and ggplot2′s text labels)

In this post, I take a look at some league table data recently published by the Guardian. I also provide …Continue reading »

Read more »

Terry’s spiel

May 22, 2011
By
Terry’s spiel

“We don’t need likelihood functions; we just need to know how to simulate from (…) We don’t need models with sufficient statistics; we just need summary statistics (…) We don’t need to be Bayesian; we just need to be approximately so. We don’t need theory to tell us our method works; we just need

Read more »

Slowing down matrix multiplication in R

May 21, 2011
By
Slowing down matrix multiplication in R

After I realized that some aspects of R’s implementation are rather inefficient, one of the first things I looked at was matrix multiplication.  There I found a huge performance penalty for many matrix multiplies, a penalty which remains in the current version, 2.13.0.  As discussed below, eliminating this penalty speeds up long vector dot products

Read more »

The distribution of interestingness

The distribution of interestingness

On April 22, David Landy posed a question about the distribution of interestingness values in response to my April 3rd post on “Interestingness Measures.”  He noted that the survey paper by Hilderman and Hamilton that I cited there makes the following comment:“Our belief is that a useful measure of interestingness should generate index values that are reasonably distributed throughout...

Read more »

The distribution of interestingness

The distribution of interestingness

On April 22, David Landy posed a question about the distribution of interestingness values in response to my April 3rd post on “Interestingness Measures.”  He noted that the survey paper by Hilderman and Hamilton that I cited there makes the following comment:“Our belief is that a useful measure of interestingness should generate index values that are reasonably distributed throughout...

Read more »