Criticism 5 of NHST: p-Values Measure Effort, Not Truth

July 17, 2012
By
Criticism 5 of NHST: p-Values Measure Effort, Not Truth

Introduction In the third installment of my series of criticisms of NHST, I focused on the notion that a p-value is nothing more than a one-dimensional representation of a two-dimensional space in which (1) the measured size of an effect and (2) the precision of this measurement have been combined in such a way that

Read more »

Optical Art with R

July 16, 2012
By
Optical Art with R

Last week, in a post entitled Bridget Riley exhibition in London, the author Markus Gesmann wrote an R script reproducing one of Riley's famous art pieces: Movement in Squares.This reminded me of my own first "brush" with Op art. It was in art class ye...

Read more »

Factor Attribution to improve performance of the 1-Month Reversal Strategy

July 16, 2012
By
Factor Attribution to improve performance of the 1-Month Reversal Strategy

Today I want to show how to use Factor Attribution to boost performance of the 1-Month Reversal Strategy. The Short-Term Residual Reversal by D. Blitz, J. Huij, S. Lansdorp, M. Verbeek (2011) paper presents the idea and discusses the results as applied to US stock market since 1929. To improve 1-Month Reversal Strategy performance authors

Read more »

Data mining for network security and intrusion detection

July 16, 2012
By
Data mining for network security and intrusion detection

In preparation for “Haxogreen” hackers summer camp which takes place in Luxembourg, I was exploring network security world. My motivation was to find out how data mining is applicable to network security and intrusion detection. Flame virus, Stuxnet, Duqu proved that static, signature based security systems are not able to detect very advanced, government sponsored

Read more »

Convenient access to Gapminder’s datasets from R

July 16, 2012
By
Convenient access to Gapminder’s datasets from R

In April, Hans Rosling examined the influence of religion on fertility. I used R to replicate a graphic of his talk:> library(datamart) > gm <- gapminder() > #queries(gm) > # > # babies per woman > tmp <- query(gm, "TotalFertilityRate") > babies <- as.vector(tmp) > names(babies) <- names(tmp) > babies <- babies > countries <- names(babies) > # > # income per capita, PPP adjusted > tmp <- query(gm, "IncomePerCapita") >...

Read more »

Holt-Winters forecast using ggplot2

July 16, 2012
By
Holt-Winters forecast using ggplot2

R has great support for Holt-Winter filtering and forecasting. I sometimes use this functionality, HoltWinter & predict.HoltWinter, to forecast demand figures based on historical data. Using the HoltWinter functions in R is pretty straightforward. Let's say our dataset looks as follows; demand <- ts(BJsales, start = c(2000, 1), frequency =  Read more...

Read more »

Using integer programming in R to optimize cargo loads

July 16, 2012
By
Using integer programming in R to optimize cargo loads

Linear Programming is a mathematical technique used to find the values of some variables (within the bounds of some defined constraints) to find the maximum value of a quantity. For example, consider this problem from the FishyOperations blog: A trading company is looking for a way to maximize profit per transportation of their goods. The company has a train...

Read more »

Example 9.38: dynamite plots, revisited

July 16, 2012
By
Example 9.38: dynamite plots, revisited

Dynamite plots are a somewhat pejorative term for a graphical display where the height of a bar indicates the mean, and the vertical line on top of it represents the standard deviation (or standard error). These displays are commonly found in many scientific disciplines, as a way of communicating group differences in means. Many...

Read more »

Holt-Winters forecast using ggplot2

July 16, 2012
By
Holt-Winters forecast using ggplot2

R has great support for Holt-Winter filtering and forecasting. I sometimes use this functionality, HoltWinter & predict.HoltWinter, to forecast demand figures based on historical data. Using the HoltWinter functions in R is pretty straightforward. Let's say our dataset looks as follows; demand <- ts(BJsales, start = c(2000, 1), frequency = 12) plot(demand) Now I pass the timeseries object to HoltWinter and...

Read more »

Best Books for Social Scientists on Bayesian Analysis

July 16, 2012
By

I list and discuss the three books on Bayesian analysis that I recommend to social scientists.

Read more »

2 dimensions of portfolio diversity

July 16, 2012
By
2 dimensions of portfolio diversity

Portfolio diversity is a balancing act. Previously The post “Portfolio diversity” talked about the role of the correlation between assets and the portfolio.  The current post fills a hole in that post. The 2 dimensions asset-portfolio correlation Each asset in the universe has a correlation with the portfolio.  If there are any assets that have … Continue reading...

Read more »

Project Euler — problem 14

July 16, 2012
By
Project Euler — problem 14

It’s Monday today! It’s work day! And I’ve already worked on computer for two hours. Time for a break, which is the 14th problem of Project Euler. The following iterative sequence is defined for the set of positive integers: n n/2 (n … Continue reading →

Read more »

Map biodiversity records with rgbif and dismo packages in R

July 15, 2012
By
Map biodiversity records with rgbif and dismo packages in R

In the earlier post we generated maps from GBIF biodiversity records using maps and ggplot2 packages. We used world map with country borders for that. Now we will generate maps with google maps as base layer using dismo package. Like earlier we download data for Danaus chrysippus from GBIF using occurrencelist function into a data

Read more »

Map biodiversity records with rgbif and dismo packages in R

July 15, 2012
By
Map biodiversity records with rgbif and dismo packages in R

In the earlier post we generated maps from GBIF biodiversity records using maps and ggplot2 packages. We used world map with country borders for that. Now we will generate maps with google maps as base layer using dismo package. Like earlier we download data for Danaus chrysippus from GBIF using occurrencelist function into a data

Read more »

A simple Approximate Bayesian Computation MCMC (ABC-MCMC) in R

July 15, 2012
By
A simple Approximate Bayesian Computation MCMC (ABC-MCMC) in R

Approximate Bayesian Computing and similar techniques, which are based on calculating approximate likelihood values based on samples from a stochastic simulation model, have attracted a lot of attention in the last years, owing to their promise to provide a general statistical technique for stochastic processes of any complexity, without the limitations that apply to “traditional”…

Read more »

Sourcing an R Script from Dropbox

July 14, 2012
By

Working on my R bootcamp materials and I thought it would be handy to get the bootcamp computers setup by sourcing an R script that will install all necessary non-core packages in it. The problem? How to deploy this script efficiently. A quick method w...

Read more »

Trends in AL run scoring (using R)

July 14, 2012
By

I have started to explore the functionality of R, the statistical and graphics programming language. And with what better data to play than that of Major League Baseball?There have already been some good examples of using R to analyze baseball data. The most comprehensive is the on-going series at The Prince of Slides (Brian Mills, aka...

Read more »

Visualization of a Twitter retweet network: art or useful data visualization?

July 14, 2012
By
Visualization of a Twitter retweet network: art or useful data visualization?

This is a Twitter retweet network. When people tweet, they may get retweeted by other people, repeating the message for their followers to view. Each retweet is a one-way flow of information that links the first person to each person who retweeted them (forwarded the original tweet into their own network). So, in this visualization

Read more »

Expected Shortfall Portfolio Optimization in R using nloptr

July 14, 2012
By
Expected Shortfall Portfolio Optimization in R using nloptr

I have previously done examples of QP optimization in for financial portfolios.  I am not a huge fan of variance optimization in finance.  Return distributions are not normal, are often skewed, and are usually leptokurtic.  In plain spea...

Read more »

Using R for classification in small-N studies

July 14, 2012
By
Using R for classification in small-N studies

Rick Davies just wrote an interesting post which combined thoughts on QCA (and multi-valued QCA or mvQCA) and classification trees with thoughts on INUS causation and classification trees. The question was something like: how can we look at a small-to-medium set of cases (like a dozen or a hundred countries or development programs) and tease

Read more »

Using R for classification in small-N studies

July 14, 2012
By

Rick Davies just wrote an interesting post which combined thoughts on QCA (and multi-valued QCA or mvQCA) and classification trees with thoughts on INUS causation and classification trees. The question was something like: how can we look at...

Read more »

Linear programming in R: an lpSolveAPI example

July 14, 2012
By
Linear programming in R: an lpSolveAPI example

First of all, a shout out to R-bloggers for adding my feed to their website! Linear programming is a valuable instrument when it comes to decision making. This post shows how R in conjunction with the lpSolveAPI package, can be used to build a linear programming model and to analyse  Read more »

Linear programming in R: an lpSolveAPI example

July 14, 2012
By
Linear programming in R: an lpSolveAPI example

First of all, a shout out to R-bloggers for adding my feed to their website! Linear programming is a valuable instrument when it comes to decision making. This post shows how R in conjunction with the lpSolveAPI package, can be used to build a linear programming model and to analyse its results. The lpSolveAPI package provides a complete implementation of the lp_solve...

Read more »

Smartphone operating system share mosaic plot

July 13, 2012
By
Smartphone operating system share mosaic plot

(This article was first published on Actuarially (Matt Malin), and kindly contributed to R-bloggers) Smartphone operating system share mosaic plot Author: Matt Malin The increasing dominance of smartphones across the market is a very common topic in technology and news sites, with analysis of operating system share and phone types often shown in the media. Stumbling across this article...

Read more »

Processing Public Data with R

July 13, 2012
By
Processing Public Data with R

I use R aplenty in analysis and thought it might be worthwhile for some to see the typical process a relative newcomer goes through in extracting and analyzing public datasets In this instance I happen to be looking at Canadian air pollution statistics. The data I am interested in is available on the Ontario Ministry

Read more »

Applications of R at Google

July 13, 2012
By

At a talk I saw at the useR!2012 conference last month, Googler Karl Millar estimated that there are at least 200 active R users at Google, plus another 300+ occasional users participating in Google's internal R support list. But what are all these Google employees doing with R? A post from the Google Research team published on Google+ yesterday...

Read more »

influence.ME updated to version 0.9

July 13, 2012
By

Influence.ME is an R extension package for R that provides tools for detecting influential data in multilevel regression models. It is developed by Rense Nieuwenhuis (that’s me), Manfred te Grotenhuis, and Ben Pelzer. Recently, a new version (0.9) was uploaded ...

Read more »

Analysing time course microarray data using Bioconductor: a case study using yeast2 Affymetrix arrays

July 13, 2012
By
Analysing time course microarray data using Bioconductor: a case study using yeast2 Affymetrix arrays

A few years ago I was involved in analysing some time-course microarray data. Our biological collaborators were interested in how we analysed their data, so this lead to a creation of tutorial, which in turn lead to a paper. When we submitted the paper, one the referees “suggested” that we write the paper using Sweave;

Read more »