Twitter analysis of air pollution in Beijing

July 31, 2012
By
Twitter analysis of air pollution in Beijing

One of the air pollution detection machine in Beijing (at the American Embassy) is connected to Twitter and tweet about the air quality in real time. By default the machine in Beijing output the 24hr summary PM2.5 air pollution information. What is PM2.5 is define here Next will be to compare the...

Read more »

Fun with geocoding and mapping in JGR

July 31, 2012
By
Fun with geocoding and mapping in JGR

For a recent project I had to do some mapping of addresses, but I didn’t have there lat/lons do use the Deducer and DeducerSpatial packages in R JGR.  After frustrating myself trying to adapt this code from stackoverflow.com, I found a much easier way of geocoding using the dismo and XML packages in R. First

Read more »

Text and symbol size in multi-panel figures in R

July 31, 2012
By
Text and symbol size in multi-panel figures in R

In R, there are a couple of packages that allow you to create multi-panel figures (see examples here and here), but, of course, you can also make multi-panel figures in the base package*. Below I provide a simple example for creating a multi-panel figure in the R base package with the focus on making the

Read more »

Edge Prediction in a Social Graph: My Solution to Facebook’s User Recommendation Contest on Kaggle

July 31, 2012
By
Edge Prediction in a Social Graph: My Solution to Facebook’s User Recommendation Contest on Kaggle

A couple weeks ago, Facebook launched a link prediction contest on Kaggle, with the goal of recommending missing edges in a social graph. I love investigating social networks, so I dug around a little, and since I did well enough to score one of the coveted prizes, I’ll share my approach here. (For some background, the contest provided...

Read more »

Application of Horizon Plots

July 31, 2012
By
Application of Horizon Plots

for background please see prior posts Horizon Plot Already Available and Cubism Horizon Charts in R Good visualization simplifies, and stories are better told with effective and pretty visualizations. Although horizon plots are not immediately intuitiv...

Read more »

Multidimensional Scaling and Company Similarity

July 30, 2012
By
Multidimensional Scaling and Company Similarity

Background and ideaOften we are looking at a particular sector, and want to get a quick overview of a group of companies relative to one another. I thought I might apply Multidimensional Scaling (MDS) to various financial ratios and see if it...

Read more »

Making R graphics legible in presentation slides

July 30, 2012
By
Making R graphics legible in presentation slides

I only visited a few JSM sessions today, as I’ve been focused on preparing for my own talk tomorrow morning. However, I went to several talks in a row which all had a common problem that made me cringe: graphics … Continue reading →

Read more »

Yet Another Forecast Dashboard

July 30, 2012
By
Yet Another Forecast Dashboard

Recently, I came across quite a few examples of time series forecasting using R. Here are some examples: Time series cross-validation 4: forecasting the S&P 500 Holt-Winters forecast using ggplot2 Autoplot: Graphical Methods with ggplot2 Large-Scale Parallel Statistical Forecasting Computations in R (2011) by M. Stokely, F. Rohani, E. Tassone Forecasting time series data ARIMA

Read more »

Split-plot 2: let’s throw in some spatial effects

July 30, 2012
By
Split-plot 2: let’s throw in some spatial effects

Disappeared for a while collecting frequent flyer points. In the process I ‘discovered’ that I live in the middle of nowhere, as it took me 36 hours to reach my conference destination (Estoril, Portugal) through Christchurch, Sydney, Bangkok, Dubai, Madrid … Continue reading →

Read more »

Big data, big analytics, big opportunity

July 30, 2012
By
Big data, big analytics, big opportunity

Data, data, every where Nor any byte to think The world today is awash with data. Corporations, governments, and individuals are busy generating petabytes of data on culture, economy, environment, religion, and society.  While data has become abundant and ubiquitous, data analysts needed to turn raw data into knowledge are in fact in short...

Read more »

Forecasting the Olympics

July 30, 2012
By

Forecasting sporting events is a growing research area. The International Journal of Forecasting even had a special issue on sports forecasting a couple of years ago. The London 2012 Olympics has attracted a few forecasters trying to predict medal counts, world records, etc. Here are some of the articles I’ve seen. Which Olympic records get shattered?, Nate Silver, New...

Read more »

A prediction for the Olympic men’s 100m sprint

July 30, 2012
By
A prediction for the Olympic men’s 100m sprint

R user Markus Gesmann used the gold-winning times from the Olympic Men's 100m sprint since 1990 as the basis of the following prediction for the London Games: My simple log-linear model forecasts a winning time of 9.68 seconds, which is 1/100 of a second faster than Usain Bolt's winning time in Beijing in 2008, but still 1/10 of a...

Read more »

Archetypal Analysis

July 30, 2012
By
Archetypal Analysis

Thinking Strategically about Customer HeterogeneityIronically, market segmentation, whose motto is "one size does not fit all," seems to rely almost exclusively on one definition of what constitutes a segment.  Borrowing its definition f...

Read more »

Machine learning for better homicide counts in Ciudad Juarez

July 30, 2012
By
Machine learning for better homicide counts in Ciudad Juarez

Photo Credit: Jesús Villaseca Pérez Ever since March 2008 Ciudad Juárez began to register an alarming number of homicides becoming Mexico's most violent city. According to the Mexican vital statistics system Ciudad Juárez (coterminous with the Juárez municipality) went from having just 202 murders in 2007 to 1,616 in 2008, 2,397 in...

Read more »

Blue Jay and Scrub Jay : Using rvertnet to check the distributions in R

July 30, 2012
By
Blue Jay and Scrub Jay : Using rvertnet to check the distributions in R

As part of my Google Summer of Code, I am also working on another package for R called rvertnet. This package is a wrapper in R for VertNet websites. Vertnet is a vertebrate distributed database network consisting of FishNet2, MaNIS, HerpNET, and ORNIS. Out of that currently Fishnet, HerpNET and ORNIS have their v2 portals serving data. rvertnet has functions now to access

Read more »

Blue Jay and Scrub Jay : Using rvertnet to check the distributions in R

July 30, 2012
By
Blue Jay and Scrub Jay : Using rvertnet to check the distributions in R

As part of my Google Summer of Code, I am also working on another package for R called rvertnet. This package is a wrapper in R for VertNet websites. Vertnet is a vertebrate distributed database network consisting of FishNet2, MaNIS, HerpNET, and ORNIS. Out of that currently Fishnet, HerpNET and ORNIS have their v2 portals serving data. rvertnet has functions now to access

Read more »

Returns with negative net asset values

July 30, 2012
By
Returns with negative net asset values

How are returns calculated when net asset value goes negative? Previously In “A tale of two returns” we highlighted the similarities and differences of log returns versus simple returns. Positive valuation We create — in R — an example of net asset value at four times: > nav1 <- c(1000, 900, 950, 1010) > nav1 … Continue reading...

Read more »

unsupervised classification of a raster in R: the layer-stack or part one.

July 29, 2012
By
unsupervised classification of a raster in R: the layer-stack or part one.

In my last post I was explaining the usage of QGis to do a layerstack of a Landsat-scene. Due to the fact that further research and trying out resulted in frustration I decided to stick with a software I know well: R. So download the needed layers here and open up your flavoured version of

Read more »

Extracting upstream regions of a RefSeq human gene list in R using Bioconductor

July 29, 2012
By

Suppose that you want to do local mapping of upstream regions of a given RefSeq IDs in a particular genome in R using Bioconductor. Download the script here.In this case, you may take a look at the Bioconductor AnnotationData Packages here: http:/...

Read more »

Community Detection in Networks with R

Community Detection in Networks with R

I mainly post this visualization because I think it’s pretty. It reminds a little of the work by the famous Dutch painter Mondrian. The complete matrix can be found here. The plot is a heatmap of an adjacency matrix generated by a weighted dir...

Read more »

ScraperWiki in R

July 29, 2012
By

ScraperWiki describes itself as an online tool for gathering, cleaning and analysing data from the web. It is a programming oriented approach, users can implement ETL processes in Python, PHP or Ruby, share these processes among the community (or pay for privacy) and schedule automated runs. The software behind the service is open source, and there is...

Read more »

Hangman in R: A learning experience

July 28, 2012
By
Hangman in R: A learning experience

I love when people take a sophisticated tool and use it to play video games. Take R for example. I first saw someone create a game for R at talk.stats.com. My friend Dason inspired me to more efficiently waste time … Continue reading →

Read more »

My New Book: Developing, Deploying and Debugging Multi-Armed Bandit Algorithms

July 28, 2012
By

I’m happy to announce that I’ve started writing a new book for O’Reilly, which will focus on teaching readers how to use Multi-Armed Bandit Algorithms to build better websites. My hope is that the book can help web developers build up an intuition for the core conundrum facing anyone who wants to build a successful

Read more »

Petrol prices adjusted for inflation

July 28, 2012
By
Petrol prices adjusted for inflation

Petrol prices adjusted for inflation (Perth, Western Australia) The thought for this sprung to mind when I saw petrol drop below $1.20 per litre the other day, and it made me think, I remember paying that when I got to … Continue reading →

Read more »

Hi R and Axys, I’m d3.js “Nice to Meet You” (On the Iphone)

July 27, 2012
By
Hi R and Axys, I’m d3.js “Nice to Meet You” (On the Iphone)

I am still definitely in the proof of concept stage, but as I progress I get more excited about the prospects of combining d3.js with R and Axys through Bryan Lewis’ really nice R websockets package (even nicer now that he has added the daemonize fun...

Read more »

R is reported as being used by about half of all data miners in the 2011 Data Miners Survey

July 27, 2012
By
R is reported as being used by about half of all data miners in the 2011 Data Miners Survey

by Yanchang Zhao, RDataMining.com R is reported as now being used by close to half of all data miners (47%) in the 2011 Data Miners Survey by Rexer Analytics. Below is picked up from the survey highlights regarding data mining … Continue reading →

Read more »

My no loops in R hair shirt

July 27, 2012
By

Being professional involved with analyzing source code I get to work with a much larger number of programming languages than most people. There is a huge difference between knowing the intricate details of the semantics of a language and being able to fluently program in a language like a native developer. There are languages whose

Read more »

Revolution Analytics at JSM 2012

July 27, 2012
By

Revolution Analytics is proud to once again be a gold sponsor and Wi-Fi sponsor of the JSM 2012 conference in San Diego, the largest gathering of statisticians, biostatisticians, analysts, data miners and data scientists in the world. The conference begins on Sunday, and you'll find the Revolution Analytics team in the exhibit hall. Drop by to take a look...

Read more »

rApache 1.2.0 Released

July 27, 2012
By
rApache 1.2.0 Released

With this release comes a minor change in behavior: for requests that have been configured with RFileEval, RFileHandler, or using the r-script handler, rApache will set the working directory to the file’s directory. For instance with a Rook deployment like this: <Location /hmisc> SetHandler r-handler ...

Read more »