Monthly Archives: July 2012

Big data, big analytics, big opportunity

July 30, 2012
By
Big data, big analytics, big opportunity

Data, data, every where
Nor any byte to think

The world today is awash with data. Corporations, governments, and individuals are busy generating petabytes of data on culture, economy, environment, religion, and society.  While data has become abundant and ubiquitous, data analysts needed to turn raw data into knowledge are in fact in short...

Read more »

Forecasting the Olympics

July 30, 2012
By

Forecasting sporting events is a growing research area. The International Journal of Forecasting even had a special issue on sports forecasting a couple of years ago. The London 2012 Olympics has attracted a few forecasters trying to predict medal counts, world records, etc. Here are some of the articles I’ve seen. Which Olympic records get shattered?, Nate Silver, New...

Read more »

A prediction for the Olympic men’s 100m sprint

July 30, 2012
By
A prediction for the Olympic men’s 100m sprint

R user Markus Gesmann used the gold-winning times from the Olympic Men's 100m sprint since 1990 as the basis of the following prediction for the London Games: My simple log-linear model forecasts a winning time of 9.68 seconds, which is 1/100 of a second faster than Usain Bolt's winning time in Beijing in 2008, but still 1/10 of a...

Read more »

Archetypal Analysis

July 30, 2012
By
Archetypal Analysis

Thinking Strategically about Customer HeterogeneityIronically, market segmentation, whose motto is "one size does not fit all," seems to rely almost exclusively on one definition of what constitutes a segment.  Borrowing its definition f...

Read more »

Machine learning for better homicide counts in Ciudad Juarez

July 30, 2012
By
Machine learning for better homicide counts in Ciudad Juarez

Photo Credit: Jesús Villaseca Pérez
Ever since March 2008 Ciudad Juárez began to register an alarming number of homicides becoming Mexico's most violent city. According to the Mexican vital statistics system Ciudad Juárez (coterminous with the Juárez municipality) went from having just 202 murders in 2007 to 1,616 in 2008, 2,397 in...

Read more »

Blue Jay and Scrub Jay : Using rvertnet to check the distributions in R

July 30, 2012
By
Blue Jay and Scrub Jay : Using rvertnet to check the distributions in R

As part of my Google Summer of Code, I am also working on another package for R called rvertnet. This package is a wrapper in R for VertNet websites. Vertnet is a vertebrate distributed database network consisting of FishNet2, MaNIS, HerpNET, and ORNIS. Out of that currently Fishnet, HerpNET and ORNIS have their v2 portals serving data. rvertnet has functions now to access

Read more »

Returns with negative net asset values

July 30, 2012
By
Returns with negative net asset values

How are returns calculated when net asset value goes negative? Previously In “A tale of two returns” we highlighted the similarities and differences of log returns versus simple returns. Positive valuation We create — in R — an example of net asset value at four times: > nav1 <- c(1000, 900, 950, 1010) > nav1 … Continue reading...

Read more »

unsupervised classification of a raster in R: the layer-stack or part one.

July 29, 2012
By
unsupervised classification of a raster in R: the layer-stack or part one.

In my last post I was explaining the usage of QGis to do a layerstack of a Landsat-scene. Due to the fact that further research and trying out resulted in frustration I decided to stick with a software I know well: R. So download the needed layers here and open up your flavoured version of

Read more »

Community Detection in Networks with R

Community Detection in Networks with R

I mainly post this visualization because I think it’s pretty. It reminds a little of the work by the famous Dutch painter Mondrian. The complete matrix can be found here. The plot is a heatmap of an adjacency matrix generated by a weighted dir...

Read more »

ScraperWiki in R

July 29, 2012
By

ScraperWiki describes itself as an online tool for gathering, cleaning and analysing data from the web. It is a programming oriented approach, users can implement ETL processes in Python, PHP or Ruby, share these processes among the community (or pay for privacy) and schedule automated runs. The software behind the service is open source, and there is...

Read more »