Creating a zoomable map of tweets with R

June 3, 2013
By

Languages tweeted around Germany: red, blue, green, yellow, grey are for German, French, English, Dutch and other  respectively. See here for a zoomable version.Motivated by the project twitter languages of New York I wanted to...

Understanding the value of Predictive Analytics on Web Data

June 3, 2013
By

In this blogpost, I will be talking briefly about Predictive Analytics and why it holds value from a web analytics perspective. Broadly speaking, Predictive Analytics is a set of methodologies that assist us in anticipating customer behavior. The customer behavior of interest could be anything ranging from spend, buying habits, page views, response to a

Creating Jekyll blog posts from R.

June 3, 2013
By

Adam Duncan Also avilable on R-bloggers.com Setting up a Jekyll/Jekyll Bootstrap blog site is a very worthwhile experience. Should you choose to use Jekyll as your blogging platform, you will find many resources out there describing the setup process. This post is not about getting set up using Jekyll or Jekyll Bootstrap. It’s about establishing a good workflow...

A Few Tips for Writing an R Book

June 3, 2013
By

I just finished fixing (hopefully all) the problems in the knitr book returned from the copy editor. David Smith has kindly announced this book before I do. I do not have much to say about this book: almost everything in the book can be found in the on...

Chicken or the Egg? Granger-Causality for the masses

June 2, 2013
By

When I first learned about Granger-causality this past February, I was bemused and quite skeptical of the whole procedure.  I felt it belonged on the scrapheap of impractical academic endeavors, preferring to possibly use an ARIMA transfer function model for the same task.  However, several contemporaries threw the red challenge flag and upon further review, my initial impressions have...

Cosmopolitan Public Spaces

June 2, 2013
By

In my PhD and post-doc research projects at the university, I did a lot of research on the new cosmopolitanism together with Ulrich Beck. Our main goal was to test the hypothesis of an “empirical cosmopolitanization”. Maybe the term is confusing and too abstract, but what we were looking for were quite simple examples

Facet wrapping multivariate data: reshape and ggplot

June 2, 2013
By

A common problem when trying to show data is that the attributes that you want to map for comparison are stored in multiple rather than single variables. For example, proportion of employment by type. This practical will achieve tis using … Continue reading →

Using R: drawing several regression lines with ggplot2

June 2, 2013
By

Occasionally I find myself wanting to draw several regression lines on the same plot, and of course ggplot2 has convenient facilities for this. As usual, don’t expect anything profound from this post, just a quick tip! There are several reasons we might end up with a table of  regression coefficients connecting two variables in different

Cars in Netherlands

June 2, 2013
By

I am looking for a new car. So when I saw there was an update on vehicles in Statistics Netherlands I just had to go and look at the data. So, I learned the brown is getting more popular, often the number of cars from a certain construction year is lar...

June 1, 2013
By

In my previous post (http://statcompute.wordpress.com/2013/05/25/test-drive-of-parallel-computing-with-r) on 05/25/2013, I’ve demonstrated the power of parallel computing with various R packages. However, in the real world, it is not straight-forward to utilize these powerful tools in our day-by-day computing tasks without carefully formulate the problem. In the example below, I am going to show how to use the

Mapping a Revolution

June 1, 2013
By

Twitter has become an important communications tool for political protests. While mass media are often censored during large-scale political protests, Social Media channels remain relatively open and can be used to tell the world what is happening and to mobilize support all over the world. From an analytic perspective tweets with geo information are

June 1, 2013
By

Historical Stock Data is critical for testing your investment strategies. I illustrated all my back-test examples with getSymbols function from quantmod package. For example, following is a back-test comparison for a few portfolio allocation methods: The getSymbols function, from quantmod package, downloads historical stock prices from Yahoo Fiance. I often get questions about alternative ways

Tweetanalytics – Interactively analyzing tweets from accounts of 5 universities

June 1, 2013
By

This is an attempt at learning and interactively displaying few results using twitter data using text mining. Interactivity is implemented using RStudio's shiny server. Their documentation of demo scripts came in very handy. As a non-user of twitter, I...

A map of the world by tweets

June 1, 2013
By

With geo-tagging enabled, tweets include information on the location of the user when the tweet was sent. Miguel Rios (@miguelrios) has plotted locations of billions of tweets to create maps of the world. This is pretty amazing stuff – a world map rendered just from twitter posts! Maps are created using every tweet from 2009

June 1, 2013
By

A list of interesting R/Stats quickies to keep the mind distracted: A long draft Advanced Data Analysis from an Elementary Point of View by Cosma Shalizi, in which he uses R to drive home the message. Not your average elementary point of view. Good notes by Frank Davenport on starting using R with data from

Fylopic, an R wrapper to Phylopic

June 1, 2013
By

What is PhyloPic? PhyloPic is an awesome new service - I'll let the creator, Mike Keesey, explain what it is (paraphrasing here): PhyloPic stores silhouette images of organisms, and each image is associated with taxonomic names, and stores the taxonomy of all taxa, allowing searching by taxonomic names. Anyone can submit silhouettes to PhyloPic. What is a silhouette? It's like...

Rmagic, A Handy Interface Bridging Python and R

May 31, 2013
By

Rmagic (http://ipython.org/ipython-doc/dev/config/extensions/rmagic.html) is the ipython extension that utilizes rpy2 in the back-end and provides a convenient interface accessing R from ipython. Compared with the generic use of rpy2, the rmagic extension allows users to exchange objects between ipython and R in a more flexible way and to run a single R function or a block

How logistic regression work ?

May 31, 2013
By

Discussing with a non statistician colleague, it seems that the logistic regression is not intuitive; Some basics questions like : - Why don't use the linear model? - What's logistic function? - How can we compute by hand, step by step t...

Generating Nice Looking Tree Diagrams in R

May 31, 2013
By

This function generates nice looking tree diagrams (see sample) below from tree objects (generated by package tree). This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option)

Using the rasterVis package for raster plotting (in R)

May 31, 2013
By

Here is a post discussing the possibilities of the rasterVis package: http://rpubs.com/Lionel/6374Filed under: R and Stat Tagged: R, raster

Snowfall

May 31, 2013
By

Yesterday I had a short post reminding EViews users that their package (versions 7 or 8) will access all of the cores on a multi-core machine. I've been playing around with parallel processing in R on my desktop machine at work over the last few days. It's something I've been meaning to do...

The arteries of the world, in Tweets

May 31, 2013
By

What happens when you plot billions of geotagged Tweets on a map? You can see the arteries of the world. Here's Europe: According to creator Miguel Rios (Engineering Manager, Data Visualization at Twitter), the dots on this chart represent every geotagged Tweet since 2009. The color represents number of tweets in the region, and the intensity shows where people...

Are parallel computations worth it ?

May 31, 2013
By
$n$

Yesterday, Daniel Marcelino published an interesting post on his blog, untitled Parallel Processing: When does it worth ? I was asking myself the same question for a chapter I am currently writing. And I did like his approach, so I tried, on my computer to do the same. I did use three packages to run parallel R codes, >...

Visualizing a One-Way ANOVA using D3.js

May 31, 2013
By

A while ago I was playing around with the JavaScript package D3.js, and I began with this visualization—that I never really finished—of how a one-way ANOVA is calculated. I wanted to make the visualization interactive, and I did integrate some interactive elements. For instance, if you hover over a data point it will show the residual, and its value will be highlighted in...

Regression regularization example

May 31, 2013
By

Recently I needed a simple example showing when application of regularization in regression is worthwhile. Here is the code I came up with (along with basic application of parallelization of code execution). Assume you have 60 observations and 50 expla...

accurate ABC: comments by Oliver Ratman [guest post]

May 31, 2013
By

Here are comments by Olli following my post: I think we found a general means to obtain accurate ABC in the sense of matching the posterior mean or MAP exactly, and then minimising the KL distance between the true posterior and its ABC approximation subject to this condition. The construction works on an auxiliary probability

Version control, gitbucket and SourceTree style

May 31, 2013
By

Last time I wrote about version control using Subversion (and its implementation in Eclipse). I still haven’t given up on it, but since I’m using a private repository, sharing code has been a bit tedious. I was introduced to git a while ago, but somehow decided to go for Subversion. A few days ago a

”How to draw the line” with ggplot2

May 30, 2013
By

In a recent tutorial in the eLife journal, Huang, Rattner, Liu & Nathans suggested that researchers who draw scatterplots should start providing not one but three regression lines. I quote, Plotting both regression lines gives a fuller picture of the data, and comparing their slopes provides a simple graphical assessment of the correlation coefficient. Plotting

my 1st post for the Guardian Australia

May 30, 2013
By

I’ll be contributing a piece about once a week for the Guardian Australia, under a part of the web site we’re calling The Swing. The set of graphs from my 1st effort were rendered in-line and rather low-res. Bigger, full res versions appear below; click on the in-line versions. It would be great to find