## Plotting Oracle RMAN backup durations with R

June 3, 2013
By

How long does your Oracle RMAN backup take to complete?  How does this vary over time?  Are there patterns by week, week of month, or day of week?   The gist below can help you evaluate questions like these.… Read more ›

## random sudokus

June 3, 2013
By

In a paper arXived on Friday, Roberto Fontana relates the generation of Sudoku grids to the one of Latin squares (which is unsurprising) and to maximum cliques of a graph (more surprising). The generation of a random Latin square proceeds in three steps: generate a random Latin square L with identity permutation matrix on symbol

## How to set up a reproducible R project

June 3, 2013
By

If you're thinking about starting a project (for example, a report or paper) using the R language for analysis, the Nice R code blog has some great advice. Following the principles of reproducible research, Macquarie University postdocs Rich FitzJohn and Daniel Falster suggest: Creating a directory structure to separate R code, data, reports, and output Treating data as read-only...

## Creating a zoomable map of tweets with R

June 3, 2013
By

Languages tweeted around Germany: red, blue, green, yellow, grey are for German, French, English, Dutch and other  respectively. See here for a zoomable version.Motivated by the project twitter languages of New York I wanted to...

## Understanding the value of Predictive Analytics on Web Data

June 3, 2013
By

In this blogpost, I will be talking briefly about Predictive Analytics and why it holds value from a web analytics perspective. Broadly speaking, Predictive Analytics is a set of methodologies that assist us in anticipating customer behavior. The customer behavior of interest could be anything ranging from spend, buying habits, page views, response to a

## Creating Jekyll blog posts from R.

June 3, 2013
By

Adam Duncan Also avilable on R-bloggers.com Setting up a Jekyll/Jekyll Bootstrap blog site is a very worthwhile experience. Should you choose to use Jekyll as your blogging platform, you will find many resources out there describing the setup process. This post is not about getting set up using Jekyll or Jekyll Bootstrap. It’s about establishing a good workflow...

## A Few Tips for Writing an R Book

June 3, 2013
By

I just finished fixing (hopefully all) the problems in the knitr book returned from the copy editor. David Smith has kindly announced this book before I do. I do not have much to say about this book: almost everything in the book can be found in the on...

## Chicken or the Egg? Granger-Causality for the masses

June 2, 2013
By

When I first learned about Granger-causality this past February, I was bemused and quite skeptical of the whole procedure.  I felt it belonged on the scrapheap of impractical academic endeavors, preferring to possibly use an ARIMA transfer function model for the same task.  However, several contemporaries threw the red challenge flag and upon further review, my initial impressions have...

## Win Your Fantasy Football Auction Draft: Calculate the Optimal Players to Draft with this Shiny App in R

June 2, 2013
By

In this post, I use a Shiny app in R to determine the best possible players to pick in a fantasy football auction draft.  The app takes projections from FantasyPros, a site that averages across numerous sources of projections.  Based on your ...

## Cosmopolitan Public Spaces

June 2, 2013
By

In my PhD and post-doc research projects at the university, I did a lot of research on the new cosmopolitanism together with Ulrich Beck. Our main goal was to test the hypothesis of an “empirical cosmopolitanization”. Maybe the term is confusing and too abstract, but what we were looking for were quite simple examples

## Facet wrapping multivariate data: reshape and ggplot

June 2, 2013
By

A common problem when trying to show data is that the attributes that you want to map for comparison are stored in multiple rather than single variables. For example, proportion of employment by type. This practical will achieve tis using … Continue reading →

## Using R: drawing several regression lines with ggplot2

June 2, 2013
By

Occasionally I find myself wanting to draw several regression lines on the same plot, and of course ggplot2 has convenient facilities for this. As usual, don’t expect anything profound from this post, just a quick tip! There are several reasons we might end up with a table of  regression coefficients connecting two variables in different

## Cars in Netherlands

June 2, 2013
By

I am looking for a new car. So when I saw there was an update on vehicles in Statistics Netherlands I just had to go and look at the data. So, I learned the brown is getting more popular, often the number of cars from a certain construction year is lar...

June 1, 2013
By

In my previous post (http://statcompute.wordpress.com/2013/05/25/test-drive-of-parallel-computing-with-r) on 05/25/2013, I’ve demonstrated the power of parallel computing with various R packages. However, in the real world, it is not straight-forward to utilize these powerful tools in our day-by-day computing tasks without carefully formulate the problem. In the example below, I am going to show how to use the

## Mapping a Revolution

June 1, 2013
By

Twitter has become an important communications tool for political protests. While mass media are often censored during large-scale political protests, Social Media channels remain relatively open and can be used to tell the world what is happening and to mobilize support all over the world. From an analytic perspective tweets with geo information are

June 1, 2013
By

Historical Stock Data is critical for testing your investment strategies. I illustrated all my back-test examples with getSymbols function from quantmod package. For example, following is a back-test comparison for a few portfolio allocation methods: The getSymbols function, from quantmod package, downloads historical stock prices from Yahoo Fiance. I often get questions about alternative ways

## Tweetanalytics – Interactively analyzing tweets from accounts of 5 universities

June 1, 2013
By

This is an attempt at learning and interactively displaying few results using twitter data using text mining. Interactivity is implemented using RStudio's shiny server. Their documentation of demo scripts came in very handy. As a non-user of twitter, I...

## A map of the world by tweets

June 1, 2013
By

With geo-tagging enabled, tweets include information on the location of the user when the tweet was sent. Miguel Rios (@miguelrios) has plotted locations of billions of tweets to create maps of the world. This is pretty amazing stuff – a world map rendered just from twitter posts! Maps are created using every tweet from 2009

## Flotsam 12: early June linkathon

June 1, 2013
By

A list of interesting R/Stats quickies to keep the mind distracted: A long draft Advanced Data Analysis from an Elementary Point of View by Cosma Shalizi, in which he uses R to drive home the message. Not your average elementary point of view. Good notes by Frank Davenport on starting using R with data from

## Fylopic, an R wrapper to Phylopic

June 1, 2013
By

What is PhyloPic? PhyloPic is an awesome new service - I'll let the creator, Mike Keesey, explain what it is (paraphrasing here): PhyloPic stores silhouette images of organisms, and each image is associated with taxonomic names, and stores the taxonomy of all taxa, allowing searching by taxonomic names. Anyone can submit silhouettes to PhyloPic. What is a silhouette? It's like...

## Fylopic, an R wrapper to Phylopic

June 1, 2013
By

What is PhyloPic? PhyloPic is an awesome new service - I'll let the creator, Mike Keesey, explain what it is (paraphrasing here): PhyloPic stores silhouette images of organisms, and each image is associated with taxonomic names, and stores the taxonomy of all taxa, allowing searching by taxonomic names. Anyone can submit silhouettes to PhyloPic. What is a silhouette? It's like...

## Rmagic, A Handy Interface Bridging Python and R

May 31, 2013
By

Rmagic (http://ipython.org/ipython-doc/dev/config/extensions/rmagic.html) is the ipython extension that utilizes rpy2 in the back-end and provides a convenient interface accessing R from ipython. Compared with the generic use of rpy2, the rmagic extension allows users to exchange objects between ipython and R in a more flexible way and to run a single R function or a block

## How logistic regression work ?

May 31, 2013
By

Discussing with a non statistician colleague, it seems that the logistic regression is not intuitive; Some basics questions like : - Why don't use the linear model? - What's logistic function? - How can we compute by hand, step by step t...

## Generating Nice Looking Tree Diagrams in R

May 31, 2013
By

This function generates nice looking tree diagrams (see sample) below from tree objects (generated by package tree). This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option)

## Using the rasterVis package for raster plotting (in R)

May 31, 2013
By

Here is a post discussing the possibilities of the rasterVis package: http://rpubs.com/Lionel/6374Filed under: R and Stat Tagged: R, raster

## Snowfall

May 31, 2013
By

Yesterday I had a short post reminding EViews users that their package (versions 7 or 8) will access all of the cores on a multi-core machine. I've been playing around with parallel processing in R on my desktop machine at work over the last few days. It's something I've been meaning to do...

## The arteries of the world, in Tweets

May 31, 2013
By

What happens when you plot billions of geotagged Tweets on a map? You can see the arteries of the world. Here's Europe: According to creator Miguel Rios (Engineering Manager, Data Visualization at Twitter), the dots on this chart represent every geotagged Tweet since 2009. The color represents number of tweets in the region, and the intensity shows where people...

## Are parallel computations worth it ?

May 31, 2013
By
$n$

Yesterday, Daniel Marcelino published an interesting post on his blog, untitled Parallel Processing: When does it worth ? I was asking myself the same question for a chapter I am currently writing. And I did like his approach, so I tried, on my computer to do the same. I did use three packages to run parallel R codes, >...