Plotting Oracle RMAN backup durations with R

June 3, 2013
By
Plotting Oracle RMAN backup durations with R

  How long does your Oracle RMAN backup take to complete?  How does this vary over time?  Are there patterns by week, week of month, or day of week?   The gist below can help you evaluate questions like these.… Read more ›

Read more »

random sudokus

June 3, 2013
By
random sudokus

In a paper arXived on Friday, Roberto Fontana relates the generation of Sudoku grids to the one of Latin squares (which is unsurprising) and to maximum cliques of a graph (more surprising). The generation of a random Latin square proceeds in three steps: generate a random Latin square L with identity permutation matrix on symbol

Read more »

How to set up a reproducible R project

June 3, 2013
By

If you're thinking about starting a project (for example, a report or paper) using the R language for analysis, the Nice R code blog has some great advice. Following the principles of reproducible research, Macquarie University postdocs Rich FitzJohn and Daniel Falster suggest: Creating a directory structure to separate R code, data, reports, and output Treating data as read-only...

Read more »

Creating a zoomable map of tweets with R

June 3, 2013
By
Creating a zoomable map of tweets with R

Languages tweeted around Germany: red, blue, green, yellow, grey are for German, French, English, Dutch and other  respectively. See here for a zoomable version.Motivated by the project twitter languages of New York I wanted to...

Read more »

Understanding the value of Predictive Analytics on Web Data

June 3, 2013
By
Understanding the value of Predictive Analytics on Web Data

In this blogpost, I will be talking briefly about Predictive Analytics and why it holds value from a web analytics perspective. Broadly speaking, Predictive Analytics is a set of methodologies that assist us in anticipating customer behavior. The customer behavior of interest could be anything ranging from spend, buying habits, page views, response to a

Read more »

Creating Jekyll blog posts from R.

June 3, 2013
By
Creating Jekyll blog posts from R.

Adam Duncan Also avilable on R-bloggers.com Setting up a Jekyll/Jekyll Bootstrap blog site is a very worthwhile experience. Should you choose to use Jekyll as your blogging platform, you will find many resources out there describing the setup process. This post is not about getting set up using Jekyll or Jekyll Bootstrap. It’s about establishing a good workflow...

Read more »

A Few Tips for Writing an R Book

June 3, 2013
By
A Few Tips for Writing an R Book

I just finished fixing (hopefully all) the problems in the knitr book returned from the copy editor. David Smith has kindly announced this book before I do. I do not have much to say about this book: almost everything in the book can be found in the on...

Read more »

Chicken or the Egg? Granger-Causality for the masses

June 2, 2013
By
Chicken or the Egg?  Granger-Causality for the masses

When I first learned about Granger-causality this past February, I was bemused and quite skeptical of the whole procedure.  I felt it belonged on the scrapheap of impractical academic endeavors, preferring to possibly use an ARIMA transfer function model for the same task.  However, several contemporaries threw the red challenge flag and upon further review, my initial impressions have...

Read more »

Win Your Fantasy Football Auction Draft: Calculate the Optimal Players to Draft with this Shiny App in R

June 2, 2013
By

In this post, I use a Shiny app in R to determine the best possible players to pick in a fantasy football auction draft.  The app takes projections from FantasyPros, a site that averages across numerous sources of projections.  Based on your ...

Read more »

Cosmopolitan Public Spaces

June 2, 2013
By
Cosmopolitan Public Spaces

In my PhD and post-doc research projects at the university, I did a lot of research on the new cosmopolitanism together with Ulrich Beck. Our main goal was to test the hypothesis of an “empirical cosmopolitanization”. Maybe the term is confusing and too abstract, but what we were looking for were quite simple examples

Read more »

Facet wrapping multivariate data: reshape and ggplot

June 2, 2013
By
Facet wrapping multivariate data: reshape and ggplot

A common problem when trying to show data is that the attributes that you want to map for comparison are stored in multiple rather than single variables. For example, proportion of employment by type. This practical will achieve tis using … Continue reading →

Read more »

Using R: drawing several regression lines with ggplot2

June 2, 2013
By
Using R: drawing several regression lines with ggplot2

Occasionally I find myself wanting to draw several regression lines on the same plot, and of course ggplot2 has convenient facilities for this. As usual, don’t expect anything profound from this post, just a quick tip! There are several reasons we might end up with a table of  regression coefficients connecting two variables in different

Read more »

Cars in Netherlands

June 2, 2013
By
Cars in Netherlands

I am looking for a new car. So when I saw there was an update on vehicles in Statistics Netherlands I just had to go and look at the data. So, I learned the brown is getting more popular, often the number of cars from a certain construction year is lar...

Read more »

Grid Search for Free Parameters with Parallel Computing

June 1, 2013
By
Grid Search for Free Parameters with Parallel Computing

In my previous post (http://statcompute.wordpress.com/2013/05/25/test-drive-of-parallel-computing-with-r) on 05/25/2013, I’ve demonstrated the power of parallel computing with various R packages. However, in the real world, it is not straight-forward to utilize these powerful tools in our day-by-day computing tasks without carefully formulate the problem. In the example below, I am going to show how to use the

Read more »

Mapping a Revolution

June 1, 2013
By
Mapping a Revolution

Twitter has become an important communications tool for political protests. While mass media are often censored during large-scale political protests, Social Media channels remain relatively open and can be used to tell the world what is happening and to mobilize support all over the world. From an analytic perspective tweets with geo information are

Read more »

Loading Historical Stock Data

June 1, 2013
By
Loading Historical Stock Data

Historical Stock Data is critical for testing your investment strategies. I illustrated all my back-test examples with getSymbols function from quantmod package. For example, following is a back-test comparison for a few portfolio allocation methods: The getSymbols function, from quantmod package, downloads historical stock prices from Yahoo Fiance. I often get questions about alternative ways

Read more »

Tweetanalytics – Interactively analyzing tweets from accounts of 5 universities

June 1, 2013
By
Tweetanalytics – Interactively analyzing tweets from accounts of 5 universities

This is an attempt at learning and interactively displaying few results using twitter data using text mining. Interactivity is implemented using RStudio's shiny server. Their documentation of demo scripts came in very handy. As a non-user of twitter, I...

Read more »

A map of the world by tweets

June 1, 2013
By
A map of the world by tweets

With geo-tagging enabled, tweets include information on the location of the user when the tweet was sent. Miguel Rios (@miguelrios) has plotted locations of billions of tweets to create maps of the world. This is pretty amazing stuff – a world map rendered just from twitter posts! Maps are created using every tweet from 2009

Read more »

Flotsam 12: early June linkathon

June 1, 2013
By

A list of interesting R/Stats quickies to keep the mind distracted: A long draft Advanced Data Analysis from an Elementary Point of View by Cosma Shalizi, in which he uses R to drive home the message. Not your average elementary point of view. Good notes by Frank Davenport on starting using R with data from

Read more »

Fylopic, an R wrapper to Phylopic

June 1, 2013
By
Fylopic, an R wrapper to Phylopic

What is PhyloPic? PhyloPic is an awesome new service - I'll let the creator, Mike Keesey, explain what it is (paraphrasing here): PhyloPic stores silhouette images of organisms, and each image is associated with taxonomic names, and stores the taxonomy of all taxa, allowing searching by taxonomic names. Anyone can submit silhouettes to PhyloPic. What is a silhouette? It's like...

Read more »

Fylopic, an R wrapper to Phylopic

June 1, 2013
By
Fylopic, an R wrapper to Phylopic

What is PhyloPic? PhyloPic is an awesome new service - I'll let the creator, Mike Keesey, explain what it is (paraphrasing here): PhyloPic stores silhouette images of organisms, and each image is associated with taxonomic names, and stores the taxonomy of all taxa, allowing searching by taxonomic names. Anyone can submit silhouettes to PhyloPic. What is a silhouette? It's like...

Read more »

Rmagic, A Handy Interface Bridging Python and R

May 31, 2013
By
Rmagic, A Handy Interface Bridging Python and R

Rmagic (http://ipython.org/ipython-doc/dev/config/extensions/rmagic.html) is the ipython extension that utilizes rpy2 in the back-end and provides a convenient interface accessing R from ipython. Compared with the generic use of rpy2, the rmagic extension allows users to exchange objects between ipython and R in a more flexible way and to run a single R function or a block

Read more »

How logistic regression work ?

May 31, 2013
By
How logistic regression work ?

Discussing with a non statistician colleague, it seems that the logistic regression is not intuitive; Some basics questions like : - Why don't use the linear model? - What's logistic function? - How can we compute by hand, step by step t...

Read more »

Generating Nice Looking Tree Diagrams in R

May 31, 2013
By
Generating Nice Looking Tree Diagrams in R

This function generates nice looking tree diagrams (see sample) below from tree objects (generated by package tree). This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option)

Read more »

Using the rasterVis package for raster plotting (in R)

May 31, 2013
By
Using the rasterVis package for raster plotting (in R)

Here is a post discussing the possibilities of the rasterVis package: http://rpubs.com/Lionel/6374Filed under: R and Stat Tagged: R, raster

Read more »

Snowfall

May 31, 2013
By
Snowfall

Yesterday I had a short post reminding EViews users that their package (versions 7 or 8) will access all of the cores on a multi-core machine. I've been playing around with parallel processing in R on my desktop machine at work over the last few days. It's something I've been meaning to do...

Read more »

The arteries of the world, in Tweets

May 31, 2013
By
The arteries of the world, in Tweets

What happens when you plot billions of geotagged Tweets on a map? You can see the arteries of the world. Here's Europe: According to creator Miguel Rios (Engineering Manager, Data Visualization at Twitter), the dots on this chart represent every geotagged Tweet since 2009. The color represents number of tweets in the region, and the intensity shows where people...

Read more »

Are parallel computations worth it ?

May 31, 2013
By
Are parallel computations worth it ?

Yesterday, Daniel Marcelino published an interesting post on his blog, untitled Parallel Processing: When does it worth ? I was asking myself the same question for a chapter I am currently writing. And I did like his approach, so I tried, on my computer to do the same. I did use three packages to run parallel R codes, >...

Read more »

Visualizing a One-Way ANOVA using D3.js

May 31, 2013
By

A while ago I was playing around with the JavaScript package D3.js, and I began with this visualization—that I never really finished—of how a one-way ANOVA is calculated. I wanted to make the visualization interactive, and I did integrate some interactive elements. For instance, if you hover over a data point it will show the residual, and its value will be highlighted in...

Read more »

Sponsors