Count data and GLMs: choosing among Poisson, negative binomial, and zero-inflated models

August 24, 2012
By

Ecologists commonly collect data representing counts of organisms. Generalized linear models (GLMs) provide a powerful tool for analyzing count data. The starting point for count data is a GLM with Poisson-distributed errors, but

Read more »

Commandeering a map from PDF or EPS, using Inkscape and R

August 24, 2012
By
Commandeering a map from PDF or EPS, using Inkscape and R

I love Nathan Yau’s tutorial on making choropleths from a SVG file. However, if you don’t have a SVG handy already and instead you want to repurpose a map from another vector format such as PDF or EPS, there are … Continue reading →

Read more »

Toy Example with GScholarScraper_3.1

August 24, 2012
By
Toy Example with GScholarScraper_3.1

A commentator on my blog brought up this nice idea of how to use the GScholarScraper function for bibliometrics..I altered the code a little bit which enables to set a year since when results should be returned and added a field to the output collectin...

Read more »

MPK Analytics – putting the R into analytics

August 24, 2012
By
MPK Analytics – putting the R into analytics

Welcome to the blog of MPK Analytics – the consulting and training company whose mission it is to help clients in academia, industry and government to transform their data into insight using

Read more »

Does playing baseball shorten your lifespan? (Answer: No.)

August 24, 2012
By
Does playing baseball shorten your lifespan? (Answer: No.)

A National Institute for Occupational Safety and Health study, published in March, found that professional American football (NFL) players lived longer, on average, than similar "mere mortals" in the general population. Football is a dangerous sport, so that might seem surprising at first, until you consider the fact that NFL players are elite sportsmen: only the strongest, fastest and...

Read more »

Finding the Best Subset of a GAM using Tabu Search and Visualizing It in R

August 24, 2012
By
Finding the Best Subset of a GAM using Tabu Search and Visualizing It in R

Finding the best subset of variables for a regression is a very common task in statistics and machine learning. There are statistical methods based on asymptotic normal theory that can help you decide whether to add or remove a variable at a time. The ...

Read more »

Visualizing the Arctic Sea Ice Extent Decline

August 24, 2012
By
Visualizing the Arctic Sea Ice Extent Decline

Understanding what is happening to Arctic sea ice is critical to recognizing the serious consequences of global warming. So I want to help people visualize the 30+ year trend in Arctic sea ice extent. The source data file is here: … Continue reading →

Read more »

CRAN might get tenure at Yale?

August 24, 2012
By

From one of the R lists I follow: Today (2012-08-23) on CRAN : “Currently, the CRAN package repository features 4001 available packages.” These packages are maintained by approximately 2350 different folks. Previous milestones: 2011-05-12: 3,000 packages 2009-10-04: 2,000 packages 2007-04-12: 1,000 packages 2004-10-01: 500 packages 2003-04-01: 250 packages http://cran.r-project.org/web/packages/

Read more »

Data analysis using R – course in Essex

August 24, 2012
By
Data analysis using R – course in Essex

This course is running 1-5 October at the University of Essex. There doesn’t seem to be a website but you register by writing to [email protected] Here’s what they say in their e-mail: Lecturers: Dr Werner Adler (University of Erlangen-Nuremberg; Co-author … Continue reading →

Read more »

ggplot2 Self-deprecation

August 24, 2012
By
ggplot2 Self-deprecation

I've been in China working for a few weeks (where this blog is (oddly) blocked). So, I haven't been able to post much over the summer. To kick things off for the new (academic) year, I thought I might just re-post something good I saw on the Book of Sa...

Read more »

Comparing hist() and cut() R functions

August 24, 2012
By

The other day a question about faceting data came up in the Dallas R Users group (link of conversation). The hist() function is more efficient and uses less memory than the cut() function. Additionally, hist() returns an object that makes...

Read more »

Momentum with R: Part 1

August 23, 2012
By
Momentum with R: Part 1

Time really flies… it is hard to believe that it has been over a month since my last post. Work and life in general have consumed much of my time lately and left little time for research and blog posts. Anyway, on to the post! This post will be the first in a series of … Continue reading...

Read more »

Revolution Analytics receives Top Innovator award for Data Science Technology

August 23, 2012
By
Revolution Analytics receives Top Innovator award for Data Science Technology

A big thank-you to all the R users out there who voted for Revolution R Enterprise in DataWeek Awards. We're so pleased to be recognized by the voters and the DataWeek judging panel with the Top Innovator Award for Data Science Technology. We're looking forward to the awards ceremony next week at DataWeek SF (in San Francisco, September 24-27)....

Read more »

difference between NA and NaN in R

August 23, 2012
By

We usually see NA and NaN in R. What's the difference between them?Here a good post for that topic:http://stats.stackexchange.com/questions/5686/what-is-the-difference-between-nan-and-naIn summary here:NaN ("Not a Number") means 0/0NA ("Not Available") is generally interpreted as a missing value and has various forms - NA_integer_, NA_real_, etc. Therefore, NaN ≠ NA and there is a need for NaN and NA.is.na() returns TRUE for both NA...

Read more »

Bonds Much Sharpe -r Than Buffett

August 23, 2012
By
Bonds Much Sharpe -r Than Buffett

Mebane Faber’s post Buffett’s Alpha points out Warren Buffett’s 0.76 Sharpe Ratio discussed in the similarly title paper Buffet’s Alpha.  I of course immediately think about the 8th Wonder of the World – the US Bond Market, whose Sharpe ...

Read more »

R and the web (for beginners), Part III: Scraping MPs’ expenses in detail from the web

August 23, 2012
By
R and the web (for beginners), Part III: Scraping MPs’ expenses in detail from the web

In this last post of my little series (see my latest post) on R and the web I explain how to extract data of a website (web scraping/screen scraping) with R. If the data you want to analyze are a part of a web page, for example a HTML-table (or hundreds of...

Read more »

Ebola in the Congo

August 23, 2012
By
Ebola in the Congo

Ebola has also appeared in the Democratic Republic of the Congo. The WHO report from 21 August reports 15 cases and 10 deaths, mostly in the town of Isiro. This outbreak is in no way related to the Ugandan outbreak … Continue reading →

Read more »

How robust is logistic regression?

August 23, 2012
By
How robust is logistic regression?

Logistic Regression is a popular and effective technique for modeling categorical outcomes as a function of both continuous and categorical variables. The question is: how robust is it? Or: how robust are the common implementations? (note: we are using robust in a more standard English sense of performs well for all inputs, not in the Related posts:

Read more »

London 2012 Olympics — Men and Women 400-metre medley

August 23, 2012
By
London 2012 Olympics — Men and Women 400-metre medley

Alan brought up the suspicion to Ye’s world record in women 400 metres individual medley. And I quote: “Her last split caused controversy (deep suspicion of doping) as she swam it faster than the fastest male swimmer. I wonder how commonly this … Continue reading →

Read more »

Open Research Data Processes: KMi Crunch – Hosted RStudio Analytics Environment

August 23, 2012
By
Open Research Data Processes: KMi Crunch – Hosted RStudio Analytics Environment

One of the possible barriers to widespread adoption of open notebook science is knowing where to start. Video reports of lab experiments hosted on Youtube can be easily embedded in a hosted WordPress blog; a MediaWiki wiki can be used to provide one page per experiment, with change tracking/history on each page and a shadow

Read more »

Using R for parallelizing OpenBUGS on a single Windows PC

August 22, 2012
By

It seems that most of the R-parallelizing business takes place on Linux clusters. And it makes sense. Why would you want to paralellize R on just a few processors (2 or 4) of a Windows laptop PC when the whole…Read more →

Read more »

Benchmarking random-number generation from C++

August 22, 2012
By

If you're writing C++ code and want to generate random numbers, you might not be aware that R provides an API to call the R RNG functionality directly. The Rcpp package's "syntactic sugar" feature makes this process easier, by automating the process of translating a subset of ordinary R code into compiled C++ code. That means you can write...

Read more »

The Kaggle Bug

August 22, 2012
By
The Kaggle Bug

If you have any interest in data mining and machine learning, you might have already caught the Kaggle bug.I myself fairly recently got caught up in following the various contests and forums after reading a copy of "Practical Time Series Forecasting," ...

Read more »

Web-Scraper for Google Scholar Updated!

August 22, 2012
By
Web-Scraper for Google Scholar Updated!

I have updated the Google Scholar Web-Scraper Function GScholarScaper_2 to GScholarScraper_3 (and GScholarScaper_3.1) as it was outdated due to changes in the Google Scholar html-code. The new script is more slender and faster. It returns a dataframe o...

Read more »

2014 Winter Olympics: Home Court Advantage – Russia

August 22, 2012
By
2014 Winter Olympics: Home Court Advantage – Russia

"Russia is a riddle wrapped in a mystery inside an enigma."  -- Winston Churchill, radio address in 1939 A couple of weeks ago, Graph of the Week published an article describing the significant improvement in medals won by the host...

Read more »

London 2012 Olympics — world record in women 400-metre medley

August 22, 2012
By
London 2012 Olympics — world record in women 400-metre medley

I’ve been going through the medal statistics in London 2012 Olympics recently. I was planning to present some extra charts, such as medal-per-milli-population or medal-vs-GDP. However, it’s a little boring to present the same kind of charts. Thus, I’d like to look into some particular … Continue reading →

Read more »

Did the Kigadi Ebola outbreak threaten to become an (inter)national epidemic?

August 22, 2012
By
Did the Kigadi Ebola outbreak threaten to become an (inter)national epidemic?

We want to evaluate the seriousness of the threat posed by the recent ebola outbreak in western Uganda. The outbreak appeared in Kigadi, a small village in the Kibaale district. The disease was first confirmed by the government on 28 … Continue reading →

Read more »

What you get and what you should be getting: checking numerical code

August 22, 2012
By
What you get and what you should be getting: checking numerical code

Whenever I write numerical code I spend half my time debugging my algebra, painstakingly uncovering one sign mistake after another in my calculations. Usually I have computed by hand the gradient or the integral of some nasty function, and I have to check it against a

Read more »

ggplot2 maps with insets

August 22, 2012
By
ggplot2 maps with insets

Here's a quick demo of creating a map with an inset within it using ggplot. The inset is achieved using the gridExtra package. Install libraries, set directory, read file setwd("/Users/ScottMac/Dropbox/CANPOLIN_networks_ms/data") # change ...

Read more »