Monthly Archives: June 2013

Exploratory Data Analysis: 2 Ways of Plotting Empirical Cumulative Distribution Functions in R

Exploratory Data Analysis: 2 Ways of Plotting Empirical Cumulative Distribution Functions in R

Introduction Continuing my recent series on exploratory data analysis (EDA), and following up on the last post on the conceptual foundations of empirical cumulative distribution functions (CDFs), this post shows how to plot them in R.  (Previous posts in this series on EDA include descriptive statistics, box plots, kernel density estimation, and violin plots.) I

Read more »

Predicting spatial locations using point processes

June 25, 2013
By
Predicting spatial locations using point processes

I’ve uploaded a draft tutorial on some aspects of prediction using point processes. I wrote it using R-Markdown, so there’s bits of R code for readers to play with. It’s hosted on Rpubs, which turns out to be a great deal more convenient than WordPress for that sort of thing.

Read more »

-omics in 2013

June 24, 2013
By
-omics in 2013

Just how many (bad) -omics are there anyway? Let’s find out. 1. Get the raw data It would be nice if we could search PubMed for titles containing all -omics: However, we cannot since leading wildcards don’t work in PubMed search. So let’s just grab all articles from 2013: and save them in a format

Read more »

Visualising Crime Hotspots in England and Wales using {ggmap}

June 24, 2013
By
Visualising Crime Hotspots in England and Wales using {ggmap}

Two weeks ago, I was looking for ways to make pretty maps for my own research project. A quick search led me to some very informative blog posts by Kim Gilbert, David Smith and Max Marchi. Eventually, I Google'd the excellent crime weather map exa...

Read more »

Comparing the speed of pqR with R-2.15.0 and R-3.0.1

June 24, 2013
By
Comparing the speed of pqR with R-2.15.0 and R-3.0.1

As part of developing pqR, I wrote a suite of speed tests for R. Some of these tests were used to show how pqR speeds up simple real programs in my post announcing pqR, and to show the speed-up obtained with helper threads in pqR on systems with multiple processor cores. However, most tests in

Read more »

Exploratory Data Analysis: Conceptual Foundations of Empirical Cumulative Distribution Functions

Exploratory Data Analysis: Conceptual Foundations of Empirical Cumulative Distribution Functions

Introduction Continuing my recent series on exploratory data analysis (EDA), this post focuses on the conceptual foundations of empirical cumulative distribution functions (CDFs); in a separate post, I will show how to plot them in R.  (Previous posts in this series include descriptive statistics, box plots, kernel density estimation, and violin plots.) To give you

Read more »

Merging Data — SAS, R, and Python

June 24, 2013
By
Merging Data — SAS, R, and Python

On analyticbridge, the question was posed about moving an inner join from Excel (which was taking many minutes via VLOOKUP()) to some other package.  The question asked what types of performance can be expected in other systems.  Of the list ...

Read more »

Rcpp 0.10.4

June 24, 2013
By

A new version of Rcpp is now on the CRAN network for GNU R; binaries for Debian have been uploaded as well. This release brings a fairly large number of fixes and improvements across a number of Rcpp features, see below for the detailed list. We a...

Read more »

A beer recommendation system made with R

June 24, 2013
By
A beer recommendation system made with R

If you know a beer you like and want some recommendations for a style of beer to try, check out the yhat Beer Recommender: This neat little app is the product of a recommendation system built using the R language by the folks behind the yhat blog. It's based on about 1.5 million beer reviews from the Beer Advocate....

Read more »

My Stat Bytes talk, with slides and code

June 24, 2013
By

On Thursday of last week I gave a short informal talk to Stat Bytes, the CMU Statistics department‘s twice a month computing seminar. Quick tricks for faster R code: Profiling to Parallelism Abstract: I will present a grab bag of … Continue reading →

Read more »