## Data types part 4: Logical class

November 30, 2012
First, an update:  A commentator has asked me to post my code so that it is easier to practice the examples I show here.  It will take me a little bit of time to get all of my code for past posts well-documented and readable, but I have uploa...

## Finding a bright object

November 30, 2012
Finally, to return to the challenge I laid out in the first of this series on image manipulation in R: can we do anything as cool in R as can be done in Mathematica? Like, for example, this illustration of how to search images of the surface of Mars...

## edply: combining plyr and expand.grid

November 30, 2012
Here’s a code snippet I thought I’d share. Very often I find myself checking the output of a function f(a,b) for a lot of different values of a and b, which I then need to plot somehow. An example: here’s a function that computes the value of a sinusoidal function on a grid of points,

## Another Way to Access R from Python – PypeR

November 29, 2012
Different from RPy2, PypeR provides another simple way to access R from Python through pipes (http://www.jstatsoft.org/v35/c02/paper). This handy feature enables data analysts to do the data munging with python and the statistical analysis with R by passing objects interactively between two computing systems. Below is a simple demonstration on how to call R within Python

## Earthquakes Over the Past 7 Days

November 29, 2012
This is a brief example using the maps in R and to highlight a source of data.  This is real-time data and it comes from the U.S. Geological Society.  This shows the location of earthquakes with magnitude of at least 1.0 in the lower 48 states. library(maps) library(maptools) library(rgdal) eq = read.table(file="http://earthquake.usgs.gov/earthquakes/catalogs/eqs7day-M1.txt", fill=TRUE, sep=",", header=T) plot.new()

## 2012-11 Generating Animation Sequence Descriptions

November 29, 2012
This report describes the animaker package for generating descriptions of animation sequences. An animation sequence is composed by combining atomic animations in series to create sequence animations or in parallel to create track animations. Functions are provided for manipulating animation … Continue reading →

## The tools in an R package developer’s toolbox

November 29, 2012
Yihui Xie is the creator of several popular R packages, including knitr, animation and cranvas. In an interview with The Setup, he shares some of the software and hardware he uses in his data-to-day work, including (of course) R: For programming and data analysis, I primarily use R since I'm a statistician. I have created a bunch of R...

## Shiny is the new Cool

November 29, 2012
Several of you will probably have tried out the new Shiny package brought to the table by the RStudio guys This is just what I have been looking for and to my mind could provide a quantum leap in the use of R. There have been other packages addressing the need for web user interactivity

## Sorting Within Lattice Graphics in R

November 29, 2012
DefaultBy default, lattice sorts the observations by the axis values, starting at the bottom left.For example,library(lattice)colors = c("#1B9E77", "#D95F02", "#7570B3")dotplot(rownames(mtcars) ~ mpg, data = mtcars, col = colors, pch = 1)produc...

## bigglm on your big data set in open source R, it just works – similar as in SAS

In a recent post by Revolution Analytics (link & link) in which Revolution was benchmarking their closed source generalized linear model approach with SAS, Hadoop and open source R, they seemed to be pointing out that there is no 'easy' R open source solution which exists for building a poisson regression model on large datasets.    This post is about...

## RStudio and Rcpp

November 29, 2012
Earlier this month a new version of the Rcpp package by Dirk Eddelbuettel and Romain François  was released to CRAN and today we’re excited to announce a new version of RStudio that integrates tightly with Rcpp. First though more about some exciting new features in Rcpp 0.10.1. This release includes Rcpp attributes, which are simple annotations that you add

## Save R objects, and other stuff

November 29, 2012
Yesterday, Christopher asked me how to store an R object, in order to save some time, when working on the project. First, download the csv file for searches related to some keyword, via http://www.google.com/trends/, for instance “sunglasses“. Recall that csv files store tabular data (numbers and text) in plain-text form, with comma-separated values (where csv term comes from). Even...

## Confident package releases in R with crant

November 29, 2012
I recently released the new lambda.r package on CRAN for functional programming. This was my first new package in quite some …Continue reading »

## Hadley’s guide to high-performance R with Rcpp

November 28, 2012
Hadley Wickham has written a comprehensive tutorial for the Rcpp package, which makes it easy to create C++ code embedded in R programs. Hadley explains why you might want to do this in the introduction: Sometimes R code just isn't fast enough - you've used profiling to find the bottleneck, but there's simply no way to make the code...

## Hurricane Sandy Land Wind Speed and Kriging

November 28, 2012
NJ Hurricane Sandy Landfall Data These data come from the National Climatic Data Center (NCDC).  Using the above link will download all of the data collected by the NCDC on the day of Hurricane Sandy.  The data can also be obtained directly from the source at http://cdo.ncdc.noaa.gov/qclcd/QCLCD. The purpose of this post is not a discussion

## So, What Are You? ..A Plant? ..An Animal? — Nope, I’m a Fungus!

November 28, 2012
Lately I had a list of about 1000 species names and I wanted to filter out only the plants as that is where I come from. I knew that Scott Chamberlain has put together the ritis package which obviously can do such things. However, I knew of ITIS before and was keen...

## Picking Lotto Numbers

November 28, 2012
There's not a lot you can do to increase your odds of winning the lottery tonight. With the PowerBall at \$500 million though, a lot of otherwise rational folks might be tempted into playing. For those of you newly tempted, it is important to remember a...

## Distribution of uptimes for high-performance computing systems

November 28, 2012
Computers break down every now and again and this is a serious problem when an application needs runs on thousands of individual computers (nodes) plugged together; lots more hardware creates lots more opportunity for a failure that renders any subsequent calculations by working nodes possible wrong. The solution is checkpointing; saving the state of each

## Images as Voronoi tesselations

November 28, 2012
This is probably the coolest-looking thing I’ve figured out how to do with raster images in R. Similar to (although not quite as impressive as) these images by Jeff Clark, I alter the simple k-means approach described in the previous post to...

## What Time Is It?

November 28, 2012
$What Time Is It?$

A common scenario that I run into is time and how to deal with it. I often will do a  variety of summaries and analysis that need to be measured at different points in time. Whether I want to graph the data or review the results I need to be able to perform measurements relative

## Quick Shiny Demo – Exploring NHS Winter Sit Rep Data

November 28, 2012
Having spent a chink of the weekend and a piece of yesterday trying to pull NHS Winter sitrep data into some sort of shape in Scraperwiki, (described, in part, here: When Machine Readable Data Still Causes “Issues” – Wrangling Dates…), I couldn’t but help myself last night and had a quick go at using RStudio’s

## Why [Not] Simulate?

November 27, 2012
Since we are in the “big data” Era, which means that a massive amount of data are made daily available by governments, institutions, and ordinary people like you and me—the importance of simulate data for “frequentists”, like myself, seems to fade. After all, once we have access to “real data”, why bother with non-real data,

## I give up, I am embracing pie charts

November 27, 2012
Most statisticians know that pie charts are a terrible way to plot percentages. You can find explanations here, here, and here as well as the R help file for the pie function which states: Pie charts are a very bad … Continue reading →

## Using R in the Human Resources department

November 27, 2012
Ajay Ohri, author of R for Business Analytics, was recently interviewed about using R instead of Excel for human resources management. Human Resources is an increasingly analytics-driven field, with predictive modeling now used to prioritize applicants for interview and to organize workspaces, to give just two examples. As Ajay points out, R gets over the limitations of spreadsheets: There...

## Rcpp 0.10.1

November 27, 2012
A the new Rcpp release 0.10.1 arrived this morning on CRAN (as already has Windows binaries) and in Debian. This is a follow-up to the recent 0.10.0 release which extends the exciting new Rcpp-attributes and Rcpp-sugar work further, and as in a n...

## MCMC convergence assessment

November 27, 2012
Richard Everitt tweetted yesterday about a recent publication in JCGS by Rajib Paul, Steve MacEachern and Mark Berliner on convergence assessment via stratification. (The paper is free-access.) Since this is another clear interest of mine’s, I had a look at the paper in the train to Besançon. (And wrote this post as a result.) The

## Spatial Data visualization with R

November 27, 2012
I have published the first version of the code and main figures of the “Spatial Data” chapter of the forthcoming …Continuar leyendo »

## OpenScoring: Open Source Scoring of PMML Models via REST

November 27, 2012
The other day I stumbled accross an amazing PMML model API called jpmml.  It's written in Java and supports PMML 4.1 (and older).  Neural networks, random forests, regression and trees PMML models can be consumed and used for scoring.I decide...

## How to: network animation with R and the iGraph package & Meaning in data viz

November 27, 2012
This article lists the steps I take to create a network animation in R, provides some example source code that you can copy and modify for your own work, and starts a discussion about programming and visualization as an interpretive approach in research. Before I start, take a look at this network animation created with