Keeping Track of an Evolving “Top N” Cutoff Threshold Value

April 26, 2015
By
Keeping Track of an Evolving “Top N” Cutoff Threshold Value

In a previous post (Charts are for Reading), I noted how it was difficult to keep track of which times in an F1 qualifying session had made the cutoff time as a qualifying session evolved. The problem can be stated as follows: in the first session, with 20 drivers competing, the 15 drivers with the

Read more »

Doing quantitative archaeology with open source software

April 25, 2015
By

(This is a guest post by Ben Marwick, originally published on ATOR blog) This short post is written for archaeologists who frequently perform common data analysis and visualization tasks in Excel, SPSS or similar commercial packages. It was motivated by my recent observations at the Society of American Archaeology meeting in San Francisco – the largest annual meeting of archaeologists in the...

Read more »

Unemployment of Europe in 2014 by NUTS 2 region

April 25, 2015
By
Unemployment of Europe in 2014 by NUTS 2 region

During the Christmas break I worked on some code to show unemployment by NUTS 2 region. At that point no 2014 data was available. When I noticed the 214 was available I dug up the code and plotted again.Data and CodeAs written, the code was made beginn...

Read more »

Random Data Sets Quickly

April 24, 2015
By
Random Data Sets Quickly

This post will discuss a recent GitHub package I’m working on, wakefield to generate random data sets. The post is broken into the following sections: Demo 1.1 Random Variable Functions 1.2 Random Data Frames 1.3 Missing Values 1.4 Default Data … Continue reading →

Read more »

Stochastic SIR Epidemiological Compartment Model

April 24, 2015
By
Stochastic SIR Epidemiological Compartment Model

Introduction This post is a simple introduction to Rcpp for disease ecologists, epidemiologists, or dynamical systems modelers - the sorts of folks who will benefit from a simple but fully-working example. My intent is to provide a complete, self-contained introduction to modeling with Rcpp. My hope is that this model can be easily modified to run any dynamical simulation that has dependence on the...

Read more »

Dynamic analysis on outliers

April 24, 2015
By
Dynamic analysis on outliers

Treating outliers Introduction Outliers are the extreme values that a variable has, depending on the model or requirement, it could be necessary to treat them, either transforming or deleting. Variable “Income” distribution This is going to be our main variable in this example, which represents customer's income in $. We can observe how there are a few cases...

Read more »

Back to basics: High quality plots using base R graphics

April 24, 2015
By

Today at the Davis R Users’ Group, Michael Koontz gave tour de force lesson in using R’s base graphics capabilities to plot data. Here’s the video: Get Michael’s excellent annotated script, which covers much more than we got to during our tutorial, here.

Read more »

Dashboards in R with Shiny & Plotly

April 24, 2015
By
Dashboards in R with Shiny & Plotly

Shiny is an R application that allows users to build interactive web applications easily in R! Shiny apps involve two main components: a ui (user interface) script and a server script. The ui script controls the layout of the app and the server script controls what the app does. In other words, the ui script creates

Read more »

Blue period: Analyzing the color of paintings with R

April 24, 2015
By
Blue period: Analyzing the color of paintings with R

While movies have been getting more orange with time, painting have been going the other direction. Paintings today are generally more blue than they were a few hundred years ago. The image above shows the color spectrum of almost 100,000 paintings created since 1800. Martin Bellander used R to create the image, by scraping images from the BBC YourPaintings...

Read more »

Introducing shinyjs: perform common JavaScript operations in Shiny apps using plain R code

April 23, 2015
By
Introducing shinyjs: perform common JavaScript operations in Shiny apps using plain R code

shinyjs is my second R package that managed to find its way past the CRAN review process. It lets you perform common useful JavaScript operations in Shiny applications without having to know any JavaScript. Demos You can check out a demo Shiny app that lets you play around with some of the functionality that shinyjs makes available, or have a look at a...

Read more »

scale acceleration

April 23, 2015
By
scale acceleration

Kate Lee pointed me to a rather surprising inefficiency in matlab, exploited in Sylvia Früwirth-Schnatter’s bayesf package: running a gamma simulation by rgamma(n,a,b) takes longer and sometimes much longer than rgamma(n,a,1)/b, the latter taking advantage of the scale nature of b. I wanted to check on my own whether or not R faced the same

Read more »

Course Profiles in ggplot2

April 23, 2015
By
Course Profiles in ggplot2

Over on my other blog, Droppin’ The Hammer… where I journal my ultra running experience, I featured a novel ggplot2 plot for highlight elevation gain and loss on running race course profiles. If you’ve never run a particular race you...

Read more »

The new science journalism and open science

April 23, 2015
By

by Joseph Rickert The New York Times is quietly changing the practice of science journalism. The Tuesday April 21, 2015 article: Ebola Lying in Wait, reports on "A growing body of scientific clues - some ambiguous, other substantive" that the Ebola virus may have lain dormant in West African rain forest for years before igniting last year's outbreak. In...

Read more »

Discount for the Open Data Science conference (@Boston / May 30th)

April 23, 2015
By
Discount for the Open Data Science conference (@Boston / May 30th)

This year’s Open Data Science conference takes place in Boston on May 30th and 31st at the Boston Convention Center. The conference features over 21 workshops and 72 presentations on the open source languages, tools, and topics around data science. R features prominently. The conference gives you the chance to hear from R contributors directly and meet them in person.  Our...

Read more »

Comrades Marathon Finish Predictions

April 23, 2015
By
Comrades Marathon Finish Predictions

* If you see a bunch of errors, you might want to try opening the page in a different browser. I have had some trouble with MathJax and Windows Explorer. There are various approaches to predicting Comrades Marathon finishing times. Lindsey Parry, for example, suggests that you use two and a half The post

Read more »

Cascading style sheets for R plots (via the Rcssplot package)

April 23, 2015
By
Cascading style sheets for R plots (via the Rcssplot package)

This post is contributed by Tomasz Konopka. Comments are welcome. [email protected] One of the great features of R is its capable graphics framework. In principle, the framework allows us to customize all aspects of the visual presentation of data. In practice, however, customization is rather tedious. For example, R’s own boxplot function has 17 custom arguments, not counting ...; stripchart has 20. Tweaking the default...

Read more »

Parallel Simulation of Heckman Selection Model

April 22, 2015
By
Parallel Simulation of Heckman Selection Model

Parallel Simulation of Heckman Selection Model One of the, if not the, fundamental problems in observational data analysis is the estimation of the value of the unobserved choice. If the (i^{text{th}}) unit chooses the value of (t) on the basis of some factors (mathbf{x_i}), which may include...

Read more »

Supplementing your R package with a Shiny app

April 22, 2015
By

The R community is generally very fond of open-source-ness and the idea of releasing all code to the public. Writing packages has become such an easy experience now that Hadley's devtools is so powerful, and as a result there are new packages being released by useRs every single day. A good package needs to have two things: useful functionality, and...

Read more »

A plot of co-authorships in my little corner of science

April 22, 2015
By

  Here’s a mostly useless visualization of the collection of journal articles that sits in my reference database in Endnote. I deal mostly in marine biology, physiology, biomechanics, and climate change papers, with a few molecular/genetics papers thrown in here and there. The database has 3325 entries, 2 of which have ambiguous publication years and

Read more »

Conjoint Analysis and the Strange World of All Possible Feature Combinations

April 22, 2015
By
Conjoint Analysis and the Strange World of All Possible Feature Combinations

The choice modeler looks over the adjacent display of cheeses and sees the joint marginal effects of the dimensions spanning the feature space: milk source, type, origin, moisture content, added mold or bacteria, aging, salting, packaging, price, and m...

Read more »

Time Series Graphs & Eleven Stunning Ways You Can Use Them

April 22, 2015
By
Time Series Graphs & Eleven Stunning Ways You Can Use Them

Many graphs use a time series, meaning they measure events over time. William Playfair (1759 - 1823) was a Scottish economist and pioneer of this approach. Playfair invented the line graph. The graph below–one of his most famous–depicts ho...

Read more »

Introducing the htmlTable-package

April 22, 2015
By
Introducing the htmlTable-package

My htmlTable-function has perhaps been one of my most successful projects. I developed it in order to get tables matching those available in top medical journals. As the function has grown I've decided to separate it from my Gmisc-package into a separate package, and at the time of writing this I've just released the 1.3 version....

Read more »

A simple explanation of rejection sampling in R

April 22, 2015
By
A simple explanation of rejection sampling in R

The central quantity in Bayesian inference, the posterior, can usually not be calculated analytically, but needs to be estimated by numerical integration, which is typically done with a Monte-Carlo algorithm. The three main algorithm classes for doing so are Rejection sampling Markov-Chain Monte Carlo (MCMC) sampling Sequential Monte Carlo (SMC) sampling I have previously given…

Read more »

Blowing Away the Competition

April 22, 2015
By
Blowing Away the Competition

In February I embarked on a mission to speed up R, and I’m very pleased with the results so far. I redesigned the internal string cache, symbol table, and environments by using a somewhat obscure data structure called an Array Hash. It’s ba...

Read more »

Microsoft hiring engineers for R projects

April 22, 2015
By

Are you a talented software engineer who would like to build out the R ecosystem and help more companies access the power of R? Microsoft (Revolution Analytics' parent) is hiring a new team to do just that: Our mission is to empower enterprises to easily and cost-effectively build high-scale analytics solutions leveraging R. Exponential growth has transformed data into...

Read more »

R’s plot function, the 1970′s retro look is not cool any more

April 22, 2015
By

Casual users of a system want to learn a few simple rules that enable them to get most things done. Many languages have a design principle of only providing one way of doing things. Members of one language family are known for providing umpteen different ways of doing something and R is no exception. R

Read more »

RcppArmadillo 0.5.000.0

April 22, 2015
By

A new major version 5.000 of Armadillo was released by Conrad a couple of days ago. Armadillo is a powerful and expressive C++ template library for linear algebra aiming towards a good balance between speed and ease of use with a syntax deliberately close to a Matlab. This version brings several new functions for sparse matrices, and automagically...

Read more »

Hash Table Performance in R: Part IV

April 21, 2015
By
Hash Table Performance in R: Part IV

In the last post I introduced the package envestigate that provides the hash table structure and interesting statistics associated with an R environment. Now I want to show you some performance characteristics of the R environment as a hash table. I&r...

Read more »

R for more powerful clustering

April 21, 2015
By
R for more powerful clustering

by Vidisha Vachharajani Freelance Statistical Consultant R showcases several useful clustering tools, but the one that seems particularly powerful is the marriage of hierarchical clustering with a visual display of its results in a heatmap. The term “heatmap” is often confusing, making most wonder – which is it? A "colorful visual representation of data in a matrix" or "a...

Read more »