## Extract values from numerous rasters in less time

May 7, 2015
These days I was working with a Shiny app for which the computation time is a big problem.Basically this app takes some coordinates, extract values from 1036 rasters for these coordinates and make some computations.  As far as I can (and please correct me if I'm wrong!) tell there are two ways of doing this task:1) load all the...

## Comparing data frames, data.table and dplyr with random walks

May 6, 2015
Arthur Charpentier was trying to solve an interesting problem with R: given this data set of random walks in the 2-D plane, what is the likely origin of a pathway that ends in the black circle below? It's pretty easy to generate random data like this with a few lines of code in R. And with 2 million trajectories...

## TidyR Challenge: Help Me Do My Job

May 6, 2015
Last week I was handed a drug prescription data set and asked to create some interesting graphics. But before I can even get to the fun part, I was faced with actually transforming the set into something that ggplot2 could read. Obviously I can’t share the data, but Tyler Rinker has created a fantastic package called wakefield that...

## useR! 2015 conference in Aalborg

May 6, 2015
The annual R conference bringing together users and developers from academia and industry is going to be held in Aalborg, Denmark, this summer, 1-3 July. The day prior to the conference 16 R tutorials are offered free of charge to the participants. The list of topics include dplyr, Tessera, Bioconductor, Grid graphics and RHadoop. In addition to our six...

## RStudio v0.99 Preview: More Editor Enhancements

May 6, 2015
We’ve blogged previously about various improvements we’ve made to the source editor in RStudio v0.99 including enhanced code completion, snippets, diagnostics, and an improved Vim mode. Besides these larger scale features we’ve made lots of smaller improvements that we also wanted to highlight. You can try out all of these features now in the RStudio

## EU Life Quality Geo Report

May 6, 2015
Living longer, living better? It's equally important to measure the longer living as well as its quality. Analyzing data from eurostat which containts the following two variables: 1- Healthy life years: Is a health expectancy indicator which com...

## corrected MCMC samplers for multivariate probit models

May 5, 2015
“Moreover, IvD point out an error in Nobile’s derivation which can alter its stationary distribution. Ironically, as we shall see, the algorithms of IvD also contain an error.”  Xiyun Jiao and David A. van Dyk arXived a paper correcting an MCMC sampler and R package MNP for the multivariate probit model, proposed by Imai and

## choroplethr v3.1.0: Better Summary Demographic Data

May 5, 2015
Today I am happy to announce that choroplethr v3.1.0 is now on CRAN. You can get it by typing the following from an R console: install.packages("choroplethr") This version adds better support for summary demographic data for each state and county in the US. The data is in two data.frames and two functions. The data.frames are:

## stringr 1.0.0

May 5, 2015
I’m very excited to announce the 1.0.0 release of the stringr package. If you haven’t heard of stringr before, it makes string manipulation easier by: Using consistent function and argument names: all functions start with str_, and the first argument is always the input string This makes stringr easier to learn and easy to use

## Data Science in HR

May 5, 2015
by Joseph Rickert Last year in a post on interesting R topics presented at the JSM I described how data scientists in Google's human resources department were using R and predictive analytics to better understand the characteristics of its workforce. Google may very well have done the pioneering work, but predictive analytics for HR applications is going mainstream. In...

## Updates to R package emdatr: More than 21000 Natural Disasters since 1900

May 5, 2015
The International Disaster Database, EMDAT database from the Center for Research on the Epidemiology of Disasters (CRED, Belgium) is often used as a reference for losses on human life and property resulting from select natural and man-made disasters....

## Predicting events, when they haven’t happened yet

May 5, 2015
Suppose you have to predict the probabilities of events which haven't happened yet. How do you do this?Here is an example from the 1950s when Longley-Cook, an actuary at an insurance company, was asked to price the risk for a mid-air collision of two p...

## Clusters May Be Categorical but Cluster Membership Is Not All-or-None

May 4, 2015
Very early in the study of statistics and R, we learn that random variables can be either categorical or continuous. Regrettably, we are forced to relearn this distinction over and over again as we debug error messages produced by our code (e.g., ...

## RcppAnnoy 0.0.6

A few days ago, Erik released a new version of his Annoy library -- a small, fast, and lightweight C++ template header library for approximate nearest neighbours -- which now no longer requires Boost. While I don't mind Boost (actually, quite the op...

## take those hats off [from R]!

May 4, 2015
This is presumably obvious to most if not all R programmers, but I became aware today of a hugely (?) delaying tactic in my R codes. I was working with Jean-Michel and Natesh and when coding an MCMC run I was telling them that I usually preferred to code

## Working with “large” datasets, with dplyr and data.table

May 4, 2015
A few months ago, I was doing some training on data science for actuaries, and I started to get interesting puzzeling questions. For instance, Fleur was working on telematic data, and she’s been challenging my (rudimentary) knowledge of R. As claimed by Donald Knuth, “we should forget about small efficiencies, say about 97% of the time: premature optimization is...

## Call R and Python from base SAS

May 4, 2015
Since 2009, it has been possible to call R from SAS programs. However, this integration requires IML, an add-on matrix-object language for SAS which isn't available with all SAS installations and is separate from the standard SAS PROC execution model. Now, engineers at SAS have shared a method of calling R, Python and other open-source tools using the Java...

## using GOSemSim to rank proteins obtained by co-IP

May 4, 2015
Co-IP is usually used to identified interactions among specific proteins. It is widely used in detecting protein complex. Unfortunately, an identified protein may not be an interactor, and sometimes can be a background contaminant. Ranking proteins can help us to focus a study on a few high quality candidates for subsequent interaction investigation. My R package GOSemSim has been...

## Geomorph beta in development (2.1.5)

May 3, 2015
Dear geomorph users,We've been busy adding some new functions to the forthcoming v.2.1.5, currently in beta stage and available on gitHub (installed using: devtools::install_github("EmSherratt/geomorph",ref = "Develop")). Users be aware that ...

## dplyr Tutorial: verbs + split-apply

May 3, 2015
At a recent Saint Louis R users meeting I had the pleasure of giving a basic introduction to the awesome dplyr R package. For me, data analysis ubiquitously involves splitting the data based on grouping variable and then applying some function to the subsets or what is termed split-apply (typically split-lapply-apply). Having personally recently incorporated

## Cohort Analysis with Heatmap

Previously I shared the data visualization approach for descriptive analysis of progress of cohorts with the “layer-cake” chart (part I and part II). In this post, I want to share another interesting visualization that not only can be used for descriptive analysis as well but would be more helpful for analyzing a large number of cohorts.... Read More »

## Introducing Radiant: A shiny interface for R

May 3, 2015
Radiant is a platform-independent browser-based interface for business analytics in R, based on the Shiny package. Key features Explore: Quickly and easily summarize, visualize, and analyze your data Cross-platform: It runs in a browser on Windows, Mac, and Linux Reproducible: Recreate results at any time and share work with others as a state file or an

## Survival Analysis With Generalized Additive Models : Part IV (the survival function)

May 2, 2015
$Survival Analysis With Generalized Additive Models : Part IV (the survival function)$

The ability of PGAMs to estimate the log-baseline hazard rate, endows them with the capability to be used as smooth alternatives to the Kaplan Meier curve. If we assume for the shake of simplicity that there are no proportional co-variates in the PGAM regression, then the quantity modeled  corresponds to the log-hazard of the  survival

## Update to Introduction to programming econometrics with R

May 2, 2015
This semester I taught a course on applied econometrics with the R programming language. For this, I created a document that I gave to my students and shared online. This is the kind of document I would have liked to read when I first started using R. I already had some programming experience in C and Pascal but this...

## Survival Analysis With Generalized Additive Models : Part III (the baseline hazard)

May 2, 2015
$Survival Analysis With Generalized Additive Models : Part III (the baseline hazard)$

In the third part of the series on survival analysis with GAMs we will review the use of the baseline hazard estimates provided by this regression model. In contrast to the Cox mode, the log-baseline hazard is estimated along with other quantities (e.g. the log hazard ratios) by the Poisson GAM (PGAM) as: In the

## Survival Analysis With Generalized Models: Part II (time discretization, hazard rate integration and calculation of hazard ratios)

May 2, 2015
$Survival Analysis With Generalized Models: Part II (time discretization, hazard rate integration and calculation of hazard ratios)$

In the second part of the series we will consider the time discretization that makes the Poisson GAM approach to survival analysis possible. Consider a set of s individual observations at times , with censoring indicators assuming the value of 0 if the corresponding observation was censored and 1 otherwise. Under the assumption of non-informative

## Rcpp 0.11.6

The new release 0.11.5 of Rcpp arrived on the CRAN network for GNU R yesterday; the corresponding Debian package has also been uploaded. Rcpp has become the most popular way of enhancing GNU R with C++ code. As of today, 373 packages on CRAN depend o...

## RcppArmadillo 0.5.100.1.0

A new minor release 5.100.1 of Armadillo was released by Conrad yesterday. Armadillo is a powerful and expressive C++ template library for linear algebra aiming towards a good balance between speed and ease of use with a syntax deliberately close to a Matlab. Our corresponding RcppArmadillo release 0.5.100.1.0 also reached CRAN and Debian yesterday. See...

## Should I use premium Diesel? Result: No

May 2, 2015
A while ago I had a post: 'Should I use premium Diesel? Setup. Since that time the data has been acquired. This post describes the results.DataData is registered by me in 2014 and 2015. 2014 has standard Diesel, while 2015 has premium. Both are fr...

