## Updates to R package emdatr: More than 21000 Natural Disasters since 1900

May 5, 2015
By

The International Disaster Database, EMDAT database from the Center for Research on the Epidemiology of Disasters (CRED, Belgium) is often used as a reference for losses on human life and property resulting from select natural and man-made disasters....

## Predicting events, when they haven’t happened yet

May 5, 2015
By

Suppose you have to predict the probabilities of events which haven't happened yet. How do you do this?Here is an example from the 1950s when Longley-Cook, an actuary at an insurance company, was asked to price the risk for a mid-air collision of two p...

## Clusters May Be Categorical but Cluster Membership Is Not All-or-None

May 4, 2015
By

Very early in the study of statistics and R, we learn that random variables can be either categorical or continuous. Regrettably, we are forced to relearn this distinction over and over again as we debug error messages produced by our code (e.g., ...

## RcppAnnoy 0.0.6

A few days ago, Erik released a new version of his Annoy library -- a small, fast, and lightweight C++ template header library for approximate nearest neighbours -- which now no longer requires Boost. While I don't mind Boost (actually, quite the op...

## take those hats off [from R]!

May 4, 2015
By

This is presumably obvious to most if not all R programmers, but I became aware today of a hugely (?) delaying tactic in my R codes. I was working with Jean-Michel and Natesh and when coding an MCMC run I was telling them that I usually preferred to code

## Working with “large” datasets, with dplyr and data.table

May 4, 2015
By

A few months ago, I was doing some training on data science for actuaries, and I started to get interesting puzzeling questions. For instance, Fleur was working on telematic data, and she’s been challenging my (rudimentary) knowledge of R. As claimed by Donald Knuth, “we should forget about small efficiencies, say about 97% of the time: premature optimization is...

## Call R and Python from base SAS

May 4, 2015
By

Since 2009, it has been possible to call R from SAS programs. However, this integration requires IML, an add-on matrix-object language for SAS which isn't available with all SAS installations and is separate from the standard SAS PROC execution model. Now, engineers at SAS have shared a method of calling R, Python and other open-source tools using the Java...

## using GOSemSim to rank proteins obtained by co-IP

May 4, 2015
By

Co-IP is usually used to identified interactions among specific proteins. It is widely used in detecting protein complex. Unfortunately, an identified protein may not be an interactor, and sometimes can be a background contaminant. Ranking proteins can help us to focus a study on a few high quality candidates for subsequent interaction investigation. My R package GOSemSim has been...

## Geomorph beta in development (2.1.5)

May 3, 2015
By

Dear geomorph users,We've been busy adding some new functions to the forthcoming v.2.1.5, currently in beta stage and available on gitHub (installed using: devtools::install_github("EmSherratt/geomorph",ref = "Develop")). Users be aware that ...

## dplyr Tutorial: verbs + split-apply

May 3, 2015
By

At a recent Saint Louis R users meeting I had the pleasure of giving a basic introduction to the awesome dplyr R package. For me, data analysis ubiquitously involves splitting the data based on grouping variable and then applying some function to the subsets or what is termed split-apply (typically split-lapply-apply). Having personally recently incorporated

## Cohort Analysis with Heatmap

Previously I shared the data visualization approach for descriptive analysis of progress of cohorts with the “layer-cake” chart (part I and part II). In this post, I want to share another interesting visualization that not only can be used for descriptive analysis as well but would be more helpful for analyzing a large number of cohorts.... Read More »

## Introducing Radiant: A shiny interface for R

May 3, 2015
By

Radiant is a platform-independent browser-based interface for business analytics in R, based on the Shiny package. Key features Explore: Quickly and easily summarize, visualize, and analyze your data Cross-platform: It runs in a browser on Windows, Mac, and Linux Reproducible: Recreate results at any time and share work with others as a state file or an

## Survival Analysis With Generalized Additive Models : Part IV (the survival function)

May 2, 2015
By
$Survival Analysis With Generalized Additive Models : Part IV (the survival function)$

The ability of PGAMs to estimate the log-baseline hazard rate, endows them with the capability to be used as smooth alternatives to the Kaplan Meier curve. If we assume for the shake of simplicity that there are no proportional co-variates in the PGAM regression, then the quantity modeled  corresponds to the log-hazard of the  survival

## Update to Introduction to programming econometrics with R

May 2, 2015
By

This semester I taught a course on applied econometrics with the R programming language. For this, I created a document that I gave to my students and shared online. This is the kind of document I would have liked to read when I first started using R. I already had some programming experience in C and Pascal but this...

## Survival Analysis With Generalized Additive Models : Part III (the baseline hazard)

May 2, 2015
By
$Survival Analysis With Generalized Additive Models : Part III (the baseline hazard)$

In the third part of the series on survival analysis with GAMs we will review the use of the baseline hazard estimates provided by this regression model. In contrast to the Cox mode, the log-baseline hazard is estimated along with other quantities (e.g. the log hazard ratios) by the Poisson GAM (PGAM) as: In the

## Survival Analysis With Generalized Models: Part II (time discretization, hazard rate integration and calculation of hazard ratios)

May 2, 2015
By
$Survival Analysis With Generalized Models: Part II (time discretization, hazard rate integration and calculation of hazard ratios)$

In the second part of the series we will consider the time discretization that makes the Poisson GAM approach to survival analysis possible. Consider a set of s individual observations at times , with censoring indicators assuming the value of 0 if the corresponding observation was censored and 1 otherwise. Under the assumption of non-informative

## Rcpp 0.11.6

The new release 0.11.5 of Rcpp arrived on the CRAN network for GNU R yesterday; the corresponding Debian package has also been uploaded. Rcpp has become the most popular way of enhancing GNU R with C++ code. As of today, 373 packages on CRAN depend o...

A new minor release 5.100.1 of Armadillo was released by Conrad yesterday. Armadillo is a powerful and expressive C++ template library for linear algebra aiming towards a good balance between speed and ease of use with a syntax deliberately close to a Matlab. Our corresponding RcppArmadillo release 0.5.100.1.0 also reached CRAN and Debian yesterday. See...

## Should I use premium Diesel? Result: No

May 2, 2015
By

A while ago I had a post: 'Should I use premium Diesel? Setup. Since that time the data has been acquired. This post describes the results.DataData is registered by me in 2014 and 2015. 2014 has standard Diesel, while 2015 has premium. Both are fr...

## Introducing Radiant: A shiny interface for R

May 1, 2015
By

Radiant is a platform-independent browser-based interface for business analytics in R, based on the Shiny package. Key features Explore: Quickly and easily summarize, visualize, and analyze your data Cross-platform: It runs in a browser on Windows, Mac, and Linux Reproducible: Recreate results at any time and share work with others as a state file or an

## Revolution R Open 8.0.3 now available

May 1, 2015
By

Revolution R Open 8.0.3 is now available for download for Windows, OS X, Red Hat, Ubuntu and OpenSUSE. This release includes seveal new features: it upgrades RRO to the R 3.1.3 engine, which adds several new features to the R language, adds support for Ubuntu 15.04, and updates the checkpoint package for reproducibility. RRO is designed to work with...

## RStudio v0.99 Preview: Graphviz and DiagrammeR

May 1, 2015
By

Soon after the announcement of htmlwidgets, Rich Iannone released the DiagrammeR package, which makes it easy to generate graph and flowchart diagrams using text in a Markdown-like syntax. The package is very flexible and powerful, and includes: Rendering of Graphviz graph visualizations (via viz.js) Creating diagrams and flowcharts using mermaid.js Facilities for mapping R objects into graphs, diagrams, and flowcharts.

## Survival Analysis With Generalized Additive Models : Part I (background and rationale)

May 1, 2015
By

After a really long break, I’d will resume my blogging activity. It is actually a full circle for me, since one of the first posts that kick started this blog, matured enough to be published in a peer-reviewed journal last week. In the next few posts I will use the R code included to demonstrate the

## Shiny: Officer Involved Shootings

May 1, 2015
By

US Officer Involved Shootings Mar-Apr 2015 with Shiny Now everyone can be a data analyst with RStudio’s Shiny package. Fellow R programmer and Las Vegas import, Steve Wells, has created a R-markdown report that shows off some of the features of this dynamic framework.  Using data derived from the Gun Violence Archive and Google maps, interested users can manipulate this data using...

## rstanmulticore: A cross-platform R package to automatically run RStan MCMC chains in parallel

May 1, 2015
By

*** This work has been supported by a grant from the Spencer Foundation (#201400002). The views expressed are those of the author and do not necessarily reflect those of the Spencer Foundation. *** It seems that the heir to WinBUGS is Stan. With Stan, reasonably complex Bayesian models can be expressed in a compact way

## How large vectors in R might be stored compactly

April 30, 2015
By

Vectors in R can currently have elements of two sizes — 8-byte double-precision floating-point elements for `numeric’ vectors, or 4-byte elements for `integer’ or `logical’ vectors.  You can also have vectors whose elements are 1-byte `raw’ values, but these raw vectors don’t support negative numbers, or NA values, so they aren’t suitable for general use. It seems that lots of

## Upcoming talks about jsonlite and mongolite

April 30, 2015
By

This summer I will be giving an invited talk at the annual French R Meeting in Grenoble as well as a shorter talk at UseR 2015 in Aalborg. The presentations will feature some recent R packages in the json/web space (curl, jsonlite, mongolite, V...

## Dockerizing a Shiny App

April 30, 2015
By

After a long pause of more than four months, I am finally back to post here. Unfortunately, many commitments prevented me keep posting, but coming back, i changed the deployment (now this blog runs entirely within a docker container with some other cool things I intend to post more forward) and wrote this post. 1. The post