I never got around to polishing my Small Area Estimation (SAE) “101” tutorial materials that I promised a while ago. So here they are, though still unedited and not as clean / self-explanatory as I’d like. The slides introduce a … Continue reading →

A new version of DeployR, the server-based framework that provides simple and secure R integration for application developers, is now available. (If you're new to DeployR, take a look at the DeployR Overview or download the white paper, Using DeployR to Solve the R Integration Problem.) This update brings several new features, including: New R Session Process Controls, which...

A friend came to me with a question. The original question was a little complicated, but in essence it could be explained in terms of the familiar urn problem. So, here's the problem: you have an urn with 50 white balls and 9 black balls. The black balls are individually numbered. Balls are drawn from The post

Why an R Tutorial on Reading and Importing Excel Files into R is necessary As most of you know, Excel is a spreadsheet application developed by Microsoft. It is an easy accessible tool for organizing, analyzing and storing data in tables and has a widespread use in many different application fields all over the world. The post

Introduction The latest version (4.9.0) of the PROJ.4 library is being incorporated into the development version of the oce R package. The work is not finalized yet, but I thought it might be useful to share an early version of the test suite, so people could get an idea of the upcoming capabilities. Note that some projections work quite poorly in oce at the...

by Joseph Rickert I have written a several posts about the Parallel External Memory Algorithms (PEMAs) in Revolution Analytics’ RevoScaleR package, most recently about rxBTrees(), but I haven’t said much about rxExec(). rxExec() is not itself a PEMA, but it can be used to write parallel algorithms. Pre-built PEMAs such as rxBTrees(), rxLinMod(), etc are inherently parallel algorithms designed...

Yesterday, I delivered a talk on "Interpretation of Results of Clinical Research" in Annual Alumni Meet of Hematology Department of All India Institute of Medical Sciences, New Delhi, India. Here is the link for the same. https://github.com/sumprain/bl...

Within geomorph are several functions that perform analysis of variance (ANOVA), includingprocD.lm()procD.pgls()advanced.procD.lm()pairwiseD.test()pairwise.slope.test()trajectory,analysis()bilat.symmetry()plotAllometry() Inherent in all of these functions is a common philosophy for ANOVA (although other philosophies exist). The geomorph ANOVA philosophy is that: (1) resampling (randomization) procedures are used to generate empirical sampling distributions to assess significance of effects, (2) effect sizes are estimated as standard deviates from such...

Motivation I often use Python and matplotlib for exploring measurement data (from e.g. accelerometers), even if I use R for the actual analysis. The reason is that I like to be able to flexibly zoom into different parts of the plot using the mouse and this works well for me with matplotlib. So I decided to try to call matplotlib from R using...

Parallelizing Random Forests in R with BatchJobs and OpenLava By: Gord Sissons and Feng Li In his series of blogs about machine learning, Trevor Stephens focuses on a survival model from the Titanic disaster and provides a tutorial explaining how decision trees tend to over-fit models yielding anomalous predictions. How do we build a better

Revolution R Open 8.0.2 is now available from MRAN. If you're already using Revolution R Open, you won't find any major changes. This release fixes a couple of bugs, includes a new version of the checkpoint package, and splits the installation into two parts on Windows and Linux (with a separate installer for the multithreaded MKL Math libraries, which...

This is the third post about LifeCycle Grids. You can find the first post about the sense of LifeCycle Grids and A-Z process for creating and visualizing with R programming language here. Lastly, here is the second post about adding monetary metrics (customer lifetime value – CLV – and customer acquisition cost – CAC) to... Read More »

The cost of fuel in South Africa (and I imagine pretty much everywhere else) is a contentious topic. It varies from month to month and, although it is clearly related to the price of crude oil and the exchange rate, various other forces play an influential role. According to the Department of Energy the majority The post

I was asked recently to look at some R code which performs “embarrassingly parallel” computations (the same function, multiple times, different parameters) and see whether I could modify it to run on one of our high-performance computing clusters. The machine has 63 virtual compute nodes and uses the TORQUE batch queue system to allocate nodes

The example below shows how to estimate a simple univariate Poisson time series model with the tscount package. While the model estimation is straightforward and yeilds very similar parameter estimates to the ones generated with the acp package (https://statcompute.wordpress.com/2015/03/29/autoregressive-conditional-poisson-model-i), the prediction mechanism is a bit tricky. 1) For the in-sample and the 1-step-ahead predictions: yhat_

Yahoo Labs has just released an interesting new data set useful for research on detecting anomalies (or outliers) in time series data. There are many contexts in which anomaly detection is important. For Yahoo, the main use case is in detecting unusual traffic on Yahoo servers. The data set comprises real traffic to Yahoo services, along

FigTree is designed for viewing beast output as demonstrated by their example data: BEAST output is well supported by ggtree and it's easy to reproduce such a tree view. ggtree supports parsing beast output by read.beast function. We can visualize the tree directly by using ggtree function. Since this is a time scale tree, we can set the parameter time_scale...

I saw a fly-by #rstats mention of more airplane accident data on — of all places — LinkedIn (email) today which took me to a GitHub repo by @philjette. It seems there’s a web site (run by what seems to be a single human) that tracks plane crashes. Here’s a tweet from @philjette announcing it:

The renamer Package Tired of the disparate naming systems in R? Then this is the package for you. Installing the package The package is located in my drat. To install install.packages("renamer", repos="http://csgillespie.github.io/drat", type="source") or if you have drat installed drat::addRepo("csgillespie") install.packages("renamer", type="source") The source is available on my github page Example: The CamelCaseR If have an

A recursive programming Le Monde mathematical puzzle: Given n tokens with 10≤n≤25, Alice and Bob play the following game: the first player draws an integer1≤m≤6 at random. This player can then take 1≤r≤min(2m,n) tokens. The next player is then free to take 1≤s≤min(2r,n-r) tokens. The player taking the last tokens is the winner. There is

Together with General Assembly, DataCamp created a free set of videos on the fundamentals of R. Discover it now! In a series of short videos, the team behind DataCamp teaches you about the fundamentals of R, an open-source statistical programming language. Use this course to understand the advantages and disadvantages of R, and discover at the same The post

You can find registration information and agenda details (as they become available) on the conference website. Or you can go directly to the registration page. Note that there's an early-bird registration deadl...

by Sherri Rose Assistant Professor of Health Care Policy Harvard Medical School Targeted learning methods build machine-learning-based estimators of parameters defined as features of the probability distribution of the data, while also providing influence-curve or bootstrap-based confidence internals. The theory offers a general template for creating targeted maximum likelihood estimators for a data structure, nonparametric or semiparametric statistical model,...

The genomation package is a toolkit for annotation and visualization of various genomic data. The package is currently in developmental version of BioC. It allows to analyze high-throughput data, including bisulfite sequencing data. Here, we will visualize the distribution of CpG methylation around promoters and their locations within gene structures on human chromosome 3.Heatmap and...

The annoucement below just went to the R-SIG-Finance list. More information is as usual at the R / Finance page. Registration for R/Finance 2015 is now open! The conference will take place on May 29 and 30, at UIC in Chicago. Building on the success of the previous conferences in 2009-2014, we expect more than 250 attendees from around...

Following my previous post, François (aka @FrancoisKeck) posted a comment mentionning another package I could use to get an interactive map, the rleafmap package. And the heatmap was here easy to include. This time, we do not use openstreetmap. The first part is still the same, to get the data, > require(rleafmap) > library(sp) > library(rgdal) > library(maptools) >...