DeployR 7.4 released

April 3, 2015
By

A new version of DeployR, the server-based framework that provides simple and secure R integration for application developers, is now available. (If you're new to DeployR, take a look at the DeployR Overview or download the white paper, Using DeployR to Solve the R Integration Problem.) This update brings several new features, including: New R Session Process Controls, which...

Read more »

Bags, Balls and the Hypergeometric Distribution

April 3, 2015
By
Bags, Balls and the Hypergeometric Distribution

A friend came to me with a question. The original question was a little complicated, but in essence it could be explained in terms of the familiar urn problem. So, here's the problem: you have an urn with 50 white balls and 9 black balls. The black balls are individually numbered. Balls are drawn from The post

Read more »

R Tutorial on Reading and Importing Excel Files into R

April 2, 2015
By
R Tutorial on Reading and Importing Excel Files into R

Why an R Tutorial on Reading and Importing Excel Files into R is necessary As most of you know, Excel is a spreadsheet application developed by Microsoft. It is an easy accessible tool for organizing, analyzing and storing data in tables and has a widespread use in many different application fields all over the world. The post

Read more »

WW1 Monthly Casualties by Fronts and Belligerents

April 2, 2015
By
WW1 Monthly Casualties by Fronts and Belligerents

GithubI've been reading a few books on WW1 and wanted to see a time series plot of battle casualty/pow by country to get a better understanding of how the conflict fits together. I couldn't find any database for military casualties in WW1 but Wikipedia...

Read more »

Recruitment Chapter for IFAR

April 2, 2015
By
Recruitment Chapter for IFAR

I have added a very rough draft of the Recruitment chapter to the Introduction to Fisheries Analysis with R (IFAR) page.  This chapter is a complete re-working of the old Stock-Recruitment vignette and includes a section on fitting non-linear stock-recruitment … Continue reading →

Read more »

map projections in oce

April 2, 2015
By
map projections in oce

Introduction The latest version (4.9.0) of the PROJ.4 library is being incorporated into the development version of the oce R package. The work is not finalized yet, but I thought it might be useful to share an early version of the test suite, so people could get an idea of the upcoming capabilities. Note that some projections work quite poorly in oce at the...

Read more »

Coarse Grain Parallelism with foreach and rxExec

April 2, 2015
By

by Joseph Rickert I have written a several posts about the Parallel External Memory Algorithms (PEMAs) in Revolution Analytics’ RevoScaleR package, most recently about rxBTrees(), but I haven’t said much about rxExec(). rxExec() is not itself a PEMA, but it can be used to write parallel algorithms. Pre-built PEMAs such as rxBTrees(), rxLinMod(), etc are inherently parallel algorithms designed...

Read more »

Presentation: Interpretation of Results of Clinical Research

April 2, 2015
By

Yesterday, I delivered a talk on "Interpretation of Results of Clinical Research" in Annual Alumni Meet of Hematology Department of All India Institute of Medical Sciences, New Delhi, India. Here is the link for the same. https://github.com/sumprain/bl...

Read more »

Bags, Balls and the Hypergeometric Distribution: Update

April 2, 2015
By

So... the Hypergeometric distribution (as used in one of my previous posts). That was a bit of overkill, wasn't it? To recap the problem: we have an urn filled with a selection of white and black balls. We want to calculate the probability that all of the white balls and all but one of the The post

Read more »

ANOVAs and Geomorph

April 1, 2015
By

Within geomorph are several functions that perform analysis of variance (ANOVA), includingprocD.lm()procD.pgls()advanced.procD.lm()pairwiseD.test()pairwise.slope.test()trajectory,analysis()bilat.symmetry()plotAllometry() Inherent in all of these functions is a common philosophy for ANOVA (although other philosophies exist).  The geomorph ANOVA philosophy is that: (1) resampling (randomization) procedures are used to generate empirical sampling distributions to assess significance of effects, (2) effect sizes are estimated as standard deviates from such...

Read more »

Call matplotlib from R

April 1, 2015
By
Call matplotlib from R

Motivation I often use Python and matplotlib for exploring measurement data (from e.g. accelerometers), even if I use R for the actual analysis. The reason is that I like to be able to flexibly zoom into different parts of the plot using the mouse and this works well for me with matplotlib. So I decided to try to call matplotlib from R using...

Read more »

Seeing the Forest and the Trees – a parallel machine learning example

April 1, 2015
By
Seeing the Forest and the Trees – a parallel machine learning example

Parallelizing Random Forests in R with BatchJobs and OpenLava By: Gord Sissons and Feng Li In his series of blogs about machine learning, Trevor Stephens focuses on a survival model from the Titanic disaster and provides a tutorial explaining how decision trees tend to over-fit models yielding anomalous predictions. How do we build a better

Read more »

A minor update: Revolution R Open 8.0.2

April 1, 2015
By

Revolution R Open 8.0.2 is now available from MRAN. If you're already using Revolution R Open, you won't find any major changes. This release fixes a couple of bugs, includes a new version of the checkpoint package, and splits the installation into two parts on Windows and Linux (with a separate installer for the multithreaded MKL Math libraries, which...

Read more »

Cohort Analysis and LifeCycle Grids mixed segmentation with R

Cohort Analysis and LifeCycle Grids mixed segmentation with R

This is the third post about LifeCycle Grids. You can find the first post about the sense of LifeCycle Grids and A-Z process for creating and visualizing with R programming language here. Lastly, here is the second post about adding monetary metrics (customer lifetime value – CLV – and customer acquisition cost – CAC) to... Read More »

Read more »

The Price of Fuel: How Bad Could It Get?

April 1, 2015
By
The Price of Fuel: How Bad Could It Get?

The cost of fuel in South Africa (and I imagine pretty much everywhere else) is a contentious topic. It varies from month to month and, although it is clearly related to the price of crude oil and the exchange rate, various other forces play an influential role. According to the Department of Energy the majority The post

Read more »

Using R, Python, & Plotly With Tableau

April 1, 2015
By
Using R, Python, & Plotly With Tableau

Andy Kriebel recently pointed out that Tableau dashboards let you export their underlying data. Using data frames in R or Python we can read data from Tableau. Then we can plot with Plotly’s Python and R APIs. The use case: collaborate and share data across languages and teams. Let’s try it out. The R code for this post is in an R Notebook; the Python code is in this IPython Notebook. One

Read more »

Configuring the R BatchJobs package for Torque batch queues

March 31, 2015
By
Configuring the R BatchJobs package for Torque batch queues

I was asked recently to look at some R code which performs “embarrassingly parallel” computations (the same function, multiple times, different parameters) and see whether I could modify it to run on one of our high-performance computing clusters. The machine has 63 virtual compute nodes and uses the TORQUE batch queue system to allocate nodes

Read more »

Modeling Count Time Series with tscount Package

March 31, 2015
By
Modeling Count Time Series with tscount Package

The example below shows how to estimate a simple univariate Poisson time series model with the tscount package. While the model estimation is straightforward and yeilds very similar parameter estimates to the ones generated with the acp package (https://statcompute.wordpress.com/2015/03/29/autoregressive-conditional-poisson-model-i), the prediction mechanism is a bit tricky. 1) For the in-sample and the 1-step-ahead predictions: yhat_

Read more »

A new open source data set for anomaly detection

March 31, 2015
By

Yahoo Labs has just released an interesting new data set useful for research on detecting anomalies (or outliers) in time series data. There are many contexts in which anomaly detection is important. For Yahoo, the main use case is in detecting unusual traffic on Yahoo servers. The data set comprises real traffic to Yahoo services, along

Read more »

an example of drawing beast tree using ggtree

March 31, 2015
By
an example of drawing beast tree using ggtree

FigTree is designed for viewing beast output as demonstrated by their example data: BEAST output is well supported by ggtree and it's easy to reproduce such a tree view. ggtree supports parsing beast output by read.beast function. We can visualize the tree directly by using ggtree function. Since this is a time scale tree, we can set the parameter time_scale...

Read more »

More Airline Crashes via the Hadleyverse

March 31, 2015
By

I saw a fly-by #rstats mention of more airplane accident data on — of all places — LinkedIn (email) today which took me to a GitHub repo by @philjette. It seems there’s a web site (run by what seems to be a single human) that tracks plane crashes. Here’s a tweet from @philjette announcing it:

Read more »

Standardising Function Names in R

March 31, 2015
By
Standardising Function Names in R

The renamer Package Tired of the disparate naming systems in R? Then this is the package for you. Installing the package The package is located in my drat. To install install.packages("renamer", repos="http://csgillespie.github.io/drat", type="source") or if you have drat installed drat::addRepo("csgillespie") install.packages("renamer", type="source") The source is available on my github page Example: The CamelCaseR If have an

Read more »

Le Monde puzzle [#905]

March 31, 2015
By
Le Monde puzzle [#905]

A recursive programming  Le Monde mathematical puzzle: Given n tokens with 10≤n≤25, Alice and Bob play the following game: the first player draws an integer1≤m≤6 at random. This player can then take 1≤r≤min(2m,n) tokens. The next player is then free to take 1≤s≤min(2r,n-r) tokens. The player taking the last tokens is the winner. There is

Read more »

Fundamentals of R: Free course by General Assembly & DataCamp

March 31, 2015
By
Fundamentals of R: Free course by General Assembly & DataCamp

Together with General Assembly, DataCamp created a free set of videos on the fundamentals of R. Discover it now! In a series of short videos, the team behind DataCamp teaches you about the fundamentals of R, an open-source statistical programming language. Use this course to understand the advantages and disadvantages of R, and discover at the same The post

Read more »

Registration Open for R/Finance 2015!

March 31, 2015
By

You can find registration information and agenda details (as they become available) on the conference website.  Or you can go directly to the registration page.  Note that there's an early-bird registration deadl...

Read more »

Targeted Learning R Packages for Causal Inference and Machine Learning

March 31, 2015
By
Targeted Learning R Packages for Causal Inference and Machine Learning

by Sherri Rose Assistant Professor of Health Care Policy Harvard Medical School Targeted learning methods build machine-learning-based estimators of parameters defined as features of the probability distribution of the data, while also providing influence-curve or bootstrap-based confidence internals. The theory offers a general template for creating targeted maximum likelihood estimators for a data structure, nonparametric or semiparametric statistical model,...

Read more »

Using genomation to analyze methylation profiles from Roadmap epigenomics and ENCODE

March 31, 2015
By
Using genomation to analyze methylation profiles from Roadmap epigenomics and ENCODE

The genomation package is a toolkit for annotation and visualization of various genomic data. The package is currently in developmental version of BioC. It allows to analyze high-throughput data, including bisulfite sequencing data. Here, we will visualize the distribution of CpG methylation around promoters and their locations within gene structures on human chromosome 3.Heatmap and...

Read more »

R / Finance 2015 Open for Registration

March 31, 2015
By

The annoucement below just went to the R-SIG-Finance list. More information is as usual at the R / Finance page. Registration for R/Finance 2015 is now open! The conference will take place on May 29 and 30, at UIC in Chicago. Building on the success of the previous conferences in 2009-2014, we expect more than 250 attendees from around...

Read more »

Another Interactive Map for the Cholera Dataset

March 31, 2015
By
Another Interactive Map for the Cholera Dataset

Following my previous post, François (aka @FrancoisKeck) posted a comment mentionning another package I could use to get an interactive map, the rleafmap package. And the heatmap was here easy to include. This time, we do not use openstreetmap. The first part is still the same, to get the data, > require(rleafmap) > library(sp) > library(rgdal) > library(maptools) >...

Read more »