## Interactive Heat Maps for R

May 23, 2016
By

In every statistical analysis, the first thing one should do is try and visualise the data before any modeling. In microarray studies, a common visualisation is a heatmap of gene expression data. In this post I simulate some gene expression data and visualise it using the heatmaply package in R by Tal Galili. This package

## Understanding Bayesian A/B testing (using baseball statistics)

May 23, 2016
By

Previously in this series Understanding the beta distribution (using baseball statistics) Understanding empirical Bayes estimation (using baseball statistics) Understanding credible intervals (using baseball statistics) Understanding the Bayesian approach to false discovery rates (using baseball statistics) Who is a better batter: Mike Piazza or Hank Aaron? Well, Mike Piazza has a slightly higher career...

## Feather: fast, interoperable data import/export for R

May 23, 2016
By

Unlike most other statistical software packages, R doesn't have a native data file format. You can certainly import and export data in any number of formats, but there's no native "R data file format". The closest equivalent is the saveRDS/loadRDS function pair, which allows you to serialize an R object to a file and then load it back into...

## Introduction to R for Data Science :: Session 3

May 23, 2016
By

Welcome to Introduction to R for Data Science Session 3! The course is co-organized by Data Science Serbia and Startit. You will find all course material (R scripts, data sets, SlideShare presentations, readings) on these pages. Welcome to the third session of Introduction to R for Data Science! Check out the Course Overview to acess the learning...

## Principal Components Regression, Pt. 2: Y-Aware Methods

May 23, 2016
By

In our previous note, we discussed some problems that can arise when using standard principal components analysis (specifically, principal components regression) to model the relationship between independent (x) and dependent (y) variables. In this note, we present some dimensionality reduction techniques that alleviate some of those problems, in particular what we call Y-Aware Principal Components … Continue reading...

## Computational Genomics Course in Berlin

May 23, 2016
By

Berlin Institute for Medical Systems Biology is organizing a computational genomics course and R programming will be used for most of the practical sessions. The course will cover basic statistics, programming and basic concepts in next-generation...

## Setting Up New R Notebook

May 23, 2016
By

Today Noam Ross’ tweet about his experience with the new R Notebooks from RStudio got me excited. Can not find a GIF to properly convey the joy + wreckage of me messing around with the internals of the new #rstats notebook stuff.— Noam Ross (@noamross) May 23, 2016 Couple that with the release of JJ...

## 5-min How-to on New Google Forms

May 23, 2016
By

googleformr is an API to Google Forms, allowing users to POST data securely to Google Forms without needing authentication or permissioning. Google Forms is a robust tool for collecting, analyzing, and storing data gathered through a webform on Google Drive. Since googleformr was created, Google Form has updated its user-experience of the Forms created process. It used to...

## Survival plots have never been so informative

May 22, 2016
By

Hadley Wickham’s ggplot2 version 2.0 revolution, at the end of 2015, triggered many crashes in dependent R packages, that finally led to deletions of few packages from The Comprehensive R Archive Network. It occured that survMisc package was removed from CRAN on 27th of January 2016 and R world remained helpless in the struggle with the...

## occupancy rules

May 22, 2016
By
$occupancy rules$

While the last riddle on The Riddler was rather anticlimactic, namely to find the mean of the number Y of empty bins in a uniform multinomial with n bins and m draws, with solution , this led

## BCEA 2.2-3 is out

May 22, 2016
By

I think the newest release of BCEA, our R package to standardise and post-process the output of a health economic model, is now available from CRAN \$-\$ in fact, the source code is also available here. The package is rather stable, so the...

## Source for the marketAgent R package

May 21, 2016
By

I recently gave a talk at the R in Finance conference in which I introduced the marketAgent package for R. Here is the source for the package if you’d like to play with it: marketAgent_0.000.tar I’ll be giving more details of the talk real soon now. The post Source for the marketAgent R package appeared first on

## Exploring P-values with Simulations in R

May 21, 2016
By
$Exploring P-values with Simulations in R$

The recent flare-up in discussions on p-values inspired me to conduct a brief simulation study. In particularly, I wanted to illustrate just how p-values vary with different effect and sample sizes. Here are the details of the simulation. I simulated draws of my independent variable : where For each , I define a as where In other words, … Continue reading...

## Visual contrast of two robust regression methods

Robust regression For training purposes, I was looking for a way to illustrate some of the different properties of two different robust estimation methods for linear regression models. The two methods I’m looking at are: least trimmed squares, implemented as the default option in lqs() a Huber M-estimator, implemented as the default option in rlm() Both functions...

## Rperform in Google Summer of Code 2016

May 21, 2016
By

Rperform had started as a GSoC 2015 project with an aim to “to provide a package with functions that make it easy for R package developers to track quantitative performance metrics of their code, over time.” Much of the functionality required for the same was implemented over the course of last summer. This included various performance visualization functions and...

## Tutorial: GitHub for Data Scientists without the Terminal

May 21, 2016
By

Git and GitHub are indispensable tools for anyone analysing data, developing software or disseminating results. Originally designed for software engineers, GitHub is now widely used in many disciplines, especially for researchers in academia. Having a source code management software such as GitHub to host your code and have detailed project documentation is a huge step

## Installing WVPlots and “knitting R markdown”

May 20, 2016
By

Some readers have been having a bit of trouble using devtools to install WVPlots. I thought I would write a note with a few instructions to help. These are things you should not have to do often, and things those of us already running R have stumbled through and forgotten about. First you will need … Continue reading...

## Data sanity checks: Data Proofer (and R analogues?)

May 20, 2016
By

I just heard about Data Proofer (h/t Nathan Yau), a test suite of sanity-checks for your CSV dataset. It checks a few basic things you’d really want to know but might forget to check yourself, like whether any rows are … Continue reading →

## Setting up a DataScience Server

May 20, 2016
By

After installing multiple software, servers etc. on my  laptop it was overloaded with different tools and running services. When I get a new laptop or it will crash I can start The post Setting up a DataScience Server appeared first on Networkx.

## Absence of evidence is not evidence of absence: Testing for equivalence

May 20, 2016
By

When you find p > 0.05, you did not observe surprising data, assuming there is no true effect. You can often read in the literature how p > 0.05 is interpreted as ‘no effect’ but due to a lack of power the data might not be surprising if there was...

## Microsoft R Open 3.2.5 now available

May 20, 2016
By

Microsoft R Open 3.2.5 is now available for download. There are no changes of note in the R langauge engine with this release (R 3.2.5 was just a largely a version number increment). There's lots new on the packages front though: Microsoft R Open 3.2.5 has a default CRAN snapshot date of May 1, 2016 and there was plenty...

## Accessing Dataframe Objects Exercises

May 20, 2016
By

The attach() function alters the R environment search path by making dataframe variables into global variables. If incorrectly scripted, the attach() function might create symantic errors. To prevent this possibility, detach() is needed to reset the dataframe objects in the search path. The transform() function allows for transformation of dataframe objects. The within() function creates

## Simulating a Weibull conditional on time-to-event is greater than a given time

May 20, 2016
By

Recently, I had to simulate a time-to-event of subjects who have been on a study, are still ongoing at the time of a data cut, but who are still at risk of an event (e.g. progressive disease, cardiac event, death). This requires the simulation of a con...

## A Beginner’s Guide to Travis-CI for R

May 19, 2016
By

Have you seen all those attractive green badges on other people’s R packages and thought, “I want a lovely green badge!” Always a nice feeling when Travis manages to actually build. #runconf16 pic.twitter.com/7qZfH2OEij— Julia Silge (@juliasilge) April 1, 2016 OF COURSE YOU DO. Well, let’s give it a shot, because today I am going to attempt...

## OpenCPU release 1.6

May 19, 2016
By

Following a few weeks of testing, OpenCPU 1.6 has been released. OpenCPU is a production-ready system for embedded statistical computing with R. It provides a neat API for remotely calling R functions over HTTP via e.g. JSON or Protocol Buffers. The OpenCPU server implementation is stable and has been thorougly...

## ABC random forests for Bayesian parameter inference

May 19, 2016
By

Before leaving Helsinki, we arXived the paper Jean-Michel presented on Monday at ABCruise in Helsinki. This paper summarises the experiments Louis conducted over the past months to assess the great performances of a random forest regression approach to ABC parameter inference. Thus validating in this experimental sense the use of

## User Groups and R Awareness

May 19, 2016
By

by Joseph Rickert For quite a few years now we have attempted to maintain the Revolution Analytics' Local R User Group Directory as the complete and authoritative list of R user groups. Meetup groups make this list in one of two ways: we discover the group because they have a web page of some sort proclaiming the group to...

## Modeling data with functional programming – State based systems

May 19, 2016
By

I’m pleased to announce the availability of my latest chapter on state based systems for my book “Modeling data with …Continue reading →

There is quite some sceptisim towards TTIP, the currently negotiated free trade agreement between Europe and the USA. Even in an exporting nation like Germany, there are substantial worries about reduction in consumer protection standards, or fear of limits to democracy if cooperations can sue states in international settlement courts. On the other hand, most economists tend to be...