## Simple R Debugging GUI for Bio7

September 26, 2014
By

26.09.2014 For the next release of Bio7 I implemented a first simple debugging GUI (Graphical User Interface) for R scripts. For the debugging process a change from Rserve to an available Java R console connection in Bio7 is necessary (with Rserve alone a debugging interface wouldn’t be possible). Both connections runs in the same process

## List of R programmers

September 26, 2014
By

List of R programmers: Hello R people. In December of 2013 I posted a cheap-o wiki-editable (thank you github) contact list which recruiters can use to find you, if they’re looking for R programmers. In what I consider a resounding success, within a few weeks it got onto the first page of google (thank you github), and...

## Overcoming D3 Cartographic Envy With R + ggplot

September 25, 2014
By

When I used one of the Scotland TopoJSON files for a recent post, it really hit me just how much D3 cartography envy I had/have as an R user. Don’t get me wrong, I can conjure up D3 maps pretty well and the utility of an interactive map visualization goes without saying, but

## R and Docker

September 25, 2014
By

Earlier this evening I gave a short talk about R and Docker at the September Meetup of the Docker Chicago group. Thanks to Karl Grzeszczak for setting the meeting, and for providing a pretty thorough intro talk regarding CoreOS and Docker. My slides...

## Google location data — Where I’ve been.

September 25, 2014
By

I was emailed by a friend that was looking into their google location data and had asked if I had ever used a json file before in R. I said I had not, but I knew there were packages to do such things. The things I sent were things he had already tried,...

## Installing dplyr 0.3 on Mac OS X (Mavericks)

September 25, 2014
By

UPDATE Per the author, a devtools::install_github("hadley/devtools") should take care of everything you need prior to installing the latest dplyr (though I did not have postgres libs installed and suspect that might still be needed). The R dplyr package just turned 0.3 and to get it working in my development environment (OS X Mavericks) I had to do the following: brew install postgresql...

## How to draw venn pie-agram (multi-layer pie chart) in R?

September 25, 2014
By

I was wondering how to draw a venn diagram like pie chart in R, to show the distribution of my RNA-seq reads mapped onto different annotation regions (e.g. intergenic, intron, exons etc.). A google search returns several options, including the nice one...

## Top open R jobs (for September 25th 2014)

September 25, 2014
By

This is the bimonthly R Jobs post (for 2014-09-25), based on the R-bloggers’ sister website: R-users.com. If you are an employer who is looking to hire people from the R community, please visit this link to post a new R job (it’s free, and registration takes less than 10 seconds). After almost 8 months, this is the first time that two weeks had pass without a single new job to share. As compensation, I...

## Brazilian Presidential Election

September 25, 2014
By

Three major polling houses published their polls this week: MDA, Ibope, and Vox Populi. The following numbers incorporate these data. With current data, a runoff between Dilma and Marina seems to be inevitable (.87), though its certainty has decreased from the previous week as the following chart indicates. How to understand the following plots: The … Read More...

## Regular expressions for everyone else

September 25, 2014
By

Regular expressions are an amazing tool for working with character data, but they are also painful to read and write.  Even after years of working with them, I struggle to remember the syntax for negative lookahead, or which way round the start and end anchor symbols go. Consequently, I’ve created the regex package for human

## Estimating Generalization Error with the PRESS statistic

September 25, 2014
By

As we’ve mentioned on previous occasions, one of the defining characteristics of data science is the emphasis on the availability of “large” data sets, which we define as “enough data that statistical efficiency is not a concern” (note that a “large” data set need not be “big data,” however you choose to define it). In Related posts:

## DescTools: a new R "misc package"

September 25, 2014
By

by Joseph Rickert One of the most difficult things about R, a problem that is particularly vexing to beginners, is finding things. This is an unintended consequence of R's spectacular, but mostly uncoordinated, organic growth. The R core team does a superb job of maintaining the stability and growth of the R language itself, but the innovation engine for...

## Aggregate portfolio contributions through time

September 25, 2014
By

The last CRAN release didn’t have much new functionality, but Ross Bennett and I have completely re-written the Return.portfolio function to fix some issues and make the calculations more transparent.  The function calculates the returns of a portfolio given asset returns, weights, and rebalancing periods – which, although not rocket science, requires some diligence about it. Users of this

## How Many Paths are Possible in an 18 Hole Round of Match Play Golf?

September 25, 2014
By

In honor of the Ryder Cup, here's a fun puzzle for the mathematically inclined golfer to consider: how many different paths are possible in an 18 hole round of match play golf? If you'd rather not wade through the math then you can skip ahead to the "practical exploration" section of this post to see some actual match play...

## Effective Applications of the R Language Conference 2014

September 25, 2014
By

By Chris Campbell - Senior Consultant, UK. What struck me first was how few sandals I could see, none of which were paired with socks. The energy in the room was electric as introductions were made and business cards were exchanged. The inaugural Effective Applications of the R Language (EARL) had started strongly with two sold-out workshops. As Matt Aldridge...

## RMOA package for running streaming classifcation & regression models now at CRAN

Last week, we released the RMOA package at CRAN (http://cran.r-project.org/web/packages/RMOA). It is an R package to allow building streaming classification and regression models on top of MOA. MOA is the acronym of 'Massive Online Analysis' and it is the most popular open source framework for data stream mining which is being developed at the University of Waikato: http://moa.cms.waikato.ac.nz....

## Joint Models for Longitudinal and Survival Data

September 25, 2014
By

What are joint models for longitudinal and survival data? In this post we will introduce in layman's terms the framework of joint models for longitudinal and time-to-event data. These models are applied in settings where the sample units are followed-up in time, for example, we may be interest in patients suffering...

## “R for Developers” course – Oct 16-17 @ Milano, Italy

September 25, 2014
By

R for Developers Milano - October 16 and 17, 2014 Course description This two-day course provides an overview of several advanced R topics, such as: R environments, object oriented programming, functional programming and debugging. Who should attend this course Anyone … Continue reading →

## Become an effective data hacker with the R-Hadoop stack

September 24, 2014
By

In discussion with several data scientists, Will Stanton (a data scientist with Return Path) learned that a common concern is: what software should I be using? There are many options out there, but what is the best platform to be an effective "data hacker"? Will recommends using a technology stack with R and Hadoop, which allows data scientists "to...

## Nuts and Bolts of Quantstrat, Part IV

September 24, 2014
By

This post will provide an introduction to the way that rules work in quantstrat. It will detail market orders along … Continue reading →

## Multiple Tests, an Introduction

September 24, 2014
By
$X_{i,t}$

Last week, a student asked me about multiple tests. More precisely, she ran an experience over – say – 20 weeks, with the same cohort of – say – 100 patients. An we observe some size=100 nb=20 set.seed(1) X=matrix(rnorm(size*nb),size,nb) (here, I just generate some fake data). I can visualize some trajectories, over the 20 weeks, library(RColorBrewer) cl1=brewer.pal(12,"Set3") cl2=brewer.pal(8,"Set2") cl=c(cl1,cl2)...

## Adding Google Drive Times and Distance Coefficients to Regression Models with ggmap and sp

September 24, 2014
By

Space, a wise man once said, is the final frontier. Not the Buzz Alrdin/Light Year, Neil deGrasse Tyson kind (but seriously, have you seen Cosmos?). Geographic space. Distances have been finding their way into metrics since the cavemen (probably). GIS seem to make nearly every science way more fun…and accurate! Most of my research deals with

## Data Science Toolbox Survey Results… Surprise! R and Python win

September 24, 2014
By

This is a re-publication of a blog post from a blog I created not long before...

## DVI Performance

September 24, 2014
By

This is the next post in the DVI indicator series. After the first two (here and here) analyzed in details the post-entry returns and the entry power of this indicator, it’s time to take a look at the trading performance. Using the Systematic Investor Toolbox, we get some pretty decent results: CAGR of 16.15% and

## PageRank For SQL Lovers

September 24, 2014
By

If you’re changing the world, you’re working on important things. You’re excited to get up in the morning (Larry Page, CEO and Co-Founder of Google) This is my particular tribute to one of the most important, influential and life-changer R packages I have discovered in the last times: sqldf package. Because of my job, transforming

## Changing the Light Azimuth in Shaded Relief Representation by Clustering Aspect

September 24, 2014
By

Some time ago I published an article on "The Cartographic Journal" regarding a method to automatically change the light azimuth in shaded relief representations.This method was based on clustering the aspect derivative of the DTM. The method was develo...

## Post 10: Multicore parallelism in MCMC

September 24, 2014
By

MCMC is by its very nature a serial algorithm -- each iteration depends on the results of the last iteration. It is, therefore, rather difficult to parallelize MCMC code so that a single chain will run more quickly by splitting … Continue reading →

## PubMed Publication Date: what is it, exactly?

September 23, 2014
By

File this one under “has troubled me (and others) for some years now, let’s try to resolve it.” Let’s use the excellent R/rentrez package to search PubMed for articles that were retracted in 2013. 117 articles. Now let’s fetch the records in XML format. Next question: which XML element specifies the “Date of publication” (PDAT)?