## “R for Developers” course – Oct 16-17 @ Milano, Italy

September 25, 2014
By

R for Developers Milano - October 16 and 17, 2014 Course description This two-day course provides an overview of several advanced R topics, such as: R environments, object oriented programming, functional programming and debugging. Who should attend this course Anyone … Continue reading →

## Become an effective data hacker with the R-Hadoop stack

September 24, 2014
By

In discussion with several data scientists, Will Stanton (a data scientist with Return Path) learned that a common concern is: what software should I be using? There are many options out there, but what is the best platform to be an effective "data hacker"? Will recommends using a technology stack with R and Hadoop, which allows data scientists "to...

## Nuts and Bolts of Quantstrat, Part IV

September 24, 2014
By

This post will provide an introduction to the way that rules work in quantstrat. It will detail market orders along … Continue reading →

## Multiple Tests, an Introduction

September 24, 2014
By
$X_{i,t}$

Last week, a student asked me about multiple tests. More precisely, she ran an experience over – say – 20 weeks, with the same cohort of – say – 100 patients. An we observe some size=100 nb=20 set.seed(1) X=matrix(rnorm(size*nb),size,nb) (here, I just generate some fake data). I can visualize some trajectories, over the 20 weeks, library(RColorBrewer) cl1=brewer.pal(12,"Set3") cl2=brewer.pal(8,"Set2") cl=c(cl1,cl2)...

## Adding Google Drive Times and Distance Coefficients to Regression Models with ggmap and sp

September 24, 2014
By

Space, a wise man once said, is the final frontier. Not the Buzz Alrdin/Light Year, Neil deGrasse Tyson kind (but seriously, have you seen Cosmos?). Geographic space. Distances have been finding their way into metrics since the cavemen (probably). GIS seem to make nearly every science way more fun…and accurate! Most of my research deals with

## Data Science Toolbox Survey Results… Surprise! R and Python win

September 24, 2014
By

This is a re-publication of a blog post from a blog I created not long before...

## DVI Performance

September 24, 2014
By

This is the next post in the DVI indicator series. After the first two (here and here) analyzed in details the post-entry returns and the entry power of this indicator, it’s time to take a look at the trading performance. Using the Systematic Investor Toolbox, we get some pretty decent results: CAGR of 16.15% and

## PageRank For SQL Lovers

September 24, 2014
By

If you’re changing the world, you’re working on important things. You’re excited to get up in the morning (Larry Page, CEO and Co-Founder of Google) This is my particular tribute to one of the most important, influential and life-changer R packages I have discovered in the last times: sqldf package. Because of my job, transforming

## Changing the Light Azimuth in Shaded Relief Representation by Clustering Aspect

September 24, 2014
By

Some time ago I published an article on "The Cartographic Journal" regarding a method to automatically change the light azimuth in shaded relief representations.This method was based on clustering the aspect derivative of the DTM. The method was develo...

## Post 10: Multicore parallelism in MCMC

September 24, 2014
By

MCMC is by its very nature a serial algorithm -- each iteration depends on the results of the last iteration. It is, therefore, rather difficult to parallelize MCMC code so that a single chain will run more quickly by splitting … Continue reading →

## PubMed Publication Date: what is it, exactly?

September 23, 2014
By

File this one under “has troubled me (and others) for some years now, let’s try to resolve it.” Let’s use the excellent R/rentrez package to search PubMed for articles that were retracted in 2013. 117 articles. Now let’s fetch the records in XML format. Next question: which XML element specifies the “Date of publication” (PDAT)?

## In-depth introduction to machine learning in 15 hours of expert videos

September 23, 2014
By

In January 2014, Stanford University professors Trevor Hastie and Rob Tibshirani (authors of the legendary Elements of Statistical Learning textbook) taught an online course based on their newest textbook, An Introduction to Statistical Learning with Applications in R (ISLR). I found it to be an excellent course in statistical learning (also known as "machine learning"), largely...

## a weird beamer feature…

September 23, 2014
By

As I was preparing my slides for my third year undergraduate stat course, I got a weird error that got a search on the Web to unravel: which was related with a fragile environment but not directly the verbatim part: the reason for the bug was that the end{frame} command did not have a line

## Seeing the (day)light with R

September 23, 2014
By

The arrival of the autumnal equinox foreshadows the reality of longer nights and shorter days here in the northeast US. We can both see that reality and distract ourselves from it at the same time by firing up RStudio (or your favorite editor) and taking a look at the sunrise & sunset times based on

## Factors are not first-class citizens in R

September 23, 2014
By

The primary user-facing data types in the R statistical computing environment behave as vectors. That is: one dimensional arrays of scalar values that have a nice operational algebra. There are additional types (lists, data frames, matrices, environments, and so-on) but the most common data types are vectors. In fact vectors are so common in R Related posts:

## How to publish R and ggplot2 to the web

September 23, 2014
By

by Matt Sundquist, Plotly Co-founder It's delightfully smooth to publish R code, plots, and presentations to the web. For example: Shiny makes interactive apps from R. Pretty R highlights R code for HTML. Slidify makes slides from R Markdown. Knitr and RPubs let you publish R Markdown docs. GitHub and devtools let you quickly release packages and collaborate. Now,...

## Hands-on dplyr tutorial for faster data manipulation in R

September 23, 2014
By

I love dplyr. It's my "go-to" package in R for data exploration, data manipulation, and feature engineering. I use dplyr because it saves me time: its performance is blazing fast on data frames, but even more importantly, I can write dplyr code faster ...

## testthat 0.9

September 23, 2014
By

testthat 0.9 is now available on CRAN. Testthat makes it easy to turn the informal testing that you’re already doing into formal automated tests. Learn more at http://r-pkgs.had.co.nz/tests.html This version of testthat has four important new features that bring testthat up to speed with unit testing frameworks in other languages: You can skip() tests with

## NCEAS Codefest Follow-up

September 23, 2014
By

The week after labor day, we had the pleasure of attending the NCEAS open science codefest event in Santa Barbara. It was great to meet folks like the new arrivals at the expanding Mozilla Science Lab, Bill Mills and Abby Cabunoc (Bill even already has a great post up about the codefest), and see...

## Managing R package dependencies

September 23, 2014
By

One of my take aways from last week's EARL conference was that R is more and more growing out of its academic roots into the enterprise. And with that come some challenges, e.g. how do I ensure consistent and systematic access to a set of R packages in an organisation, in particular when one team is providing...

September 22, 2014
By

Version 1.1 of the archivist package reached CRAN few days ago. This package supports operations on disk based repository of R objects. It makes the storing, restoring and searching for an R objects easy (searching with the use of meta information). Want to share your object with article reviewers or collaborators? This package should help.

September 22, 2014
By

Continuing with his standard pace of approximately one new version per month, Conrad released a new minor release of Armadillo a few days ago. As before, I had created a GitHub-only pre-release which was tested against all eighty-seven (!!) CRAN dependents of our RcppArmadillo package and then uploaded RcppArmadillo 0.4.450.0 to CRAN. The CRAN...

## Interesting high contrast plots in R

September 22, 2014
By

I was inspired by this blog post and thought I could do the same thing in R.  Well I posted the code in Google+Here are my results.  Not bad.

## Newcastle R course, a write-up

September 22, 2014
By

I recently attended a week-long R course in Newcastle, taught by Colin Gillespie. It went from “An Introduction to R” to “Advanced Graphics” via a day each on modelling, efficiency and programming. Suffice to say it was an intense 5 days! Overall it was the best R course I’ve been on so far. I’d recommend it to others,...

## Twitter’s REST API v1.1 with R (for Linux and Windows)

September 22, 2014
By

In this tutorial I am going to describe a straightforward way of how to make use of Twitter’s REST API v1.1. For that purpose I composed a little package (RTwitterAPI), so that requesting data just needs the API URL, the API parameters … Continue reading →

## Around the world in 80k miles

September 22, 2014
By

You're probably familiar with the classic Travelling Salesman problem: given (say) 20 cities, what is shortest route you can take that passes through all 20 cities and returns to the starting point? It's a difficult problem to solve, because you need to try all possible routes to find the minimum, and there are a LOT of possibilities. For a...

## H2O, Domino & Kaggle Quick-Start Guide and RUGSMAPS2

September 22, 2014
By

Following up on my previous posts about H2O Deep Learning (TTTAR1) and RUGSMAPS (TTTAR2), here is a quick update on two interesting things I have been working on: a Kaggle tutorial and a new RUGSMAPS app.Short Tutorials based on a Kaggle CompetitionFirst of all, I would like to share with...

## Dirk Eddelbuettel, the useR! 2014 Interview

September 22, 2014
By

First things first, Dirk Eddelbuettel was recently named ordinary. This seems contradictory, since Dirk is a...