4 new R jobs (from R-users.com ; 2015-09-21)
<img src=' [Read more...]
Warsaw Meetings of R Users / Warszawskie Spotkania Entuzjastów R
With the summer holiday season coming to an end, we are back with Warsaw Meetings of R Users (Warszawskie Spotkania Entuzjastów R). Three meetings ahead: September 26 th (this Saturday) – let’s start with data-hack-day (DHD). Having data from Polish Seym (votes and transcripts), we are going to prepare some ... [Read more...]
How to Create Infographics in R
Although you will learn in this article how to create inforgraphics in R, I will be honest: I don’t like meaningless, busy, and unactionable infographics. Then why did I write this article? A couple of reasons: I love the message of this chart I wanted to see what’s ... [Read more...]
Working With SEM Keywords in R
The following post is taken from two previous posts from an older blog of mine that is no longer available. These are from several years ago, and related to two critical questions that I encountered. One, how can I automatically generate hundreds of thousands of keywords for a search engine ... [Read more...]
Six lines to install and start SparkR on Mac OS X Yosemite
I know there are many R users who like to test out SparkR without all the configuration hassle. Just these six lines and you can start SparkR from both RStudio and command line.
One line for Spark and SparkR
Apache Spark is a fast and gener... [Read more...]
Six lines to install and start SparkR on Mac OS X Yosemite
I know there are many R users who like to test out SparkR without all the configuration hassle. Just these six lines and you can start SparkR from both RStudio and command line.
One line for Spark and SparkR
Apache Spark is a fast and general-purpose cluster computing system
SparkR ... [Read more...]
RcppArmadillo 0.5.600.2.0
And yet another upstream Armadillo update -- version 5.600.2 was released yesterday by Conrad. So I pushed a new and matching RcppArmadillo release 0.5.600.2.0 to CRAN and to Debian.
Armadillo is a powerful and expressive C++ template library for linear algebra aiming towards a good balance between speed and ease of use ... [Read more...]
RQuantLib 0.4.1
Right before heading off to last week's excellent EARL 2015 conference in London, a new minor release of RQuantLib was released onto CRAN and into Debian.
The changes are detailed below.
Changes in RQuantLib version 0.4.1 (2015-09-11)
Changes... [Read more...]
Using the ggplot2 library in R
In this article, I will show you how to use the ggplot2 plotting library in R. It was written by Hadley Wickham. If you don’t have already have it, install it and load it up: install.packages('ggplot2') library(ggplot2) qplot qplot is the quickest way to get ... [Read more...]
Teaching R to 200 students in a week
I just taught a week-long “R Bootcamp” to 200 R newbies. It went quite well, and I thought it would be valuable to jot down some thoughts on what worked and what I might change if doing it again.
The course design and my approach to teaching scientific computing in general ... [Read more...]
When is a Backtest Too Good to be True? Part Two.
In the previous post, I went through a simple exercise which, to me, clearly demonsrtates that 60% out of sample guess rate (on daily basis) for S&P 500 will generate ridiculous returns. From the feedback I got, it seemed that my example was somewhat unconvincing. Let’s dig a bit further ... [Read more...]
A Package Full o’ Pirates & Makin’ Interactive Pirate Maps in arrrrrRstats
Avast, me hearties! It’s time four t’ annual International Talk Like a Pirate Day #rstats post! (OK, I won’t make you suffer continuous pirate-speak for the entire post) I tried to be a bit more practical this year and have two treasuRe chests for you to (hopefully) enjoy. ...
[Read more...]Recipe for Computing and Sampling Multivariate Kernel Density Estimates (and Plotting Contours for 2D KDEs).
The code snippet below creates the above graphic:
## radially symmetric kernel (Gussian kernel)
RadSym [Read more...]
#Altmetrics on CiteULike entries in R
I wanted to know when a set of publications I was aggregating on CiteULike was published. The number of publications per year, for example. I did a quick Google but could not find an R package to client to the CiteULike API, and because I wanted to play with JSON ... [Read more...]
#Altmetrics on CiteULike entries in R
I wanted to know when a set of publications I was aggregating on CiteULike was published. The number of publications per year, for example. I did a quick Google but could not find an R package to client to the CiteULike API, and because I wanted to pla... [Read more...]
Passing arguments to an R script from command lines
This post describes how to pass external arguments to R when calling a Rscript with a command line. The case study presented here is very simple: a Rscript is called which needs, as an input, a file name (a text file containing data which are loaded into R to be ... [Read more...]
How to compare two blackbox timeseries generators?
Comparing two timeseries-generating blackboxes
In my last post I talked about how this question on Cross-Validated got me interested. Basically the challenge is to compare two data generating models to see if they are essentially the same. Since then I’ve noticed that this problem comes up in a number ... [Read more...]
Predicting Titanic deaths on Kaggle VI: Stan
It is a bit a contradiction. Kaggle provides competitions on data science, while Stan is clearly part of the (Bayesian) statistics. Yet after using random forests, boosting and bagging, I also think this problem has a suitable size for Stan, which I understand can handle larger problems than older Bayesian ... [Read more...]
Predicting Titanic deaths on Kaggle VI: Stan
It is a bit a contradiction. Kaggle provides competitions on data science, while Stan is clearly part of the (Bayesian) statistics. Yet after using random forests, boosting and bagging, I also think this problem has a suitable size for Stan, which I un... [Read more...]
Can You Say “Heteroscedasticity” 3 Times Fast?
Most books on regression analysis assume homoscedasticity, the situation in which Var(Y | X = t), for a response variable Y and vector of predictor variables X, is the same for all t. Yet, needless to say, almost all data in real life is heteroscedastic. For Y = human weight and X = ... [Read more...]