Variations on rolling forecasts

July 15, 2014
By

Rolling forecasts are commonly used to compare time series models. Here are a few of the ways they can be computed using R. I will use ARIMA models as a vehicle of illustration, but the code can easily be adapted to other univariate time series models. One-step forecasts without re-estimation The simplest approach is to estimate the model on...

Read more »

The secrets of a linear world exposed

July 15, 2014
By
The secrets of a linear world exposed

Like many concepts in mathematics, linearity has multiple interpretations and meanings. What do we mean when we say something is …Continue reading →

Read more »

another R new trick [new for me!]

July 15, 2014
By
another R new trick [new for me!]

While working with Andrew and a student from Dauphine on importance sampling, we wanted to assess the distribution of the resulting sample via the Kolmogorov-Smirnov measure where F is the target.  This distance (times √n) has an asymptotic distribution that does not depend on n, called the Kolmogorov distribution. After searching for a little while,

Read more »

Palette of Colors from Image %>% ggplot2 %>% rCharts + dimple.js

July 15, 2014
By

R should be mature and dying but now, but it is instead alive and vibrant. I experiment with some of the new developments below. It is probably best to copy/paste the intro below in case you do not see the iframe content. The five purposes of this po...

Read more »

best note on reshape2

July 15, 2014
By

I found this elegant note about reshape2 from Sean Anderson's blog:http://seananderson.ca/2013/10/19/reshape.html

Read more »

Creating Reproducible Software Environments with Packrat

July 15, 2014
By

Open science has grown tremendously in the past few years. While there’s stilla long way to go, the availability of data, software, and other materials is making it possible to re-use these products to expand upon previous work and apply them to new areas. Through responsible conduct of research (RCR) …

Read more »

Preparing Big Data for Analysis in R

July 15, 2014
By
Preparing Big Data for Analysis in R

by Yaniv Mor, Co-founder & CEO of Xplenty How do you get Big Data ready for R? Gigabytes or terabytes of raw data may need to be combined, cleaned, and aggregated before they can be analyzed. Processing such large amounts of data used to require installing Hadoop on a cluster of servers, not to mention coding MapReduce jobs in...

Read more »

Presentations and video of the 5th meeting

July 15, 2014
By

Great success for the 5th MilanoR meeting. At links below, you find speech presentations. Please leave a comment! Welcome Presentation by Nicola Sturaro, consultant at Quantide Singular Spectrum Analysis With Rssa by Maurizio Sanarico, Chief Data Scientist at SDG consulting … Continue reading →

Read more »

Finally, a use for rapply

July 15, 2014
By
Finally, a use for rapply

Tagged: r, rapply, recursive, stats

Read more »

Average dissertation and thesis length, take two

July 15, 2014
By
Average dissertation and thesis length, take two

About a year ago I wrote a post describing average length of dissertations at the University of Minnesota. I've been meaning to expand that post by adding data from masters theses since the methods for gathering/parsing the records are transferable. This post provides some graphics and links to R code for evaluating dissertation (doctorate) and

Read more »

Consistent naming conventions in R

July 15, 2014
By
Consistent naming conventions in R

Naming conventions in R are famously anarchic, with no clear winner and multiple conventions in use simultaneously in the same package. This has been written about before, in a lucid article in the R Journal, a detailed exploration of names in R source code hosted on CRAN and general discussion on stackoverflow. Basically, there are 5 naming...

Read more »

Simple user interface in R to get login details

July 15, 2014
By
Simple user interface in R to get login details

Occasionally I have to connect to services from R that ask for login details, such as databases. I don't like to store my login details in the R source code file, instead I would prefer to enter the my login details when I execute the code.Fortunately,...

Read more »

implementing reproducible research [short book review]

July 14, 2014
By
implementing reproducible research [short book review]

As promised, I got back to this book, Implementing reproducible research (after the pigeons had their say). I looked at it this morning while monitoring my students taking their last-chance R exam (definitely last chance as my undergraduate R course is not reconoduced next year). The book is in fact an edited collection of papers

Read more »

Implementing mclapply() on Windows: a primer on embarrassingly parallel computation on multicore systems with R

July 14, 2014
By

An easy way to run R code in parallel on a multicore system is with the mclapply() function. Unfortunately, mclapply() does not work on Windows machines because the mclapply() implementation relies on forking and Windows does not support forking. For me, this is somewhat of a headache because I am used to using mclapply(), and

Read more »

“Vignettes” Update

July 14, 2014
By
“Vignettes” Update

As a follow-up to my post about major changes to FSA and that some of the “old” vignettes are now out-of-date, here is a brief summary of new material in the draft book chapters (linked to below) that replaced the … Continue reading →

Read more »

Bayesian Naive Bayes for Classification with the Dirichlet Distribution

July 14, 2014
By

I have a classification task and was reading up on various approaches. In the specific case where all inputs are categorical, one can use “Bayesian Naïve Bayes” using the Dirichlet distribution.  Poking through the freely available text by Barber, I found a rather detailed discussion in chapters 9 and 10, as well as example matlab code for the...

Read more »

Guide to Machine Learning with R from InsideBigData

July 14, 2014
By

InsideBigData has published a new Guide to Machine Learning, in collaboration with Revolution Analytics. As the name suggests, the Guide provides an overview of machine learning techniques, with a focus on implementation with the R language and (for big-data applications) Revolution R Enterprise. You can download the Guide here (email registration required), or for a quick overview of the...

Read more »

R Notes: Functions

July 14, 2014
By

R's semantics is modeled after Scheme. Scheme is a functional language and R is functional too. I am writing about the functions in R and many R's strange usages are just syntax sugars of special function calls. What is rownames(x) <- c('A','B','C')? y <- c(1, 2, 3, 4, 5, 6)x <- matrix(y, nrow = 3, ncol...

Read more »

5 new R jobs (for July 14th 2014)

July 14, 2014
By
r_jobs

This is the bimonthly R Jobs post (for 2014-07-14), based on the R-bloggers’ sister website: R-users.com. If you are an employer who is looking to hire people from the R community, please visit this link to post a new R job (it’s free, and registration takes less than 10 seconds). If you are a job seekers, please follow the links below to learn more and apply for your job of interest (or visit previous...

Read more »

Win Your Fantasy Football Draft with These Shiny Apps: 2014 Update

July 13, 2014
By
Win Your Fantasy Football Draft with These Shiny Apps: 2014 Update

By popular demand, I updated the fantasy football draft optimizers with 2014 projections.  There are two draft optimizers (one for auction drafts and one for snake drafts).  The optimizers identify The post Win Your Fantasy Football Draft with These Shiny Apps: 2014 Update appeared first on Fantasy Football Analytics.

Read more »

Using bootMer to do model comparison in R

July 13, 2014
By
Using bootMer to do model comparison in R

Setting the right random effect part in mixed effect models can be tricky in many applied situation. I will not talk here about choosing wether a grouping variable (sites, individuals …) should be included as a fixed term or as a random term, please see Gelman and Hill (2006) and Zuur et al (2009) for

Read more »

Stan goes to the World Cup

July 13, 2014
By
Stan goes to the World Cup

I thought it would be fun to fit a simple model in Stan to estimate the abilities of the teams in the World Cup, then I could post everything here on the blog, the whole story of the analysis from beginning to end, showing the results of spending a couple hours on a data analysis. The post

Read more »

The Zebra Of Riemann

July 13, 2014
By
The Zebra Of Riemann

Mathematics is the art of giving the same name to different things (Henri Poincare) Many surveys among experts point that demonstration of the Riemann Hypothesis is the most important pending mathematical issue in this world. This hypothesis is related to Riemann zeta function, which is supossed to be zero only for those complex whose real part is equal to

Read more »

RcppArmadillo 0.4.320.0

July 12, 2014
By

While I was out at the (immensely impressive and equally enjoyable) useR! 2014 conference at UCLA, Conrad provided a bug-fix release 4.320 of Armadillo, the nifty templated C++ library for linear algebra. I quickly rolled that into RcppArmadillo rel...

Read more »

odfweave setup and counting logicals

July 12, 2014
By

Two short items in this blogpost. Since it was not obvious how to run odfWeave() in my particular setup, the call I am using. Then there were several people crosstabulating logical vectors, so I wanted to play along, 80 times faster than table().odfWeaveMy particular setup consists of R, 7-zip, libreoffice. Somehow they don't 100% play along when using odfWeave....

Read more »

R Notes: vectors

July 12, 2014
By

  R is different from C family languages. It has a C syntax, but a Lisp semantics. Programmers from C/C++/Java world would find many usages in R adhoc and need to memorize special cases. This is because they use R from a C's perspective. R is a very...

Read more »

Le Monde puzzle [#875]

July 11, 2014
By
Le Monde puzzle [#875]

I learned something in R today thanks to Le Monde mathematical puzzle: A two-player game consists in A picking a number n between 1 and 10 and B and A successively choosing and applying one of three transforms to the current value of n n=n+1, n=3n, n=4n, starting with B, until n is larger than

Read more »

Sometimes I feel (some) need for speed

July 11, 2014
By
Sometimes I feel (some) need for speed

I’m the first to acknowledge that most of my code could run faster. The truth of the matter is that, in essence, I write ‘quickies’: code that will run once or twice, so there is no incentive to spend days or hours in shaving seconds of a computation. Most analyses of research data fall in

Read more »

IEEE ranks R #9 amongst all languages

July 11, 2014
By
IEEE ranks R #9 amongst all languages

IEEE — the world's largest professional association for the language of technology — recently published its ranking of the popularity of programming languages. The R language comes in at number 9 in the list. The ranking is based on 12 weighted factors, including Google search rankings and trends, social media chatter, aggregator posts (Reddit and Hacker news), social programming...

Read more »