While working with Andrew and a student from Dauphine on importance sampling, we wanted to assess the distribution of the resulting sample via the Kolmogorov-Smirnov measure where F is the target. This distance (times √n) has an asymptotic distribution that does not depend on n, called the Kolmogorov distribution. After searching for a little while,

R should be mature and dying but now, but it is instead alive and vibrant. I experiment with some of the new developments below. It is probably best to copy/paste the intro below in case you do not see the iframe content. The five purposes of this po...

I found this elegant note about reshape2 from Sean Anderson's blog:http://seananderson.ca/2013/10/19/reshape.html

Open science has grown tremendously in the past few years. While there’s stilla long way to go, the availability of data, software, and other materials is making it possible to re-use these products to expand upon previous work and apply them to new areas. Through responsible conduct of research (RCR) …

by Yaniv Mor, Co-founder & CEO of Xplenty How do you get Big Data ready for R? Gigabytes or terabytes of raw data may need to be combined, cleaned, and aggregated before they can be analyzed. Processing such large amounts of data used to require installing Hadoop on a cluster of servers, not to mention coding MapReduce jobs in...

Great success for the 5th MilanoR meeting. At links below, you find speech presentations. Please leave a comment! Welcome Presentation by Nicola Sturaro, consultant at Quantide Singular Spectrum Analysis With Rssa by Maurizio Sanarico, Chief Data Scientist at SDG consulting … Continue reading →

About a year ago I wrote a post describing average length of dissertations at the University of Minnesota. I've been meaning to expand that post by adding data from masters theses since the methods for gathering/parsing the records are transferable. This post provides some graphics and links to R code for evaluating dissertation (doctorate) and

Naming conventions in R are famously anarchic, with no clear winner and multiple conventions in use simultaneously in the same package. This has been written about before, in a lucid article in the R Journal, a detailed exploration of names in R source code hosted on CRAN and general discussion on stackoverflow. Basically, there are 5 naming...

As promised, I got back to this book, Implementing reproducible research (after the pigeons had their say). I looked at it this morning while monitoring my students taking their last-chance R exam (definitely last chance as my undergraduate R course is not reconoduced next year). The book is in fact an edited collection of papers

An easy way to run R code in parallel on a multicore system is with the mclapply() function. Unfortunately, mclapply() does not work on Windows machines because the mclapply() implementation relies on forking and Windows does not support forking. For me, this is somewhat of a headache because I am used to using mclapply(), and

I have a classification task and was reading up on various approaches. In the specific case where all inputs are categorical, one can use “Bayesian Naïve Bayes” using the Dirichlet distribution. Poking through the freely available text by Barber, I found a rather detailed discussion in chapters 9 and 10, as well as example matlab code for the...

InsideBigData has published a new Guide to Machine Learning, in collaboration with Revolution Analytics. As the name suggests, the Guide provides an overview of machine learning techniques, with a focus on implementation with the R language and (for big-data applications) Revolution R Enterprise. You can download the Guide here (email registration required), or for a quick overview of the...

R's semantics is modeled after Scheme. Scheme is a functional language and R is functional too. I am writing about the functions in R and many R's strange usages are just syntax sugars of special function calls. What is rownames(x) <- c('A','B','C')? y <- c(1, 2, 3, 4, 5, 6)x <- matrix(y, nrow = 3, ncol...

This is the bimonthly R Jobs post (for 2014-07-14), based on the R-bloggers’ sister website: R-users.com. If you are an employer who is looking to hire people from the R community, please visit this link to post a new R job (it’s free, and registration takes less than 10 seconds). If you are a job seekers, please follow the links below to learn more and apply for your job of interest (or visit previous...

By popular demand, I updated the fantasy football draft optimizers with 2014 projections. There are two draft optimizers (one for auction drafts and one for snake drafts). The optimizers identify The post Win Your Fantasy Football Draft with These Shiny Apps: 2014 Update appeared first on Fantasy Football Analytics.

Setting the right random effect part in mixed effect models can be tricky in many applied situation. I will not talk here about choosing wether a grouping variable (sites, individuals …) should be included as a fixed term or as a random term, please see Gelman and Hill (2006) and Zuur et al (2009) for

Mathematics is the art of giving the same name to different things (Henri Poincare) Many surveys among experts point that demonstration of the Riemann Hypothesis is the most important pending mathematical issue in this world. This hypothesis is related to Riemann zeta function, which is supossed to be zero only for those complex whose real part is equal to

While I was out at the (immensely impressive and equally enjoyable) useR! 2014 conference at UCLA, Conrad provided a bug-fix release 4.320 of Armadillo, the nifty templated C++ library for linear algebra. I quickly rolled that into RcppArmadillo rel...

Two short items in this blogpost. Since it was not obvious how to run odfWeave() in my particular setup, the call I am using. Then there were several people crosstabulating logical vectors, so I wanted to play along, 80 times faster than table().odfWeaveMy particular setup consists of R, 7-zip, libreoffice. Somehow they don't 100% play along when using odfWeave....

R is different from C family languages. It has a C syntax, but a Lisp semantics. Programmers from C/C++/Java world would find many usages in R adhoc and need to memorize special cases. This is because they use R from a C's perspective. R is a very...

IEEE — the world's largest professional association for the language of technology — recently published its ranking of the popularity of programming languages. The R language comes in at number 9 in the list. The ranking is based on 12 weighted factors, including Google search rankings and trends, social media chatter, aggregator posts (Reddit and Hacker news), social programming...

Monday, I will be giving the closing talk of the R in Insurance Conference, in London, on Bayesian Computations for Actuaries, as to be more specific, Getting into Bayesian Wizardry… (with the eyes of a muggle actuary). The animated version of the slides (since we will spend some time on MCMC algorithm, I thought that animated graphs could be...