## The secrets of a linear world exposed

July 15, 2014
By
$The secrets of a linear world exposed$

Like many concepts in mathematics, linearity has multiple interpretations and meanings. What do we mean when we say something is …Continue reading →

## another R new trick [new for me!]

July 15, 2014
By

While working with Andrew and a student from Dauphine on importance sampling, we wanted to assess the distribution of the resulting sample via the Kolmogorov-Smirnov measure where F is the target.  This distance (times √n) has an asymptotic distribution that does not depend on n, called the Kolmogorov distribution. After searching for a little while,

## Palette of Colors from Image %>% ggplot2 %>% rCharts + dimple.js

July 15, 2014
By

R should be mature and dying but now, but it is instead alive and vibrant. I experiment with some of the new developments below. It is probably best to copy/paste the intro below in case you do not see the iframe content. The five purposes of this po...

## best note on reshape2

July 15, 2014
By

I found this elegant note about reshape2 from Sean Anderson's blog:http://seananderson.ca/2013/10/19/reshape.html

## Creating Reproducible Software Environments with Packrat

July 15, 2014
By

Open science has grown tremendously in the past few years. While there’s stilla long way to go, the availability of data, software, and other materials is making it possible to re-use these products to expand upon previous work and apply them to new areas. Through responsible conduct of research (RCR) …

## Preparing Big Data for Analysis in R

July 15, 2014
By

by Yaniv Mor, Co-founder & CEO of Xplenty How do you get Big Data ready for R? Gigabytes or terabytes of raw data may need to be combined, cleaned, and aggregated before they can be analyzed. Processing such large amounts of data used to require installing Hadoop on a cluster of servers, not to mention coding MapReduce jobs in...

## Presentations and video of the 5th meeting

July 15, 2014
By

Great success for the 5th MilanoR meeting. At links below, you find speech presentations. Please leave a comment! Welcome Presentation by Nicola Sturaro, consultant at Quantide Singular Spectrum Analysis With Rssa by Maurizio Sanarico, Chief Data Scientist at SDG consulting … Continue reading →

## Finally, a use for rapply

July 15, 2014
By

Tagged: r, rapply, recursive, stats

## Average dissertation and thesis length, take two

July 15, 2014
By

About a year ago I wrote a post describing average length of dissertations at the University of Minnesota. I've been meaning to expand that post by adding data from masters theses since the methods for gathering/parsing the records are transferable. This post provides some graphics and links to R code for evaluating dissertation (doctorate) and

## Consistent naming conventions in R

July 15, 2014
By

Naming conventions in R are famously anarchic, with no clear winner and multiple conventions in use simultaneously in the same package. This has been written about before, in a lucid article in the R Journal, a detailed exploration of names in R source code hosted on CRAN and general discussion on stackoverflow. Basically, there are 5 naming...

## Simple user interface in R to get login details

July 15, 2014
By

Occasionally I have to connect to services from R that ask for login details, such as databases. I don't like to store my login details in the R source code file, instead I would prefer to enter the my login details when I execute the code.Fortunately,...

## implementing reproducible research [short book review]

July 14, 2014
By

As promised, I got back to this book, Implementing reproducible research (after the pigeons had their say). I looked at it this morning while monitoring my students taking their last-chance R exam (definitely last chance as my undergraduate R course is not reconoduced next year). The book is in fact an edited collection of papers

## Implementing mclapply() on Windows: a primer on embarrassingly parallel computation on multicore systems with R

July 14, 2014
By

An easy way to run R code in parallel on a multicore system is with the mclapply() function. Unfortunately, mclapply() does not work on Windows machines because the mclapply() implementation relies on forking and Windows does not support forking. For me, this is somewhat of a headache because I am used to using mclapply(), and

## “Vignettes” Update

July 14, 2014
By

As a follow-up to my post about major changes to FSA and that some of the “old” vignettes are now out-of-date, here is a brief summary of new material in the draft book chapters (linked to below) that replaced the … Continue reading →

## Bayesian Naive Bayes for Classification with the Dirichlet Distribution

July 14, 2014
By

I have a classification task and was reading up on various approaches. In the specific case where all inputs are categorical, one can use “Bayesian Naïve Bayes” using the Dirichlet distribution.  Poking through the freely available text by Barber, I found a rather detailed discussion in chapters 9 and 10, as well as example matlab code for the...

## Guide to Machine Learning with R from InsideBigData

July 14, 2014
By

InsideBigData has published a new Guide to Machine Learning, in collaboration with Revolution Analytics. As the name suggests, the Guide provides an overview of machine learning techniques, with a focus on implementation with the R language and (for big-data applications) Revolution R Enterprise. You can download the Guide here (email registration required), or for a quick overview of the...

## R Notes: Functions

July 14, 2014
By

R's semantics is modeled after Scheme. Scheme is a functional language and R is functional too. I am writing about the functions in R and many R's strange usages are just syntax sugars of special function calls. What is rownames(x) <- c('A','B','C')? y <- c(1, 2, 3, 4, 5, 6)x <- matrix(y, nrow = 3, ncol...

## 5 new R jobs (for July 14th 2014)

July 14, 2014
By

This is the bimonthly R Jobs post (for 2014-07-14), based on the R-bloggers’ sister website: R-users.com. If you are an employer who is looking to hire people from the R community, please visit this link to post a new R job (it’s free, and registration takes less than 10 seconds). If you are a job seekers, please follow the links below to learn more and apply for your job of interest (or visit previous...

## Win Your Fantasy Football Draft with These Shiny Apps: 2014 Update

July 13, 2014
By

By popular demand, I updated the fantasy football draft optimizers with 2014 projections.  There are two draft optimizers (one for auction drafts and one for snake drafts).  The optimizers identify The post Win Your Fantasy Football Draft with These Shiny Apps: 2014 Update appeared first on Fantasy Football Analytics.

## Using bootMer to do model comparison in R

July 13, 2014
By

Setting the right random effect part in mixed effect models can be tricky in many applied situation. I will not talk here about choosing wether a grouping variable (sites, individuals …) should be included as a fixed term or as a random term, please see Gelman and Hill (2006) and Zuur et al (2009) for

## Stan goes to the World Cup

July 13, 2014
By

I thought it would be fun to fit a simple model in Stan to estimate the abilities of the teams in the World Cup, then I could post everything here on the blog, the whole story of the analysis from beginning to end, showing the results of spending a couple hours on a data analysis. The post

## The Zebra Of Riemann

July 13, 2014
By

Mathematics is the art of giving the same name to different things (Henri Poincare) Many surveys among experts point that demonstration of the Riemann Hypothesis is the most important pending mathematical issue in this world. This hypothesis is related to Riemann zeta function, which is supossed to be zero only for those complex whose real part is equal to

July 12, 2014
By

While I was out at the (immensely impressive and equally enjoyable) useR! 2014 conference at UCLA, Conrad provided a bug-fix release 4.320 of Armadillo, the nifty templated C++ library for linear algebra. I quickly rolled that into RcppArmadillo rel...

## odfweave setup and counting logicals

July 12, 2014
By

Two short items in this blogpost. Since it was not obvious how to run odfWeave() in my particular setup, the call I am using. Then there were several people crosstabulating logical vectors, so I wanted to play along, 80 times faster than table().odfWeaveMy particular setup consists of R, 7-zip, libreoffice. Somehow they don't 100% play along when using odfWeave....

## R Notes: vectors

July 12, 2014
By

R is different from C family languages. It has a C syntax, but a Lisp semantics. Programmers from C/C++/Java world would find many usages in R adhoc and need to memorize special cases. This is because they use R from a C's perspective. R is a very...

## Le Monde puzzle [#875]

July 11, 2014
By

I learned something in R today thanks to Le Monde mathematical puzzle: A two-player game consists in A picking a number n between 1 and 10 and B and A successively choosing and applying one of three transforms to the current value of n n=n+1, n=3n, n=4n, starting with B, until n is larger than

## Sometimes I feel (some) need for speed

July 11, 2014
By

I’m the first to acknowledge that most of my code could run faster. The truth of the matter is that, in essence, I write ‘quickies’: code that will run once or twice, so there is no incentive to spend days or hours in shaving seconds of a computation. Most analyses of research data fall in

## IEEE ranks R #9 amongst all languages

July 11, 2014
By

IEEE — the world's largest professional association for the language of technology — recently published its ranking of the popularity of programming languages. The R language comes in at number 9 in the list. The ranking is based on 12 weighted factors, including Google search rankings and trends, social media chatter, aggregator posts (Reddit and Hacker news), social programming...