R can be connected with Hadoop through the rmr2 package. The core of this package is mapreduce() function that allows to write some custom MapReduce algorithms. The aim of this article is to show how it works and to provide … Continue reading →

R can be connected with Hadoop through the rmr2 package. The core of this package is mapreduce() function that allows to write some custom MapReduce algorithms. The aim of this article is to show how it works and to provide … Continue reading →

Recently I’ve been spending a lot of time trying to learn Julia by doing the problems at Project Euler. What’s great about these problems is that it gets me out of my normal design patterns, since I don’t generally think about prime numbers, factorials and other number theory problems during my normal workday. These problems Fun With Just-In-Time...

KDDNuggets has completed its annual poll of top languages for analytics, data mining and data science, and just as in the prior two years the R language is ranked the most popular. R is used by almost 61% of respondents: R's usage grew year over year as well, up 16% compared to the 2012 poll. By contrast, the rate...

Effectively showing estimates and uncertainty from Cox Proportional Hazard (PH) models, especially for interactive and non-linear effects, can be challenging with currently available software. So, researchers often just simply display a results table. These are pretty useless for Cox PH models. It is difficult to decipher a simple linear variable’s estimated effect and basically impossible to understand time...

While at the Joint Statistical Meeting a few weeks ago I was talking to a friend about various aspects to clinical trials. He indicated that no current R package was able to perfectly reproduce Passing-Bablok (PB) regression so that it exactly matched SAS. He ultimately wrote a couple of functions and kindly shared them with

Last week I attended a workshop on how to run highly parallel distributed jobs on the Open Science Grid (osg). There I met Derek Weitzel who has made an excellent contribution to advancing R as a high performance computing language by developing BoscoR. BoscoR greatly facilitates the use of the already existing package “GridR” by The post Easy...

Getting Started with Structural Equation Modeling Part 1Getting Started with Structural Equation Modeling: Part 1 Introduction For the analyst familiar with linear regression fitting structural equation models can at first feel strange. In the R environment, fitting structural equation models involves learning new modeling syntax, new plotting...

Continuing my exploration of mixed models, I now understand what is happening in the second SAS(R)/STAT example for proc mixed (page 5007 of the SAS/STAT 12.3 Manual). It is all about correlation between the time-points within subjects. The data as suc...

In a previous post, I showed how to determine the best starting lineup to draft in an auction draft using an optimizer tool. In this post, I use a Shiny app in R to determine The post Win Your Fantasy Football Snake Draft with this Shiny App in R appeared first on Fantasy Football Analytics.

In a previous post, I showed how to determine the best starting lineup to draft in an auction draft using an optimizer tool. In this post, I use a Shiny app in R to determine the best possible players to pick in a fantasy...

StarCluster is a utility for creating and managingdistributed computing clusters hosted on Amazon's Elastic ComputeCloud (EC2). StarCluster utilizes Amazon´s EC2 web service to createand destroy clusters of Linux virtual machines on demand. Justin Riley http://star.mit.edu/cluster/docs/latest/index.html StarCluster documentation StarCluster provides a convenient way to quickly set up a cluster of machines to run some data parallel jobs using a distributed memory framework. Install...

With this post, I want to introduce the new ‘propagate’ package on CRAN. It has one single purpose: propagation of uncertainties (“error propagation”). There is already one package on CRAN available for this task, named ‘metRology’ (http://cran.r-project.org/web/packages/metRology/index.html). ‘propagate’ has some additional functionality that some may find useful. The most important functions are: * propagate: A

Today, I take my first shots at ranking Major League Baseball (MLB) teams. I see my efforts at prediction and ranking an ongoing process so that my models improve, the data I incorporate are more meaningful, and ultimately my predictions are largely accurate. For the first attempt, let’s rank MLB teams using the Bradley-Terry (BT) model. Before we discuss the rankings, we need...

A useful property of mixed effects and Bayesian hierarchical models is that lower level estimates are shrunk towards the more stable estimates further up the hierarchy. To use a time honoured example you might be modelling the effect of a new teaching method on performance at the classroom level. Classes of 30 or so students … Continue reading...

Summary: Put a plaintext file named CITATION in the root directory of your code, and put information in it about how to cite your software. Go on, do it now – it’ll only take two minutes! Software is very important in science – but good software takes time and effort that could be used to do

Some of my fellow scientists have it easy. They use predefined methods like linear regression and ANOVA to test simple hypotheses; they live in the innocent world of bivariate plots and lm(). Sometimes they notice that the data have odd histograms and they use glm(). The more educated ones use … Continue reading →

Exegetic Analytics extols the wonders of foreach package for iterative operations that go beyond the standard "for" loop in R. For example, here's a neat (if not optimally efficient) construct using filters to calculate the primes less than 100: foreach(n = 1:100, .combine = c) %:% when (isPrime(n)) %do% n The open-source team at Revolution Analytics created the foreach...

The slides for my ECVP tutorial on classification images are available here. Try this alternative version if the equations look funny. (image from Mineault et al. 2009) The slides are in HTML and contain some interactive elements. They’re the result of experimenting with R Markdown, D3 and pandoc. You write the slides in R Markdown,

One of the most frequently used string recognition algorithms out there is regex and R implements regex. However, users can often be frustrated with how despite taking examples verbatim from many sources such as stackoverflow they do not seem to ...

A new little OpenCPU app allows you to knit and markdown in the browser. It has a fancy pants code editor which automatically updates the output after 3 seconds of inactivity. It uses the Ace web editor with mode-r.js (thanks to RStudio for making the latter available). Like all OpenCPU apps, the source...

A new little OpenCPU app allows you to knit and markdown in the browser. It has a fancy pants code editor which automatically updates the output after 3 seconds of inactivity. It uses the Ace web editor with mode-r.js (thanks to RStudio for making the latter available). Like all OpenCPU apps, the source package lives in the opencpu app...

In a previous post, I showed how to determine the best starting lineup to draft using an optimizer tool. The optimizer identifies the players that maximize your projected points within your The post Drafting the Best Starting Lineup in Fantasy Football by Taking into Account Uncertainty in the Projections: An Optimization Simulation appeared first on Fantasy Football Analytics.

I was working on a project yesterday where I needed to amortize out a bunch of loans to calculate the total interest a borrower would pay if he or she paid the minimum monthly payment for the full term of the loan. I couldn’t find any package in R that already contained the necessary math,