## An example of MapReduce with rmr2

September 2, 2013
By

R can be connected with Hadoop through the rmr2 package. The core of this package is mapreduce() function that allows to write some custom MapReduce algorithms. The aim of this article is to show how it works and to provide … Continue reading →

## Fun With Just-In-Time Compiling: Julia, Python, R and pqR

September 2, 2013
By

Recently I’ve been spending a lot of time trying to learn Julia by doing the problems at Project Euler. What’s great about these problems is that it gets me out of my normal design patterns, since I don’t generally think about prime numbers, factorials and other number theory problems during my normal workday. These problems Fun With Just-In-Time...

## Sentiment Analysis on Twitter with Viralheat API

September 2, 2013
By

Hi there! Some time ago I published a post about doing a sentiment analysis on Twitter. I used two wordlists to do so; one with positive and one with negative words. For the first try of a sentiment analysis it is surely a good way to start but if you want to receive more accurate …

## Poll: R top language for data science three years running

September 2, 2013
By

KDDNuggets has completed its annual poll of top languages for analytics, data mining and data science, and just as in the prior two years the R language is ranked the most popular. R is used by almost 61% of respondents: R's usage grew year over year as well, up 16% compared to the 2012 poll. By contrast, the rate...

## Showing results from Cox Proportional Hazard Models in R with simPH

September 2, 2013
By

Effectively showing estimates and uncertainty from Cox Proportional Hazard (PH) models, especially for interactive and non-linear effects, can be challenging with currently available software. So, researchers often just simply display a results table. These are pretty useless for Cox PH models. It is difficult to decipher a simple linear variable’s estimated effect and basically impossible to understand time...

## Passing-Bablok Regression: R code for SAS users

September 2, 2013
By

While at the Joint Statistical Meeting a few weeks ago I was talking to a friend about various aspects to clinical trials. He indicated that no current R package was able to perfectly reproduce Passing-Bablok (PB) regression so that it exactly matched SAS. He ultimately wrote a couple of functions and kindly shared them with

## Easy 3-Minute Guide to Making apply() Parallel over Distributed Grids and Clusters in R

September 1, 2013
By

Last week I attended a workshop on how to run highly parallel distributed jobs on the Open Science Grid (osg). There I met Derek Weitzel who has made an excellent contribution to advancing R as a high performance computing language by developing BoscoR. BoscoR greatly facilitates the use of the already existing package “GridR” by The post Easy...

## Latent Variable Analysis with R: Getting Setup with lavaan

September 1, 2013
By

Getting Started with Structural Equation Modeling Part 1Getting Started with Structural Equation Modeling: Part 1 Introduction For the analyst familiar with linear regression fitting structural equation models can at first feel strange. In the R environment, fitting structural equation models involves learning new modeling syntax, new plotting...

## Fair weather fans, redux

September 1, 2013
By

Fair weather fans, redux Or, A little larger small sample On August 11 the Victoria HarbourCats closed out their 2013 West Coast League season with a 4-3 win over the Bellingham Bells. In an earlier...

## Mixed models exercise 2. Repeated measurements

September 1, 2013
By

Continuing my exploration of mixed models, I now understand what is happening in the second SAS(R)/STAT example for proc mixed (page 5007 of the SAS/STAT 12.3 Manual). It is all about correlation between the time-points within subjects. The data as suc...

## Win Your Fantasy Football Snake Draft with this Shiny App in R

August 31, 2013
By

In a previous post, I showed how to determine the best starting lineup to draft in an auction draft using an optimizer tool.  In this post, I use a Shiny app in R to determine The post Win Your Fantasy Football Snake Draft with this Shiny App in R appeared first on Fantasy Football Analytics.

## Win Your Fantasy Football Snake Draft with this Shiny App in R

August 31, 2013
By

In a previous post, I showed how to determine the best starting lineup to draft in an auction draft using an optimizer tool.  In this post, I use a Shiny app in R to determine the best possible players to pick in a fantasy...

## StarCluster and R

August 31, 2013
By

StarCluster is a utility for creating and managingdistributed computing clusters hosted on Amazon's Elastic ComputeCloud (EC2). StarCluster utilizes Amazon´s EC2 web service to createand destroy clusters of Linux virtual machines on demand. Justin Riley http://star.mit.edu/cluster/docs/latest/index.html StarCluster documentation StarCluster provides a convenient way to quickly set up a cluster of machines to run some data parallel jobs using a distributed memory framework. Install...

## Introducing ‘propagate’

August 31, 2013
By
$Introducing ‘propagate’$

With this post, I want to introduce the new ‘propagate’ package on CRAN. It has one single purpose: propagation of uncertainties (“error propagation”). There is already one package on CRAN available for this task, named ‘metRology’ (http://cran.r-project.org/web/packages/metRology/index.html). ‘propagate’ has some additional functionality that some may find useful. The most important functions are: * propagate: A

## GitHub Package Ideas I Stole

August 31, 2013
By

One of my favorite sources of good ideas is looking at the GitHub repositories of others and modeling my repos after the good ideas I see others doing. Here's Steve Jobs on stealing ideas: In the past few weeks I've … Continue reading →

## MLB Rankings Using the Bradley-Terry Model

August 31, 2013
By

Today, I take my first shots at ranking Major League Baseball (MLB) teams. I see my efforts at prediction and ranking an ongoing process so that my models improve, the data I incorporate are more meaningful, and ultimately my predictions are largely accurate. For the first attempt, let’s rank MLB teams using the Bradley-Terry (BT) model. Before we discuss the rankings, we need...

## The Dutch Dataverse Network: a host for the ChEMBL-RDF v13.5 data, and some thoughts in workflow integration

August 31, 2013
By

Last Thursday, there was a UM library network drink. And as I see a library where knowledge is found, and libraries still rarely think of knowledge as ever being able to be stored outside books and papers, I was happy to see the library promoting the D...

## Visualising Shrinkage

August 31, 2013
By

A useful property of mixed effects and Bayesian hierarchical models is that lower level estimates are shrunk towards the more stable estimates further up the hierarchy. To use a time honoured example you might be modelling the effect of a new teaching method on performance at the classroom level. Classes of 30 or so students … Continue reading...

## Encouraging citation of software – introducing CITATION files

August 30, 2013
By

Summary: Put a plaintext file named CITATION in the root directory of your code, and put information in it about how to cite your software. Go on, do it now – it’ll only take two minutes! Software is very important in science – but good software takes time and effort that could be used to do

## The joy and martyrdom of trying to be a Bayesian

August 30, 2013
By

Some of my fellow scientists have it easy. They use predefined methods like linear regression and ANOVA to test simple hypotheses; they live in the innocent world of bivariate plots and lm(). Sometimes they notice that the data have odd histograms and they use glm(). The more educated ones use … Continue reading →

## Tutorial: Parallel programming with foreach

August 30, 2013
By

Exegetic Analytics extols the wonders of foreach package for iterative operations that go beyond the standard "for" loop in R. For example, here's a neat (if not optimally efficient) construct using filters to calculate the primes less than 100: foreach(n = 1:100, .combine = c) %:% when (isPrime(n)) %do% n The open-source team at Revolution Analytics created the foreach...

## ECVP tutorial on classification images

August 30, 2013
By

The slides for my ECVP tutorial on classification images are available here. Try this alternative version if the equations look funny. (image from Mineault et al. 2009) The slides are in HTML and contain some interactive elements. They’re the result of experimenting with R Markdown, D3 and pandoc. You write the slides in R Markdown,

## Making regex examples work for you!

August 30, 2013
By

One of the most frequently used string recognition algorithms out there is regex and R implements regex.  However, users can often be frustrated with how despite taking examples verbatim from many sources such as stackoverflow they do not seem to ...

## Knitr/Markdown OpenCPU App

August 30, 2013
By

A new little OpenCPU app allows you to knit and markdown in the browser. It has a fancy pants code editor which automatically updates the output after 3 seconds of inactivity. It uses the Ace web editor with mode-r.js (thanks to RStudio for making the latter available). Like all OpenCPU apps, the source...

## Knitr/Markdown OpenCPU App

August 30, 2013
By

A new little OpenCPU app allows you to knit and markdown in the browser. It has a fancy pants code editor which automatically updates the output after 3 seconds of inactivity. It uses the Ace web editor with mode-r.js (thanks to RStudio for making the latter available). Like all OpenCPU apps, the source package lives in the opencpu app...

## Drafting the Best Starting Lineup in Fantasy Football by Taking into Account Uncertainty in the Projections: An Optimization Simulation

August 29, 2013
By

In a previous post, I showed how to determine the best starting lineup to draft using an optimizer tool.  The optimizer identifies the players that maximize your projected points within your The post Drafting the Best Starting Lineup in Fantasy Football by Taking into Account Uncertainty in the Projections: An Optimization Simulation appeared first on Fantasy Football Analytics.

## Drafting the Best Starting Lineup in Fantasy Football by Taking into Account Uncertainty in the Projections: An Optimization Simulation

August 29, 2013
By

In a previous post, I showed how to determine the best starting lineup to draft using an optimizer tool.  The optimizer identifies the players that maximize your projected points within your risk tolerance.  The optimizer does not take i...

## Plot Weekly or Monthly Totals in R

August 29, 2013
By

When plotting time series data, you might want to bin the values so that each data point corresponds to the sum for a given month or week. This post will show an easy way to use cut and ggplot2's stat_summary to plot month totals in R wi...

## A simple amortization function

August 29, 2013
By

I was working on a project yesterday where I needed to amortize out a bunch of loans to calculate the total interest a borrower would pay if he or she paid the minimum monthly payment for the full term of the loan. I couldn’t find any package in R that already contained the necessary math,