## By-Group Aggregation in Parallel

October 4, 2014
Similar to the row search, by-group aggregation is another perfect use case to demonstrate the power of split-and-conquer with parallelism. In the example below, it is shown that the homebrew by-group aggregation with foreach pakage, albeit inefficiently coded, is still a lot faster than the summarize() function in Hmisc package.

## What happens if we forget a trivial assumption ?

October 4, 2014
$a$

Last week, @dmonniaux published an interesting post entitled l’erreur n’a rien d’original  on  his blog. He was asking the following question : let , and denote three real-valued coefficients, under which assumption on those three coefficients does has a real-valued root ? Everyone aswered , but no one mentioned that it is necessary to have a proper quadratic equation,...

## Introducing miniCRAN: an R package to create a private CRAN repository

October 3, 2014
by Andrie deVries One of the reasons that R is so popular is the CRAN archive of useful packages. However, with more than 5,900 packages on CRAN, many organisations need to maintain a private mirror of CRAN with only a subset of packages that are relevant to them. The package miniCRAN makes this possible by determining the dependency tree...

## SelectionShare & TimingShare | Masterfully Written by Delightfully Responsive Author

October 3, 2014
Anders Ekholm has written a wonderful paper Ekholm, Anders G. Components of Portfolio Variance: Systematic, Selection and Timing August 8, 2014 http://ssrn.com/abstract=2463649 demonstrating how we might decompose a money manager’s performance with...

## Ebola: Beds, Labs, and Warnings? Can they help? (Shiny App)

October 3, 2014
A month ago when the WHO was projecting estimates of the effect of current outbreak of Ebola being as deadly as affecting 20,000 people, I ran some elementary modelling and found that these estimates are far too small given the current trend.  The...

## Consumer Preference Driven by Benefits and Affordances, Yet Management Sees Only Products and Features

October 2, 2014
Return on Investment (ROI) is management's bottom line. Consequently, everything must be separated and assigned a row with associated costs and profits. Will we make more by adding another product to our line? Will we lose sales by limiting the feature...

## Shiny 0.10.2

October 2, 2014
Shiny v0.10.2 has been released to CRAN. To install it: install.packages('shiny') This version of Shiny requires R 3.0.0 or higher (note the current version of R is 3.1.1). R 2.15.x is no longer supported. Here are the most prominent changes: File uploading via fileInput() now works for Internet Explorer 8 and 9. Note, however, that IE 8/9 do not

## Find us at Strata Conference and Hadoop World 2014!

October 2, 2014
SupStat Analytics and Transwarp Technologies will be at the 2014 Strata Conference and Hadoop World showcasing the power of Hadoop and Spark computing with R analytics. We’re excited to be presenting to the data science world the Transwarp Data Hub, an integrated storage, processing, and analytics platform that delivers up to 100 times faster performance

## A Failed Attempt at Backtesting Structural Arbitrage

October 2, 2014
One of the things that I wondered about regarding the previous post was how would this strategy have performed in … Continue reading →

## The Rise of the Samurai Pitcher

October 2, 2014
Masahiro Tanaka stands on the mound, rubbing the ball vigorously between his hands. It's a crisp, cool night in the Bronx. Stepping back, he digs his right foot into the rubber, winds up and, with a seven-foot stretch, steps towards the catcher, unleashing a blistering four-seam, 95 mph fastball. Less than half a second later, it explodes into the catcher's...

## R and Data Science Webinar

October 2, 2014
by Joseph Rickert Recently, I had the opportunity to present a webinar on R and Data Science. The challenge with attempting this sort of thing is to say something interesting that does justice to the subject while being suitable for an audience that may include both experienced R users and curious beginners. The approach I settled on had three...

## Announcing the Publication of Practical Data Science Cookbook

October 2, 2014
Four of DC2′s board members have published a new book! Tony Ojeda, Sean Murphy, Benjamin Bengfort, and Abhijit Dasgupta are proud to announce the arrival of Practical Data Science Cookbook (Packt, $10 ebook or$49.99 print+ebook). Practical Data Science Cookbook is perfect for … Continue reading → The post Announcing the Publication of Practical Data Science Cookbook appeared first on...

## devtools 1.6

October 2, 2014
Devtools 1.6 is now available on CRAN. Devtools makes it so easy to build a package that it becomes your default way to organise code, data and documentation. Learn more at http://r-pkgs.had.co.nz/. You can get the latest version with: install.packages("devtools") We’ve made a lot of improvements to the install and release process: Installation functions now

## Building a DGA Classifer: Part 2, Feature Engineering

October 2, 2014
This is part two of a three-part blog series on building a DGA classifier and it is split into the three phases of building a classifier: 1) Data preperation 2) Feature engineering and 3) Model selection. Back in part 1, we prepared the data and we are starting with a nice clean list of domains labeled as either legitamate (“legit”) or generated by...

October 2, 2014
## Society for Judgment and Decision Making: Who Are We (Part 1)

October 1, 2014
An analysis of the academic departments of SJDM society members. The post Society for Judgment and Decision Making: Who Are We (Part 1) appeared first on Decision Science News.

October 1, 2014
## Cross Validation for Kernel Density Estimation

October 1, 2014
$\mathbb{E}\left[\int [\widehat{f}_h(x)-f(x)]^2dx\right]$

In a post publihed in July, I mentioned the so called the Goldilocks principle, in the context of kermel density estimation, and bandwidth selection. The bandwith should not be too small (the variance would be too large) and it should not be too large (the bias would be too large). Another standard method to select the bandwith, as mentioned...

## New York Times approachably describes Bayesian Statistics

October 1, 2014
The New York Times published an article of interest to statisticians the other day: "The Odds, Continually Updated". Surprisingly for a general-audience newspaper, this article goes into the the distinctions between Bayesian and frequentist statistics, and does so in a very approachable way. Here's an excerpt: The essence of the frequentist technique is to apply probability to data. If...

## Got a ticket for the runoff?

October 1, 2014
This is one of the very last posting before the election next Sunday. So far, the only certainty is the runoff ticket of the incumbent candidate, Dilma Rousseff (PT). The runner up candidates, the environmentalist Marina Silva (PSB) and the Social Democrat Aecio Neves are walking to a neck-and-neck dispute over the last spin. Although … Read More...

## Working with NIfTI images in R

October 1, 2014
The oro.nifti package is awesome for NeuRoimaging (couldn't help myself). It has functions to read/write images, introduces the S4 nifti class, and has useful plotting functions. There are some limitations and some gotchas that are important to discuss if you are working with these objects in R. Dataset Creation We'll read in some data (a

## Transparent hurricane paths in R

October 1, 2014
Arthur Charpentier has written a really nice blog post about obtaining hurricane tracks and plotting them. He then goes on to do other clever Markov process models, but as a dataviz guy who knows almost nothing about meteorology, I want to … Continue reading →

## New fiscal sponsorship agreement with NumFocus foundation

October 1, 2014
I’m very pleased to announce that rOpenSci has signed a comprehensive fiscal sponsorship agreement with the NumFocus foundation, a 501(c)3 nonprofit that supports R&D for open source scientific software projects. We are delighted to be in the company of esteemed projects such as IPython and Julia that share our goal of promoting reproducible research practices...

## Complex Domain Coloring

September 30, 2014
Why don’t you stop doodling and start writing serious posts in your blog? (Cecilia, my beautiful wife) Choose a function, apply it to a set of complex numbers, paint  the result using the HSV technique and be ready to be impressed because images can be absolutely amazing. You only need ggplot2 package and your imagination. This is what happens

## Structured simulation of regression models – simReg package.

September 30, 2014
I'd like to introduce a package that simulates regression models. This includes both single level and multilevel (i.e. hierarchical or linear mixed) models up to two levels of nesting. The package produces a unified framework to simulate all types of c...

## Install R in Android, via GNURoot -no root required!

September 30, 2014
Playing with my tablet some time ago, I wondered if installing R could be possible. You know, a small android device “to the power of R”… After searching on Google from time to time, I came across some interesting possibilities: … Sigue leyendo →

## Interactive Visualisation of the Profitable Amount of Waste to Dispose Illegally

September 30, 2014
"Wow!" - I said to myself after reading R Helps With Employee Churn post - "I can create interactive plots in R?!!! I have to try it out!" I quickly came up with an idea of creating interactive plot for my simple model for assessment of the profitable ratio between the volume waste that could be illegally...

## Generating Hurricanes with a Markov Spatial Process

September 30, 2014
The National Hurricane Center (NHC) collects datasets with all  storms in North Atlantic, the North Atlantic Hurricane Database (HURDAT). For all sorms, we have the location of the storm, every six jours (at midnight, six a.m., noon and six p.m.). Note that we have also the date, the maximal wind speed – on a 6 hour window – and...