## Visualising the seasonality of Atlantic windstorms

October 7, 2014
By

Last week Arthur Charpentier sketched out a Markov spatial process to generate hurricane trajectories. Here, I would like to take another look at the data Arthur used, but focus on its time component. According to the Insurance Information Institute, a normal season, based on averages from 1980 to 2010, has 12 named storms, six hurricanes and...

## Popular Mutual Funds Decomposed With Ekholm (2014)

October 6, 2014
By

While we have a foundation and momentum from the last post “SelectionShare & TimingShare | Masterfully Written by Delightfully Responsive Author” , we can run the Ekholm calculations on some popular funds to see how they have evolved since the early 1980s.  Remember these are my opinions and not investment advice.  I chose these four funds for ...

## The World We Live In #1: Obesity And Cells

October 6, 2014
By

Lesson learned, and the wheels keep turning (The Killers – The world we live in) I discovered this site with a huge amount of data waiting to be analyzed. The first thing I’ve done is this simple graph, where you can see relationship between cellular subscribers and obese people. Bubbles are countries and its size

## The winds of Winter [Bayesian prediction]

October 6, 2014
By

A surprising entry on arXiv this morning: Richard Vale (from Christchurch, NZ) has posted a paper about the characters appearing in the yet hypothetical next volume of George R.R. Martin’s Song of ice and fire series, The winds of Winter . Using the previous five books in the series

## R as a general-purpose language for creating DSLs

October 6, 2014
By

As a computer scientist, RStudio's Joe Cheng has some great insights into the R language and how it compares with other programming language. In the interview with DataScience.LA below, he notes that while R is often thought about as a domain-specific language (or DSL), the combination of a functional language with deferred evaluation of functional arguments actually makes it...

## New version of pqR with faster variable lookup, faster subset replacement, and more

October 6, 2014
By

I’ve released a new version, pqR-2014-09-30, of my speedier, “pretty quick”, implementation of R, with some major performance improvements, and some features from recent R Core versions. It also has fixes for bugs (some also in R-3.1.1) and installation glitches. Details are in pqR NEWS. Here I’ll highlight some of the more interesting improvements. Faster variable lookup.   In both pqR

## 7 new R jobs (for October 6th 2014)

October 6, 2014
By

This is the bimonthly R Jobs post (for 2014-10-06), based on the R-bloggers’ sister website: R-users.com. If you are an employer who is looking to hire people from the R community, please visit this link to post a new R job (it’s free, and registration takes less than 10 seconds). If you are a job seekers, please follow the links below to learn more and apply for your job of interest (or visit previous...

## Machine Learning in R—–NYC class offering

October 6, 2014
By

SupStat is offering a 5-day intensive course in machine learning techniques in R starting this Sunday, October 5th at its Times Square office. Classes are held on October 5th, 12th, and 18th, and November 2nd and 9th, from 10 am to 5 pm for total 35 hours of classroom time and 3 weeks of projects

## A Conversation with Joe Cheng at useR! 2014

October 6, 2014
By

Joe Cheng is a software engineer. Unfortunately the term gets thrown around pretty lightly, so...

## A bit more fragmented

October 6, 2014
By

Tweet This year election renders an even more fragmented legislative. The way political scientists measure this is by applying an algorithm to calculate the Effective Number of Parties, which is a measure that helps to go beyond the simple number of parties. A widely accepted algorithm was proposed by M. Laakso and R. Taagepera: , … Read More...

## Building a DGA Classifier: Part 3, Model Selection

October 6, 2014
By

This is part two of a three-part blog series on building a DGA classifier and it is split into the three phases of building a classifier: 1) Data preparation 2) Feature engineering and 3) Model selection (this post) Back in part 1, we prepared the data and we are starting with a nice clean list of domains labeled as either legitimate (“legit”) or generated by an algorithm (“dga”)....

## TBATS with regressors

October 5, 2014
By

I’ve received a few emails about including regression variables (i.e., covariates) in TBATS models. As TBATS models are related to ETS models, tbats() is unlikely to ever include covariates as explained here. It won’t actually complain if you include an xreg argument, but it will ignore it. When I want to include covariates in a

## Monte Carlo simulation and resampling methods for social science [book review]

October 5, 2014
By

Monte Carlo simulation and resampling methods for social science is a short paperback written by Thomas Carsey and Jeffrey Harden on the use of Monte Carlo simulation to evaluate the adequacy of a model and the impact of assumptions behind this model. I picked it in the library the other day and browse through the

## Bayes of thrones

October 5, 2014
By

My friend and colleague Andreas sent me a link to a working paper published by a statistician at the University of Christchurch (New Zealand) and discussed here. The main idea of the paper was to use a Bayesian model to predict the number of futur...

## Bayes models from SAS PROC MIXED in R, post 2

October 5, 2014
By

This is my second post in converting SAS's PROC MCMC examples in R. The task in his week is determining the transformation parameter in a Box-Cox transformation. SAS only determines Lambda, but I am not so sure about that. What I used to do was get an ...

## By-Group Aggregation in Parallel

October 4, 2014
By

Similar to the row search, by-group aggregation is another perfect use case to demonstrate the power of split-and-conquer with parallelism. In the example below, it is shown that the homebrew by-group aggregation with foreach pakage, albeit inefficiently coded, is still a lot faster than the summarize() function in Hmisc package.

## What happens if we forget a trivial assumption ?

October 4, 2014
By
$a$

Last week, @dmonniaux published an interesting post entitled l’erreur n’a rien d’original  on  his blog. He was asking the following question : let , and denote three real-valued coefficients, under which assumption on those three coefficients does has a real-valued root ? Everyone aswered , but no one mentioned that it is necessary to have a proper quadratic equation,...

## Introducing miniCRAN: an R package to create a private CRAN repository

October 3, 2014
By

by Andrie deVries One of the reasons that R is so popular is the CRAN archive of useful packages. However, with more than 5,900 packages on CRAN, many organisations need to maintain a private mirror of CRAN with only a subset of packages that are relevant to them. The package miniCRAN makes this possible by determining the dependency tree...

## SelectionShare & TimingShare | Masterfully Written by Delightfully Responsive Author

October 3, 2014
By

Anders Ekholm has written a wonderful paper Ekholm, Anders G. Components of Portfolio Variance: Systematic, Selection and Timing August 8, 2014 http://ssrn.com/abstract=2463649 demonstrating how we might decompose a money manager’s performance with...

## Ebola: Beds, Labs, and Warnings? Can they help? (Shiny App)

October 3, 2014
By

A month ago when the WHO was projecting estimates of the effect of current outbreak of Ebola being as deadly as affecting 20,000 people, I ran some elementary modelling and found that these estimates are far too small given the current trend.  The...

## Consumer Preference Driven by Benefits and Affordances, Yet Management Sees Only Products and Features

October 2, 2014
By

Return on Investment (ROI) is management's bottom line. Consequently, everything must be separated and assigned a row with associated costs and profits. Will we make more by adding another product to our line? Will we lose sales by limiting the feature...

## Shiny 0.10.2

October 2, 2014
By

Shiny v0.10.2 has been released to CRAN. To install it: install.packages('shiny') This version of Shiny requires R 3.0.0 or higher (note the current version of R is 3.1.1). R 2.15.x is no longer supported. Here are the most prominent changes: File uploading via fileInput() now works for Internet Explorer 8 and 9. Note, however, that IE 8/9 do not

## Find us at Strata Conference and Hadoop World 2014!

October 2, 2014
By

SupStat Analytics and Transwarp Technologies will be at the 2014 Strata Conference and Hadoop World showcasing the power of Hadoop and Spark computing with R analytics. We’re excited to be presenting to the data science world the Transwarp Data Hub, an integrated storage, processing, and analytics platform that delivers up to 100 times faster performance

## A Failed Attempt at Backtesting Structural Arbitrage

October 2, 2014
By

One of the things that I wondered about regarding the previous post was how would this strategy have performed in … Continue reading →

## The Rise of the Samurai Pitcher

October 2, 2014
By

Masahiro Tanaka stands on the mound, rubbing the ball vigorously between his hands. It's a crisp, cool night in the Bronx. Stepping back, he digs his right foot into the rubber, winds up and, with a seven-foot stretch, steps towards the catcher, unleashing a blistering four-seam, 95 mph fastball. Less than half a second later, it explodes into the catcher's...

## R and Data Science Webinar

October 2, 2014
By

by Joseph Rickert Recently, I had the opportunity to present a webinar on R and Data Science. The challenge with attempting this sort of thing is to say something interesting that does justice to the subject while being suitable for an audience that may include both experienced R users and curious beginners. The approach I settled on had three...

## Announcing the Publication of Practical Data Science Cookbook

October 2, 2014
By

Four of DC2′s board members have published a new book! Tony Ojeda, Sean Murphy, Benjamin Bengfort, and Abhijit Dasgupta are proud to announce the arrival of Practical Data Science Cookbook (Packt, $10 ebook or$49.99 print+ebook). Practical Data Science Cookbook is perfect for … Continue reading → The post Announcing the Publication of Practical Data Science Cookbook appeared first on...

## devtools 1.6

October 2, 2014
By

Devtools 1.6 is now available on CRAN. Devtools makes it so easy to build a package that it becomes your default way to organise code, data and documentation. Learn more at http://r-pkgs.had.co.nz/. You can get the latest version with: install.packages("devtools") We’ve made a lot of improvements to the install and release process: Installation functions now