Job Trends in the Analytics Market: New, Improved, now Fortified with C, Java, MATLAB, Python, Julia and Many More!

February 24, 2014
By

I’m expanding the coverage of my article, The Popularity of Data Analysis Software. This is the first installment, which includes a new opening and a greatly expanded analysis of the analytics job market. Here it is, from the abstract onward … Continue reading →

2013-11 Improving the ‘gridGraphviz’ package in R

February 24, 2014
By

The gridGraphviz package renders node-and-edge graphs in R using the grid graphics package. Graphs are laid out using the Rgraphviz package to interface with the graph layout algorithms in graphviz. This article details the improvements made between gridGraphviz versions 0.2 … Continue reading →

The forecast mean after back-transformation

February 24, 2014
By

Many functions in the forecast package for R will allow a Box-Cox transformation. The models are fitted to the transformed data and the forecasts and prediction intervals are back-transformed. This preserves the coverage of the prediction intervals, and the back-transformed point forecast can be considered the median of the forecast densities (assuming the forecast densities on the transformed scale...

Bayesian First Aid: Two Sample t-test

February 24, 2014
By

As spring follows winter once more here down in southern Sweden, the two sample t-test follows the one sample t-test. This is a continuation of the Bayesian First Aid alternative to the one sample t-test where I’ll introduce the two sample alternative. It will be a quite short post as the two sample alternative is just more of...

Brief introduction on Sweave and Knitr for reproducible research

February 24, 2014
By
$Brief introduction on Sweave and Knitr for reproducible research$

A few weeks ago I gave a presentation on using Sweave and Knitr under the guise of promoting reproducible research. I humbly offer this presentation to the blog with full knowledge that there are already loads of tutorials available online. This presentation is specific and slightly biased towards Windows OS, so it probably has limited

More on Rebalancing | With Data from Research Affiliates

February 24, 2014
By

While on the topic of rebalancing (see Unsolved Mysteries of Rebalancing), I thought it would be good to highlight another good research paper with some quick rCharts analysis. Arnott, Robert D., et al.The Surprising Alpha from Malkiel’s Monkey and ...

Quick and dirty notes on General Linear Mix Models

February 24, 2014
By

My datasets tend to have random factors. I try to stick to general models whenever I can to avoid dealing with both random factors and complex error distributions (not always possible). I am compiling some notes here to avoid visiting … Continue reading →

How to Make a Bad Password with R

February 24, 2014
By

I have a lot of projects that will take ages to finish (some are in such poor shape that I tuck them away in private repositories, so no one can see my shame).  So sometimes it's nice to just take a weekend and crank out something start to finish, even if it's dumb and no one cares about it...

Face To Face With Marilyn Monroe

February 24, 2014
By

Symmetry is what we see at a glance (Blaise Pascal) Ladies and gentlement, the beautiful Marilyn Monroe: There are several image processing packages in R. In this experiment I used biOps, which turns images into 3D matrices. The third dimension is a 3-array corresponding to (r, g, b) color of pixel defined by two other

Performing a non-local return in R

February 23, 2014
By

In most languages return is a statement, but in R it is a function (in fact R does not really have statements, it only has expressions). This function-like behavior of return is useful for figuring out the order in which operations are performed, e.g., the value returned by return(1)+return(2) tells us that binary operators are

Toronto Data Science Group – A Survey of Data Visualization Techniques and Practice

February 23, 2014
By

Recently I spoke at the Toronto Data Science group. The folks at Mozilla were kind enough to record it and put it on Air, so here it is for your viewing pleasure (and critique):Overall it was quite well received. Aside from the usual omg does my voice ...

Programmatically download political science data with the psData package

February 23, 2014
By

A lot of progress has been made on improving political scientists’ ability to access data ‘programmatically’, e.g. data can be downloaded with source code R. Packages such as WDI for World Bank Development Indicator and dvn for many data sets stored on the Dataverse Network make it much easier for political scientists to use this data...

Pimping your forest plot

February 23, 2014
By

In order to celebrate my Gmisc-package being on CRAN I decided to pimp up the forestplot2 function. I had a post on this subject and one of the suggestions I got from the comments was the ability to change the default box marker to something else. This idea had been in my mind for a while and I therefore...

qdap 1.1.0 Released on CRAN

February 23, 2014
By

We’re very pleased to announce the release of qdap 1.1.0 This is the fourth installment of the qdap package available at CRAN. Major development has taken place since the last CRAN update. The qdap package automates many of the tasks … Continue reading →

Unemployment revisited

February 23, 2014
By

Approximately a year ago I made a post graphing unemployment in Europe and other locations. I have always wanted to do this again, not because the R-code would be so interesting, but just because I wanted to see the plots. As time progressed I attempte...

Convex Hull of Polygon using Boost.Geometry

February 23, 2014
By

Rcpp can be used to convert basic R data types to and from Boost.Geometry models.In this example we take a matrix of 2d-points and convert it into a Boost.Geometry polygon. We then compute the convex hull of this polygon using a Boost.Geometry function boost::geometry::convex_hull. The convex hull is then converted back to an R matrix.The conversions to and...

High Dimensional Biological Data Analysis and Visualization

February 22, 2014
By

High dimensional biological data shares many qualities with other forms of data. Typically it is wide (samples << variables), complicated by experiential design and made up of complex relationships driven by both biological and analytical sources of variance. Luckily the powerful combination of R, Cytoscape (< v3) and the R package RCytoscape can be used

RcppBDT 0.2.2

February 22, 2014
By

A new maintenance release of the RcppBDT package is now on CRAN. There is no new code in this; it mainly accomodates a request by CRAN to standardize the \code{Imports:} and \code{Depends:} relationships in the \code{DESCRIPTION} file. We also updated...

RDieHarder 0.1.3

February 22, 2014
By

A pure maintenance release of RDieHarder is now on CRAN. RDieHarder provides R bindings for the DieHarder battery of tests for random number generators by Brown et al.This release contains no new code, but the vignette needed to be moved from inst...

Because it’s Friday: US Dialects

February 21, 2014
By

In the video below from The Atlantic, the differences in the way US citizens describe or pronounce various things is illustrated in a series of phone calls (via Sullivan): If you're wondering how your dialect fits in, you can try the New York Times Dialect Quiz. Answer 25 questions, and it will identify the 3 US cities that most...

Call for papeRs in the near future

February 21, 2014
By

I am really happy to share some great news with all R users about some upcoming conferences in the near future. The calls for papers are still active for all three below events that I plan to visit, I hope I could meet more and more useRs there!The closest event in time will be at Bucharest, Romania on the...

A survival guide to Data Science with R, from Graham Williams

February 21, 2014
By

Graham Williams is the Lead Data Scientist at the Australian Taxation Office, and the creator of Rattle, an open-source GUI for data mining with R. (Check out some recent reviews/demos of Rattle on this blog here and here.) Dr Williams continues his many contributions to the R community with One Page R, a "Survival Guide to Data Science with...

Unsolved Mysteries of Rebalancing

February 21, 2014
By

There is a lot not yet fully understood about rebalancing in portfolio management.  This 2013 paper from Nardon and Kiskiras is the best I have read yet. Kiskiras, John and Nardon, AndreaPortfolio Rebalancing: A Stable Source of AlphaJanuary 18, 2013...

One day discount on Practical Data Science with R

February 21, 2014
By

Please forward and share this discount offer for our upcoming book. Manning Deal of the Day February 22: Half off Practical Data Science with R. Use code dotd022214au at www.manning.com/zumel/.Related posts: Data Science, Machine Learning, and Statis...

Interactive exploration of a prior’s impact

February 21, 2014
By

The probably most frequent criticism of Bayesian statistics sounds something like “It’s all subjective – with the ‘right’ prior, you can get any result you want.”. In order to approach this criticism it has been suggested to do a sensitivity analysis (or robustness analysis), that demonstrates how the choice of priors affects the conclusions drawn

Forecasting within limits

February 21, 2014
By

It is common to want forecasts to be positive, or to require them to be within some specified range . Both of these situations are relatively easy to handle using transformations. Positive forecasts To impose a positivity constraint, simply work on the log scale. With the forecast package in R, this can be handled by specifying the Box-Cox parameter...

Self-written function help

Self-written function helpI have noted at least one instance (and there are probably others) about how Python's docStrings are so great, and wouldn't it be nice to have a similar system in R. Especially when you can have your new function tab completion available depending on your development environment.This is a false statement, however. If you set...

The gap between data mining and predictive models

February 20, 2014
By

The Facebook data science blog shared some fun data explorations this Valentine’s Day in Carlos Greg Diuk’s “The Formation of Love”. They are rightly receiving positive interest in and positive reviews of their work (for example Robinson Meyer’s Atlantic article). The finding is also a great opportunity to discuss the gap between cool data miningRelated posts: