Herds of statistical models

December 16, 2019
by Carlos J. Gil Bellosta Big datasets found in statistical practice often have a rich structure. Most traditional methods, including their modern counterparts, fail to efficiently use the information contained in them. Here we propose and discuss an alternative modelling strategy based on herds of simple models. Big Data: How big datasets came...

Comparison of indices of significance in the Bayesian framework

December 16, 2019
The bayestestR package has several functions to compute indices of effect existence and significance in a Bayesian framework, like p_direction() or bayesfactor_parameters(). The accuracy of these indices is affected by various sources of uncertainty...

Automating update of an international database for the Euro Area

December 16, 2019
Our purpose is to create an international quarterly database for the Euro area that could be updated automatically. We want to build the following series: Foreign demand (without trade between Euro area countries) Foreign interest rate Oil prices Real effective exchange rate Import and export To construct these series we use data from DBnomics. The DBnomics API is called using the rdbnomics package. All the code is written in R, thanks...

Beautiful paper on HMMs and derivatives

December 16, 2019
$Beautiful paper on HMMs and derivatives$

I’ve been talking to Michael Betancourt and Charles Margossian about implementing analytic derivatives for HMMs in Stan to reduce memory overhead and increase speed. For now, one has to implement the forward algorithm in the Stan program and let Stan autodiff through it. I worked out the adjoint method (aka reverse-mode autodiff) derivatives of the

validate 0.9.3 is on CRAN

December 16, 2019
CRAN just accepted the latest version of our R package validate. The validate package provides an infrastructure to perform any data quality check in a flexible and extensible way. This is a minor update with the following new features: New … Continue reading →

Call for abstracts and tutorials: use of R in official statistics 2020 in Vienna

December 16, 2019
The eight international conference on the Use of R in Official Statistics (#uRos2020) will take place place from 6 to 8 May 2020 at Statistics Austria, the Austrian office of National Statistics. The meeting in a nutshell 4-5 May: unconfUROS … Continue reading →

December 16, 2019
R and Python, the “dynamic duo” of data science, are both free, open-source programming languages. That means that there’s no “vendor” in the sense that, say, Microsoft owns Excel. This can make getting started with these programs a little trickier: there are several ways to install them, often multi-step, confusing, and resource-intensive.  It would be

BH 1.72.0-1 on CRAN

December 16, 2019
The BH package provides a sizeable portion of the Boost C++ libraries as a set of template headers for use by R. It is quite popular, and frequently used together with Rcpp. The BH CRAN page shows e.g. that it is used by rstan, dplyr as well as a fe...

Tip (2) for R to Python and Vice-Versa seamlessly

December 16, 2019
In continuation to my earlier R to Python tips, in order to deal with both Python and R simultaneously for client requests; this time with respect to plots where both schools as of now by large distinct in their plotting styles; Plotline a new python p...

A Tale of an Edgy Panda and some Python Reviews

December 15, 2019
This post will be a quickie detailing a rather annoying…finding about the pandas package in Python. For those not in … Continue reading →

Reordering bars in GGanimate visualization

December 15, 2019
Last week several gganimate visualizations came to my feed. Some R users were wondering about reordering gganimate and ggplot2 bars as long as them are evolving (over animation time). Then, we came up with this R viz where several bars are not only evolving and reordering over time but leaving and joining the chart. We want the top 4 countries...

tidyposterior’s Bayesian Approach to Model Comparison

December 15, 2019
A task common to many machine learning workflows is to compare the performance of several models with respect to some metric such as accuracy or area under the ROC curve. Standard practice is to try out several different algorithms on a training data set and see which works better. Unfortunately, all to often, after this work has been done,...

Bump chart of a parliamentary constituency

December 15, 2019
A bump chart showing the evolution of voting in the Midlothian constituency.

The Renzo Pomodoro dataset

December 15, 2019
Estimating how long it will take to complete a task is hard work, and the most common motivation for this work comes from external factors, e.g., the boss, or a potential client asks for an estimate to do a job. People also make estimates for their own use, e.g., when planning work for the day.

New rquery Vignette: Working with Many Columns

December 15, 2019
We have a new rquery vignette here: Working with Many Columns. This is an attempt to get back to writing about how to use the package to work with data (versus the other-day’s discussion of package design/implementation). Please check it out.

The significance of education on the salary of engineers in Sweden

December 14, 2019
In my last posts, I analysed the significance of experience for different occupational groups. In this post, I will turn the interest towards education. I will again start with engineers and see if I can expand my analysis to all occupational groups. First, define libraries and functions. library (tidyverse) ## -- Attaching packages -------------------------------------------- tidyverse 1.2.1 -- ## v ggplot2 3.2.0 ...

Git Hosting for the Distraught and the Restless

December 14, 2019
It’s generally impossible to only use services, private or government, that perfectly align with one’s values, so one must opt to choose one’s battles. The controversy over GitHub’s contract with U.S. Immigration and Customs Enforcement is the latest such battle in the open-source software world. GitHub employees and users are trying to pressure GitHub to drop the contract, as a way to place...

Introducing the schrute Package: the Entire Transcripts From The Office

December 14, 2019
What This is a package that does/has only one thing: the complete transcriptions of all episodes of The Office! (US version). Use this data set to master NLP or text analysis. Let’s scratch the surface of the subject with a few examples from the excellent Text Mining with R book, by Julia Silge and David Robinson. First install the package from CRAN: #...

A large repository of networkdata

December 14, 2019
There are many network repositories out there that offer a large variety of amazing free data. (See the awesome network analysis list on github for an overview.) The problem is, that network data can come in many formats. Either in plain text as edgelist or adjacency matrix, or in a dedicated network file format from which there are many (paj,dl,gexf,graphml,net,gml,…). The...

December 14, 2019
In this post, we will go through the steps you need to follow if you would like to add a Jekyll / Github Pages blog to R-Bloggers. I recently went through this process and had to search through a lot of information in order to figure out how to do it. ...

How H2O propels data scientists ahead of itself: enhancing Driverless AI with advanced options, recipes and visualizations

December 14, 2019
H2O engineers continually innovate and implement latest techniques by following and adopting latest research, working on cutting edge use cases, and participating and winning machine learning competitions like Kaggle. But thanks to explosion of AI research and applications even most advanced automated machine learning platforms like H2O.ai Driverless AI can not come with all bells and whistles to...

Meta Machine Learning aggregator packages in R, The 2nd generation

December 14, 2019
TL;DR mlr was refactored into mlr3. caret was refactored into tidymodels. What are the main differences in terms of software design, and tweaking it for your own needs. R6 vs S3. Which one is less fraigle? Motivation My previous post from mi...

R 3.6.2 is out, and a preview of R 4.0.0

December 13, 2019
R 3.6.2, the latest update to the R language, is now available for download on Windows, Mac and Linux. As a minor release, R 3.6.2 makes only small improvements to R, including some new options for dot charts and better handling of missing values when using running medians as a smoother on charts. It also includes several bug fixes...

December 13, 2019
Following on from the success of our recent graduate intake, we are already looking to find three more graduates... The post Mango graduate assessment day appeared first on Mango Solutions.

Exploratory Data Analysis of Cell Phone Usage with R: Part 2

December 13, 2019
In this post, we will analyze data from my cell phone provider on my phone usage. In this post, we will focus on the volume of my mobile data use across time. We will use exploratory data analysis to understand how my usage of mobile data varies across...

Confidence and prediction intervals explained… (with a Shiny app!)

December 13, 2019
This semester I started teaching introduction to statistics and data analysis with R, at Tel-Aviv university. I put in a lot of efforts into bringing practical challenges, examples from real life, and a lot of demonstrations of statistical theory with R. This post is an example for how I’ve been using R code (and specifically Shiny apps) to demonstrate statistical...