Herds of statistical models

December 16, 2019
By
Herds of statistical models

by Carlos J. Gil Bellosta Big datasets found in statistical practice often have a rich structure. Most traditional methods, including their modern counterparts, fail to efficiently use the information contained in them. Here we propose and discuss an alternative modelling strategy based on herds of simple models. Big Data: How big datasets came...

Read more »

Comparison of indices of significance in the Bayesian framework

December 16, 2019
By
Comparison of indices of significance in the Bayesian framework

The bayestestR package has several functions to compute indices of effect existence and significance in a Bayesian framework, like p_direction() or bayesfactor_parameters(). The accuracy of these indices is affected by various sources of uncertainty...

Read more »

Automating update of an international database for the Euro Area

December 16, 2019
By
Automating update of an international database for the Euro Area

Our purpose is to create an international quarterly database for the Euro area that could be updated automatically. We want to build the following series: Foreign demand (without trade between Euro area countries) Foreign interest rate Oil prices Real effective exchange rate Import and export To construct these series we use data from DBnomics. The DBnomics API is called using the rdbnomics package. All the code is written in R, thanks...

Read more »

Automating update of an international database for the Euro Area

December 16, 2019
By
Automating update of an international database for the Euro Area

Our purpose is to create an international quarterly database for the Euro area that could be updated automatically. We want to build the following series: Foreign demand (without trade between Euro area countries) Foreign interest rate Oil prices Real effective exchange rate Import and export To construct these series we use data from DBnomics. The DBnomics API is called using the rdbnomics package. All the code is written in R, thanks...

Read more »

Beautiful paper on HMMs and derivatives

December 16, 2019
By
Beautiful paper on HMMs and derivatives

I’ve been talking to Michael Betancourt and Charles Margossian about implementing analytic derivatives for HMMs in Stan to reduce memory overhead and increase speed. For now, one has to implement the forward algorithm in the Stan program and let Stan autodiff through it. I worked out the adjoint method (aka reverse-mode autodiff) derivatives of the

Read more »

validate 0.9.3 is on CRAN

December 16, 2019
By
validate 0.9.3 is on CRAN

CRAN just accepted the latest version of our R package validate. The validate package provides an infrastructure to perform any data quality check in a flexible and extensible way. This is a minor update with the following new features: New … Continue reading →

Read more »

Call for abstracts and tutorials: use of R in official statistics 2020 in Vienna

December 16, 2019
By
Call for abstracts and tutorials: use of R in official statistics 2020 in Vienna

The eight international conference on the Use of R in Official Statistics (#uRos2020) will take place place from 6 to 8 May 2020 at Statistics Austria, the Austrian office of National Statistics. The meeting in a nutshell 4-5 May: unconfUROS … Continue reading →

Read more »

Practice R and Python on the Cloud for Free

December 16, 2019
By
Practice R and Python on the Cloud for Free

R and Python, the “dynamic duo” of data science, are both free, open-source programming languages. That means that there’s no “vendor” in the sense that, say, Microsoft owns Excel. This can make getting started with these programs a little trickier: there are several ways to install them, often multi-step, confusing, and resource-intensive.  It would be

Read more »

BH 1.72.0-1 on CRAN

December 16, 2019
By
BH 1.72.0-1 on CRAN

The BH package provides a sizeable portion of the Boost C++ libraries as a set of template headers for use by R. It is quite popular, and frequently used together with Rcpp. The BH CRAN page shows e.g. that it is used by rstan, dplyr as well as a fe...

Read more »

Tip (2) for R to Python and Vice-Versa seamlessly

December 16, 2019
By
Tip (2) for R to Python and Vice-Versa seamlessly

In continuation to my earlier R to Python tips, in order to deal with both Python and R simultaneously for client requests; this time with respect to plots where both schools as of now by large distinct in their plotting styles; Plotline a new python p...

Read more »

A Tale of an Edgy Panda and some Python Reviews

December 15, 2019
By

This post will be a quickie detailing a rather annoying…finding about the pandas package in Python. For those not in … Continue reading →

Read more »

Reordering bars in GGanimate visualization

December 15, 2019
By
Reordering bars in GGanimate visualization

Last week several gganimate visualizations came to my feed. Some R users were wondering about reordering gganimate and ggplot2 bars as long as them are evolving (over animation time). Then, we came up with this R viz where several bars are not only evolving and reordering over time but leaving and joining the chart. We want the top 4 countries...

Read more »

tidyposterior’s Bayesian Approach to Model Comparison

December 15, 2019
By
tidyposterior’s Bayesian Approach to Model Comparison

A task common to many machine learning workflows is to compare the performance of several models with respect to some metric such as accuracy or area under the ROC curve. Standard practice is to try out several different algorithms on a training data set and see which works better. Unfortunately, all to often, after this work has been done,...

Read more »

Bump chart of a parliamentary constituency

December 15, 2019
By
Bump chart of a parliamentary constituency

A bump chart showing the evolution of voting in the Midlothian constituency.

Read more »

The Renzo Pomodoro dataset

December 15, 2019
By
The Renzo Pomodoro dataset

Estimating how long it will take to complete a task is hard work, and the most common motivation for this work comes from external factors, e.g., the boss, or a potential client asks for an estimate to do a job. People also make estimates for their own use, e.g., when planning work for the day.

Read more »

New rquery Vignette: Working with Many Columns

December 15, 2019
By

We have a new rquery vignette here: Working with Many Columns. This is an attempt to get back to writing about how to use the package to work with data (versus the other-day’s discussion of package design/implementation). Please check it out.

Read more »

The significance of education on the salary of engineers in Sweden

December 14, 2019
By
The significance of education on the salary of engineers in Sweden

In my last posts, I analysed the significance of experience for different occupational groups. In this post, I will turn the interest towards education. I will again start with engineers and see if I can expand my analysis to all occupational groups. First, define libraries and functions. library (tidyverse) ## -- Attaching packages -------------------------------------------- tidyverse 1.2.1 -- ## v ggplot2 3.2.0 ...

Read more »

Git Hosting for the Distraught and the Restless

December 14, 2019
By
Git Hosting for the Distraught and the Restless

It’s generally impossible to only use services, private or government, that perfectly align with one’s values, so one must opt to choose one’s battles. The controversy over GitHub’s contract with U.S. Immigration and Customs Enforcement is the latest such battle in the open-source software world. GitHub employees and users are trying to pressure GitHub to drop the contract, as a way to place...

Read more »

Introducing the schrute Package: the Entire Transcripts From The Office

December 14, 2019
By
Introducing the schrute Package: the Entire Transcripts From The Office

What This is a package that does/has only one thing: the complete transcriptions of all episodes of The Office! (US version). Use this data set to master NLP or text analysis. Let’s scratch the surface of the subject with a few examples from the excellent Text Mining with R book, by Julia Silge and David Robinson. First install the package from CRAN: #...

Read more »

A large repository of networkdata

December 14, 2019
By
A large repository of networkdata

There are many network repositories out there that offer a large variety of amazing free data. (See the awesome network analysis list on github for an overview.) The problem is, that network data can come in many formats. Either in plain text as edgelist or adjacency matrix, or in a dedicated network file format from which there are many (paj,dl,gexf,graphml,net,gml,…). The...

Read more »

Creating an RSS Feed to Add Your Jekyll / Github Pages Blog to R-Bloggers

December 14, 2019
By
Creating an RSS Feed to Add Your Jekyll / Github Pages Blog to R-Bloggers

In this post, we will go through the steps you need to follow if you would like to add a Jekyll / Github Pages blog to R-Bloggers. I recently went through this process and had to search through a lot of information in order to figure out how to do it. ...

Read more »

How H2O propels data scientists ahead of itself: enhancing Driverless AI with advanced options, recipes and visualizations

December 14, 2019
By
How H2O propels data scientists ahead of itself: enhancing Driverless AI with advanced options, recipes and visualizations

H2O engineers continually innovate and implement latest techniques by following and adopting latest research, working on cutting edge use cases, and participating and winning machine learning competitions like Kaggle. But thanks to explosion of AI research and applications even most advanced automated machine learning platforms like H2O.ai Driverless AI can not come with all bells and whistles to...

Read more »

Meta Machine Learning aggregator packages in R, The 2nd generation

December 14, 2019
By
Meta Machine Learning aggregator packages in R, The 2nd generation

TL;DR mlr was refactored into mlr3. caret was refactored into tidymodels. What are the main differences in terms of software design, and tweaking it for your own needs. R6 vs S3. Which one is less fraigle? Motivation My previous post from mi...

Read more »

R 3.6.2 is out, and a preview of R 4.0.0

December 13, 2019
By
R 3.6.2 is out, and a preview of R 4.0.0

R 3.6.2, the latest update to the R language, is now available for download on Windows, Mac and Linux. As a minor release, R 3.6.2 makes only small improvements to R, including some new options for dot charts and better handling of missing values when using running medians as a smoother on charts. It also includes several bug fixes...

Read more »

R 3.6.2 is out, and a preview of R 4.0.0

December 13, 2019
By
R 3.6.2 is out, and a preview of R 4.0.0

R 3.6.2, the latest update to the R language, is now available for download on Windows, Mac and Linux. As a minor release, R 3.6.2 makes only small improvements to R, including some new options for dot charts and better handling of missing values when using running medians as a smoother on charts. It also includes several bug fixes...

Read more »

Mango graduate assessment day

December 13, 2019
By

  Following on from the success of our recent graduate intake, we are already looking to find three more graduates... The post Mango graduate assessment day appeared first on Mango Solutions.

Read more »

Exploratory Data Analysis of Cell Phone Usage with R: Part 2

December 13, 2019
By
Exploratory Data Analysis of Cell Phone Usage with R: Part 2

In this post, we will analyze data from my cell phone provider on my phone usage. In this post, we will focus on the volume of my mobile data use across time. We will use exploratory data analysis to understand how my usage of mobile data varies across...

Read more »

Confidence and prediction intervals explained… (with a Shiny app!)

December 13, 2019
By
Confidence and prediction intervals explained… (with a Shiny app!)

This semester I started teaching introduction to statistics and data analysis with R, at Tel-Aviv university. I put in a lot of efforts into bringing practical challenges, examples from real life, and a lot of demonstrations of statistical theory with R. This post is an example for how I’ve been using R code (and specifically Shiny apps) to demonstrate statistical...

Read more »

Null hypothesis

December 12, 2019
By
Null hypothesis

In our previous post we ran two investing strategies based on Apple’s last twelve months price-to-earnings multiple (LTM P/E). One strategy bought Apple’s stock when its multiple dropped below 10x and sold when it rose above 20x. The other bought the stock when the 22-day moving average of the multiple crossed above the current multiple and sold when the...

Read more »

Search R-bloggers

Sponsors