R or Python? Why not both? Using Anaconda Python within R with {reticulate}

December 29, 2018
By
R or Python? Why not both? Using Anaconda Python within R with {reticulate}

This short blog post illustrates how easy it is to use R and Python in the same R Notebook thanks to the {reticulate} package. For this to work, you might need to upgrade RStudio to the current preview version. Let’s start by importing {reticulate}: library(reticulate) {reticulate} is an RStudio package that provides “a comprehensive set of tools for interoperability between Python and R”. With...

Read more »

Leaf Plant Classification: An Exploratory Analysis – Part 1

December 29, 2018
By
Leaf Plant Classification: An Exploratory Analysis – Part 1

CategoriesGetting Data Tags Data Management Data Visualisation Exploratory Analysis R Programming In this post, I am going to run an exploratory analysis of the plant leaf dataset as made available by UCI Machine Learning repository at this link. The dataset is expected to comprise sixteen samples each of one-hundred plant species. Its analysis was introduced within ref. . That paper describes a method designed to...

Read more »

Part 5: Code corrections to optimism corrected bootstrapping series

December 29, 2018
By
Part 5: Code corrections to optimism corrected bootstrapping series

The truth is out there R readers, but often it is not what we have been led to believe. The previous post examined the strong positive results bias in optimism corrected bootstrapping (a method of assessing a machine learning model’s predictive power) with increasing p (completely random features). There were 2 implementations of the method

Read more »

Tidymodels

December 28, 2018
By
Tidymodels

Introduction Packages CRAN availability of tidymodels packages: Unified Modelling Syntax Statistical Tests and Model Selection Resampling, Feature Engineering and Performance Metrics Modeling Data Response Variable lstat Correlations lstat vs categorical variables Preprocessing with recipe Summary Recipe Resampling with rsample Modelling with caret Wrapper Apply Wrapper Assess Performance with yardstick Parameters as string Get best performing model for each method Get cv-performance Get 1SE stats Plot Introduction RStudio is expanding the tidyverse principles to modelling with R and...

Read more »

Part 4: Why does bias occur in optimism corrected bootstrapping?

December 28, 2018
By
Part 4: Why does bias occur in optimism corrected bootstrapping?

In the previous parts of the series we demonstrated a positive results bias in optimism corrected bootstrapping by simply adding random features to our labels. This problem is due to an ‘information leak’ in the algorithm, meaning the training and test datasets are not kept seperate when estimating the optimism. Due to this, the optimism,

Read more »

Using emojis as scatterplot points

December 27, 2018
By
Using emojis as scatterplot points

Recently I wanted to learn how to use emojis as points in a scatterplot points. It seems like the emojifont package is a popular way to do it. However, I couldn’t seem to get it to work on my machine … Continue reading →

Read more »

My R Take on Advent of Code – Day 3

December 27, 2018
By

Ho, ho, ho, Happy Chris.. New Year? Between eating the sea of fish (as the Polish tradition requires), assembling doll houses and designing a new kitchen, I finally managed to publish the third post on My R take on Advent of Code. To keep things short and sweet, here’s the original challenge: Each Elf has made a claim about which...

Read more »

My #Best9of2018 tweets

December 27, 2018
By
My #Best9of2018 tweets

As 2018 nears its end, it’s time for me to look back on my R/Twitter year with the same simple method as last year: let me identify and webshoot my 9 best tweets of 2018! Downloading and opening my Twitter data Like in 2017 I tweeted too much and therefore was unable to rely on rtweet::get_timeline() (or rtweet::get_my_timeline()) to download my tweets so I exported data...

Read more »

French Mortality Poster

December 27, 2018
By
French Mortality Poster

Based on the heatmaps I drew earlier this month, I made a poster of two centuries of data on mortality rates in France for males and females. It turned out reasonably well, I think. I will probably get it blown up to a nice large size and put it up on the wall. I’ve had very good results with...

Read more »

Part 3: Two more implementations of optimism corrected bootstrapping show shocking bias

December 27, 2018
By
Part 3: Two more implementations of optimism corrected bootstrapping show shocking bias

Welcome to part III of debunking the optimism corrected bootstrap in high dimensions (quite high number of features) in the Christmas holidays. Previously we saw with a reproducible code implementation that this method is very bias when we have many features (50-100 or more). I suggest avoiding this method until at some point it has

Read more »

Clustering the Bible

December 27, 2018
By
Clustering the Bible

During this time of year there is obviously a lot of talk about the Bible. As most people know the New Testament comprises four different Gospels written by anonymous authors 40 to 70 years after Jesus’ supposed crucifiction. Unfortunately we have lost all of the originals but only retained copies of copies of copies (and … Continue reading "Clustering...

Read more »

The Christmas Eve Selloff was a Classic Capitulation

December 26, 2018
By
The Christmas Eve Selloff was a Classic Capitulation

The selloff on Christmas eve was so bad it looked like a typical bear market capitulation. The following rally merely confirmed it. As mentioned in the last post, at the time the correction reached 16%, at the close of December 21st, the oversold indicator was not lighted. What followed was the worst Christmas eve selloff The post The Christmas...

Read more »

Some fun with {gganimate}

December 26, 2018
By
Some fun with {gganimate}

Your browser does not support the video tag. In this short blog post I show you how you can use the {gganimate} package to create animations from {ggplot2} graphs with data from UNU-WIDER. WIID data Just before Christmas, UNU-WIDER released a new edition of their World Income Inequality Database: *NEW #DATA*We’ve just released a new version of the World Income Inequality Database.WIID4 includes #data...

Read more »

Should Old Acquaintance be Forgot: Tidying up Mac Mail

Should Old Acquaintance be Forgot: Tidying up Mac Mail

As the year is closing down, why not spend some of the free time to explore your email data using R and the tidyverse? When I learned that Mac OS Mail stores its internal data in a SQLite database file I was hooked. A quick dive in your email archive might uncover some of your old acquaintances. Let’s take...

Read more »

Le Monde puzzle [#1076]

December 26, 2018
By
Le Monde puzzle [#1076]

A cheezy Le Monde mathematical puzzle : (which took me much longer to find than to solve, as Warwick U does not get a daily delivery of the newspaper ): Take a round pizza (or a wheel of Gruyère) cut into seven identical slices and turn one

Read more »

Very shiny holidays!

December 26, 2018
By
Very shiny holidays!

How would I miss to program just a little bit during the holiday season? But I didn’t want to work on something serious, so I decided to checkout some ground work on R-Shiny + JQuery + CSS. The result are some nice holiday greetings inside a shiny app: An app to greet your family with shinyI just googled CSS + holidays and...

Read more »

Part 2: Optimism corrected bootstrapping is definitely bias, further evidence

December 26, 2018
By
Part 2: Optimism corrected bootstrapping is definitely bias, further evidence

Some people are very fond of the technique known as ‘optimism corrected bootstrapping’, however, this method is bias and this becomes apparent as we increase the number of noise features to high numbers (as shown very clearly in my previous blog post). This needs exposing, I don’t have the time to do a publication on

Read more »

Finally, You Can Plot H2O Decision Trees in R

December 25, 2018
By
Finally, You Can Plot H2O Decision Trees in R

Creating and plotting decision trees (like one below) for the models created in H2O will be main objective of this post: Figure 1. Decision Tree Visualization in R Decision Trees with H2O With release 3.22.0.1 H2O-3 (a.k.a. open source H2O or simply H2O) added to its family of tree-based algorithms (which already included DRF, GBM, and XGBoost) support for one more: Isolation...

Read more »

Survey Raking: An Illustration

December 25, 2018
By
Survey Raking: An Illustration

Analysing survey data can be tricky. There’s often a mismatch between the characteristics of the survey respondents and and those of the general population. If the discrepancies are not accounted for then the survey results can (and generally will!) be misleading. A common approach to this problem is to weight the individual survey responses so that the marginal proportions of...

Read more »

Statistical Assessments of AUC

December 25, 2018
By

In the scorecard development, the area under ROC curve, also known as AUC, has been widely used to measure the performance of a risk scorecard. Given everything else equal, the scorecard with a higher AUC is considered more predictive than the one with a lower AUC. However, little attention has been paid to the statistical

Read more »

Rolling Origins and Fama French

December 25, 2018
By
Rolling Origins and Fama French

Today, we continue our work on sampling so that we can run models on subsets of our data and then test the accuracy of the models on data not included in those subsets. In the machine learning prediction world, these two data sets are often called training data and testing data, but we’re not going to do any machine...

Read more »

R Package modopt.matlab – MatLab-style matrix-based optimization modeling in R

December 25, 2018
By

Introduction Besides Deep Learning (in the realm of Data Science and AI) there is another scientific and applied area where people always seem to prefer Python over R and this is: Optimization (in the realm of Decision Science). Luckily there are two very good optimization modeling frameworks for R available, namely CVXR and ompr. If you require a quick...

Read more »

Optimism corrected bootstrapping: a problematic method

December 25, 2018
By
Optimism corrected bootstrapping: a problematic method

There are lots of ways to assess how predictive a model is while correcting for overfitting. In Caret the main methods I use are leave one out cross validation, for when we have relatively few samples, and k fold cross validation when we have more. There also is another method called ‘optimism corrected bootstrapping’, that

Read more »

Pivot Billions and Deep Learning enhanced trading models achieve 100% net profit

December 24, 2018
By
Pivot Billions and Deep Learning enhanced trading models achieve 100% net profit

Deep Learning has revolutionized the fields of image classification, personal assistance, competitive board game play, and many more. However, the financial currency markets have been surprisingly stagnant. In our efforts to create a profitable and accurate trading model, we came upon the question: what if financial currency data could be represented as an image? The continue reading...

Read more »

Dreaming of a white Christmas – with ggmap in R

December 24, 2018
By
Dreaming of a white Christmas – with ggmap in R

With the holidays approaching, one of the most discussed questions at STATWORX was whether we’ll have a white Christmas or not. And what better way to get our hopes up, than by taking a look at the DWD Climate Data Center’s historic data on the snow depth on the past ten Christmas Eves? But how to best visualize spatial...

Read more »

Day 24 – big helper helfRlein

December 24, 2018
By
Day 24 – big helper helfRlein

In the last 23 days I presented one function each day from the helfRlein package we created here at STATWORX. I hope you found some of the functions useful and had some fun discovering new ways of doing things with R! Since today is Christmas, only one thing remains to say: To see all functions you can either check...

Read more »

The Need for Speed Part 2: C++ vs. Fortran vs. C

December 23, 2018
By
The Need for Speed Part 2: C++ vs. Fortran vs. C

Searching for Speed In my previous post, I described the method I use for compiling Fortran (or C) into an R package using the .Call interface. This post will compare the speed of various implementations of the layer loss cost function. The Function Often, insurance or reinsurance is bought in stratified horizontal layers. For example, Read the full article... The...

Read more »

R 101

December 23, 2018
By
R 101

HAPPY HOLIDAYS!!!🎉⛄🎆🍾❄ In the spirit of the coming new year and new beginnings, we created a tutorial for getting started or restarted with R. If you are new to R or have dabbled in R but haven’t used it much recently, then this post is for you. We will focus on data classes and types, as well as data wrangling,...

Read more »

Text classification with tidy data principles

December 23, 2018
By
Text classification with tidy data principles

I am an enthusiastic proponent of using tidy data principles for dealing with text data. This kind of approach offers a fluent and flexible option not just for exploratory data analysis, but also for machine learning for text, including both unsupervised machine learning and supervised machine learning. I haven’t written much about supervised machine learning for text, i.e. predictive modeling,...

Read more »

Search R-bloggers

Sponsors