## Brief mention in Map-making on a budget

June 5, 2018
By

Some of the R code I have previously posted here for working with NOAA’s optimal interpolated sea surface temperature (OISST) datasets made its way into a recent piece in a Nature news and commentary piece on open-source map-making tools by Jeffrey Perkel. The article details the expansion of open-source tools for visualizing spatial data that

## Simple audio classification with Keras

June 5, 2018
By

Introduction In this tutorial we will build a deep learning model to classify words. We will use tfdatasets to handle data IO and pre-processing, and Keras to build and train the model. We will use the Speech Commands dataset which consists of 65.000 one-second audio files of people saying 30 different words. Each file contains a single spoken English word. The...

## Unconf18 projects 2: middlechild, defender, ropsec, keybase

As part of our series summarizing all projects from this year’s unconf I’m excited to dive into all the security related offerings from this year. In the spirit of exploration and experimentation at rOpenSci unconference...

## CHAID and caret – a good combo – June 6, 2018

June 5, 2018
By

In an earlier post I focused on an in depth visit with CHAID (Chi-square automatic interaction detection). There are lots of tools that can help you predict an outcome, or classify, but CHAID is especially good at helping you explain to any audience how the model arrives at it’s prediction or classification. It’s also incredibly robust from a statistical...

## Statistics “Sunday”: More Sentiment Analysis Resources

June 5, 2018
By

I've just returned from a business trip - lots of long days, working sometimes from 8 am to 9 pm or 10 pm. I didn't get a chance to write this week's Statistics Sunday post, in part because I wasn't entirely certain what to write about. But as I starte...

## Classification from scratch, neural nets 6/8

June 5, 2018
By

Sixth post of our series on classification from scratch. The latest one was on the lasso regression, which was still based on a logistic regression model, assuming that the variable of interest has a Bernoulli distribution. From now on, we will discuss technique that did not originate from those probabilistic models, even if they might still have a probabilistic...

## Monte Carlo

June 4, 2018
By

Today, we change gears from our previous work on Fama French and run a Monte Carlo (MC) simulation of future portfolio returns. Monte Carlo relies on repeated, random sampling. We will sample based on two parameters: mean and standard deviation of portfolio returns. Our long-term goal (long-term == over the next two or three blog posts) is to build...

## Unconf18 projects 1: mchtoolbox, pkginspector, dataspice, rOpenSciEd, rOpenInterviews

After Stefanie’s recap of unconf18, this week the blog will feature brief summaries of projects developed at the event: each day 4 to 5 projects will be highlighted. In the following weeks, a handful of groups will share more thorough posts about their work. In the spirit of exploration and experimentation at rOpenSci unconferences, these projects are not necessarily finished...

## Exploring, experimenting, and building software and trust at rOpenSci’s unconf18

We held our 5th annual unconference in Seattle, May 21-22, 2018 at Microsoft’s Reactor space. Researchers, students, postdocs and faculty, R software users and developers, and open data enthusiasts from academia, industry, government, and non-profits came together for two days to hack on projects they dreamed up and for an opportunity to meet and work together in person. We...

## epubr 0.4.0 CRAN release

The epubr package provides functions supporting the reading and parsing of internal e-book content from EPUB files. E-book metadata and text content are parsed separately and joined together in a tidy, nested tibble data frame. E-book formatting is non...

## Animating Changes in Football Kits using R

June 4, 2018
By

Background I am enjoying the magick package at the moment. Reading through the vignette I spotted the image_morph() function. In this post I experiment with the function to build the GIF below that shows the changes in the England football first kit over time, using images from the excellent Historical Football Kits website. Scraping The Historical Football Kits website has a detailed...

## Classification from scratch, penalized Lasso logistic 5/8

June 4, 2018
By

Fifth post of our series on classification from scratch, following the previous post on penalization using the norm (so-called Ridge regression), this time, we will discuss penalization based on the norm (the so-called Lasso regression). First of all, one should admit that if the name stands for least absolute shrinkage and selection operator, that’s actually a very cool name…...

## Interactive RTutor Problemsets via RStudio Cloud

I just learned about RStudio Cloud (see https://rstudio.cloud/) that allows to simply run RStudio instances from your browser. Moreover, you can simply set-up RStudio projects that other users can simply copy and use themselves. RStudio Cloud is still in alpha and currently one can freely register to use it. I have set up a project that allows you to simply...

## Hello, Dorling! (Creating Dorling Cartograms from R Spatial Objects + Introducing Prism Skeleton)

June 3, 2018
By

NOTE: There is some iframed content in this post and you can bust out of it if you want to see the document in a full browser window. Also, apologies for some lingering GitHub links. I’m waiting for all the repos to import into to other services and haven’t had time to setup my own... Continue reading →

## The First Date with your Data in R

June 3, 2018
By

The First Date with your Data in R So you have your data, now what? With a little R code, you can quickly get to The post The First Date with your Data in R appeared first on ProgrammingR.

June 3, 2018
By

Preambule

## New R package xplain: Providing interactive interpretations and explanations of statistical results

June 3, 2018
By

The package xplain is designed to help users interpret the results of their statistical analyses. It does so not in an abstract way as textbooks do. Textbooks do not help the user of a statistical method understand his findings directly. What does...

## Different demand functions and optimal price estimation in R

June 3, 2018
By
$Different demand functions and optimal price estimation in R$

By Yuri Fonseca Demand models In the previous post about pricing optimization (link here), we discussed a little about linear demand and how to estimate optimal prices in that case. In this post we are going to compare three different … Continue reading →

June 3, 2018
By

rquery is an R package for specifying data transforms using piped Codd-style operators. It has already shown great performance on PostgreSQL and Apache Spark. rqdatatable is a new package that supplies a screaming fast implementation of the rquery system in-memory using the data.table package. rquery is already one of the fastest and most teachable (due … Continue reading rqdatatable:...

## Fancy Plot (with Posterior Samples) for Bayesian Regressions

June 2, 2018
By

The Model Plot Credits As Bayesian models usually generate a lot of samples (iterations), one could want to plot them as well, instead (or along) the posterior “summary” (with indices like the 90% HDI). This can be done quite easily by extracting all the iterations in get_predicted from the psycho package. The Model # devtools::install_github("neuropsychology/psycho.R") # Install the last...

## Even Simpler SQL

June 2, 2018
By

I’ve had some feedback on the last post, and rather than repeat the same thing multiple times, I’m going all @drob, and writing this instead.. When I tweeted out the link to my post I gave it the tag line “why I’d rather write dplyr than SQL”. What I couldn’t fit in to the tweet was that this was based on...

## Classification from scratch, penalized Ridge logistic 4/8

June 2, 2018
By

Fourth post of our series on classification from scratch, following the previous post which was some sort of detour on kernels. But today, we’ll get back on the logistic model. Formal approach of the problem We’ve seen before that the classical estimation technique used to estimate the parameters of a parametric model was to use the maximum likelihood approach....

## How To Plot With Patchwork: Exercises

June 2, 2018
By

INTRODUCTION The goal of patchwork is to make it simple to combine separate ggplots into the same graphic. It tries to solve the same problem as gridExtra::grid.arrange() and cowplot::plot_grid, but using an API that incites exploration and iteration. Before proceeding, please follow our short tutorial. Look at the examples given and try to understand the Related exercise sets:ggvis Exercises...

## Ceteris Paribus Plots – a new DALEX companion

June 1, 2018
By

If you like magical incantations in Data Science, please welcome the Ceteris Paribus Plots. Otherwise feel free to call them What-If Plots. Ceteris Paribus (latin for all else unchanged) Plots explain complex Machine Learning models around a single observation. They supplement tools like breakDown, Shapley values, LIME or LIVE. In addition to feature importance/feature attribution, … Czytaj dalej Ceteris...

## StatCheck the Game

June 1, 2018
By

If you don't get enough joy from publishing scientific papers in your day job, or simply want to experience what it's like to be in a publish-or-perish environment where the P-value is the only important part of a paper, you might want to try StatCheck: the board game where the object is to publish two papers before any of...

## My book ‘Practical Machine Learning in R and Python: Second edition’ on Amazon

June 1, 2018
By

The second edition of my book ‘Practical Machine Learning with R and Python – Machine Learning in stereo’ is now available in both paperback (\$10.99) and kindle (\$7.99/Rs449) versions.  This second edition includes more content,  extensive comments and formatting for better readability. In this book I implement some of the most common, but important Machine … Continue reading My...

## Praise you like I should: Shiny Appreciation Month

June 1, 2018
By

Aimée Gott, Education Practice Lead Back in the summer of 2012 I was meant to be focusing on one thing: finishing my thesis. But, unfortunately for me, a friend and former colleague came back from a conference (JSM) and told me all about a new package that she had seen demoed. "You should sign up for the beta testing and...

## Coloring Sudokus

June 1, 2018
By

Someday you will find me caught beneath the landslide (Champagne Supernova, Oasis) I recently read a book called Snowflake Seashell Star: Colouring Adventures in Numberland by Alex Bellos and Edmund Harris which is full of mathematical patterns to be coloured. All images are truly appealing and cause attraction to anyone who look at them, independently of … Continue reading Coloring...

## simpler SQL with dplyr

May 31, 2018
By

comparing dplyr with SQL nested queries - Following on from my last post, where I demonstrated R to some first time R users, I want to do a wee comparison of dplyr V SQL, so that folks, particularly those in the NHS who might be R curious, can see just what the fuss is about. To do so...