## How to easily generate a perfectly normal distribution

April 23, 2019
By

Many times, for instance when teaching, I needed to quickly and simply generate a perfectly normally distributed sample to illustrate or show some of its characteristics. This is now very easy to do with the new bayestestR package, which includes the rnorm_perfect function. This function is very similar to the classic rnorm (same arguments), with the difference that the generated...

## conditionz: control how many times conditions are thrown

conditionz is a new (just on CRAN today) R package for controlling how many times conditions are thrown. This package arises from an annoyance in another set of packages I maintain: The brranching package uses the taxize package internally, calling it’s function taxize::tax_name(). The taxize::tax_name() function throws useful messages to the user if their API key is not found, and gives them...

## Join, split, and compress PDF files with pdftools

Last month we released a new version of pdftools and a new companion package qpdf for working with pdf files in R. This release introduces the ability to perform pdf transformations, such as splitting and combining pages from multiple files. Moreover, the pdf_data() function which was introduced in pdftools 2.0 is now available on all major systems. Split and Join...

## 77th Tokyo.R Users Meetup Roundup!

April 23, 2019
By

As the sakura bloomed in Tokyo, another TokyoR User Meetup was held, this time at SONY City! On April 13th useRs from all over Tokyo (and some even from further afield) flocked to Osaki, Tokyo for a special session focused on be...

## Probability of winning a best-of-7 series

April 22, 2019
By
$Probability of winning a best-of-7 series$

The NBA playoffs are in full swing! A total of 16 teams are competing in a playoff-format competition, with the winner of each best-of-7 series moving on to the next round. In each matchup, two teams play 7 basketball games … Continue reading →

## #GOT Animating the Shifting of Affiliations

April 22, 2019
By

My first steps using the gganimate package using Game of Thrones data!

## Comparing Point-and-Click Front Ends for R

April 22, 2019
By

Now that I've completed seven detailed reviews of Graphical User Interfaces (GUIs) for R, let's try to compare them. It's easy enough to count their features and plot them, so let's start there. Continue reading →

## Le Monde puzzle [#1094]

April 22, 2019
By

A rather blah number Le Monde mathematical puzzle: Find all integer multiples of 11111 with exactly one occurrence of each decimal digit.. Which I solved by brute force, by looking at the possible range of multiples (and  borrowing stringr:str_count from Robin!) __ combien=0 __ for (i in 90001:900008){ j=i*11111 combien=combien+(min(stringr::str_count(j,paste(0:9)))==1)} __ combien 3456 And

## Using R/exams for Written Exams in Finance Classes

April 22, 2019
By

Experiences with using R/exams for written exams in finance classes with a moderate number of students at Texas A&M International University (TAMIU). Guest post by Nathaniel P. Graham (Texas A&M International University, Division of International Banking and Finance Studies). Background While R/exams...

## Practical Data Science with R Book Update (April 2019)

April 22, 2019
By

I thought I would give a personal update on our book: Practical Data Science with R 2nd edition; Zumel, Mount; Manning 2019. The second edition should be fully available this fall! Nina and I have finished up through chapter 10 (of 12), and Manning has released previews of up through chapter 7 (with more to … Continue reading Practical...

April 22, 2019
By

Today I am happy to announce a new free course: Help Your Team Learn R! Over the last few years I’ve helped a number of data teams train their analysts to use R. At each company there was a skilled R user who was leading the team’s effort to adopt R. Each of these internal The post Free Course:...

## India has 100k records on iNaturalist

April 21, 2019
By

Biodiversity citizen scientists use iNaturalist to post their observations with photographs. The observations are then curated there by crowd-sourcing the identifications and other trait related aspects too. The data once converted to “research grade” is passed on to GBIF as occurrence records. Exciting news from India in 3rd week of April 2019 is: Another important

## Embedding subplots in ggplot2 graphics

The idea of embedded plots for visualizing a large dataset that has an overplotting problem recently came up in some discussions with students. I first learned about embedded graphics from package ggsubplot. You can still see an old post about that package and about embedded graphics in general, with examples. However, ggsubplot is no longer maintained and doesn’t work...

## Reproducible Environments

April 21, 2019
By

Great data science work should be reproducible. The ability to repeat experiments is part of the foundation for all science, and reproducible work is also critical for business applications. Team collaboration, project validation, and sustainable products presuppose the ability to reproduce work over time. In my opinion, mastering just a handful of important tools will make reproducible work in R much easier for data...

## survivalists [a Riddler’s riddle]

April 21, 2019
By
$survivalists [a Riddler’s riddle]$

A neat question from The Riddler on a multi-probability survival rate: Nine processes are running in a loop with fixed survivals rates .99,….,.91. What is the probability that the first process is the last one to die? Same question with probabilities .91,…,.99 and the probability that the last process is the last one to die.

## Binning with Weights

April 21, 2019
By

After working on the MOB package, I received requests from multiple users if I can write a binning function that takes the weighting scheme into consideration. It is a legitimate request from the practical standpoint. For instance, in the development of fraud detection models, we often would sample down non-fraud cases given an extremely low

## Familiarisation with the Australian Election Study by @ellis2013nz

April 21, 2019
By

The Australian Election Study is an impressive long term research project that has collected the attitudes and behaviours of a sample of individual voters after each Australian federal election since 1987. All the datasets and documentation are freely ...

## FizzBuzz in R and Python

April 21, 2019
By

In this post, we will solve a simple problem (called "FizzBuzz") that is asked by some employers in data scientist job interviews. The question seeks to ascertain the applicant's familiarity with basic programming concepts. We will see 2 different ways to solve the problem in 2 different statistical programming languages: R and Python.The FizzBuzz Question I came across the FizzBuzz...

## Using data.table with magrittr pipes: best of both worlds

April 20, 2019
By

Should we use magrittr pipes with data.table? Why ask the question? If you are fairly new to R, you might find it puzzling / intriguing that R questions on Stack Overflow tend to attract a range of solutions which all have different syntax “styles”, but almost all seem to be valid answers to some extent (as indicated by the...

## Process Mining (Part 2/3): More on bupaR package

April 20, 2019
By

Recap In the last post, the discipline of event log and process mining were defined. The bupaR package was introduced as a technique to do process mining in R. Objectives for This Post Visualize workflow Understand the concept of activity reoccurrences We will use a pre-loaded dataset sepsis from the bupaR package. This event log is based on real life management of sepsis from...

## Before you take my DataCamp course please consider this info

April 20, 2019
By

Today, I am finally getting around to writing this very sad blog post: Before you take my DataCamp course please consider the following information about the sexual harassment scandal surrounding DataCamp! As many of my fellow instructors and community...

## Batch Deployment of WoE Transformations

April 20, 2019
By

After wrapping up the function batch_woe() today with the purpose to allow users to apply WoE transformations to many independent variables simultaneously, I have completed the development of major functions in the MOB package that can be usable for the model development in a production setting. The function batch_woe() basically is the wrapper around cal_woe()

## modelplotr vignette

April 20, 2019
By

Why ROC curves are a bad idea to explain your model to business people The modelplotr package makes it easy to create a number of valuable evaluation plots to assess the business value of a predictive model. Using these plots, it can be shown how implementation of the model will impact business targets like response or return on investment of...

## Styling DataTables

April 19, 2019
By

Most of the shiny apps have tables as the primary component. Now lets say you want to prettify your app and style the tables. All you need understand how tables are built using HTML. This is how the default datatable looks like in the app. In order to build the html table I have used a function table_frame which...

## Quick Example of Latent Profile Analysis in R

April 19, 2019
By

Latent Profile Analysis (LPA) tries to identify clusters of individuals (i.e., latent profiles) based on responses to a series of continuous variables (i.e., indicators). LPA assumes that there are unobserved latent profiles that generate patterns of responses on indicator items. Here, I will go through a quick example of LPA to identify groups of people based on their interests/hobbies. The...

## Control Charts Another Package

April 19, 2019
By

I got an email from Alex Zanidean, who runs the xmrr package “You might enjoy my package xmrr for similar charts – but mine recalculate the bounds automatically” and if we go to the vingette, “XMRs combine X-Bar control charts and Moving Range control charts. These functions also will recalculate the reference lines when significant change has occurred” This seems...

## Happy EasteR! Let’s find some eggs…

April 19, 2019
By

It's Easter Time! Let's find some eggs... Hi there! Yes, it's the most Easterful time of the year again. For some of us a sacret time, for others mainly an egg-eating period and some just enjoy the extra day of spare time. In case you have some time available for some good egg searching business, but no-one seems willing to...

## ODSC East 2019 Talks to Expand and Apply R Skills

R programmers are not necessary data scientists, but rather software engineers. We have an entirely new multitrack focus area that helps engineers learn AI skills – AI for Engineers. This focus area is designed specifically to help programmers get familiar with AI-driven software that utilizes deep learning and machine learning models to enable conversational AI, … Continue reading ODSC...