How to easily generate a perfectly normal distribution

April 23, 2019
By
How to easily generate a perfectly normal distribution

Many times, for instance when teaching, I needed to quickly and simply generate a perfectly normally distributed sample to illustrate or show some of its characteristics. This is now very easy to do with the new bayestestR package, which includes the rnorm_perfect function. This function is very similar to the classic rnorm (same arguments), with the difference that the generated...

Read more »

conditionz: control how many times conditions are thrown

conditionz is a new (just on CRAN today) R package for controlling how many times conditions are thrown. This package arises from an annoyance in another set of packages I maintain: The brranching package uses the taxize package internally, calling it’s function taxize::tax_name(). The taxize::tax_name() function throws useful messages to the user if their API key is not found, and gives them...

Read more »

Join, split, and compress PDF files with pdftools

Last month we released a new version of pdftools and a new companion package qpdf for working with pdf files in R. This release introduces the ability to perform pdf transformations, such as splitting and combining pages from multiple files. Moreover, the pdf_data() function which was introduced in pdftools 2.0 is now available on all major systems. Split and Join...

Read more »

77th Tokyo.R Users Meetup Roundup!

April 23, 2019
By
77th Tokyo.R Users Meetup Roundup!

As the sakura bloomed in Tokyo, another TokyoR User Meetup was held, this time at SONY City! On April 13th useRs from all over Tokyo (and some even from further afield) flocked to Osaki, Tokyo for a special session focused on be...

Read more »

Probability of winning a best-of-7 series

April 22, 2019
By
Probability of winning a best-of-7 series

The NBA playoffs are in full swing! A total of 16 teams are competing in a playoff-format competition, with the winner of each best-of-7 series moving on to the next round. In each matchup, two teams play 7 basketball games … Continue reading →

Read more »

#GOT Animating the Shifting of Affiliations

April 22, 2019
By

My first steps using the gganimate package using Game of Thrones data!

Read more »

Comparing Point-and-Click Front Ends for R

April 22, 2019
By
Comparing Point-and-Click Front Ends for R

Now that I've completed seven detailed reviews of Graphical User Interfaces (GUIs) for R, let's try to compare them. It's easy enough to count their features and plot them, so let's start there. Continue reading →

Read more »

Le Monde puzzle [#1094]

April 22, 2019
By
Le Monde puzzle [#1094]

A rather blah number Le Monde mathematical puzzle: Find all integer multiples of 11111 with exactly one occurrence of each decimal digit.. Which I solved by brute force, by looking at the possible range of multiples (and  borrowing stringr:str_count from Robin!) __ combien=0 __ for (i in 90001:900008){ j=i*11111 combien=combien+(min(stringr::str_count(j,paste(0:9)))==1)} __ combien 3456 And

Read more »

Using R/exams for Written Exams in Finance Classes

April 22, 2019
By
Using R/exams for Written Exams in Finance Classes

Experiences with using R/exams for written exams in finance classes with a moderate number of students at Texas A&M International University (TAMIU). Guest post by Nathaniel P. Graham (Texas A&M International University, Division of International Banking and Finance Studies). Background While R/exams...

Read more »

Practical Data Science with R Book Update (April 2019)

April 22, 2019
By

I thought I would give a personal update on our book: Practical Data Science with R 2nd edition; Zumel, Mount; Manning 2019. The second edition should be fully available this fall! Nina and I have finished up through chapter 10 (of 12), and Manning has released previews of up through chapter 7 (with more to … Continue reading Practical...

Read more »

Free Course: Help Your Team Learn R!

April 22, 2019
By

Today I am happy to announce a new free course: Help Your Team Learn R! Over the last few years I’ve helped a number of data teams train their analysts to use R. At each company there was a skilled R user who was leading the team’s effort to adopt R. Each of these internal The post Free Course:...

Read more »

India has 100k records on iNaturalist

April 21, 2019
By
India has 100k records on iNaturalist

Biodiversity citizen scientists use iNaturalist to post their observations with photographs. The observations are then curated there by crowd-sourcing the identifications and other trait related aspects too. The data once converted to “research grade” is passed on to GBIF as occurrence records. Exciting news from India in 3rd week of April 2019 is: Another important

Read more »

Embedding subplots in ggplot2 graphics

Embedding subplots in ggplot2 graphics

The idea of embedded plots for visualizing a large dataset that has an overplotting problem recently came up in some discussions with students. I first learned about embedded graphics from package ggsubplot. You can still see an old post about that package and about embedded graphics in general, with examples. However, ggsubplot is no longer maintained and doesn’t work...

Read more »

Reproducible Environments

April 21, 2019
By
Reproducible Environments

Great data science work should be reproducible. The ability to repeat experiments is part of the foundation for all science, and reproducible work is also critical for business applications. Team collaboration, project validation, and sustainable products presuppose the ability to reproduce work over time. In my opinion, mastering just a handful of important tools will make reproducible work in R much easier for data...

Read more »

survivalists [a Riddler’s riddle]

April 21, 2019
By
survivalists [a Riddler’s riddle]

A neat question from The Riddler on a multi-probability survival rate: Nine processes are running in a loop with fixed survivals rates .99,….,.91. What is the probability that the first process is the last one to die? Same question with probabilities .91,…,.99 and the probability that the last process is the last one to die.

Read more »

Binning with Weights

April 21, 2019
By

After working on the MOB package, I received requests from multiple users if I can write a binning function that takes the weighting scheme into consideration. It is a legitimate request from the practical standpoint. For instance, in the development of fraud detection models, we often would sample down non-fraud cases given an extremely low

Read more »

Familiarisation with the Australian Election Study by @ellis2013nz

April 21, 2019
By

The Australian Election Study is an impressive long term research project that has collected the attitudes and behaviours of a sample of individual voters after each Australian federal election since 1987. All the datasets and documentation are freely ...

Read more »

FizzBuzz in R and Python

April 21, 2019
By

In this post, we will solve a simple problem (called "FizzBuzz") that is asked by some employers in data scientist job interviews. The question seeks to ascertain the applicant's familiarity with basic programming concepts. We will see 2 different ways to solve the problem in 2 different statistical programming languages: R and Python.The FizzBuzz Question I came across the FizzBuzz...

Read more »

Using data.table with magrittr pipes: best of both worlds

April 20, 2019
By
Using data.table with magrittr pipes: best of both worlds

Should we use magrittr pipes with data.table? Why ask the question? If you are fairly new to R, you might find it puzzling / intriguing that R questions on Stack Overflow tend to attract a range of solutions which all have different syntax “styles”, but almost all seem to be valid answers to some extent (as indicated by the...

Read more »

Process Mining (Part 2/3): More on bupaR package

April 20, 2019
By

Recap In the last post, the discipline of event log and process mining were defined. The bupaR package was introduced as a technique to do process mining in R. Objectives for This Post Visualize workflow Understand the concept of activity reoccurrences We will use a pre-loaded dataset sepsis from the bupaR package. This event log is based on real life management of sepsis from...

Read more »

Before you take my DataCamp course please consider this info

April 20, 2019
By

Today, I am finally getting around to writing this very sad blog post: Before you take my DataCamp course please consider the following information about the sexual harassment scandal surrounding DataCamp! As many of my fellow instructors and community...

Read more »

Batch Deployment of WoE Transformations

April 20, 2019
By

After wrapping up the function batch_woe() today with the purpose to allow users to apply WoE transformations to many independent variables simultaneously, I have completed the development of major functions in the MOB package that can be usable for the model development in a production setting. The function batch_woe() basically is the wrapper around cal_woe()

Read more »

modelplotr vignette

April 20, 2019
By
modelplotr vignette

Why ROC curves are a bad idea to explain your model to business people The modelplotr package makes it easy to create a number of valuable evaluation plots to assess the business value of a predictive model. Using these plots, it can be shown how implementation of the model will impact business targets like response or return on investment of...

Read more »

Styling DataTables

April 19, 2019
By
Styling DataTables

Most of the shiny apps have tables as the primary component. Now lets say you want to prettify your app and style the tables. All you need understand how tables are built using HTML. This is how the default datatable looks like in the app. In order to build the html table I have used a function table_frame which...

Read more »

Quick Example of Latent Profile Analysis in R

April 19, 2019
By
Quick Example of Latent Profile Analysis in R

Latent Profile Analysis (LPA) tries to identify clusters of individuals (i.e., latent profiles) based on responses to a series of continuous variables (i.e., indicators). LPA assumes that there are unobserved latent profiles that generate patterns of responses on indicator items. Here, I will go through a quick example of LPA to identify groups of people based on their interests/hobbies. The...

Read more »

Control Charts Another Package

April 19, 2019
By
Control Charts Another Package

I got an email from Alex Zanidean, who runs the xmrr package “You might enjoy my package xmrr for similar charts – but mine recalculate the bounds automatically” and if we go to the vingette, “XMRs combine X-Bar control charts and Moving Range control charts. These functions also will recalculate the reference lines when significant change has occurred” This seems...

Read more »

Happy EasteR! Let’s find some eggs…

April 19, 2019
By
Happy EasteR! Let’s find some eggs…

It's Easter Time! Let's find some eggs... Hi there! Yes, it's the most Easterful time of the year again. For some of us a sacret time, for others mainly an egg-eating period and some just enjoy the extra day of spare time. In case you have some time available for some good egg searching business, but no-one seems willing to...

Read more »

ODSC East 2019 Talks to Expand and Apply R Skills

R programmers are not necessary data scientists, but rather software engineers. We have an entirely new multitrack focus area that helps engineers learn AI skills – AI for Engineers. This focus area is designed specifically to help programmers get familiar with AI-driven software that utilizes deep learning and machine learning models to enable conversational AI, … Continue reading ODSC...

Read more »

tint 0.1.2: Some cleanups

April 19, 2019
By
tint 0.1.2: Some cleanups

A new version 0.1.2 of the tint package is arriving at CRAN as I write this. It follows the recent 0.1.1 release which included two fabulous new vignettes featuring new font choices. The package name expands from tint is not tufte as the package offe...

Read more »

Search R-bloggers

Sponsors