April 2019

Le Monde puzzle [#1094]

April 22, 2019 | 0 Comments

A rather blah number Le Monde mathematical puzzle: Find all integer multiples of 11111 with exactly one occurrence of each decimal digit.. Which I solved by brute force, by looking at the possible range of multiples (and  borrowing stringr:str_count from Robin!) __ combien=0 __ for (i in 90001:900008){ j=i*11111 combien=combien+(...
[Read more...]

Using R/exams for Written Exams in Finance Classes

April 22, 2019 | 0 Comments

Experiences with using R/exams for written exams in finance classes with a moderate number of students at Texas A&M International University (TAMIU). Guest post by Nathaniel P. Graham (Texas A&M International University, Division of International Banking and Finance Studies). Background While R/exams was originally written for ... [Read more...]

Practical Data Science with R Book Update (April 2019)

April 22, 2019 | 0 Comments

I thought I would give a personal update on our book: Practical Data Science with R 2nd edition; Zumel, Mount; Manning 2019. The second edition should be fully available this fall! Nina and I have finished up through chapter 10 (of 12), and Manning has released previews of up through chapter 7 (with more ... [Read more...]

Free Course: Help Your Team Learn R!

April 22, 2019 | 0 Comments

Today I am happy to announce a new free course: Help Your Team Learn R! Over the last few years I’ve helped a number of data teams train their analysts to use R. At each company there was a skilled R user who was leading the team’s effort ... [Read more...]

India has 100k records on iNaturalist

April 21, 2019 | 0 Comments

Biodiversity citizen scientists use iNaturalist to post their observations with photographs. The observations are then curated there by crowd-sourcing the identifications and other trait related aspects too. The data once converted to “research grade” is passed on to GBIF as occurrence records. Exciting news from India in 3rd week of ...
[Read more...]

Embedding subplots in ggplot2 graphics

April 21, 2019 | 0 Comments

The idea of embedded plots for visualizing a large dataset that has an overplotting problem recently came up in some discussions with students. I first learned about embedded graphics from package ggsubplot. You can still see an old post about that package and about embedded graphics in general, with examples. ...
[Read more...]

Reproducible Environments

April 21, 2019 | 0 Comments

Great data science work should be reproducible. The ability to repeat experiments is part of the foundation for all science, and reproducible work is also critical for business applications. Team collaboration, project validation, and sustainable products presuppose the ability to reproduce work over time. In my opinion, mastering just a ...
[Read more...]

survivalists [a Riddler’s riddle]

April 21, 2019 | 0 Comments

A neat question from The Riddler on a multi-probability survival rate: Nine processes are running in a loop with fixed survivals rates .99,….,.91. What is the probability that the first process is the last one to die? Same question with probabilities .91,…,.99 and the probability that the last process is the last ...
[Read more...]

Binning with Weights

April 21, 2019 | 0 Comments

After working on the MOB package, I received requests from multiple users if I can write a binning function that takes the weighting scheme into consideration. It is a legitimate request from the practical standpoint. For instance, in the development of fraud detection models, we often would sample down non-fraud ... [Read more...]

FizzBuzz in R and Python

April 21, 2019 | 0 Comments

In this post, we will solve a simple problem (called “FizzBuzz”) that is asked by some employers in data scientist job interviews. The question seeks to ascertain the applicant’s familiarity with basic programming concepts. We will see 2 different ways to solve the problem in 2 different statistical programming languages: R ... [Read more...]

FizzBuzz in R and Python

April 21, 2019 | 0 Comments

In this post, we will solve a simple problem (called "FizzBuzz") that is asked by some employers in data scientist job interviews. The question seeks to ascertain the applicant's familiarity with basic programming concepts. We will see 2 different ways to solve the problem in 2 different statistical programming languages: R and ... [Read more...]

Using data.table with magrittr pipes: best of both worlds

April 20, 2019 | 0 Comments

Should we use magrittr pipes with data.table? Why ask the question? If you are fairly new to R, you might find it puzzling / intriguing that R questions on Stack Overflow tend to attract a range of solutions which all have different syntax “styles”, but almost all seem to be ...
[Read more...]

Process Mining (Part 2/3): More on bupaR package

April 20, 2019 | 0 Comments

Recap In the last post, the discipline of event log and process mining were defined. The bupaR package was introduced as a technique to do process mining in R. Objectives for This Post Visualize workflow Understand the concept of activity reoccurrences We will use a pre-loaded dataset sepsis from the ... [Read more...]

Batch Deployment of WoE Transformations

April 20, 2019 | 0 Comments

After wrapping up the function batch_woe() today with the purpose to allow users to apply WoE transformations to many independent variables simultaneously, I have completed the development of major functions in the MOB package that can be usable for the model development in a production setting. The function batch_... [Read more...]

modelplotr vignette

April 20, 2019 | 0 Comments

Why ROC curves are a bad idea to explain your model to business people The modelplotr package makes it easy to create a number of valuable evaluation plots to assess the business value of a predictive model. Using these plots, it can be shown how implementation of the model will ...
[Read more...]

Styling DataTables

April 19, 2019 | 0 Comments

Most of the shiny apps have tables as the primary component. Now lets say you want to prettify your app and style the tables. All you need understand how tables are built using HTML. This is how the default datatable looks like in the app. In order to build the ... [Read more...]

Quick Example of Latent Profile Analysis in R

April 19, 2019 | 0 Comments

Latent Profile Analysis (LPA) tries to identify clusters of individuals (i.e., latent profiles) based on responses to a series of continuous variables (i.e., indicators). LPA assumes that there are unobserved latent profiles that generate patterns of responses on indicator items. Here, I will go through a quick example ...
[Read more...]

Control Charts Another Package

April 19, 2019 | 0 Comments

I got an email from Alex Zanidean, who runs the xmrr package “You might enjoy my package xmrr for similar charts – but mine recalculate the bounds automatically” and if we go to the vingette, “XMRs combine X-Bar control charts and Moving Range control charts. These functions also will recalculate the ...
[Read more...]
1 3 4 5 6 7 14

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)