CRAN does not validate R packages!

July 9, 2019
By
CRAN does not validate R packages!

A friend called me the other day for advice on how to submit an R package to CRAN along with a proof his method was mathematically sound. I replied with some items of advice taken from my (limited) experience with submitting packages. And with the remark that CRAN would not validate the mathematical contents of

Read more »

Stress Testing Dynamic R/exams Exercises

July 9, 2019
By
Stress Testing Dynamic R/exams Exercises

Before actually using a dynamic exercise in a course it should be thoroughly tested. While certain aspects require critical reading by a human, other aspects can be automatically stress-tested in R. Motivation After a dynamic exercise has been developed, thorough testing...

Read more »

How to Use R in AWS Lambda

July 9, 2019
By
How to Use R in AWS Lambda

How to use R in AWS Lambda With AWS Lambda, small and frequent jobs can be done without setting up an instance and keeping it “alive” waiting for requests The post How to Use R in AWS Lambda appeared first on Enhance Data Science.

Read more »

Teach R to read handwritten Digits with just 4 Lines of Code

July 9, 2019
By
Teach R to read handwritten Digits with just 4 Lines of Code

What is the best way for me to find out whether you are rich or poor, when the only thing I know is your address? Looking at your neighbourhood! That is the big idea behind the k-nearest neighbour (or KNN) algorithm, where k stands for the number of neighbours to look at. The idea couldn’t … Continue reading "Teach...

Read more »

Monotonic Binning Driven by Decision Tree

July 8, 2019
By

After the development of MOB package (https://github.com/statcompute/MonotonicBinning), I was asked by a couple users about the possibility of using the decision tree to drive the monotonic binning. Although I am not aware of any R package implementing the decision tree with the monotonic constraint, I did manage to find a solution based upon the decision

Read more »

Dividend Sleuthing with R

July 8, 2019
By
Dividend Sleuthing with R

Welcome to a mid-summer edition of Reproducible Finance with R. Today, we’ll explore the dividend histories of some stocks in the S&P 500. By way of history for all you young tech IPO and crypto investors out there: way back, a long time ago in the dark ages, companies used to take pains to generate free cash flow and...

Read more »

More on those stepped-wedge design assumptions: varying intra-cluster correlations over time

July 8, 2019
By
More on those stepped-wedge design assumptions: varying intra-cluster correlations over time

In my last post, I wrote about within- and between-period intra-cluster correlations in the context of stepped-wedge cluster randomized study designs. These are quite important to understand when figuring out sample size requirements (and models for analysis, which I’ll be writing about soon.) Here, I’m extending the constant ICC assumption I presented last time around by introducing some complexity...

Read more »

Complexity is a source of income in open source ecosystems

July 8, 2019
By

I am someone who regularly uses R, and my interest in programming languages means that on a semi-regular basis spend time reading blog posts about the language. Over the last year, or so, I had noticed several patterns of behavior, and after reading a recent blog post things started to make sense (the blog post

Read more »

R/exams @ useR! 2019

July 8, 2019
By
R/exams @ useR! 2019

Pre-conference tutorial about R/exams at useR! 2019 (The R User Conference) in Toulouse: Slides, example code, and links to further information. Tutorial at useR! 2019 Today, the pre-conference tutorials kick off this...

Read more »

How to use `recipes` package from `tidymodels` for one hot encoding 🛠

July 8, 2019
By
How to use `recipes` package from `tidymodels` for one hot encoding 🛠

Quick introduction to `recipes` package, from the `tidymodels` family, based on one hot encoding. Useful to automatize some data preparation tasks.

Read more »

Excel to R, Part 2 – Speed Up Exploratory Data Analysis 100X (R Code!)

July 7, 2019
By
Excel to R, Part 2 – Speed Up Exploratory Data Analysis 100X (R Code!)

You’re a Business Analyst - well versed in tools like Tableau, PowerBI, and maybe even SQL, but you want to take your data analytics abilities to the next level, by improving productivity and making predictive business insights with data science (rather than just descriptive insights). Then R is the language for you. In this article, you’ll learn how to...

Read more »

Clean, Consistent Column Names

July 7, 2019
By

I like to standardize the column names of data I’m reading into R so that I don’t have to match column names from one dataset that has an i.d. column and another that has an id column or maybe an ID column. Keep it simple: lower case with a single underscore separator between words. My … Continue reading Clean,...

Read more »

Tableau – Creating a Waffle Chart

July 7, 2019
By
Tableau – Creating a Waffle Chart

Waffles are for Breakfast It’s been a long time since my last update and I’ve decided to start with Tableau, of all topics! Although open source advocates do not look kindly upon Tableau, I find myself using it frequently and relearning all the...

Read more »

Chat with rOpenSci Contributors at useR!2019

Chat with rOpenSci Contributors at useR!2019

Three members of the rOpenSci team - Scott Chamberlain, Jenny Bryan, and Rich FitzJohn - as well as many community members will give talks at useR!2019. Many other package authors, maintainers, reviewers and unconf participants will be there too. Don’t hesitate to ask them about rOpenSci packages, software peer review, community, or just say hello if you’re looking for...

Read more »

swephR v0.2.1

July 7, 2019
By

This morning swephR version 0.2.1 made it unto CRAN and is now propagating to the mirrors. The goal of swephR is to provide an R interface to the Swiss Ephemeris, a high precision ephemeris based upon the DE431 ephemeris from NASA’s JPL. It covers the time range 13201 BCE to 17191 CE. This new version comes closely after last week’s release and contains only a single albeit...

Read more »

Introducing tidylo

July 7, 2019
By
Introducing tidylo

Today I am so pleased to introduce a new package for calculating weighted log odds ratios, tidylo. Often in data analysis, we want to measure how the usage or frequency of some feature, such as words, differs across some group or set, such as documents. One statistic often used to find these kinds of differences in text data is tf-idf....

Read more »

Chunk Averaging of GLM

July 7, 2019
By

Chunk Average (CA) is an interesting concept proposed by Matloff in the chapter 13 of his book “Parallel Computing for Data Science”. The basic idea is to partition the entire model estimation sample into chunks and then to estimate a glm for each chunk. Under the i.i.d assumption, the CA estimator with the chunked data

Read more »

Sampling paths from a Gaussian process

July 7, 2019
By
Sampling paths from a Gaussian process

Gaussian processes are a widely employed statistical tool because of their flexibility and computational tractability. (For instance, one recent area where Gaussian processes are used is in machine learning for hyperparameter optimization.) A stochastic process is a Gaussian process if … Continue reading →

Read more »

Le Monde puzzle [#1105]

July 7, 2019
By
Le Monde puzzle [#1105]

Another token game as Le Monde mathematical puzzle: Archibald and Beatrix play with a pile of n__100 tokens, sequentially picking m tokens from the pile with m being a prime number or a multiple of 6, the winner taking the last tokens. If Beatrix knows n and proposes to Archibald to start, what

Read more »

CRAN Release of R/exams 2.3-3 and 2.3-4

July 7, 2019
By
CRAN Release of R/exams 2.3-3 and 2.3-4

New minor releases of the R/exams package to CRAN, containing a new dedicated function for online quizzes/exams in the Canvas learning management system. Moreover, the update provides a range of smaller improvements and bug fixes. ...

Read more »

Link Functions versus Data Transforms

July 7, 2019
By
Link Functions versus Data Transforms

In the linear regression section of our book Practical Data Science in R, we use the example of predicting income from a number of demographic variables (age, sex, education and employment type). In the text, we choose to regress against log10(income) rather than directly against income. One obvious reason for not regressing directly against income … Continue reading Link...

Read more »

Latin Hypercube Sampling in Hyper-Parameter Optimization

July 6, 2019
By
Latin Hypercube Sampling in Hyper-Parameter Optimization

In my previous post https://statcompute.wordpress.com/2019/02/03/sobol-sequence-vs-uniform-random-in-hyper-parameter-optimization/, I’ve shown the difference between the uniform pseudo random and the quasi random number generators in the hyper-parameter optimization of machine learning. Latin Hypercube Sampling (LHS) is another interesting way to generate near-random sequences with a very simple idea. Let’s assume that we’d like to perform LHS for 10 data

Read more »

Visualize monthly precipitation anomalies

July 6, 2019
By
Visualize monthly precipitation anomalies

Normally when we visualize monthly precipitation anomalies, we simply use a bar graph indicating negative and positive values with red and blue. However, it does not explain the general context of these anomalies. For example, what was the highest or lowest anomaly in each month? In principle, we could use a boxplot to visualize the distribution of the anomalies,...

Read more »

Making a Cheat Sheet with Rmarkdown

July 6, 2019
By
Making a Cheat Sheet with Rmarkdown

Unfortunately, I haven’t had as much time to make blog posts in the past year or so. I started taking classes as part of Georgia Tech’s Online Master of Science in Analytics (OMSA) program last summer (2018) while continuing to work full-time, so extra time to code and write hasn’t been abundant for me. Anyways, I figured I would share one neat thing I learned as...

Read more »

Use the k-means clustering, Luke

July 6, 2019
By
Use the k-means clustering, Luke

In my last post I scraped some character statistics from the mobile game Star Wars: Galaxy of Heroes. In this post, I’ll be aiming to try out k-means clustering in order to see if it comes out with an intuitive result, and to learn how to integrate this kind of analysis into a tidy workflow using broom. First I’ll load...

Read more »

Automatic differentiation in pqR

July 6, 2019
By
Automatic differentiation in pqR

I’ve released a version of my pqR implementation of R that has extensions for automatic differentiation. This is not a stable release, but it can be downloaded from pqR-project.org — look for the test version at the bottom — and installed the same as other pqR versions (from source, so you’ll need C and Fortran compilers).

Read more »

Programming Over lm() in R

July 6, 2019
By

Here is simple modeling problem in R. We want to fit a linear model where the names of the data columns carrying the outcome to predict (y), the explanatory variables (x1, x2), and per-example row weights (wt) are given to us as strings. Lets start with our example data and parameters. The point is: we … Continue reading Programming...

Read more »

Rhombuses

July 6, 2019
By
Rhombuses

For a lonely soul, you’re having such a nice time (Nothing in my way, Keane) In my previous post, I created the P2 Penrose tessellation according to the instructions of this post. Now it’s time to create the P3 tessellation following the same technique I described already. This is the image of the P3 tessellation: … Continue reading Rhombuses...

Read more »

Why I love data.table

July 5, 2019
By
Why I love data.table

I’ve been an R user for a few years now and the data.table package has been my staple package for most of it. In this post I wanted to talk about why almost every script and RMarkdown report I write start with: library(data.table) My memory issues I started working on my licenciate thesis (the argentinian equivalent to a Masters Degree) around mid...

Read more »

Search R-bloggers

Sponsors