## Tableau – Creating a Waffle Chart

July 7, 2019
By

Waffles are for Breakfast It’s been a long time since my last update and I’ve decided to start with Tableau, of all topics! Although open source advocates do not look kindly upon Tableau, I find myself using it frequently and relearning all the...

## Chat with rOpenSci Contributors at useR!2019

Three members of the rOpenSci team - Scott Chamberlain, Jenny Bryan, and Rich FitzJohn - as well as many community members will give talks at useR!2019. Many other package authors, maintainers, reviewers and unconf participants will be there too. Don’t hesitate to ask them about rOpenSci packages, software peer review, community, or just say hello if you’re looking for...

## swephR v0.2.1

July 7, 2019
By

This morning swephR version 0.2.1 made it unto CRAN and is now propagating to the mirrors. The goal of swephR is to provide an R interface to the Swiss Ephemeris, a high precision ephemeris based upon the DE431 ephemeris from NASA’s JPL. It covers the time range 13201 BCE to 17191 CE. This new version comes closely after last week’s release and contains only a single albeit...

## Introducing tidylo

July 7, 2019
By

Today I am so pleased to introduce a new package for calculating weighted log odds ratios, tidylo. Often in data analysis, we want to measure how the usage or frequency of some feature, such as words, differs across some group or set, such as documents. One statistic often used to find these kinds of differences in text data is tf-idf....

## Chunk Averaging of GLM

July 7, 2019
By

Chunk Average (CA) is an interesting concept proposed by Matloff in the chapter 13 of his book “Parallel Computing for Data Science”. The basic idea is to partition the entire model estimation sample into chunks and then to estimate a glm for each chunk. Under the i.i.d assumption, the CA estimator with the chunked data

## Sampling paths from a Gaussian process

July 7, 2019
By
$Sampling paths from a Gaussian process$

Gaussian processes are a widely employed statistical tool because of their flexibility and computational tractability. (For instance, one recent area where Gaussian processes are used is in machine learning for hyperparameter optimization.) A stochastic process is a Gaussian process if … Continue reading →

## Le Monde puzzle [#1105]

July 7, 2019
By

Another token game as Le Monde mathematical puzzle: Archibald and Beatrix play with a pile of n__100 tokens, sequentially picking m tokens from the pile with m being a prime number or a multiple of 6, the winner taking the last tokens. If Beatrix knows n and proposes to Archibald to start, what

## CRAN Release of R/exams 2.3-3 and 2.3-4

July 7, 2019
By

New minor releases of the R/exams package to CRAN, containing a new dedicated function for online quizzes/exams in the Canvas learning management system. Moreover, the update provides a range of smaller improvements and bug fixes. ...

## Link Functions versus Data Transforms

July 7, 2019
By

In the linear regression section of our book Practical Data Science in R, we use the example of predicting income from a number of demographic variables (age, sex, education and employment type). In the text, we choose to regress against log10(income) rather than directly against income. One obvious reason for not regressing directly against income … Continue reading Link...

## Latin Hypercube Sampling in Hyper-Parameter Optimization

July 6, 2019
By

In my previous post https://statcompute.wordpress.com/2019/02/03/sobol-sequence-vs-uniform-random-in-hyper-parameter-optimization/, I’ve shown the difference between the uniform pseudo random and the quasi random number generators in the hyper-parameter optimization of machine learning. Latin Hypercube Sampling (LHS) is another interesting way to generate near-random sequences with a very simple idea. Let’s assume that we’d like to perform LHS for 10 data

## Visualize monthly precipitation anomalies

July 6, 2019
By

Normally when we visualize monthly precipitation anomalies, we simply use a bar graph indicating negative and positive values with red and blue. However, it does not explain the general context of these anomalies. For example, what was the highest or lowest anomaly in each month? In principle, we could use a boxplot to visualize the distribution of the anomalies,...

## Making a Cheat Sheet with Rmarkdown

July 6, 2019
By

Unfortunately, I haven’t had as much time to make blog posts in the past year or so. I started taking classes as part of Georgia Tech’s Online Master of Science in Analytics (OMSA) program last summer (2018) while continuing to work full-time, so extra time to code and write hasn’t been abundant for me. Anyways, I figured I would share one neat thing I learned as...

## Use the k-means clustering, Luke

July 6, 2019
By

In my last post I scraped some character statistics from the mobile game Star Wars: Galaxy of Heroes. In this post, I’ll be aiming to try out k-means clustering in order to see if it comes out with an intuitive result, and to learn how to integrate this kind of analysis into a tidy workflow using broom. First I’ll load...

## Automatic differentiation in pqR

July 6, 2019
By

I’ve released a version of my pqR implementation of R that has extensions for automatic differentiation. This is not a stable release, but it can be downloaded from pqR-project.org — look for the test version at the bottom — and installed the same as other pqR versions (from source, so you’ll need C and Fortran compilers).

## Programming Over lm() in R

July 6, 2019
By

Here is simple modeling problem in R. We want to fit a linear model where the names of the data columns carrying the outcome to predict (y), the explanatory variables (x1, x2), and per-example row weights (wt) are given to us as strings. Lets start with our example data and parameters. The point is: we … Continue reading Programming...

## Rhombuses

July 6, 2019
By

For a lonely soul, you’re having such a nice time (Nothing in my way, Keane) In my previous post, I created the P2 Penrose tessellation according to the instructions of this post. Now it’s time to create the P3 tessellation following the same technique I described already. This is the image of the P3 tessellation: … Continue reading Rhombuses...

## Why I love data.table

July 5, 2019
By

I’ve been an R user for a few years now and the data.table package has been my staple package for most of it. In this post I wanted to talk about why almost every script and RMarkdown report I write start with: library(data.table) My memory issues I started working on my licenciate thesis (the argentinian equivalent to a Masters Degree) around mid...

## A Short Essay on Duplicated R Artefacts

July 5, 2019
By

Organic Development of R Artefacts In a previous post, I alluded to the point that one of the great strengths (but also one of the challenges) of R is the organic way in which R ‘artefacts’ are developed.1 One characteristic of this “organic d...

## Integration in R

July 5, 2019
By

Are you interested in guest posting? Publish at DataScience+ via your editor (i.e., RStudio). Category Basic Statistics Tags Linear Regression R Programming Tips & Tricks Integration is the process of evaluating integrals. It is one of the two central ideas of calculus and is the inverse of the other central idea of calculus, differentiation. Generally, we can speak of integration in two different contexts: the...

## Optimal transport on large networks

July 4, 2019
By

With Alfred Galichon and Lucas Vernet, we recently uploaded a paper entitled optimal transport on large networks on arxiv. This article presents a set of tools for the modeling of a spatial allocation problem in a large geographic market and gives examples of applications. In our settings, the market is described by a network that maps the cost of...

## pkginfo: Tools for Retrieving R Package Information

July 4, 2019
By

Motivation There are several wonderful tools for retrieving information about R packages, some of which are listed below: cranlogs, dlstats and packageRank for R package download stats pkgsearch and packagefinder for searching CRAN R packages crandb provides API for programatically accessing meta-data cchecks for CRAN check results We have used some or all of these to track/monitor our own R packages available on CRAN. Over...

## swephR v0.2.0

July 4, 2019
By

This morning swephR version 0.2.0 made it unto CRAN and is now propagating to the mirrors. The goal of swephR is to provide an R interface to the Swiss Ephemeris, a high precision ephemeris based upon the DE431 ephemeris from NASA’s JPL. It covers the time range 13201 BCE to 17191 CE. The new version 0.2.0 brings two important changes. First, the version of the included Swiss...

## 79th #TokyoR Meetup: {tidyr} 1.0.0, RAW image processing, and more!

July 4, 2019
By

As the monsoon rains fall, another TokyoR User Meetup! On June 29th, useRs from all over Tokyo flocked to Hanzomon, Tokyo for another jam-packed session of #rstats hosted by Infocom. In line with my previous round up posts: ...

## groupdata2 version 1.1.0 released on CRAN

July 4, 2019
By

A few days ago, I released a new version of my R package, groupdata2, on CRAN. groupdata2 contains a set of functions for grouping data, such as creating balanced partitions… Read More → Indlægget groupdata2 version 1.1.0 released on CRAN blev først udgivet på .

## digest 0.6.20

This morning, digest version 0.6.20 went to CRAN, and I will send a package to Debian shortly as well. digest creates hash digests of arbitrary R objects (using the md5, sha-1, sha-256, sha-512, crc32, xxhash32, xxhash64, murmur32, and spookyhash algorithms) permitting easy comparison of R language objects. This version contains only internal changes with a switch to the (excellent) tinytest package....

## cvms 0.1.0 released on CRAN

July 4, 2019
By

After a fairly long life on GitHub, my R package, cvms, for cross-validating linear and logistic regression, is finally on CRAN! With a few additions in the past months, this… Read More → Indlægget cvms 0.1.0 released on CRAN blev først udgivet på .

## 10th MilanoR meeting: photos and resources

July 4, 2019
By

Curious to know how did our last MilanoR Meeting end up? Check out some photos, resources and the highlights of the night. The post 10th MilanoR meeting: photos and resources appeared first on MilanoR.

## Communication between modules and its whims

July 4, 2019
By

As part of the development of a Shiny application for production using {golem}, we recommend, among other things, working with Shiny-modules. The communication of data between the different modules can be complex. At ThinkR we use a strategy: the stratégie du petit r. We explain everything in this article. What is a module? A module is the combination of...