Tableau – Creating a Waffle Chart

July 7, 2019
By
Tableau – Creating a Waffle Chart

Waffles are for Breakfast It’s been a long time since my last update and I’ve decided to start with Tableau, of all topics! Although open source advocates do not look kindly upon Tableau, I find myself using it frequently and relearning all the...

Read more »

Chat with rOpenSci Contributors at useR!2019

Chat with rOpenSci Contributors at useR!2019

Three members of the rOpenSci team - Scott Chamberlain, Jenny Bryan, and Rich FitzJohn - as well as many community members will give talks at useR!2019. Many other package authors, maintainers, reviewers and unconf participants will be there too. Don’t hesitate to ask them about rOpenSci packages, software peer review, community, or just say hello if you’re looking for...

Read more »

swephR v0.2.1

July 7, 2019
By

This morning swephR version 0.2.1 made it unto CRAN and is now propagating to the mirrors. The goal of swephR is to provide an R interface to the Swiss Ephemeris, a high precision ephemeris based upon the DE431 ephemeris from NASA’s JPL. It covers the time range 13201 BCE to 17191 CE. This new version comes closely after last week’s release and contains only a single albeit...

Read more »

Introducing tidylo

July 7, 2019
By
Introducing tidylo

Today I am so pleased to introduce a new package for calculating weighted log odds ratios, tidylo. Often in data analysis, we want to measure how the usage or frequency of some feature, such as words, differs across some group or set, such as documents. One statistic often used to find these kinds of differences in text data is tf-idf....

Read more »

Chunk Averaging of GLM

July 7, 2019
By

Chunk Average (CA) is an interesting concept proposed by Matloff in the chapter 13 of his book “Parallel Computing for Data Science”. The basic idea is to partition the entire model estimation sample into chunks and then to estimate a glm for each chunk. Under the i.i.d assumption, the CA estimator with the chunked data

Read more »

Sampling paths from a Gaussian process

July 7, 2019
By
Sampling paths from a Gaussian process

Gaussian processes are a widely employed statistical tool because of their flexibility and computational tractability. (For instance, one recent area where Gaussian processes are used is in machine learning for hyperparameter optimization.) A stochastic process is a Gaussian process if … Continue reading →

Read more »

Le Monde puzzle [#1105]

July 7, 2019
By
Le Monde puzzle [#1105]

Another token game as Le Monde mathematical puzzle: Archibald and Beatrix play with a pile of n__100 tokens, sequentially picking m tokens from the pile with m being a prime number or a multiple of 6, the winner taking the last tokens. If Beatrix knows n and proposes to Archibald to start, what

Read more »

CRAN Release of R/exams 2.3-3 and 2.3-4

July 7, 2019
By
CRAN Release of R/exams 2.3-3 and 2.3-4

New minor releases of the R/exams package to CRAN, containing a new dedicated function for online quizzes/exams in the Canvas learning management system. Moreover, the update provides a range of smaller improvements and bug fixes. ...

Read more »

Link Functions versus Data Transforms

July 7, 2019
By
Link Functions versus Data Transforms

In the linear regression section of our book Practical Data Science in R, we use the example of predicting income from a number of demographic variables (age, sex, education and employment type). In the text, we choose to regress against log10(income) rather than directly against income. One obvious reason for not regressing directly against income … Continue reading Link...

Read more »

Latin Hypercube Sampling in Hyper-Parameter Optimization

July 6, 2019
By
Latin Hypercube Sampling in Hyper-Parameter Optimization

In my previous post https://statcompute.wordpress.com/2019/02/03/sobol-sequence-vs-uniform-random-in-hyper-parameter-optimization/, I’ve shown the difference between the uniform pseudo random and the quasi random number generators in the hyper-parameter optimization of machine learning. Latin Hypercube Sampling (LHS) is another interesting way to generate near-random sequences with a very simple idea. Let’s assume that we’d like to perform LHS for 10 data

Read more »

Visualize monthly precipitation anomalies

July 6, 2019
By
Visualize monthly precipitation anomalies

Normally when we visualize monthly precipitation anomalies, we simply use a bar graph indicating negative and positive values with red and blue. However, it does not explain the general context of these anomalies. For example, what was the highest or lowest anomaly in each month? In principle, we could use a boxplot to visualize the distribution of the anomalies,...

Read more »

Making a Cheat Sheet with Rmarkdown

July 6, 2019
By
Making a Cheat Sheet with Rmarkdown

Unfortunately, I haven’t had as much time to make blog posts in the past year or so. I started taking classes as part of Georgia Tech’s Online Master of Science in Analytics (OMSA) program last summer (2018) while continuing to work full-time, so extra time to code and write hasn’t been abundant for me. Anyways, I figured I would share one neat thing I learned as...

Read more »

Use the k-means clustering, Luke

July 6, 2019
By
Use the k-means clustering, Luke

In my last post I scraped some character statistics from the mobile game Star Wars: Galaxy of Heroes. In this post, I’ll be aiming to try out k-means clustering in order to see if it comes out with an intuitive result, and to learn how to integrate this kind of analysis into a tidy workflow using broom. First I’ll load...

Read more »

Automatic differentiation in pqR

July 6, 2019
By
Automatic differentiation in pqR

I’ve released a version of my pqR implementation of R that has extensions for automatic differentiation. This is not a stable release, but it can be downloaded from pqR-project.org — look for the test version at the bottom — and installed the same as other pqR versions (from source, so you’ll need C and Fortran compilers).

Read more »

Programming Over lm() in R

July 6, 2019
By

Here is simple modeling problem in R. We want to fit a linear model where the names of the data columns carrying the outcome to predict (y), the explanatory variables (x1, x2), and per-example row weights (wt) are given to us as strings. Lets start with our example data and parameters. The point is: we … Continue reading Programming...

Read more »

Rhombuses

July 6, 2019
By
Rhombuses

For a lonely soul, you’re having such a nice time (Nothing in my way, Keane) In my previous post, I created the P2 Penrose tessellation according to the instructions of this post. Now it’s time to create the P3 tessellation following the same technique I described already. This is the image of the P3 tessellation: … Continue reading Rhombuses...

Read more »

Why I love data.table

July 5, 2019
By
Why I love data.table

I’ve been an R user for a few years now and the data.table package has been my staple package for most of it. In this post I wanted to talk about why almost every script and RMarkdown report I write start with: library(data.table) My memory issues I started working on my licenciate thesis (the argentinian equivalent to a Masters Degree) around mid...

Read more »

A Short Essay on Duplicated R Artefacts

July 5, 2019
By
A Short Essay on Duplicated R Artefacts

Organic Development of R Artefacts In a previous post, I alluded to the point that one of the great strengths (but also one of the challenges) of R is the organic way in which R ‘artefacts’ are developed.1 One characteristic of this “organic d...

Read more »

Integration in R

July 5, 2019
By
Integration in R

Are you interested in guest posting? Publish at DataScience+ via your editor (i.e., RStudio). Category Basic Statistics Tags Linear Regression R Programming Tips & Tricks Integration is the process of evaluating integrals. It is one of the two central ideas of calculus and is the inverse of the other central idea of calculus, differentiation. Generally, we can speak of integration in two different contexts: the...

Read more »

Optimal transport on large networks

July 4, 2019
By
Optimal transport on large networks

With Alfred Galichon and Lucas Vernet, we recently uploaded a paper entitled optimal transport on large networks on arxiv. This article presents a set of tools for the modeling of a spatial allocation problem in a large geographic market and gives examples of applications. In our settings, the market is described by a network that maps the cost of...

Read more »

pkginfo: Tools for Retrieving R Package Information

July 4, 2019
By
pkginfo: Tools for Retrieving R Package Information

Motivation There are several wonderful tools for retrieving information about R packages, some of which are listed below: cranlogs, dlstats and packageRank for R package download stats pkgsearch and packagefinder for searching CRAN R packages crandb provides API for programatically accessing meta-data cchecks for CRAN check results We have used some or all of these to track/monitor our own R packages available on CRAN. Over...

Read more »

swephR v0.2.0

July 4, 2019
By

This morning swephR version 0.2.0 made it unto CRAN and is now propagating to the mirrors. The goal of swephR is to provide an R interface to the Swiss Ephemeris, a high precision ephemeris based upon the DE431 ephemeris from NASA’s JPL. It covers the time range 13201 BCE to 17191 CE. The new version 0.2.0 brings two important changes. First, the version of the included Swiss...

Read more »

79th #TokyoR Meetup: {tidyr} 1.0.0, RAW image processing, and more!

July 4, 2019
By
79th #TokyoR Meetup: {tidyr} 1.0.0, RAW image processing, and more!

As the monsoon rains fall, another TokyoR User Meetup! On June 29th, useRs from all over Tokyo flocked to Hanzomon, Tokyo for another jam-packed session of #rstats hosted by Infocom. In line with my previous round up posts: ...

Read more »

groupdata2 version 1.1.0 released on CRAN

July 4, 2019
By

A few days ago, I released a new version of my R package, groupdata2, on CRAN. groupdata2 contains a set of functions for grouping data, such as creating balanced partitions… Read More → Indlægget groupdata2 version 1.1.0 released on CRAN blev først udgivet på .

Read more »

digest 0.6.20

This morning, digest version 0.6.20 went to CRAN, and I will send a package to Debian shortly as well. digest creates hash digests of arbitrary R objects (using the md5, sha-1, sha-256, sha-512, crc32, xxhash32, xxhash64, murmur32, and spookyhash algorithms) permitting easy comparison of R language objects. This version contains only internal changes with a switch to the (excellent) tinytest package....

Read more »

cvms 0.1.0 released on CRAN

July 4, 2019
By

After a fairly long life on GitHub, my R package, cvms, for cross-validating linear and logistic regression, is finally on CRAN! With a few additions in the past months, this… Read More → Indlægget cvms 0.1.0 released on CRAN blev først udgivet på .

Read more »

10th MilanoR meeting: photos and resources

July 4, 2019
By
10th MilanoR meeting: photos and resources

Curious to know how did our last MilanoR Meeting end up? Check out some photos, resources and the highlights of the night. The post 10th MilanoR meeting: photos and resources appeared first on MilanoR.

Read more »

Communication between modules and its whims

July 4, 2019
By
Communication between modules and its whims

As part of the development of a Shiny application for production using {golem}, we recommend, among other things, working with Shiny-modules. The communication of data between the different modules can be complex. At ThinkR we use a strategy: the stratégie du petit r. We explain everything in this article. What is a module? A module is the combination of...

Read more »

compareWith: Easy diff and merge in RStudio

July 4, 2019
By
compareWith: Easy diff and merge in RStudio

We are happy to announce the R package compareWith, providing user-friendly RStudio addins that simplify diff and merge tasks. Just ahead of the upcoming useR!2019 Toulouse, where Miraier Nikki will be introducing the package in a short talk during ...

Read more »

Search R-bloggers

Sponsors