Building A base dplyr With Primitives: Grouped Operations, Pipes and More!

February 27, 2020
By

Introduction In my last post we looked at how we can recreate base equivalents of the dplyr functions select(), filter(), mutate() and arrange(), amongst others. I wrote these functions and presented them in a new package called poorman. In this post I will be discussing new functionality that I have since added to poorman including grouped operations, renaming columns, summarising...

Read more »

Decision Boundary for a Series of Machine Learning Models

Decision Boundary for a Series of Machine Learning Models

Machine Learning at the Boundary: There is nothing new in the fact that machine learning models can outperform traditional econometric models but I want to show as part of my research why and how some models make given predictions or in this instance classifications. I wanted to show the decision boundary in which my binary classification model was making. That is,...

Read more »

Version 0.4.0 of nnetsauce, with fruits and breast cancer classification

February 27, 2020
By
Version 0.4.0 of nnetsauce, with fruits and breast cancer classification

Version 0.4.0 of nnetsauce, with fruits and breast cancer classification

Read more »

Student’s t-test in R and by hand: how to compare two groups under different scenarios

February 27, 2020
By
Student’s t-test in R and by hand: how to compare two groups under different scenarios

Introduction Null and alternative hypothesis Hypothesis testing Different versions of the Student’s t-test How to compute Student’s t-test by hand? Scenario 1: Independent samples with 2 known variances Scenario 2: Independent samples with 2 equal but unknown variances Scenario 3: Independent samples with 2 unequal and unknown variances Scenario 4: Paired samples where the variance of the differences is known Scenario 5: Paired samples where the variance of...

Read more »

Data Science in Manufacturing: An Overview

February 27, 2020
By
Data Science in Manufacturing: An Overview

Original article published in opendatascience.com In the last couple of years, data science has seen an immense influx in various industrial applications across the board. Today, we can see data science applied in health care, customer service, governments, cyber security, mechanical, aerospace, and other industrial applications. Among these, manufacturing has gained more prominence to achieve... Continue Reading →

Read more »

Developing a complex R Shiny app – the good, the bad and the ugly

February 27, 2020
By
Developing a complex R Shiny app – the good, the bad and the ugly

Together with Clara Bicalho (UC Berkeley) and Sisi Huang (WZB), I recently developed a web application that acts as a … Read More →

Read more »

MLOPS for R with Azure Machine Learning

February 26, 2020
By
MLOPS for R with Azure Machine Learning

The video recording of my RStudio::conf talk, MLOPS for R with Azure Machine Learning, is now available for streaming thanks to the fine folks at RStudio. The talk begins with a general discussion of MLOps (Machine Learning Operations) and how it differs from DevOps as applied to traditional (non-ML-based) applications. This is a theme I plan to develop further...

Read more »

RStudio Package Manager 1.1.2 – Windows

February 26, 2020
By
RStudio Package Manager 1.1.2 – Windows

RStudio Package Manager 1.1.2 introduces beta support for Windows package binaries. These binaries make it easier and faster to install R packages on Windows Desktop. With this release, all the benefits of Package Manager are available to ...

Read more »

if … else and ifelse

February 26, 2020
By

Let’s make this a quick and quite basic one. There is this incredibly useful function in R called ifelse(). It’s basically a vectorized version of an if … else control structure every programming language has in one way or the other. ifelse() has, in my view, two major advantages over if … else: It’s super fast. It’s more convenient to use. The...

Read more »

chain of lynx and drove of hares

February 26, 2020
By
chain of lynx and drove of hares

A paper (and an introduction to the paper) in Nature this week seems to have made progress on the existence of indefinite predator-prey cyles. As in the lynx/hare dataset available on R. The paper is focusing on another pair, an invertebrate and its prey, an algae. For which the authors managed a 50 cycle sequence.

Read more »

A New Baby Boom Poster

February 26, 2020
By
A New Baby Boom Poster

I wanted to work through a few examples of more polished graphics done mostly but perhaps not entirely in R. So, I revisited the Baby Boom visualizations I made a while ago and made a new poster with them. This allowed me to play around with a few packages that I either hadn’t made use of or that weren’t...

Read more »

Testing REST APIs with Newman

February 26, 2020
By
Testing REST APIs with Newman

Newman and Postman form a great team to test your REST API. I will give you a quick roundtrip through both tools and their interplay: define requests and tests, export them, and let them run with CLI and within Jenkins. Der Beitrag Testing REST APIs with Newman erschien zuerst auf STATWORX.

Read more »

R for Excel Users: Pivot Tables, VLOOKUPs in R

February 25, 2020
By
R for Excel Users: Pivot Tables, VLOOKUPs in R

New business and financial analysts are finding R every day. Most of these new userRs (R users) are coming from a non-programming background. They have ample domain experience in functions like finance, marketing, and business, but their tool of choice...

Read more »

Including Function Factories in an R Package: Using Collate

February 25, 2020
By

Introduction This week I was working on a package which included a function factory. A function factory is a function which returns a function. The problem I faced was that when I was running R CMD check on my package, the check informed me my package had several issues which on first glance were confusing and seemingly shouldn’t have been...

Read more »

The p-direction: A Bayesian equivalent of the p-value?

February 25, 2020
By
The p-direction: A Bayesian equivalent of the p-value?

The Bayesian framework is powerful and allows for an incredible amount of flexibility and control over your analysis. That being said, newcomers often struggle with a lot of new concepts and tools and could benefit from some familiar grounding. And the p-value is a very familiar index (although paradoxically often misunderstood, but that’s another topic). Is there an equivalent of...

Read more »

A collection of self-starters for nonlinear regression in R

A collection of self-starters for nonlinear regression in R

Usually, the first step of every nonlinear regression analysis is to select the function \(f\), which best describes the phenomenon under study. The next step is to fit this function to the observed data, possibly by using some sort of nonlinear least squares algorithms. These algorithms are iterative, in the sense that they start from some initial values of...

Read more »

New xgboost defaults

February 25, 2020
By
New xgboost defaults

xgboost is the most famous R package for gradient boosting and it is since long time on the market. In one of my publications, I created a framework for providing defaults (and tunability measures) and one of the packages that I used there was xgboost. The results provided a default with the parameter nrounds=4168, which leads to long runtimes....

Read more »

The p-direction: A Bayesian equivalent of the p-value?

February 25, 2020
By
The p-direction: A Bayesian equivalent of the p-value?

The Bayesian framework is powerful and allows for an incredible amount of flexibility and control over your analysis. That being said, newcomers often struggle with a lot of new concepts and tools and could benefit from some familiar grounding. And the p-value is a very familiar index (although paradoxically often misunderstood, but that’s another topic). Is there an equivalent of...

Read more »

3 recommended books on learning R

February 24, 2020
By
3 recommended books on learning R

I sometimes get asked how I got started learning R. I thought I would use this post to go through a few books I read along the way which have been highly useful. The Art of R Programming The Art of R Programming: A Tour of Statistical Software Design is one of the first R The post 3 recommended...

Read more »

R Robustreg Package Downloads

February 24, 2020
By
R Robustreg Package Downloads

I built robustreg in 2006 and at the time the major stat packages did not have a robust regression available.  Below are graphs of weekly and cumulative downloads from just the RStudio mirror.  I would estimate total downloads at over 150,000. The median_rcpp() function is written in C++ and is multiple times faster than the R base function median().__ r_norm...

Read more »

Book slides – Analyzing Financial and Economic Data with R

February 24, 2020
By

The slides for my newly released book Analyzing Financial and Economic Data with R are finally ready! I apologize for keep you guys waiting. The slides are available as independent .Rmd files for all book chapters including: ## "afedR-Slides_Chapter-01_Introduction.Rmd" ## "afedR-Slides_Chapter-02_BasicOperations.Rmd" ## "afedR-Slides_Chapter-03_ResearchScripts.Rmd" ...

Read more »

RStudio 1.3 Preview: Integrated Tutorials

February 24, 2020
By
RStudio 1.3 Preview: Integrated Tutorials

This blog post is part of a series on new features in RStudio 1.3, currently available as a preview release. We’re excited to announce that RStudio v1.3 will gain a newly-minted pane: the Tutorial pane, used to host tutorials powered by the learnr package. The learnr package makes it easy to turn any R Markdown document into an interactive tutorial. Here...

Read more »

opentripplanner: Fast and Easy Multimodal Trip Planning in R with OpenTripPlanner

With services like Google Maps, finding the fastest route from A to B has become quick, cheap, and easy. Not just for driving but walking, cycling and public transport too. But in the field of transport studies, we often want not only a single route, but thousands or millions of routes. This is where we hit a problem for...

Read more »

multiplying the bars

February 24, 2020
By

The latest Riddler makes the remark that the expression |-1|-2|-3| has no unique meaning (and hence value) since it could be | -1x|-2|-3 | = 5   or   |-1| – 2x|-3| = -5 depending on the position of the multiplication sign and asks for all the possible values of |-1|-2|…|-9| which can be explored by a

Read more »

Gender balance in the Irish elections a.k.a. an excellent excuse to learn how to create stacked point plots and butterfly plots in R!

February 23, 2020
By
Gender balance in the Irish elections a.k.a. an excellent excuse to learn how to create stacked point plots and butterfly plots in R!

With the most recent Irish General Election having concluded last month, I got interested in looking at some of the available data and trying to see how they could be visualized with R. Here, we’ll have a look on how to use stacked dot plots and butt...

Read more »

January 2020: “Top 40” New R Packages

February 23, 2020
By
January 2020: “Top 40” New R Packages

One hundred forty-seven new packages made it to CRAN in January. Here are my “Top 40” picks in nine categories: Computational Methods, Genomics, Machine Learning, Mathematics, Medicine, Statistics, Time Series, Utilities and Visualization. Computational Methods FSSF v0.1.1: Provides three methods proposed by Shang & Apley (2019) to generate fully-sequential space-filling designs inside a unit hypercube. seagull v1.0.5: Implements a proximal gradient descent...

Read more »

Le Monde puzzle [#1132]

February 23, 2020
By
Le Monde puzzle [#1132]

A vaguely arithmetic challenge as Le weekly Monde current mathematical puzzle: Given two boxes containing x and 2N+1-x balls respectively. If one proceeds by repeatedly transferring half the balls from the even box to the odd box, what is the largest value of N for which the resulting sequence in one of the boxes covers

Read more »

Synthetic micro-datasets: a promising middle ground between data privacy and data analysis

February 22, 2020
By
Synthetic micro-datasets: a promising middle ground between data privacy and data analysis

Intro: the need for microdata, and the risk of disclosure Survey and administrative data are essential for scientific research, however accessing such datasets can be very tricky, or even impossible. In my previous job I was responsible for getting access to such “scientific micro-datasets” from institutions like Eurostat. In general, getting access to these micro datasets was only a question of filling out...

Read more »

The significance of the sector on the salary in Sweden, a comparison between different occupational groups, part 2

February 22, 2020
By
The significance of the sector on the salary in Sweden, a comparison between different occupational groups, part 2

In my last post, I examined the significance of the sector on the salary for different occupational groups using statistics from different regions. In previous posts I have shown a correlation between the salary and experience and also salary and education, In this post, I will examine the correlation between salary and sector using statistics for age. The F-value from...

Read more »

Search R-bloggers

Sponsors