Stack Overflow questions around the world

April 10, 2018
By

I am so lucky to work with so many generous, knowledgeable, and amazing people at Stack Overflow, including Ian Allen and Kirti Thorat. Both Ian and Kirti are part of biweekly sessions we have at Stack Overflow where several software developers join me in practicing R, data science, and modeling skills. This morning, the two of them went to...

Read more »

A brief history of time series forecasting competitions

April 10, 2018
By

Prediction competitions are now so widespread that it is often forgotten how controversial they were when first held, and how influential they have been over the years. To keep this exercise manageable, I will restrict attention to time series forecast...

Read more »

Statistics from R-bloggers

April 10, 2018
By
Statistics from R-bloggers

Tal Galili's R-bloggers.com has been syndicating blog posts about R for quite a while — from memory I'd say about 8 years, but I couldn't find the exact date it started aggregating. Anyway, it contains a wealth of information about activity in the R ecosystem, but without any easy way to access that information other than the blog post...

Read more »

Data science courses in R (/python/etc.) for $11 at Udemy (Sitewide Sale until April 16th)

April 10, 2018
By
Data science courses in R (/python/etc.) for $11 at Udemy (Sitewide Sale until April 16th)

Udemy is offering readers of R-bloggers access to its global online learning marketplace for $15 per course! This deal (offering over 50%-90% discount) is for hundreds of their courses – including many R-Programming, data science, machine learning etc. Click here to browse ALL (R and non-R) courses Advanced R courses:  Regression modelling Comprehensive Linear Modeling with R (15 Hours of video) Linear regression in R...

Read more »

How to call bullshit on AI companies (aka a short lesson on recall)

April 10, 2018
By
How to call bullshit on AI companies (aka a short lesson on recall)

Now that software ate the world, what’s for dessert? Those in the know know that it’s AI. It seems everyone …Continue reading →

Read more »

R Tip: Use match_order() to Align Data

April 10, 2018
By

R tip. Use wrapr::match_order() to align data. Suppose we have data in two data frames, and both of these data frames have common row-identifying columns called “idx“. library("wrapr") d1

Read more »

Data science at DataCamp

April 10, 2018
By
Data science at DataCamp

In January, I was excited to make an announcement about a shift in my career: I have some exciting news: today I'm joining @DataCamp as their Chief Data Scientist 🎉📊📈 pic.twitter.com/wiN9J4qSjx— David Robinson (@drob) January 29, 2018 When I first discussed the role with the DataCamp CEO, I described my goal as to “Make DataCamp as good at doing data science...

Read more »

Mapping the best states for business

April 10, 2018
By
Mapping the best states for business

  Data science gives you tools to find opportunities A few months ago, I wrote a blog post about Amazon’s search for a location for a second headquarters. For those of you who don’t know about it, the technology giant – which currently has its headquarters in Seattle Washington – announced last September that it The post Mapping the...

Read more »

Weighted survey data with Power BI compared to dplyr, SQL or survey by @ellis2013nz

April 10, 2018
By

A conundrum for Microsoft Power BI I’ve been familiarising myself with Microsoft Power BI, which features prominently in any current discussion on data analysis and dissemination tools for organisations. It’s a good tool with some nice features. I won’t try to do a full review here, but just ruminate on one aspect - setting it up for non-specialists...

Read more »

Get basic summary statistics for all the variables in a data frame

I have added a new function to my {brotools} package, called describe(), which takes a data frame as an argument, and returns another data frame with descriptive statistics. It is very much inspired by the {skmir} package but also by assist::describe() (click on the packages to be redirected to the respective Github repos) but I wanted to write my...

Read more »

Writing better R functions part two – April 10, 2018

April 9, 2018
By
Writing better R functions part two – April 10, 2018

In my last post I started to build two functions that took pairs of variables from a dataset and produced some nice useful ggplot plots from them. We started with the simplest case, plotting counts of how two variables cross-tabulate. Then we worked our way up to being able to automate the process of plotting lots of pairings of...

Read more »

How to visualize data with Highcharter: exercises

April 9, 2018
By
How to visualize data with Highcharter: exercises

INTRODUCTION Highcharter is a R wrapper for Highcharts javascript libray and its modules. Highcharts is very mature and flexible javascript charting library and it has a great and powerful API. Before proceeding, please follow our short tutorial. Look at the examples given and try to understand the logic behind them. Then, try to solve the Related exercise sets:Data Visualization...

Read more »

A New Package (hhi) for Quick Calculation of Herfindahl-Hirschman Index scores

April 9, 2018
By
A New Package (hhi) for Quick Calculation of Herfindahl-Hirschman Index scores

The Herfindahl-Hirschman Index (HHI) is a widely used measure of concentration in a variety of fields including, business, economics, political science, finance, and many others. Though simple to calculate (summed squared market shares of firms/actors in a single market/space), calculation of the HHI can get onerous, especially as the number of firms/actors increases and the … Continue reading A...

Read more »

P-Values, Sample Size and Data Mining

P-Values, Sample Size and Data Mining

Recently, a paper was presented at our university that showed a significant effect for a variable of interest but had a relatively small number of observations. One colleague suggested that we should consider the significance of the results with care since the number of observations was fairly small. This ignited some discussion. Given that the significance test computed exact p-values,...

Read more »

How to use dplyr’s mutate in R without a vectorized function

April 9, 2018
By

TL;DR: Use the Vectorize() function! If you’re reading this, you’ve either encountered this problem before, or you just got to this article out of curiousity (in which case you probably don’t know what problem I’m talking about). A few days ago I was given code by a client for a function that, given a path to a patient’s file, generates a...

Read more »

Building tidy tools workshop

April 8, 2018
By

Join RStudio Chief Data Scientist Hadley Wickham for his popular Building tidy tools workshop in San Francisco! If you’d missed the sold out course at rstudio::conf 2018 now is your chance. Register here: https://www.rstudio.com/workshops/extending-the-tidyverse/ You should take this class if you have some experience programming in R and you want to learn how to tackle larger scale problems. You’ll get...

Read more »

Struggle with Harry Potter Data

April 8, 2018
By

Notes about creation of Harry Potter Books Survey. It is not over, I need your help. Prologue Right now I am in the final stage of developing two packages devoted to results of abstract competitions (still not perfectly ready, so use with caution): comperes - infrastructure package for dealing with different formats...

Read more »

Masterclass in Bayesian Statistics in Marseilles next Fall

April 8, 2018
By
Masterclass in Bayesian Statistics in Marseilles next Fall

This post is to announce a second occurrence of the exciting “masterclass in Bayesian Statistics” that we organised in 2016, near Marseilles. It will take place on 22-26 October 2018 once more at CIRM (Centre International de Recherches Mathématiques, Luminy, Marseilles, France). The targeted audience includes all scientists interested in learning how Bayesian inference may

Read more »

Generating Text From An R DataFrame using PyTracery, Pandas and Reticulate

April 8, 2018
By
Generating Text From An R DataFrame using PyTracery, Pandas and Reticulate

In a couple of recent posts (Textualisation With Tracery and Database Reporting 2.0 and More Tinkering With PyTracery) I’ve started exploring various ways of using the pytracery port of the tracery story generation tool to generate variety of texts from Python pandas data frames. For my F1DataJunkie tinkerings I’ve been using R + SQL as

Read more »

tint 0.1.0

April 8, 2018
By
tint 0.1.0

A new release of the tint package just arrived on CRAN. Its name expands from tint is not tufte as the package offers a fresher take on the Tufte-style for html and pdf presentations. This version adds support for the tufte-book latex style. The pack...

Read more »

Dissecting R Package “Utility Belts”

April 8, 2018
By
Dissecting R Package “Utility Belts”

Many R package authors (including myself) lump a collection of small, useful functions into some type of utils.R file and usually do not export the functions since they are (generally) designed to work on package internals rather than expose their functionality via the exported package API. Just like Batman’s utility belt, which can be customized... Continue reading →

Read more »

MCMC Using STAN – Diagnostics With The Bayesplot Package: Exercises

April 8, 2018
By
MCMC Using STAN – Diagnostics With The Bayesplot Package: Exercises

This exercise set will continue to present the STAN platform, but with another useful tool: the bayesplot package. This package is very useful to construct diagnostics that can be used to have insights on the convergence of the MCMC sampling since the convergence of the generated chains is the main issue in most STAN models. Related exercise sets:Spatial Data...

Read more »

anomalize: Tidy Anomaly Detection

anomalize: Tidy Anomaly Detection

We recently had an awesome opportunity to work with a great client that asked Business Science to build an open source anomaly detection algorithm that suited their needs. The business goal was to accurately detect anomalies for various marketing data consisting of website actions and marketing feedback spanning thousands of time series across multiple customers and web sources. Enter...

Read more »

Access your data in Google BigQuery with Python and R

April 7, 2018
By

Some time ago we discussed how you can access data that are stored in Amazon Redshift and PostgreSQL with Python and R. Let’s say you did find an easy way to store a pile of data in your BigQuery data warehouse and keep them in sync. Now you want to start messing with it using statistical techniques, maybe build...

Read more »

Sketch – Data Trivia

April 7, 2018
By

A bit more tinkering with F1 data from the ergast db, this time trying to generating trivia / facts around races. The facts are identified using SQL queries: Some of the queries also embed query fragments, which I intend to develop further… I'm using knitr to generate Github flavoured markdown (gfm) from my Rmd docs

Read more »

Simple Numerical Modeling in R – Part 2: Exercises

April 6, 2018
By
Simple Numerical Modeling in R – Part 2: Exercises

In this exercise, we will continue to build our model from our previous exercise here, specifically to revise the errors that may be generated from the model, including rounding and truncating errors. Answers to these exercises are available here. If you obtained a different (correct) answer than those listed on the solutions page, please feel Related exercise sets:3D plotting...

Read more »

The smaller the p-value, the higher the likelihood ratio: true or false?

April 6, 2018
By

Someone recently said to me that the lower the p-value, the higher the likelihood ratio under the alternative vs the null. The arXiv paper by Michael Lew makes analogous points (thanks to Titus von der Malsb...

Read more »

magrittr and wrapr Pipes in R, an Examination

April 6, 2018
By

Let’s consider piping in R both using the magrittr package and using the wrapr package. magrittr pipelines The magittr pipe glyph “%__%” is the most popular piping symbol in R. magrittr documentation describes %__% as follow. Basic piping: x %__% f is equivalent to f(x) x %__% f(y) is equivalent to f(x, y) x %__% … Continue reading magrittr...

Read more »

Tinkering with Competitive Supertimes

April 6, 2018
By
Tinkering with Competitive Supertimes

I’m back on the R thang with F1 data from ergast.com, and started having a look at how drivers and teams compare at a circuit. One metric I came across for comparing teams over a season is the supertime, typically calculated for each manufacturer as the average of their fastest single lap recorded by the team

Read more »

Search R-bloggers


Sponsors

Mango solutions





Zero Inflated Models and Generalized Linear Mixed Models with R



Quantide: statistical consulting and training

ODSC2 west

ODSC1_jobs

datasociety

http://www.eoda.de

max kuhn

CRC R books series







Six Sigma Online Training



mljar.com

datazar.com



Contact us if you wish to help support R-bloggers, and place your banner here.