An R vlookup? Not so silly idea

An R vlookup? Not so silly idea

It started out as a joke, but Jenny Bryan recently posted a vlookup implementation in R. Here is the original post as seen on twitter: Sometimes you just need to party like it’s VLOOKUP time 😁🖇 … seriously, sometimes a join doesn’t fit the bill pic.twitter.com/jz8StfQdNg— Jenny Bryan (@JennyBryan) April 3, 2018 The argument for the creation of this kind of...

Read more »

anomalize: Tidy Anomaly Detection

anomalize: Tidy Anomaly Detection

We recently had an awesome opportunity to work with a great client that asked Business Science to build an open source anomaly detection algorithm that suited their needs. The business goal was to accurately detect anomalies for various marketing data consisting of website actions and marketing feedback spanning thousands of time series across multiple customers and web sources. Enter...

Read more »

Access your data in Google BigQuery with Python and R

April 7, 2018
By

Some time ago we discussed how you can access data that are stored in Amazon Redshift and PostgreSQL with Python and R. Let’s say you did find an easy way to store a pile of data in your BigQuery data warehouse and keep them in sync. Now you want to start messing with it using statistical techniques, maybe build...

Read more »

G is for GLM Function

April 7, 2018
By

In the Beta post, I used linear regression and demonstrated how to request standardized regression coefficients (betas). You may remember that I mentioned in that post that there are other types of regression. Linear regression is used with a continouous outcome. But sometimes you want to predict outcomes that aren't continuous - perhaps they're binary outcomes (0...

Read more »

Sketch – Data Trivia

April 7, 2018
By

A bit more tinkering with F1 data from the ergast db, this time trying to generating trivia / facts around races. The facts are identified using SQL queries: Some of the queries also embed query fragments, which I intend to develop further… I'm using knitr to generate Github flavoured markdown (gfm) from my Rmd docs

Read more »

Simple Numerical Modeling in R – Part 2: Exercises

April 6, 2018
By
Simple Numerical Modeling in R – Part 2: Exercises

In this exercise, we will continue to build our model from our previous exercise here, specifically to revise the errors that may be generated from the model, including rounding and truncating errors. Answers to these exercises are available here. If you obtained a different (correct) answer than those listed on the solutions page, please feel Related exercise sets:3D plotting...

Read more »

R:case4base – reshape data with base R

April 6, 2018
By

Contents Introduction How to use this article Basic wide to long reshape Basic long to wide reshape Advanced reshape Alternatives to base R TL;DR - Just want the code Exercises References Exercise answers Discuss the article Introduction This is the first post in the R:case4base series. The aim of the series is to elaborate on very useful features of base R that are lesser known and many times substituted with custom functionality...

Read more »

The smaller the p-value, the higher the likelihood ratio: true or false?

April 6, 2018
By

Someone recently said to me that the lower the p-value, the higher the likelihood ratio under the alternative vs the null. The arXiv paper by Michael Lew makes analogous points (thanks to Titus von der Malsb...

Read more »

magrittr and wrapr Pipes in R, an Examination

April 6, 2018
By

Let’s consider piping in R both using the magrittr package and using the wrapr package. magrittr pipelines The magittr pipe glyph “%__%” is the most popular piping symbol in R. magrittr documentation describes %__% as follow. Basic piping: x %__% f is equivalent to f(x) x %__% f(y) is equivalent to f(x, y) x %__% … Continue reading magrittr...

Read more »

Tinkering with Competitive Supertimes

April 6, 2018
By
Tinkering with Competitive Supertimes

I’m back on the R thang with F1 data from ergast.com, and started having a look at how drivers and teams compare at a circuit. One metric I came across for comparing teams over a season is the supertime, typically calculated for each manufacturer as the average of their fastest single lap recorded by the team

Read more »

Adding macOS Touch Bar Support to RStudio

April 6, 2018
By
Adding macOS Touch Bar Support to RStudio

Modern MacBook Pros have a fairly useless (c’mon, admit it!) “Touch Bar” that did little more than cause severe ire in the developer community after turning a full-fledged, tactile Escape key into a hollow version if its former self. Having said, that, some apps do make OK use of it, with Fantastical and Omnigraffle being... Continue reading →

Read more »

Benchmarking the six most used manipulations for data.tables in R

April 6, 2018
By
Benchmarking the six most used manipulations for data.tables in R

Almost everybody that handles large data sets in R is familiar with the data.table package. It provides several functions to subset, merge, and manipulate tabular data. The post Benchmarking the six most used manipulations for data.tables in R appeared first on Opremic.

Read more »

Tips for Lightning Talks

April 6, 2018
By

It seems a little counter-intuitive, but a 5 minute lightning talk is far more difficult to prepare (and present!) than a standard 20 minute or longer talk. The principle challenge is fitting everything that you want to say into the allotted time, while still maintaining an engaging narrative. At the recent satRday conference in Cape Town (17 March 2018) we...

Read more »

Writing better R functions part one – April 6, 2018

April 5, 2018
By
Writing better R functions part one – April 6, 2018

One of the nicest things about working with R is that with very little effort you can customize and automate activities to produce the output you want – just the way you want it. You can contrast that with more monolithic packages that may allow you to do a bit of scripting, but for the most part, the price...

Read more »

Where is the value in package peer review?

Where is the value in package peer review?

If you read my reflection #1 on rOpenSci Onboarding, then you know I see value in the Onboarding process. A LOT of value even. This post is about where that value lies. This question has important corollaries which I will explore here based on my experience as a reviewer of bowerbird: How is a package peer reviewer’s time best spent? When is...

Read more »

A few podcast recommendations

April 5, 2018
By

After avoiding the entire medium for years, I've been rather getting into listening to podcasts lately. As a worker-from-home I don't have a commute (the natural use case of podcasts, I guess), but I have been travelling a lot more recently and it's been great to listen to during long flights. It turns out there are a lot of...

Read more »

Advanced Raster Data: Exercises

April 5, 2018
By
Advanced Raster Data: Exercises

Geospatial data is becoming increasingly used to solve numerous ‘real-life’ problems (check out some examples here.) In turn, R is becoming a powerful open-source solution to handle this type of data, currently providing an exceptional range of functions and tools for GIS and Remote Sensing data analysis. In particular, raster data provides support for representing Related exercise sets:Advanced Techniques...

Read more »

Laminar flow with ggplot2 and gganimate

April 5, 2018
By
Laminar flow with ggplot2 and gganimate

Preface I’ve realized that all my previous posts were quite substantial in length and took quite a long time to create them. From this point forward I’ll be generating posts of shorter length (partially for my sanity and more for my impulsivity with ideas). A few of these posts won’t be public health related (like... Continue Reading →

Read more »

Posterior probability of the null hypothesis being true, given a significant effect

April 5, 2018
By

For some reason, I am unable to load this post to google blogger. I have linked the post to an html file on my home page. Please comment here on this blog. Here is the post: Posterior probability of the null hypothesis being true, given a significant...

Read more »

Clojure Integration with R

April 4, 2018
By

(require ' ' ' ') ;; CREATE A TOY DATA (def ds [{:id 1.0 :name "name1"} {:id 2.0 :n...

Read more »

Not Hotdog: An R image classification application, using the Custom Vision API

April 4, 2018
By
Not Hotdog: An R image classification application, using the Custom Vision API

If you're a fan of the HBO show Silicon Valley, you probably remember the episode where Jian Yang creates an application to identify food using a smartphone phone camera: Surprisingly, the app in that scene isn't just a pre-recorded special effect: the producers actually developed a smartphone application using Tensorflow (and you can even download the app for your...

Read more »

The Travelling Salesman Portrait

April 4, 2018
By
The Travelling Salesman Portrait

I have noticed even people who claim everything is predestined, and that we can do nothing to change it, look before they cross the road (Stephen Hawking) Imagine a salesman and a set of cities. The salesman has to visit each one of the cities starting from a certain one and returning to the same … Continue reading The...

Read more »

Four Years of Practical Data Science with R

April 4, 2018
By
Four Years of Practical Data Science with R

Four years ago today authors Nina Zumel and John Mount received our author’s copies of Practical Data Science with R! It has its imitators, but it remains the best “I have R, now what do I do with it?” book (as it works the user through non-trivial projects, analyses, presentations, predictive analytic, data science, and … Continue reading Four...

Read more »

Constricted development with reticulate

April 4, 2018
By

I’ve been using the reticulate package occasionally for a while now, so I was surprised to see that it had only just been officially released. reticulate: R interface to Python https://t.co/qVWmwoMQAP. Comprehensive set of interoperability tools including R Markdown Python...Continue Reading →

Read more »

Exploring R-Bloggers Posts with the Feedly API

April 4, 2018
By
Exploring R-Bloggers Posts with the Feedly API

There’s a yuge chance you’re reading this post (at least initially) on R-Bloggers right now (though you should also check out R Weekly and add their live feed to your RSS reader pronto!). It’s a central “watering hole” for R folks and is read by many (IIRC over 20,000 Feedly users have it in their... Continue reading →

Read more »

Moving from RPubs to Github documents

April 4, 2018
By

If you still follow my Twitter feed – I pity you, as it’s been rather boring of late. Consisting largely of Github commit messages, many including the words “knit to github document”. Here’s why. RPubs, an early offering from RStudio, has been a great platform for easy and free publishing of HTML documents generated from … Continue reading Moving...

Read more »

Design Patterns in R

April 4, 2018
By

These notes are inspired by a talk by Stuart Sierra on Design Patterns in Functional Programming and some thoughts I found on F# for fun an profit and are reflection on how I use different strategies to solve things in R. Design Pattern seems...

Read more »

What is tidy eval and why should I care?

April 3, 2018
By
What is tidy eval and why should I care?

Nic Crane, Data Scientist This article was first published on Nic Crane's Blog and kindly contributed to the Mango Blog. I’m going to begin this post somewhat backwards, and start with the conclusion: tidy eval is important to anyone who writes R functions and uses dplyr and/or tidyr. I’m going to load a couple of packages, and then show you exactly why. library(dplyr) library(rlang) Data...

Read more »

Dominik is coming back on-board to manage Appsilon’s Open Source

Dominik is coming back on-board to manage Appsilon’s Open Source

Meet Dominik, our Open Source Tech Lead Before Dominik took a leadership role he used to be a Data Scientist at Appsilon. He has five years of experience in Python and R programming, mostly from data science and machine learning related project...

Read more »

Search R-bloggers


Sponsors

Mango solutions





Zero Inflated Models and Generalized Linear Mixed Models with R

Analytics Vidhya



datasciencego.com

Quantide: statistical consulting and training

ODSC2 west

ODSC1_london

datasociety

http://www.eoda.de

max kuhn









Six Sigma Online Training

mljar.com

Our ads respect your privacy. Read our Privacy Policy page to learn more.

Contact us if you wish to help support R-bloggers, and place your banner here.