A Quick and Tidy Look at the 2018 GSS

March 22, 2019
By
A Quick and Tidy Look at the 2018 GSS

The data from the 2018 wave of the General Social Survey was released during the week, leading to a flurry of graphs showing various trends. The GSS is one of the most important sources of information on various aspects of U.S. society. One of the best things about it is that the data is freely available for more than...

Read more »

AFL teams Elo ratings and footy-tipping by @ellis2013nz

March 22, 2019
By
AFL teams Elo ratings and footy-tipping by @ellis2013nz

So now that I live in Melbourne, to blend in with the locals I need to at least vaguely follow the AFL (Australian Football League). For instance, my work like many others has an AFL footy-tipping competition. I was initially going to choose my tips ba...

Read more »

Python or R? Why not both?

March 22, 2019
By
Python or R? Why not both?

How to use Python code inside R

Read more »

Human Face Detection with R

March 22, 2019
By
Human Face Detection with R

Doing human face detection with computer vision is probably something you do once unless you work for police departments, you work in the surveillance industry or for the Chinese government. In order to reduce the time you lose on that small exercise, bnosac created a small R package (source code available at https://github.com/bnosac/image) which wraps the weights of a...

Read more »

How to Speed Up Gradient Boosting by a Factor of Two

March 22, 2019
By
How to Speed Up Gradient Boosting by a Factor of Two

Our latest tool development at STATWORX: random boost, an algorithm twice as fast as gradient boosting, with comparable prediction performance. Der Beitrag How to Speed Up Gradient Boosting by a Factor of Two erschien zuerst auf STATWORX.

Read more »

How long since your team scored 100+ points? This blog’s first foray into the fitzRoy R package

March 21, 2019
By
How long since your team scored 100+ points? This blog’s first foray into the fitzRoy R package

When this blog moved from bioinformatics to data science I ran a Twitter poll to ask whether I should start afresh at a new site or continue here. “Continue here”, you said. So let’s test the tolerance of the long-time audience and celebrate the start of the 2019 season as we venture into the world … Continue reading How...

Read more »

How to Choose the Best Open Source Software

March 21, 2019
By
How to Choose the Best Open Source Software

Photo by Pankaj Patel on UnsplashAfter reading the O’Reilly book “Foundations for Architecting Data Solutions”, by Ted Malaska and Jonathan Seidman, I reflected on how I chose software/tools/solutions in the past and how I should choose them going forward. As a bioinformatician you need to be able to quickly discern whether a publication/tool is really a major advancement or just marginally better....

Read more »

RStudio Connect 1.7.2

March 21, 2019
By
RStudio Connect 1.7.2

RStudio Connect 1.7.2 is ready to download, and this release contains some long-awaited functionality that we are excited to share. Several authentication and user-management tooling improvements have been added, including the ability to change authentication providers on an existing server, new group support options, and the official introduction of SAML as a supported authentication provider (currently a beta feature*). But that’s not all… keep...

Read more »

Upcoming talks in spring 2019

March 21, 2019
By
Upcoming talks in spring 2019

This spring, I’ll be giving talks at a couple of Meetups and conferences: March, 26th: At the data lounge Bremen, I’ll be talking about Explainable Machine Learning April, 11th: At the Data Science Meetup Bielefeld, I’ll be talking about Bu...

Read more »

lconnect connectivity metrics

March 21, 2019
By

In our package lconnect we use the Integral Index of connectivity to obtain patch importance, but several other metrics are currently available. A description of each of this metrics can be found below. For more information about each metric please see the references provided. At the end of the post an example using the function … Continue reading lconnect...

Read more »

Integrating Qlik Sense and R

March 21, 2019
By
Integrating Qlik Sense and R

Components Qlik Sense is a tool for exploratory data analysis and visualisation. It’s powerful and versatile. It’s can, however, be significantly enhanced by interfacing with R. Qlik Sense does not currently integrate directly with R. However, it’s not too tricky to get the two systems talking to each other. We’ll need two things to make this happen: Rserve — A TCP/IP...

Read more »

Package lconnect: patch connectivity metrics and patch prioritization

March 20, 2019
By
Package lconnect: patch connectivity metrics and patch prioritization

Today we are presenting a new package, lconnect. This package is intended to be a very simple approach to derive landscape connectivity metrics. Many of these metrics come from the interpretation of landscape as graphs. Additionally, it also provides a function to prioritize landscape patches based on their contribution to the overall landscape connectivity. For now … Continue reading Package...

Read more »

How to Avoid Publishing Credentials in Your Code

March 20, 2019
By
How to Avoid Publishing Credentials in Your Code

Roland Stevenson is a data scientist and consultant who may be reached on Linkedin. When accessing an API or database in R, it is often necessary to provide credentials such as a login name and password. You may find yourself being prompted with something like this: When writing an R script that requires a user to provide credentials, you will want...

Read more »

All Around The World: Maps and Flags in R

March 20, 2019
By
All Around The World: Maps and Flags in R

Our lab is international. People born all over the world have come to work in my group. I’m proud of this fact, especially in the current political climate. I’ve previously used the GoogleMaps API to display a heat map on our lab webpage. It shows where in the world people in the lab come from.

Read more »

RSAGA 1.0.0

March 19, 2019
By

RSAGA 1.0.0 has been released on CRAN. The RSAGA package provides an interface between R and the open-source geographic information system SAGA, which offers a variety of geoscientific methods to analyse spatial data. SAGA GIS is supp...

Read more »

Best Fantasy Player of All Time

March 19, 2019
By
Best Fantasy Player of All Time

One of the things I have always wondered about AFL fantasy is just who is the best fantasy player of all time? Not the fan who wins the most but who is the best player. So one possible idea would be to work out the fantasy scores of players going back for all the time that is possible (YAY fitzRoy!)....

Read more »

Pivoting data frames just got easier thanks to `pivot_wide()` and `pivot_long()`

Pivoting data frames just got easier thanks to `pivot_wide()` and `pivot_long()`

There’s a lot going on in the development version of {tidyr}. New functions for pivoting data frames, pivot_wide() and pivot_long() are coming, and will replace the current functions, spread() and gather(). spread() and gather() will remain in the package though: You may have heard a rumour that gather/spread are going away. This is simply not true (they’ll stay around forever) but I...

Read more »

Data Science Software Reviews: Forrester vs. Gartner

March 19, 2019
By
Data Science Software Reviews: Forrester vs. Gartner

In my previous post, I discussed Gartner's reviews of data science software companies. In this post, I show Forrester's coverage and discuss how radically different it is. As usual, this post is already integrated into my regularly-updated article, The Popularity of Data Science Software. Continue reading →

Read more »

The importance of Graphing Your Data – Anscombe’s Clever Quartet!

March 19, 2019
By
The importance of Graphing Your Data – Anscombe’s Clever Quartet!

Francis Anscombe's seminal paper on "Graphs in Statistical" analysis (American Statistician, 1973) effectively makes the case that looking at summary statistics of data is insufficient to identify the relationship between variables. He demonstrates this by generating four different data sets (Anscombe's quartet) which have nearly identical summary statistics. His data have the same mean and variance for x...

Read more »

Scooters, mapped

March 19, 2019
By
Scooters, mapped

Do you know where all the scooters are in your city? Available devices by provider in Los AngelesYou may have been following the scooter craze or have seen scooters or e-bikes pop-up on sidewalks throughout your neighborhood. Have you ever been curious about where all the scooters are located in your city? This map, built using R’s Shiny library, shows the current location of...

Read more »

R and labelled data: Using quasiquotation to add variable and value labels #rstats

March 19, 2019
By

Labelling data is typically a task for end-users and is applied in own scripts or functions rather than in packages. However, sometimes it can be useful for both end-users and package developers to have a flexible way to add variable and value labels to their data. In such cases, quasiquotation is helpful. This vignette demonstrate how to … Weiterlesen R and...

Read more »

Tidyverse users: gather/spread are on the way out

March 19, 2019
By
Tidyverse users: gather/spread are on the way out

From https://twitter.com/sharon000/status/1107771331012108288: From https://tidyr.tidyverse.org/dev/articles/pivot.html: There are two important new features inspired by other R packages that have been advancing of reshaping in R: The reshaping operation can be specified with a data frame that describes precisely how metadata stored in column names becomes data variables (and vice versa). This is inspired by the cdata package … Continue reading Tidyverse...

Read more »

Learning Data Science: Predicting Income Brackets

March 19, 2019
By
Learning Data Science: Predicting Income Brackets

As promised in the post Learning Data Science: Modelling Basics we will now go a step further and try to predict income brackets with real world data and different modelling approaches. We will learn a thing or two along the way, e.g. about the so-called Accuracy-Interpretability Trade-Off, so read on… The data we will use … Continue reading "Learning...

Read more »

Assumptions Matter More Than Dependencies

March 18, 2019
By

There’s been alot of talk about “dependencies” in the R universe of late. This is not really a post about that but more of a “really, don’t do this” if you decide you want to poke the dependency bear by trying to build a deeply flawed model off of CRAN package metadata. CRAN packages undergo... Continue reading →

Read more »

Using Scoped dplyr verbs

March 18, 2019
By

Introduction Over the past several months, I have really started to increase the amount that I have been using scoped dplyr verbs. For those of you who don’t know about these functions, they are handy variants to the normal dplyr verbs, such as filter, mutate, and summarize, that allow you to target multiple columns or all of your columns. These...

Read more »

The Credibility Crisis in Data Science

March 18, 2019
By
The Credibility Crisis in Data Science

Hugo Bowne-Anderson, the host of DataFramed, the DataCamp podcast, recently interviewed Skipper Seabold, a Director of Data Science at Civis Analytics. Introducing Skipper Seabold Hugo: Hi there, Skipper, and welcome to Data Framed. Skipper: Thanks. Happy to...

Read more »

RStudio Connect Quickstart

March 18, 2019
By
RStudio Connect Quickstart

RStudio have recently announced ‘RStudio Connect QuickStart’ which is a VM containing a full suite of RStudio’s pro tools, available to be trialled for a 45 day period. RStudio Connect Quickstart allows R users and people exploring the idea of using R in production, a quick and easy way to set-up a full, production-like environment that contains all of...

Read more »

A gentle introduction to SHAP values in R

March 18, 2019
By
A gentle introduction to SHAP values in R

Opening the black-box in complex models: SHAP values. What are they and how to draw conclusions from them? With R code example!

Read more »

Quantifying R Package Dependency Risk

March 18, 2019
By
Quantifying R Package Dependency Risk

We recently commented on excess package dependencies as representing risk in the R package ecosystem. The question remains: how much risk? Is low dependency a mere talisman, or is there evidence it is a good practice (or at least correlates with other good practices)? Well, it turns out we can quantify it: each additional non-core … Continue reading Quantifying...

Read more »

Search R-bloggers

Sponsors