Pivoting data frames just got easier thanks to `pivot_wide()` and `pivot_long()`

Pivoting data frames just got easier thanks to `pivot_wide()` and `pivot_long()`

There’s a lot going on in the development version of {tidyr}. New functions for pivoting data frames, pivot_wide() and pivot_long() are coming, and will replace the current functions, spread() and gather(). spread() and gather() will remain in the package though: You may have heard a rumour that gather/spread are going away. This is simply not true (they’ll stay around forever) but I...

Read more »

Data Science Software Reviews: Forrester vs. Gartner

March 19, 2019
By
Data Science Software Reviews: Forrester vs. Gartner

In my previous post, I discussed Gartner's reviews of data science software companies. In this post, I show Forrester's coverage and discuss how radically different it is. As usual, this post is already integrated into my regularly-updated article, The Popularity of Data Science Software. Continue reading →

Read more »

The importance of Graphing Your Data – Anscombe’s Clever Quartet!

March 19, 2019
By
The importance of Graphing Your Data – Anscombe’s Clever Quartet!

Francis Anscombe's seminal paper on "Graphs in Statistical" analysis (American Statistician, 1973) effectively makes the case that looking at summary statistics of data is insufficient to identify the relationship between variables. He demonstrates this by generating four different data sets (Anscombe's quartet) which have nearly identical summary statistics. His data have the same mean and variance for x...

Read more »

R and labelled data: Using quasiquotation to add variable and value labels #rstats

March 19, 2019
By

Labelling data is typically a task for end-users and is applied in own scripts or functions rather than in packages. However, sometimes it can be useful for both end-users and package developers to have a flexible way to add variable and value labels to their data. In such cases, quasiquotation is helpful. This vignette demonstrate how to … Weiterlesen R and...

Read more »

Tidyverse users: gather/spread are on the way out

March 19, 2019
By
Tidyverse users: gather/spread are on the way out

From https://twitter.com/sharon000/status/1107771331012108288: From https://tidyr.tidyverse.org/dev/articles/pivot.html: There are two important new features inspired by other R packages that have been advancing of reshaping in R: The reshaping operation can be specified with a data frame that describes precisely how metadata stored in column names becomes data variables (and vice versa). This is inspired by the cdata package … Continue reading Tidyverse...

Read more »

Learning Data Science: Predicting Income Brackets

March 19, 2019
By
Learning Data Science: Predicting Income Brackets

As promised in the post Learning Data Science: Modelling Basics we will now go a step further and try to predict income brackets with real world data and different modelling approaches. We will learn a thing or two along the way, e.g. about the so-called Accuracy-Interpretability Trade-Off, so read on… The data we will use … Continue reading "Learning...

Read more »

Assumptions Matter More Than Dependencies

March 18, 2019
By

There’s been alot of talk about “dependencies” in the R universe of late. This is not really a post about that but more of a “really, don’t do this” if you decide you want to poke the dependency bear by trying to build a deeply flawed model off of CRAN package metadata. CRAN packages undergo... Continue reading →

Read more »

The Credibility Crisis in Data Science

March 18, 2019
By
The Credibility Crisis in Data Science

Hugo Bowne-Anderson, the host of DataFramed, the DataCamp podcast, recently interviewed Skipper Seabold, a Director of Data Science at Civis Analytics. Introducing Skipper Seabold Hugo: Hi there, Skipper, and welcome to Data Framed. Skipper: Thanks. Happy to...

Read more »

RStudio Connect Quickstart

March 18, 2019
By
RStudio Connect Quickstart

RStudio have recently announced ‘RStudio Connect QuickStart’ which is a VM containing a full suite of RStudio’s pro tools, available to be trialled for a 45 day period. RStudio Connect Quickstart allows R users and people exploring the idea of using R in production, a quick and easy way to set-up a full, production-like environment that contains all of...

Read more »

A gentle introduction to SHAP values in R

March 18, 2019
By
A gentle introduction to SHAP values in R

Opening the black-box in complex models: SHAP values. What are they and how to draw conclusions from them? With R code example!

Read more »

Quantifying R Package Dependency Risk

March 18, 2019
By
Quantifying R Package Dependency Risk

We recently commented on excess package dependencies as representing risk in the R package ecosystem. The question remains: how much risk? Is low dependency a mere talisman, or is there evidence it is a good practice (or at least correlates with other good practices)? Well, it turns out we can quantify it: each additional non-core … Continue reading Quantifying...

Read more »

Download and Plot Factor Returns from the Fama-French Research Data Library

March 18, 2019
By
Download and Plot Factor Returns from the Fama-French Research Data Library

CategoriesGetting Data Tags Data Management Plot R Programming Since the initial publication of the Three Factor Model by Eugene Fama and Kenneth French in their influential 1993 paper (Common Risk Factors in the Returns of Stocks and Bonds) a lot of academic research has been dedicated to the analysis of factors driving security returns. With the rise of quantitative investment management, this field Related...

Read more »

Handling & Sharing PCAPs Like a Boss with PacketTotal

March 17, 2019
By

The fine folks over at @PacketTotal bequeathed an API token on me so I cranked out an R package for it to enable more dynamic investigations work (RStudio makes for an amazing incident responder investigations console given that you can script in multiple languages, code in C, and write documentation all at the same time... Continue reading →

Read more »

Are R ecosystems the future?

March 17, 2019
By

Some random thoughts… Over the past 6 months I’ve been creating, refining, and delivering a variety of ‘Introduction to R’ training courses. The more I do this, the more I come to the view that not nearly enough is made of taking an ecosystem-oriented view to packages. A good way of talking about #rstats functionality is in terms of ecosystems, rather...

Read more »

The reticulate package solves the hardest problem in data science: people

March 17, 2019
By
The reticulate package solves the hardest problem in data science: people

Andrew Mangano is the Director of eCommerce Analytics at Albertsons Companies. Part I - Modelling The reticulate package integrates Python within R and, when used with RStudio 1.2, brings the two languages together like never before. Much more important than the technical details of how it all works is the impact that it has on on both individuals and teams by...

Read more »

R meta programmation

March 17, 2019
By
R meta programmation

Metaprogrammation impact

Read more »

Network Analysis of Emotions

March 17, 2019
By
Network Analysis of Emotions

In this month’s post, I set out to create a visual network of emotions. Emotion Dynamics tells us that different emotions are highly interconnected, such that one emotion morphs into another and so on. I’ll be using a large dataset from an original study published in PLOS ONE by Trampe, Quoidbach, and Taquet (2015). Thanks to Google Dataset Search,...

Read more »

drake transformed

Version 7.0.0 of drake just arrived on CRAN, and it is faster and easier to use than previous releases. install.packages("drake") Recap Data analysis can be slow. A round of scientific computation can take several minutes, hours, or even days to complete. After it finishes, if you update your code or data, your hard-earned results may no longer be valid. How much of...

Read more »

Code and Data in a large Machine Learning project

March 17, 2019
By

We did a large machine learning project at work recently. It involved two data scientists, two backend engineers and a data engineer, all working on-and-off on the R code during the project. The project had many interesting and new aspects to me, among them are doing data science in an agilish way, how to keep track of the different...

Read more »

Access the free economic database DBnomics with R

March 17, 2019
By
Access the free economic database DBnomics with R

DBnomics : the world’s economic database Explore all the economic data from different providers (national and international statistical institutes, central banks, etc.), for free, following the link db.nomics.world. You can also retrieve all the economic data through the rdbnomics package here. This blog post describes the different ways to do so. Fetch time series by ids First, let’s assume that we know which series we...

Read more »

RQuantLib 0.4.8: Small updates

March 17, 2019
By

A new version 0.4.8 of RQuantLib reached CRAN and Debian. This release was triggered by a CRAN request for an update to the configure.ac script which was easy enough (and which, as it happens, did not result in changes in the configure script produce...

Read more »

Rcpp 1.0.1: Updates

March 17, 2019
By

Following up on the 10th anniversary and the 1.0.0. release, we excited to share the news of the first update release 1.0.1 of Rcpp. package turned ten on Monday—and we used to opportunity to mark the current version as 1.0.0! It arrived at CRAN ov...

Read more »

Tipster Season

March 16, 2019
By
Tipster Season

So it is approaching AFL mens season, which means that soon everyones twitter feed, Facebook and emails will get clogged up with various tipsters. People saying they have won at 60% of the time over last season and therefor you should pay them money and follow their tips! But how can you assess the accuracy of a tipster? Very few...

Read more »

wrapr::let()

March 16, 2019
By

I would like to once again recommend our readers to our note on wrapr::let(), an R function that can help you eliminate many problematic NSE (non-standard evaluation) interfaces (and their associate problems) from your R programming tasks. The idea is to imitate the following lambda-calculus idea: let x be y in z := ( λ … Continue reading wrapr::let()

Read more »

How to create professional reports from R scripts, with custom styles.

March 16, 2019
By
How to create professional reports from R scripts, with custom styles.

Introduction If the practical tips for R Markdown post we talked briefly about how we can easily create professional reports directly from R scripts, without the need for converting them manually to Rmd and creating code chunks. In this one, we will provide useful tips on advanced options for styling, using themes and producing light-weight HTML reports directly from R...

Read more »

Version 0.7.1 of NIMBLE released

March 15, 2019
By
Version 0.7.1 of NIMBLE released

We’ve released the newest version of NIMBLE on CRAN and on our website. NIMBLE is a system for building and sharing analysis methods for statistical models, especially for hierarchical models and computationally-intensive methods (such as MCMC and SMC). Version 0.7.1 is primarily a maintenance release with a couple important bug fixes and a few additional

Read more »

Adding Custom Fonts to ggplot in R

March 15, 2019
By
Adding Custom Fonts to ggplot in R

ggplot – You can spot one from a mile away, which is great! And when you do it’s a silent The post Adding Custom Fonts to ggplot in R appeared first on Daniel Oehm | Gradient Descending.

Read more »

littler 0.3.7: Small tweaks

March 15, 2019
By
littler 0.3.7: Small tweaks

The eight release of littler as a CRAN package is now available, following in the thirteen-ish year history as a package started by Jeff in 2006, and joined by me a few weeks later. littler is the first command-line interface for R and predates Rscript. And it is (in my very biased eyes) better as it allows for piping as...

Read more »

Scraping old player data

March 15, 2019
By
Scraping old player data

As its been pointed out to me on that it would be handy if within fitzRoy that it should contain past players data from footywire. So here is roughly how to do that. Step 1 - get all the packages you need library(rvest) ## Loading required package: xml2 library(tidyverse) ## ── Attaching packages ──────────────── tidyverse 1.2.1 ── ## ✔ ggplot2 3.1.0 ...

Read more »

Search R-bloggers


Sponsors

Mango solutions







Zero Inflated Models and Generalized Linear Mixed Models with R



wiley.com/learn/datascience

Quantide: statistical consulting and training

ODSC boston

http://www.eoda.de









Six Sigma Online Training

mljar.com

Our ads respect your privacy. Read our Privacy Policy page to learn more.

Contact us if you wish to help support R-bloggers, and place your banner here.