Oh hai, my first post

July 18, 2019
By
Oh hai, my first post

What’s all this then? Hi. My name is Seth, and this is my new blog. I plan to use this blog to noodle around with the statistical programming language R, hopefully in fun and interesting ways, to continue to hone my programming and data analysis skills. I’ve been thinking about creating an R blog for some time, but the task always...

Read more »

Statistical matching, or when one single data source is not enough

Statistical matching, or when one single data source is not enough

I was recently asked how to go about matching several datasets where different samples of individuals were interviewed. This sounds like a big problem; say that you have dataset A and B, and that A contain one sample of individuals, and B another sample of individuals, then how could you possibly match the datasets? Matching datasets requires a common identifier, for instance, suppose...

Read more »

An R Users Guide to JSM 2019

July 18, 2019
By
An R Users Guide to JSM 2019

If you are like me, and rather last minute about making a plan to get the most out of a large conference, you are just starting to think about JSM 2019 which will begin in just a few days. My plans always begin with an attempt to sleuth out the R-related sessions. While in the past it took quite...

Read more »

Dotplot – the single most useful yet largely neglected dataviz type

July 18, 2019
By
Dotplot – the single most useful yet largely neglected dataviz type

I have to confess that the core message of this post is not really a fresh saying. But if I was given a chance to deliver one dataviz advise to every (ha-ha-ha) listening mind, I’d choose this: forget multi-category bar plots and use dotplots instead.

Read more »

Wordcloud of conference abstracts – FOSS4G Edinburgh

July 18, 2019
By
Wordcloud of conference abstracts – FOSS4G Edinburgh

FOSS4G conference wordcloud of abstracts. Code included!

Read more »

RStudio Trainer Directory Launches

July 17, 2019
By

Several dozen people have taken part in RStudio’s instructor training and certification program since it was announced earlier this year. Since our last update, many of them have completed certification, so we are pleased to announce a preview of our trainers’ directory. Each of the people listed there has completed an exam on modern evidence-based teaching practices, as well...

Read more »

Plotting Bayes Factors for multiple comparisons using ggsignif

Plotting Bayes Factors for multiple comparisons using ggsignif

This week my post is relatively short and very focused. What makes it interesting (at least to me) is whether it will be seen as a useful “bridge” between frequentist methods and bayesian methods or as an abomination to both! There’s some reasonably decent code and explanation in this post but before I spend much more time on the functionality I definitely want...

Read more »

Processing satellite image collections in R with the gdalcubes package

July 17, 2019
By
Processing satellite image collections in R with the gdalcubes package

The problem Introduction and overview of gdalcubes Installation Demo dataset Creating image collections Creating and processing data cubes Chaining data cube operations User-defined functions Interfacing with stars Future work [v...

Read more »

rOpenSci Hiring for New Position in Statistical Software Testing and Peer Review

Are you passionate about statistical methods and software? If so we would love for you to join our team to dig deep into the world of statistical software packages. You’ll develop standards for evaluating and reviewing statistical t...

Read more »

Combining momentum and value into a simple strategy to achieve higher returns

July 17, 2019
By
Combining momentum and value into a simple strategy to achieve higher returns

In this post I'll introduce a simple investing strategy that is well diversified and has been shown to work across different markets. In short, buying cheap and uptrending stocks has historically led to notably higher returns. The strategy is a combination of these two different investment styles, value and momentum. In a previous post I explained how the range of...

Read more »

An Ad-hoc Method for Calibrating Uncalibrated Models

July 16, 2019
By
An Ad-hoc Method for Calibrating Uncalibrated Models

In the previous article in this series, we showed that common ensemble models like random forest and gradient boosting are uncalibrated: they are not guaranteed to estimate aggregates or rollups of the data in an unbiased way. However, they can be preferable to calibrated models such as linear or generalized linear regression, when they make … Continue reading An...

Read more »

Three Strategies for Working with Big Data in R

July 16, 2019
By
Three Strategies for Working with Big Data in R

For many R users, it’s obvious why you’d want to use R with big data, but not so obvious how. In fact, many people (wrongly) believe that R just doesn’t work very well for big data. In this article, I’ll share three strategies for thinking about how to use big data in R, as well as some examples of how...

Read more »

101 Machine Learning Algorithms for Data Science with Cheat Sheets

July 16, 2019
By
101 Machine Learning Algorithms for Data Science with Cheat Sheets

Your one-stop-shop for machine learning algorithms. Each algorithm is complete with a short description and links to examples. If you would like to take the algorithms with you, click the little 'embed' button in the lower left-hand corner.

Read more »

shinymeta — a revolution for reproducibility

July 16, 2019
By
shinymeta — a revolution for reproducibility

Joe Cheng presented shinymeta enabling reproducibility in shiny at useR in July 2019. This is a simple application using shinymeta. You will see how reactivity and reproducibility do not exclude each other. I am really thankful for Joe Cheng realizing the shinymeta project. Introduction In 2018 at the R/Pharma conference I first heard of the concept of using quotations. With quotations to make...

Read more »

Shiny Modules

July 16, 2019
By
Shiny Modules

Tidiness is half the life .. this is a German saying that you might not necessarily have to live. While it becomes essential in programming, at least in my opinion. Because when you do not invest a little time into the order and structure of your...

Read more »

eRum2020 in Milan

July 16, 2019
By
eRum2020 in Milan

The European R conference will visit Milan in 2020! Mirai Solutions is delighted to actively support and participate in the organization of the event. The European R Users Meeting (eRum) is a biennial conference, taking place in Europe during those...

Read more »

Reinforcement Learning: Life is a Maze

July 16, 2019
By
Reinforcement Learning: Life is a Maze

It can be argued that most important decisions in life are some variant of an exploitation-exploration problem. Shall I stick with my current job or look for a new one? Shall I stay with my partner or seek a new love? Shall I continue reading the book or watch the movie instead? In all of … Continue reading "Reinforcement...

Read more »

Posts

July 15, 2019
By

. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. To leave a comment for the author, please follow the link and comment on their blog: Pachá....

Read more »

Bojack Horseman and Tidy Data Principles (Part 1)

July 15, 2019
By
Bojack Horseman and Tidy Data Principles (Part 1)

Motivation After reading The Life Changing Magic of Tidying Text and A tidy text analysis of Rick and Morty I wanted to do something similar for Rick and Morty and I did. Now I’m doing something similar for Bojack Horseman. In this post I’ll focus on the Tidy Data principles. However, here is the Github repo with the scripts to scrap...

Read more »

Pricing floating legs of interest rate swaps

Pricing floating legs of interest rate swaps

In this post we will close the trilogy on (old style) swap pricing. In particular, we will look at how downloading the data for the variable rate needed to calculate the variable leg accrual. Part 1 gave the general idea behind tidy pricing interest rate swaps using a 7 lines pipe Part 2 went much more into detail and priced some real world contract...

Read more »

Aggregating spatial data with the grainchanger package

Aggregating spatial data with the grainchanger package

The grainchanger package provides functionality for data aggregation to a coarser resolution via moving-window or direct methods. Why do we need new methods for data aggregation? As landscape ecologists and macroecologists, we often need to aggregate data in order to harmonise datasets. In doing so, we often lose a lot of information about the spatial structure and environmental heterogeneity of data...

Read more »

Estimating treatment effects (and ICCs) for stepped-wedge designs

July 15, 2019
By
Estimating treatment effects (and ICCs) for stepped-wedge designs

In the last two posts, I introduced the notion of time-varying intra-cluster correlations in the context of stepped-wedge study designs. (See here and here). Though I generated lots of data for those posts, I didn’t fit any models to see if I could recover the estimates and any underlying assumptions. That’s what I am doing now. My focus here is...

Read more »

Quick Hit: A Different (Diminutive) Look At Distributions With {ggeconodist}

July 15, 2019
By
Quick Hit: A Different (Diminutive) Look At Distributions With {ggeconodist}

Despite being a full-on denizen of all things digital I receive a fair number of dead-tree print magazines as there’s nothing quite like seeing an amazing, large, full-color print data-driven visualization up close and personal. I also like supporting data journalism through the subscriptions since without cash we will only have insane, extreme left/right-wing perspectives... Continue reading →

Read more »

Is Scholarly Use of R Use Beating SPSS Already?

July 15, 2019
By
Is Scholarly Use of R Use Beating SPSS Already?

by Bob Muenchen & Sean Mackinnon One of us (Muenchen) has been tracking The Popularity of Data Science Software using a variety of different approaches. One approach is to use Google Scholar to count the number of scholarly articles found … Continue reading →

Read more »

Twitter coverage of the useR! 2019 conference

July 15, 2019
By
Twitter coverage of the useR! 2019 conference

Very briefly: Last week was useR! conference time again, coming to you this time from Toulouse, France I’ve retrieved 8 318 tweets that mention #user2019 and run them through my report generator And here are the results Take-home message this year: the R Ladies rock!

Read more »

Looking at flood insurance claims with choroplethr

July 14, 2019
By
Looking at flood insurance claims with choroplethr

I recently learned how to use the choroplethr package through a short tutorial by the package author Ari Lamstein (youtube link here). To cement what I learned, I thought I would use this package to visualize flood insurance claims. I … Continue reading →

Read more »

Recreating ‘Unknown Pleasures’ graphic

July 14, 2019
By
Recreating ‘Unknown Pleasures’ graphic

For some time I’ve wanted to recreate the cover art from Joy Division’s Unknown Pleasures album. The visualisation depicts successive pulses from the pulsar PSR B1919+21, discovered by Jocelyn Bell in 1967. Album art.Data The first obstacle was acquiring the data. I found a D3 visualisation by Mike Bostock. This in turn pointed me to a CSV file in a gist...

Read more »

Distribution of Headline Sentiment

July 14, 2019
By
Distribution of Headline Sentiment

My web scraping project explored the distribution of headline sentiment by news source. To do this, I scraped the Nasdaq latest market headlines page and applied sentiment analysis to the retrieved text. It should be noted that I only scraped one web page, but this page aggregates headlines from multiple sources. I wanted to see

Read more »

Collecting and Analyzing Twitter Data Using R

July 14, 2019
By
Collecting and Analyzing Twitter Data Using R

How do you access Twitter’s API, collect a stream of tweets, and analyze the retrieved data? Which potentials, challenges, and limitations for social scientific research come along with using Twitter data? This Methods Bites Tutorial by Denis Cohen, based on a workshop by Simon Kühne (Bielefeld University) in the MZES Social Science Data Lab in Spring 2019, aims to...

Read more »

Search R-bloggers

Sponsors