Articles by Unknown

TV Shows on the “Big 3” Streaming Services

August 10, 2020 | Unknown

2020 has been a tough year, and I've been doing my best to keep busy (and distracted from all the insanity - both at the personal and worldwide levels). Earlier this year, I took a course in machine learning techniques and have been working on applying...
[Read more...]

TV Shows on the “Big 3” Streaming Services

August 10, 2020 | Unknown

2020 has been a tough year, and I've been doing my best to keep busy (and distracted from all the insanity - both at the personal and worldwide levels). Earlier this year, I took a course in machine learning techniques and have been working on applying...
[Read more...]

Statistics Sunday: My 2019 Reading

May 3, 2020 | Unknown

I've spent the month of April blogging my way through the tidyverse, while using my reading dataset from 2019 as the example. Today, I thought I'd bring many of those analyses and data manipulation techniques together to do a post about my reading habits for the year. library(tidyverse) ## -- Attaching ...
[Read more...]

Z is for Additional Axes

April 30, 2020 | Unknown

Here we are at the last post in Blogging A to Z! Today, I want to talk about adding additional axes to your ggplot, using the options for fill or color. While these aren't true z-axes in the geometric sense, I think of them as a third, z, axis.Some ...
[Read more...]

Y is for scale_y

April 29, 2020 | Unknown

Yesterday, I talked about scale_x. Today, I'll continue on that topic, focusing on the y-axis.The key to using any of the scale_ functions is to know what sort of data you're working with (e.g., date, continuous, discrete). Yesterday, I talked about sc...
[Read more...]

X is for scale_x

April 28, 2020 | Unknown

These next two posts will deal with formatting scales in ggplot2 - x-axis, y-axis - so I'll try to limit the amount of overlap and repetition.Let's say I wanted to plot my reading over time, specifically as a cumulative sum of pages across the year. My...
[Read more...]

W is for Write and Read Data – Fast

April 27, 2020 | Unknown

Once again, I'm dipping outside of the tidyverse, but this package and its functions have been really useful in getting data quickly in (and out) of R.For work, I have to pull in data from a few different sources, and manipulate and work with them to g... [Read more...]

V is for Verbs

April 25, 2020 | Unknown

In this series, I've covered five terms for data manipulation:arrangefiltermutateselectsummariseThese are the verbs that make up the grammar of data manipulation. They all work with group_by to perform these functions groupwise.There are scoped version...
[Read more...]

U is for Useful Trick

April 24, 2020 | Unknown

This will be a very short post for a line of code I've found unbelievably useful as I analyze data for work. I'm working with datasets containing millions of rows of data. (The most recent one I worked with had about 13 million records.) Because R load...
[Read more...]

T is for Themes

April 23, 2020 | Unknown

One of the easiest ways to make a beautiful ggplot is by using a theme. ggplot2 comes with a variety of pre-existing themes. I'll use the genre statistics summary table I created in yesterday's post, and create the same chart with different themes.libr...
[Read more...]

S is for summarise

April 22, 2020 | Unknown

Today, we'll finally talk about summarise! It's very similar to mutate, but instead of adding or altering a variable in a dataset, it aggregates your data, creating a new tibble with the columns containing your requested summary data. The number of rows will be equal to the number of groups ...
[Read more...]

R is for read_

April 21, 2020 | Unknown

The tidyverse is full of functions for reading data, beginning with "read_". The read_csv I've used to access my reads2019 data is one example, falling under the read_delim functions. read_tsv allows you to quickly read in tab-delimited files. And you ...
[Read more...]

Q is for qplot versus ggplot

April 20, 2020 | Unknown

Two years ago, when I did Blogging A to Z of R, I talked about qplots. qplots are great for quick plots - which is why they're named as such - because they use variable types to determine the best plot to generate. For instance, if I give it a ...
[Read more...]

P is for percent

April 18, 2020 | Unknown

We've used ggplots throughout this blog series, but today, I want to introduce another package that helps you customize scales on your ggplots - the scales package. I use this package most frequently to format scales as percent. There aren't a lot of g...
[Read more...]

O is for order_by

April 17, 2020 | Unknown

This will be a quick post on another tidyverse function, order_by. I'll admit, I don't use this one as often as arrange. It can be useful, though, if you don't want to permanently change the order of your dataset but want to use functions that require ...
[Read more...]

N is for n_distinct

April 16, 2020 | Unknown

Today, we'll start digging into some of the functions used to summarise data. The full summarise function will be covered for the letter S. For now, let's look at one function from the tidyverse that can give some overall information about a dataset: n...
[Read more...]

M is for mutate

April 15, 2020 | Unknown

Today, we finally talk about the mutate function! I've used it a lot throughout the series so far, so it's nice to get to discuss what it is and how it works.The mutate function is used anytime you want create or modify a variable. It works with pretty much ...
[Read more...]

L is for Log Transformation

April 14, 2020 | Unknown

When visualizing data, outliers and skewed data can have a huge impact, potentially making your visualization difficult to understand. We can use many of the tricks covered so far to deal with those issues, such as using filters to remove extreme value...
[Read more...]
1 2

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)