# Articles by Unknown

### TV Shows on the “Big 3” Streaming Services

August 10, 2020 |

2020 has been a tough year, and I've been doing my best to keep busy (and distracted from all the insanity - both at the personal and worldwide levels). Earlier this year, I took a course in machine learning techniques and have been working on applying...

### TV Shows on the “Big 3” Streaming Services

August 10, 2020 |

2020 has been a tough year, and I've been doing my best to keep busy (and distracted from all the insanity - both at the personal and worldwide levels). Earlier this year, I took a course in machine learning techniques and have been working on applying...

### Flying Saucers and Bright Lights: A Data Visualization

June 25, 2020 |

UFO Sightings by Shape and Year Earlier last week, I taught part 2 of a course on using R and tidyverse for my work colleagues. I wanted a fun dataset to use as an example for coding exercises throughout. There was really only one choice.I found this great dataset through ...

### Flying Saucers and Bright Lights: A Data Visualization

June 25, 2020 |

UFO Sightings by Shape and Year Earlier last week, I taught part 2 of a course on using R and tidyverse for my work colleagues. I wanted a fun dataset to use as an example for coding exercises throughout. There was really only one choice. I found this great dataset through ...

### Statistics Sunday: My 2019 Reading

May 3, 2020 |

I've spent the month of April blogging my way through the tidyverse, while using my reading dataset from 2019 as the example. Today, I thought I'd bring many of those analyses and data manipulation techniques together to do a post about my reading habits for the year. library(tidyverse) ## -- Attaching ...

### Z is for Additional Axes

April 30, 2020 |

Here we are at the last post in Blogging A to Z! Today, I want to talk about adding additional axes to your ggplot, using the options for fill or color. While these aren't true z-axes in the geometric sense, I think of them as a third, z, axis.Some ...

### Y is for scale_y

April 29, 2020 |

Yesterday, I talked about scale_x. Today, I'll continue on that topic, focusing on the y-axis.The key to using any of the scale_ functions is to know what sort of data you're working with (e.g., date, continuous, discrete). Yesterday, I talked about sc...

### X is for scale_x

April 28, 2020 |

These next two posts will deal with formatting scales in ggplot2 - x-axis, y-axis - so I'll try to limit the amount of overlap and repetition.Let's say I wanted to plot my reading over time, specifically as a cumulative sum of pages across the year. My...

### W is for Write and Read Data – Fast

April 27, 2020 |

Once again, I'm dipping outside of the tidyverse, but this package and its functions have been really useful in getting data quickly in (and out) of R.For work, I have to pull in data from a few different sources, and manipulate and work with them to g... [Read more...]

### V is for Verbs

April 25, 2020 |

In this series, I've covered five terms for data manipulation:arrangefiltermutateselectsummariseThese are the verbs that make up the grammar of data manipulation. They all work with group_by to perform these functions groupwise.There are scoped version...

### U is for Useful Trick

April 24, 2020 |

This will be a very short post for a line of code I've found unbelievably useful as I analyze data for work. I'm working with datasets containing millions of rows of data. (The most recent one I worked with had about 13 million records.) Because R load...

### T is for Themes

April 23, 2020 |

One of the easiest ways to make a beautiful ggplot is by using a theme. ggplot2 comes with a variety of pre-existing themes. I'll use the genre statistics summary table I created in yesterday's post, and create the same chart with different themes.libr...

### S is for summarise

April 22, 2020 |

Today, we'll finally talk about summarise! It's very similar to mutate, but instead of adding or altering a variable in a dataset, it aggregates your data, creating a new tibble with the columns containing your requested summary data. The number of rows will be equal to the number of groups ...

April 21, 2020 |

The tidyverse is full of functions for reading data, beginning with "read_". The read_csv I've used to access my reads2019 data is one example, falling under the read_delim functions. read_tsv allows you to quickly read in tab-delimited files. And you ...

### Q is for qplot versus ggplot

April 20, 2020 |

Two years ago, when I did Blogging A to Z of R, I talked about qplots. qplots are great for quick plots - which is why they're named as such - because they use variable types to determine the best plot to generate. For instance, if I give it a ...

### P is for percent

April 18, 2020 |

We've used ggplots throughout this blog series, but today, I want to introduce another package that helps you customize scales on your ggplots - the scales package. I use this package most frequently to format scales as percent. There aren't a lot of g...

### O is for order_by

April 17, 2020 |

This will be a quick post on another tidyverse function, order_by. I'll admit, I don't use this one as often as arrange. It can be useful, though, if you don't want to permanently change the order of your dataset but want to use functions that require ...

### N is for n_distinct

April 16, 2020 |

Today, we'll start digging into some of the functions used to summarise data. The full summarise function will be covered for the letter S. For now, let's look at one function from the tidyverse that can give some overall information about a dataset: n...

### M is for mutate

April 15, 2020 |

Today, we finally talk about the mutate function! I've used it a lot throughout the series so far, so it's nice to get to discuss what it is and how it works.The mutate function is used anytime you want create or modify a variable. It works with pretty much ...