Articles by Unknown

TV Shows on the “Big 3” Streaming Services

August 10, 2020 | Unknown

2020 has been a tough year, and I've been doing my best to keep busy (and distracted from all the insanity - both at the personal and worldwide levels). Earlier this year, I took a course in machine learning techniques and have been working on applying...

TV Shows on the “Big 3” Streaming Services

August 10, 2020 | Unknown

2020 has been a tough year, and I've been doing my best to keep busy (and distracted from all the insanity - both at the personal and worldwide levels). Earlier this year, I took a course in machine learning techniques and have been working on applying...

Flying Saucers and Bright Lights: A Data Visualization

June 25, 2020 | Unknown

UFO Sightings by Shape and Year Earlier last week, I taught part 2 of a course on using R and tidyverse for my work colleagues. I wanted a fun dataset to use as an example for coding exercises throughout. There was really only one choice.I found this great dataset through ...

Flying Saucers and Bright Lights: A Data Visualization

June 25, 2020 | Unknown

UFO Sightings by Shape and Year Earlier last week, I taught part 2 of a course on using R and tidyverse for my work colleagues. I wanted a fun dataset to use as an example for coding exercises throughout. There was really only one choice. I found this great dataset through ...

Statistics Sunday: My 2019 Reading

May 3, 2020 | Unknown

I've spent the month of April blogging my way through the tidyverse, while using my reading dataset from 2019 as the example. Today, I thought I'd bring many of those analyses and data manipulation techniques together to do a post about my reading habits for the year. library(tidyverse) ## -- Attaching ...

Z is for Additional Axes

April 30, 2020 | Unknown

Here we are at the last post in Blogging A to Z! Today, I want to talk about adding additional axes to your ggplot, using the options for fill or color. While these aren't true z-axes in the geometric sense, I think of them as a third, z, axis.Some ...

Y is for scale_y

April 29, 2020 | Unknown

Yesterday, I talked about scale_x. Today, I'll continue on that topic, focusing on the y-axis.The key to using any of the scale_ functions is to know what sort of data you're working with (e.g., date, continuous, discrete). Yesterday, I talked about sc...

X is for scale_x

April 28, 2020 | Unknown

These next two posts will deal with formatting scales in ggplot2 - x-axis, y-axis - so I'll try to limit the amount of overlap and repetition.Let's say I wanted to plot my reading over time, specifically as a cumulative sum of pages across the year. My...

W is for Write and Read Data – Fast

April 27, 2020 | Unknown

Once again, I'm dipping outside of the tidyverse, but this package and its functions have been really useful in getting data quickly in (and out) of R.For work, I have to pull in data from a few different sources, and manipulate and work with them to g... [Read more...]

V is for Verbs

April 25, 2020 | Unknown

In this series, I've covered five terms for data manipulation:arrangefiltermutateselectsummariseThese are the verbs that make up the grammar of data manipulation. They all work with group_by to perform these functions groupwise.There are scoped version...

U is for Useful Trick

April 24, 2020 | Unknown

This will be a very short post for a line of code I've found unbelievably useful as I analyze data for work. I'm working with datasets containing millions of rows of data. (The most recent one I worked with had about 13 million records.) Because R load...

T is for Themes

April 23, 2020 | Unknown

One of the easiest ways to make a beautiful ggplot is by using a theme. ggplot2 comes with a variety of pre-existing themes. I'll use the genre statistics summary table I created in yesterday's post, and create the same chart with different themes.libr...

S is for summarise

April 22, 2020 | Unknown

Today, we'll finally talk about summarise! It's very similar to mutate, but instead of adding or altering a variable in a dataset, it aggregates your data, creating a new tibble with the columns containing your requested summary data. The number of rows will be equal to the number of groups ...

R is for read_

April 21, 2020 | Unknown

The tidyverse is full of functions for reading data, beginning with "read_". The read_csv I've used to access my reads2019 data is one example, falling under the read_delim functions. read_tsv allows you to quickly read in tab-delimited files. And you ...

Q is for qplot versus ggplot

April 20, 2020 | Unknown

Two years ago, when I did Blogging A to Z of R, I talked about qplots. qplots are great for quick plots - which is why they're named as such - because they use variable types to determine the best plot to generate. For instance, if I give it a ...

P is for percent

April 18, 2020 | Unknown

We've used ggplots throughout this blog series, but today, I want to introduce another package that helps you customize scales on your ggplots - the scales package. I use this package most frequently to format scales as percent. There aren't a lot of g...

O is for order_by

April 17, 2020 | Unknown

This will be a quick post on another tidyverse function, order_by. I'll admit, I don't use this one as often as arrange. It can be useful, though, if you don't want to permanently change the order of your dataset but want to use functions that require ...

N is for n_distinct

April 16, 2020 | Unknown

Today, we'll start digging into some of the functions used to summarise data. The full summarise function will be covered for the letter S. For now, let's look at one function from the tidyverse that can give some overall information about a dataset: n...

M is for mutate

April 15, 2020 | Unknown

Today, we finally talk about the mutate function! I've used it a lot throughout the series so far, so it's nice to get to discuss what it is and how it works.The mutate function is used anytime you want create or modify a variable. It works with pretty much ...

L is for Log Transformation

April 14, 2020 | Unknown

When visualizing data, outliers and skewed data can have a huge impact, potentially making your visualization difficult to understand. We can use many of the tricks covered so far to deal with those issues, such as using filters to remove extreme value...

1 2 »

Copyright © 2025 | MH Corporate basic by MH Themes