Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

(TL;DR: Come on. This is pretty short. Productivity level up by harnessing the power of RStudio!)

# Motivation

Say we’re at work and we’ve received some data in Excel.

Let’s say that this Excel workbook contains a subset of a broader dataset. For example, maybe our Excel file contains:

• a list of customer ID numbers that we need to look into, or
• a list of dates where our data looks weird and we need to find out what has gone wrong.

Maybe the broader dataset is located within a database. Maybe we have a larger dataset within R itself. In any case, we want to use this subset of data to filter a broader dataset and get more info that might help us in our data analysis adventure!

# What I’ve seen a lot of people do in Excel (including myself!)

Let’s proceed with a date and time related example given my recent obsession with time zones. We’ve been given some file of dates to look into and it looks like this:

We want to collapse these into quoted, comma-separated strings that we’ll pass into the c() function. This will give us a character vector. Once we have a character vector, we want to convert these into POSIXct values using the power of vectorisation!

In the past, I would have embarassingly done something like this to enclose each date in single quotes with a trailing comma at the end of each line:

But this requires far too many keystrokes! My new mechanical keyboard with CHERRY MX Brown keyswitches are rated for only 50 million keystrokes per switch. I must be frugal with my keystrokes!

# Lazy is good – let’s try this instead

Let’s harness the power of RStudio.

1. We’ll copy the values in the spreadsheet column and paste them into a new script in RStudio:

2. If we’re using a PC keyboard, let’s hold down Alt and click and drag the cursor down the left-hand side of each row until we reach the last line. If we’re using a Mac, we’ll have to do some Googling…but I can only assume that we should use the option key instead of the Alt key!

3. Let’s highlight all of our pasted text row-wise. Let’s hold down Alt+Shift, then press the right arrow.

4. We’ll quote them lines using another shortcut – Shift+' (Shift and the quotation mark key):

5. While our cursor is still across all lines of the last column, we’ll press the comma key:

6. Once we’ve removed the annoying trailing comma on the last line, we’ve got ourselves some quoted, comma-separated strings!

Now we can pass these into the c() function and continue on with our lives. For example, we now have this:

c("26/04/2019 08:00:00",
"25/04/2019 21:30:00",
"25/04/2019 22:00:00",
"25/04/2019 18:45:00",
"25/04/2019 14:40:00",
"25/04/2019 22:00:00",
"25/04/2019 16:00:00")

## [1] "26/04/2019 08:00:00" "25/04/2019 21:30:00" "25/04/2019 22:00:00"
## [4] "25/04/2019 18:45:00" "25/04/2019 14:40:00" "25/04/2019 22:00:00"
## [7] "25/04/2019 16:00:00"


We assign it to a variable:

date_time_str <- c("26/04/2019 08:00:00",
"25/04/2019 21:30:00",
"25/04/2019 22:00:00",
"25/04/2019 18:45:00",
"25/04/2019 14:40:00",
"25/04/2019 22:00:00",
"25/04/2019 16:00:00")


We use the dmy_hms() function from lubridate to convert each element of our vector into POSIXct objects. Since we’re in Sydney, Australia, we’ll assign each the ‘Australia/Sydney’ time zone:

library(lubridate)
date_time_values <- dmy_hms(date_time_str, tz='Australia/Sydney')
print(date_time_values)

## [1] "2019-04-26 08:00:00 AEST" "2019-04-25 21:30:00 AEST"
## [3] "2019-04-25 22:00:00 AEST" "2019-04-25 18:45:00 AEST"
## [5] "2019-04-25 14:40:00 AEST" "2019-04-25 22:00:00 AEST"
## [7] "2019-04-25 16:00:00 AEST"


Hooray!

# And since we’re in Sydney, Australia…

We’ll end with this:

“Bloody brilliant!”

Justin