XKCD-Gutenberg Passwords

[This article was first published on R on jmarriott.com, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I have been inspired by this informative XKCD comic on password security, and I often follow its advice by using random-word-generating websites. But I have to wonder what dictionaries these sites use and how random the words are that they spit out. So I thought that it would be fun to make my own generator using books from https://www.gutenberg.org/ as my dictionaries, that way I would at least know where they are coming from. As long as I chose books somewhat randomly, I think that should be pretty secure.

First I will load the required packages.

library(tidyverse)
library(gutenbergr)
library(tidytext)

Since Pride and Prejudice is currently the most downloaded book on gutenberg over the past 30 days and I happen to like it myself, I’ll use that as my example book. It is easy to download the full text with the gutenbergr package.

book_text <- gutenberg_download(1342)

data("stop_words") # load stop words

book_text %>%
  unnest_tokens(word, text) %>% # turn the text into a single column of words
  mutate(word = str_extract(string = word, pattern = "[[:alpha:]]+")) %>% # remove any non-alphanumeric characters 
  select(word) %>% # get rid of the extra columns
  unique() %>% # get rid of duplicate words
  anti_join(stop_words, by = "word") %>% # get rid of boring "stop" words
  drop_na() %>% # drop anything that didn't make it through cleanly
  unlist() %>% # turn the column into a vector that sample() knows what to do with
  sample(4) # chose four words at random
##        word273       word4600        word632       word4122 
##       "choose"      "content" "circumstance"      "mingled"

Well that was super easy, wasn’t it? The only thing that isn’t easy with this setup is looking up a book to use. But that can readily be solved with a shiny app. You can see the code below (at the time this post was knit—the current code will always be on github here), and the live app is at https://jameson-marriott.shinyapps.io/Password_App/.

library(shiny)
library(shinythemes)
library(gutenbergr)
library(dplyr)
library(tidyr)
library(tidytext)
library(stringr)
library(rclipboard)

# get all the titles for the drop-down menu
titles <- gutenberg_works(only_text = TRUE, distinct = TRUE) %>%
    select(title) %>%
    drop_na()

# load the stop words so that we don't have to reload it later
data("stop_words")

ui <- fluidPage(theme = shinytheme("cerulean"),
                
                rclipboardSetup(), # what it sounds like
                
    verticalLayout(
        fluidRow(
            column(width = 8, offset = 1,
                   titlePanel(title = "XKCD-Inspired, Gutenberg-Sourced Passwords"),
                   p("This web-app lets you generate passwords inspired by ",
                      a(href = "https://xkcd.com/936/", "this xkcd comic."),
                      br(),
                      "First select a book from ",
                      a(href = "https://www.gutenberg.org/", "Project Gutenberg"),
                      " and then chose the number of words you want to use from that book for your password.")
                   ),
        ),
        fluidRow(
            column(width = 6, offset = 1,
                   selectizeInput(inputId = "book_title", 
                                  label = "Book Title",
                                  choices = c("Chose one" = "", titles), # removes the default selection, but needs error handling for the down-stream items
                                  selected = NULL),
                                  #choices = titles,
                                  #selected = "Pride and Prejudice"),
                   p(textOutput("book_length")),
                   sliderInput("number_of_words",
                               "Number of words to chose",
                               min = 1,
                               max = 10,
                               value = 4))
        ),

        # Show the password
        fluidRow(
            column(width = 6, offset = 1,
                   tags$hr(),
                   textOutput("password", container = tags$strong)
            ),
        ),
        # Show the password without spaces
        fluidRow(
            column(width = 6, offset = 1,
                   uiOutput("password_no_spaces"))
        )
    )
)

server <- function(input, output) {
    
    # get the book
    gutenberg_book <- reactive({
        validate(
            need(input$book_title != "", "Please chose a book.")
        )
        gutenberg_works(title == input$book_title) %>% # get the gutenberg id
            gutenberg_download() %>% 
            unnest_tokens(word, text) %>% # turn the text into a single column of words
            mutate(word = str_extract(string = word, pattern = "[[:alpha:]]+")) %>% # remove any non-alphanumeric characters. 
            select(word) %>% # get rid of the extra columns
            unique() %>% # get rid of duplicate words
            anti_join(stop_words, by = "word") %>% # get rid of boring, "stop" words
            drop_na() %>% # drop anything that didn't make it through cleanly
            unlist()
    })
    
    # Report the number of unique words in the book
    output$book_length <- renderText({
        length <- gutenberg_book() %>%
            length() %>%
            format(big.mark = ",") # Add some nice formating
        
        paste0("There are ", length, " unique words in this book (including diffent forms of the same word).")
    })
     
    # Generate the actual password from the book
    password <- reactive({
        validate(
            need(input$book_title != "", "")
        )
        gutenberg_book() %>%
            sample(input$number_of_words) %>% # chose words at random
            paste0() # drop the names
    })
    
    # Output the password for the UI
    output$password <- renderText({
        password()
    })
    
    # Make the button to copy the password to the clipboard
    output$password_no_spaces <- renderUI({
        rclipButton("clip_button", paste0("Copy \"", str_flatten(password()), "\""), str_flatten(password()))
    })
}

# Run the application 
shinyApp(ui = ui, server = server)

To leave a comment for the author, please follow the link and comment on their blog: R on jmarriott.com.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)