Simulating Text Files with R to Test the Emacs Denote Package

[This article was first published on Having Fun and Creating Value With the R Language on Lucid Manager, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Emacs is the most user friendly piece of software ever invented by humanity. I use it for 90% of my computing task, including keeping my digital knowledge garden with notes. Several notes packages exist, with Org Roam as the most popular and fully-featured. I have used this package for a while now, but it relies on a database and has grown a feature set far beyond my needs.

Protesilaos (Prot) Stavrou is developing the Denote package that goes back to the basics of Emacs. The defining feature of this package is a file-naming convention that acts as metadata to find your notes. The basic structure is: YYYMMDDTHHMMSS--file-name-dashed__keyword1_keyword2.extension. The filename starts with a timestamp at one second resolution to ensure unique file names (unless you create more than one per second). This timestamp also acts as the unique identifier to link notes. The timestamp is followed by two dashes and the sluggified file name. Two underscores after the file name indicate the start of the keywords, separated by one underscore. This convention provides a convenient heuristics to find notes based on dates, title and keywords. Denote supports either Org mode, plain text or Markdown files.

The simplicity of Denote allows for it to be easily integrated with other Emacs packages and it can be easily extended with some Emacs Lisp code. I am working on a package to integrate it with Citar so that notes can be linked to a bibliography.

I decided to have a play with this package and considered moving away from Org Roam to the monastic simplicity of Denote. But before I decided to convert my existing knowledge base, I wanted to see how it behaves with thousands of files in a single folder. Rather then converting my existing files, I decided to generate some random files to see how it performs.

Generating Random Text Files for Emacs Denote

My coding chops in R are much better than Emacs Lisp, so I decided to write some R code to generate random text files and take Denote through its paces.

This code uses the Collins Scrabble Word list to generate random file names and keywords. Download this file to your working directory before using this code. The code reads the file and generates a set of 50 keywords. Random timestamps are set somewhere in the distant future. Each file has a template for the front matter.

  ## Simulate n files in denote folder

  ## Initiation

  n <- 10000
  k <- 50

  wordlist <- readLines("collins-scrabble-words-2019.txt")
  wordlist <- tolower(words)
  tag_words <- sample(words[nchar(wordlist) <= 5], k)
  timestamps <- Sys.time() + sample(600E6:666E6, n)
  template <- c("#+title:      ",
                "#+date:       ",
                "#+filetags:   ",
                "#+identifier: ")
  denote_directory <- "~/denote-sim/"

This next code snippet generates n Org mode files in the denote_directory folder. Titles are extracted by sampling the word list and the tags (keywords) are sampled from the 50 defined tags. The front matter includes the tile, the creation date, the keywords (called filetags in Org mode) and the identifier. The Lorem Ipsum generator in the stringr package generates some paragraphs of text. The last part of the code generates some links to random posts.

  ## Generate n random posts

  for(i in 1:n) {
      title <- paste(sample(wordlist, sample(2:5, 1)), collapse = "-")
      tags <- paste(sample(tag_words, sample(4, 1)), collapse = "_")
      identifier <- format(timestamps[i], "%Y%m%dT%H%M%S")
      front_matter <- c(paste0(template[1],
                               str_to_title(str_replace_all(title, "-", " "))),
                               paste0("[", format(timestamps[i], "%F %a %H:%M"), "]")),
                               paste0(":", str_replace_all(tags, "_", ":"), ":")),
                        paste0(template[4], identifier))
      links_list <- vector()
      for (j in 1:(sample(1:5, 1))) {
          links_list[j] <- paste0("- ", "[[denote:",
                                  sample(format(timestamps, "%Y%m%dT%H%M%S"), 1), "]]")
      content <- c(front_matter,
                   paste("*", str_to_title(paste(sample(wordlist,
                                                        sample(1:3, 1)),
                                                 collapse = " "))),
      filename <- paste0(denote_directory, identifier, "--", title, "__", tags, ".org")
      writeLines(content, filename)

Generating thousands of files will take a few minutes …

Using this code I generated ten thousands notes and used this to test the Denote package to see it if works at a large scale. This tests shows that Prot's approach is perfectly capable of working with thousands of notes. Just for kicks, I also synchronised these files with an Org Roam setup. My laptop struggled with the computational load and I was unable to properly access the files as it struggled with the large number of files. So case, closed - I am moving to Denote and teach myself more Emacs Lisp to build my ideal zettelkasten.

To leave a comment for the author, please follow the link and comment on their blog: Having Fun and Creating Value With the R Language on Lucid Manager. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)