Snapshot Testing in R: Beyond Screenshots

Jakub Sobolewski

21 hours ago

[This article was first published on Jakub Sobolewski, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Snapshot testing is not about screenshots.

Most people meet it through UI regression tests: render a component, save a picture, fail the build when the picture changes. So the technique gets filed away as “the thing that compares images.” That is one use. But not the only one.

The mechanic underneath is general. Capture some output, save it to a file, and on every later run compare fresh output against the saved copy. The output can be a plot. It can also be console text, a log, a data frame, an error message, or a deeply nested list. Anything you can serialize, you can snapshot.

What makes it powerful is also what makes it dangerous: you are the test oracle. There is no expect_equal(result, 42) stating the answer up front. You accept the first snapshot because you read it and judged it correct. Get that review wrong, or skip it, and you have pinned a bug in place and called it a passing test.

In this post I want to walk through using snapshot testing for what it is good for, and the practices that make it efficient.

What snapshot testing actually is

In testthat’s third edition the entry points are expect_snapshot() and expect_snapshot_file(). The first run records output into a _snaps/ directory next to your tests, as a .md file named after the test file. Every run after that compares against what’s recorded. A mismatch fails the test and shows you a diff.

test_that("summary prints a one-line overview", {
  expect_snapshot(print(summary(1:10)))
})

The first time, testthat writes the printed output to _snaps/summary.md and the test passes (with a note that a new snapshot was recorded). From then on, that file is the expected value.

You reach for this in those situations:

The output is large or tedious to assert field by field, but you can recognize whether it’s correct by looking at it.
The output is impossible to express in code: a rendered plot, a rendered table, any image.
The output is impractical to express in code a formatted CLI report, a full console transcript with its alignment. You can’t write expect_equal() for “the table is laid out correctly.” You can look at it and know.

Five practices that keep snapshots trustworthy

Snapshot suites rot in predictable ways: noise the diff engine flags as failures, snapshots nobody can review, tests that flake on a different machine, and (as with any other test) titles that say nothing.

Those practices prevent it.

1. Scope to exactly what proves the behavior

Capture the plot, not the page the plot lives on.

If you’re testing that a chart colors points correctly, snapshot the chart. Not the dashboard it’s embedded in, with its header, its sidebar, the current date in the corner, and a “last refreshed” timestamp. Every one of those is unrelated to the behavior under test, and every one is a reason for the comparison to fail when nothing you care about changed.

A snapshot’s diff engine is literal. It flags any difference. So don’t hand it differences that don’t matter. Scope the capture down to the smallest thing that demonstrates the behavior, and the only way the test can fail is if that behavior breaks.

Don’t give the diff engine extra reasons to make false positives.

2. Make snapshots human-readable

You are going to review these files by eye. So they have to be readable by eye.

Store snapshots as text: markdown, CSV, SVG, JSON. Never as a binary blob. I heard on the R Weekly podcast that some teams keep snapshots as .rds files, and I’d push back on that hard. A binary snapshot can’t be read in an editor, can’t be reviewed in a pull request, and can’t be diffed when it changes. It defeats the entire premise. The whole technique rests on a human being able to look at the recorded output and decide it’s right. You want to also help your code reviewers to do that, don’t hide the “truth” in a binary file. Especially when accepting the first snapshot as “the truth”; make it easy for your collaborators to read and judge the snapshot!

Don’t introduce extra points of friction. Keep it simple.

3. Remove nondeterminism, or filter what’s left

A snapshot that changes on every run is useless. Timestamps, random IDs, elapsed-time measurements, unordered query results. Any of these will make the file churn and train you to accept changes blindly.

Fix it at the source first. Inject the things that vary so the test controls them: pass a fixed clock instead of calling Sys.time(), set a seed, supply IDs rather than generating them. This is dependency injection, the same move that makes any code testable.

When you can’t remove the variation, filter it. testthat’s expect_snapshot() takes a transform argument: a function that cleans each line of output before it’s compared. Strip the timestamps, drop the spinner characters, normalize the paths. For data, impose a deterministic order before you serialize.

Don’t let snapshots change when there is no reason for them to change.

4. Stabilize platform differences

A rendered snapshot depends on more than your code. Fonts render differently on macOS and Linux. A new release of a plotting or formatting dependency shifts the output by a pixel or a label. R itself changes between versions. None of that is a regression, but a literal diff engine can’t tell, so a snapshot recorded on your laptop fails the moment it runs anywhere else.

Two tools handle this, and they work together.

A. Variants keep incompatible environments from overwriting each other. Both expect_snapshot() and expect_snapshot_file() take a variant argument. testthat stores each variant in its own subdirectory, _snaps/{variant}/, so the macOS render and the Linux render sit side by side instead of clobbering one another. Key the variant on whatever actually moves the output: the operating system, the R version, a specific dependency’s version, or a combination. You decide what relevant axes of variation are, and you key the snapshots to them. Maybe you want to support rendering on different platforms, and you want to support different versions of a plotting library. Then the variant should include both the platform and the library version.

variant = paste(platform_variant(), packageVersion("echarts4r"), sep = "-")

B. Let one platform generate the truth, and let the whole team use it. Variants solve the storage problem. They don’t solve the contribution problem: your developers are on different operating systems, and you don’t want each regenerated snapshot to depend on whose machine produced it. If your team works on Windows, macOS and Linux, you may not want to check-into the repository 3 slightly different copies of the same thing. Nominate a single canonical environment, your CI runner, and treat the snapshots it produces as authoritative.

When a snapshot test fails on GitHub Actions, the files that run produced are uploaded as build artifacts. testthat gives you a helper to pull them straight into your local checkout:

testthat::snapshot_download_gh(
  repository = "your-org/your-package",
  run_id = "47905180716"
)

This is a quite recent addition to testthat. Worth knowing.

You rarely have to look the call up. When snapshots fail inside an R CMD check job, testthat prints the exact snapshot_download_gh() line in the CI log, ready to copy. Run it, review the downloaded files the way you’d review any first snapshot, and commit them.

That turns snapshot testing into a team practice. A contributor on Windows can change a plot, open a pull request, and let CI render the canonical image. The reviewer accepts the snapshot CI produced, not one tied to a particular laptop. The truth comes from one place, and everyone contributes to it through the same door.

You’ll notice both options have their advantages and disadvantages. It’s up to you to decide which one fits your team and workflow better.

But now that you know your options you can test them out and see which one works best for you.

5. Name the test and the snapshot so they stand alone

A test title should state the precondition and the expected output.

The same holds for snapshot tests. Not “reporter works.” Something like “progress reporter shows survived mutants in summary.” The title is the first thing a reviewer reads when the snapshot changes.

But snapshot tests aren’t self contained.

The assertion of a snapshot test is a file. That means you read the test and then you need to open the file to understand what is really the expected outcome. But there is also another workflow: you might also browse the snapshots directory first and get a grasp of what the code is producing.

With expect_snapshot_file() you also name the snapshot file yourself. Use that. A file called scatterplot_colors_points_in_the_band.png tells you what it should contain before you even read the test itself. The filename and its content should tell the story on their own, without you having to dig up the test that produced them.

The rest of this post is five worked examples, each leaning on these five practices.

Example 1: plots, from simple to interactive

ggplot: snapshot the SVG, not a PNG

For ggplot, the right tool is vdiffr. It renders a plot to SVG, which is text, and snapshots that.

test_that("points below lower threshold are green, above upper are red, inside are yellow", {
  # Arrange
  data <- data.frame(
    x = seq_date(3),
    y = c(10, 20, 30)
  )

  # Act
  p <- threshold_plot(data, lower = 15, upper = 25)

  # Assert
  vdiffr::expect_doppelganger("threshold_plot_below_threshold_green_above_threshold_red_inside_threshold_yellow", p)
})

Two of our practices fall out of this for free. The snapshot is scoped to the plot object itself, not a Shiny page that embeds it. And it’s human-readable: the recorded .svg is text you can open, and vdiffr ships a Shiny app (vdiffr::manage_cases()) that shows the old and new render side by side when something changes. You review the picture, but the artifact under version control is inspectable text.

Here’s the actual plot that test captures:

And here are the first lines of the SVG vdiffr would record. This is the whole point of practice #2: the snapshot under version control is text you can read.

<?xml version="1.0" encoding="UTF-8"?>
<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="450" viewBox="0 0 504 288">
<defs>
<g>
<g id="glyph-0-0">

Comparing text is easier than comparing pixels. If saving image to SVG is possible, you should always prefer it to a PNG. Not only you can view SVG both as text and the image, but also the diff engine can tell you exactly what changed in the markup, instead of just showing a pixel difference.

But there are plenty of cases when SVG isn’t an option.

htmlwidgets: when you’re forced back to pixels

vdiffr can’t render an htmlwidget or a Shiny tag list, because there’s no static SVG to produce. Here you fall back to rendering the thing to a real PNG and comparing images. That’s a harder problem, and it’s where scoping and determinism stop being nice-to-haves.

The shape of the helper: render to HTML, screenshot to PNG with webshot, then hand the PNG to expect_snapshot_file().

expect_plot <- function(x, name, ...) {
  UseMethod("expect_plot")
}

expect_plot.htmlwidget <- function(
  x,
  name,
  variant = shinytest2::platform_variant(),
  width = 992,
  height = 744
) {
  local_edition(3)
  html_temp <- fs::path(tempdir(), name, ext = "html")
  png_temp <- fs::path(tempdir(), name, ext = "png")
  on.exit(unlink(tempdir()))

  htmlwidgets::saveWidget(
    x,
    file = html_temp,
    selfcontained = FALSE
  )
  webshot::webshot(
    url = html_temp,
    file = png_temp,
    delay = 0.5,
    quiet = TRUE,
    vwidth = width,
    vheight = height
  )

  testthat::expect_snapshot_file(
    png_temp,
    name = fs::path(name, ext = "png"),
    variant = variant
  )
}

I’ve used this pattern across many projects and it always worked for me very well.

The threshold. A pixel-exact comparison of a rendered chart will fail on trivial, invisible differences in anti-aliasing. So the compare function allows a small per-pixel difference budget before it calls a mismatch. Locally (interactive()) the threshold is 0, because you want to see every change as you work. On CI it’s relaxed, because the CI renderer isn’t identical to your laptop.

The variant. Fonts render differently across operating systems, so a snapshot recorded on macOS will not match one produced on Linux. platform_variant() keys the snapshot to the platform, and expect_snapshot_file() keeps a separate recorded file per variant (_snaps/mac/..., _snaps/linux/...), so cross-platform rendering differences never masquerade as a regression. This is the storage half of practice #4; pair it with snapshot_download_gh() so CI generates the canonical files the whole team commits.

The test that uses it reads like any other, with a descriptive title and a named snapshot:

it("colors timepoints between thresholds", {
  # Arrange
  data <- data.frame(
    x = seq_date(3),
    y = c(10, 20, 30)
  )

  # Act
  plot <- threshold_plot(data, lower = 15, upper = 25)

  # Assert
  expect_plot(
    plot,
    name = "colors_timepoints_between_thresholds"
  )
})

Interactive plots: you can even snapshot an interaction

An htmlwidget is just HTML and JavaScript. That means you can drive it into a specific state and snapshot that. Here’s an echarts4r line chart where the behavior under test is the tooltip content.

A small helper dispatches the chart’s own “show tooltip” action when the widget loads, it simulates user interacting with the plot:

#' tests/testthat/setup-trigger_tooltip.R
trigger_tooltip <- function(x, series_index, data_index) {
  htmlwidgets::onRender(
    x,
    sprintf(
      "function(el, x, data) {
        const chart = echarts.getInstanceByDom(el);
        chart.dispatchAction({
          type: 'showTip',
          seriesIndex: %s,
          dataIndex: %s,
        });
      }",
      series_index,
      data_index
    )
  )
}

Then the test renders the chart, triggers the tooltip, and snapshots the result (notice that the # Act is triggering the tooltip, not creating the plot):

it("shows tooltip content from the specified tooltip column", {
  # Arrange
  data <- data.frame(
    date = as.Date(
      c("2020-01-01", "2020-02-01", "2020-03-01")
    ),
    value = c(1, 3, 2),
    tooltip = "TOOLTIP"
  )
  plot <- line_plot(
    data,
    date = "date",
    value = "value",
    tooltip = "tooltip"
  )

  # Act
  result < - plot %>%
    trigger_tooltip(series_index = 0, data_index = 1)

  # Assert
  expect_plot(result, "line_plot_tooltip")
})

The tooltip text is doing double duty. It’s the data the test checks, and it’s a human-readable marker that tells the reviewer exactly what to look for when accepting the snapshot. Here’s the PNG that snapshot captures:

Example 2: printed output (reporters, loggers, CLI)

Some objects exist to print.

Test reporters, loggers, CLI tools. Their whole job is to render formatted text to the console.

That text is the behavior, and expect_snapshot() captures it verbatim into a readable .md file.

I use this in muttest, a mutation testing package, to pin down exactly what the progress reporter prints. The test is small:

test_that("progress reporter shows all killed", {
  .with_example_dir("shipping/", {
    mutators < - list(operator(">", "<"))
    plan <- muttest_plan(mutators, fs::dir_ls("R"))
    .expect_snapshot(
      muttest(
        plan,
        reporter = ProgressMutationReporter$new(
          min_time = Inf,
          survived_detail = "none"
        )
      )
    )
  })
})

But a reporter’s output is full of nondeterminism: spinner frames, blank lines, and per-step timings like [0.3s] and Duration: 1.2s. Snapshot that raw and it fails on every run. The fix is practice #3: a transform that strips the noise before comparison. I wrap expect_snapshot() once, in setup.R, so every test in the suite gets the cleaning for free:

.expect_snapshot <- purrr::partial(
  testthat::expect_snapshot,
  transform = function(lines) {
    lines |>
      stringr::str_subset("^[\\|/\\-\\\\] \\|", negate = TRUE) |>
      stringr::str_subset("^$", negate = TRUE) |>
      stringr::str_remove_all("\\s\\[\\d+.\\d+s\\]") |>
      stringr::str_remove_all("Duration:\\s\\d+.\\d+\\ss") |>
      stringr::str_trim()
  }
)

The first two str_subset calls drop spinner lines and blanks. The two str_remove_all calls delete the timing fragments. What’s left is the stable, meaningful part of the output, and that’s what lands in the snapshot:

# progress reporter shows all killed

    Code
      muttest(plan, reporter = ProgressMutationReporter$new(min_time = Inf,
        survived_detail = "none"))
    Output
      i Mutation Testing
        |   K |   S |   E |   T |   % | Mutator  | File
      v |   1 |   0 |   0 |   1 | 100 | > → <    | shipping.R
      -- Results ---------------------------------------------------------------------
      [ KILLED 1 | SURVIVED 0 | ERRORS 0 | TOTAL 1 | SCORE 100.0% ]

This is a snapshot doing exactly what it should. The table alignment, the symbols, the score line. None of that is pleasant to assert by hand, but all of it is obviously correct (or obviously wrong) at a glance. The full reporter source, setup, and recorded snapshots are in the muttest repo: setup.R, the test, and the snapshot.

Example 3: data frames as CSV

Data frames are a classic snapshot candidate, and a classic way to get it wrong.

The tempting move is expect_snapshot(print(df)). Don’t. Printed data frames are truncated past a certain size, formatted to your console width, and shown in whatever order the rows happen to be in. You’re snapshotting the print method, not the data.

Write the data frame to CSV. CSV is text, it diffs cleanly, and it’s the obvious human-readable representation of tabular data.

I find snapshotting tables especially useful when you need signoff of business logic calculation from a business expert. Then instead of showing a table created in code you can print hand over the CSV or even a formatted markdown table for review. The expert can then sign off on the calculation without needing to read the code.

Following the same S3 pattern as expect_plot, here’s a custom expectation. The comparison is the part worth getting right, so it lives in its own named function:

compare_df <- function(old, new) {
  # Compare parsed data frames, not raw CSV text
  # Notice you can use custom comparison functions here
  isTRUE(all.equal(
    read.csv(old, stringsAsFactors = FALSE),
    read.csv(new, stringsAsFactors = FALSE)
  ))
}
expect_snap <- function(x, name, ...) {
  UseMethod("expect_snap")
}

expect_snap.data.frame <- function(x, name, ...) {
  local_edition(3)

  # Practice #3: deterministic order so row shuffling never breaks the test
  x <- x[do.call(order, x), , drop = FALSE]
  rownames(x) <- NULL

  path <- fs::path(tempdir(), name, ext = "csv")
  on.exit(unlink(path))

  expect_snapshot_file(
    path = local({
      write.csv(x, path, row.names = FALSE)
      path
    }),
    name = fs::path(name, ext = "csv"),
    compare = compare_df
  )
}

Two choices make this robust. Sorting on all columns before writing (that’s practice #3) makes the snapshot order-independent. And compare_df reads both files back into data frames and compares those, not the raw text, so a trailing newline, a quoting difference, or an integer written as 1 versus 1.0 never fails the test.

That second claim is the one worth verifying, and compare_df is an ordinary function, so it gets an ordinary unit test. Two files holding the same data but different CSV text must compare equal; a genuine change must not:

test_that("compare_df ignores CSV formatting but catches value changes", {
  # Arrange
  recorded    <- tempfile(fileext = ".csv")
  reformatted <- tempfile(fileext = ".csv")
  changed     <- tempfile(fileext = ".csv")
  writeLines(c("x,y",     "1,10.5", "2,20.1"),     recorded)
  # quoted, trailing newline
  writeLines(c('"x","y"', "1,10.5", "2,20.1", ""), reformatted)
  # a real change
  writeLines(c("x,y",     "1,10.5", "2,99.9"),     changed)

  # Act & Assert
  expect_true(compare_df(recorded, reformatted))
  expect_false(compare_df(recorded, changed))
})
Test passed with 2 successes 😀.

The test passes: formatting noise doesn’t fail the comparison, a changed value does. You snapshot to a readable format, but you compare on meaning.

Example 4: errors and conditions

User-facing messages are a contract. When a function fails, the error text is part of its behavior, and expect_snapshot() pins it.

test_that("withdrawing more than the balance reports the shortfall", {
  expect_snapshot(
    withdraw(account(balance = 50), amount = 80),
    error = TRUE
  )
})

The error = TRUE tells testthat the code is expected to throw and to capture the condition instead of failing the test. The message goes into the snapshot:

# withdrawing more than the balance reports the shortfall

    Code
      withdraw(account(balance = 50), amount = 80)
    Condition
      Error in `withdraw()`:
      ! Cannot withdraw 80 from an account with balance 50.
      i Available to withdraw: 50.

Now if someone changes that message (softens it, drops the available balance, mangles the formatting), the test fails and shows the diff. The same works for warnings and messages. It’s the cleanest way to keep error messages from silently degrading. (The flip side: a message worth snapshotting is a message worth writing carefully. See the Mystery Guest and Overspecification smells for the failure modes nearby.)

Example 5: nested data structures

Some outputs are big nested lists: a parsed config, an API response, a model object’s metadata. Asserting them field by field is miserable:

expect_equal(result$user$name, "Ada")
expect_equal(result$user$roles, c("admin", "editor"))
expect_equal(result$settings$theme, "dark")
expect_equal(result$settings$notifications$email, TRUE)
# ... twenty more lines

Each line is a place to make a typo, and together they still might not cover every field. Snapshot the whole structure once, review it once.

The only real decision is the serialization format, and practice #2 decides it. Don’t use dput(); its output is valid R but painful to read. Serialize to pretty JSON or YAML, which a human can actually scan:

result <- list(
  user = list(name = "Ada", roles = c("admin", "editor")),
  settings = list(
    theme = "dark",
    notifications = list(email = TRUE, sms = FALSE)
  )
)

cat(jsonlite::toJSON(result, pretty = TRUE, auto_unbox = TRUE))
{
  "user": {
    "name": "Ada",
    "roles": ["admin", "editor"]
  },
  "settings": {
    "theme": "dark",
    "notifications": {
      "email": true,
      "sms": false
    }
  }
}

Wrap that in a snapshot expectation and the recorded file is a clean, indented JSON document. One review covers the entire structure, and any change to any field shows up as a precise diff.

Use it sparingly, not every big output needs a snapshot test, sometimes it’s better to assert on the shape and values that actually matter.

The responsibility

Here is where snapshot testing lives or dies.

The first snapshot is a decision, not a fact. When testthat records a new snapshot, the test passes. That green check does not mean the output is correct. It means the output now exists. The only thing that makes it correct is you reading it and deciding it is. Accept a snapshot without reading it and you’ve written a test that asserts “the code does whatever it currently does,” which is no test at all.

When a snapshot changes, testthat tells you and gives you tools to review:

# opens a diff app for changed snapshots
testthat::snapshot_review()
# accept changes once you've reviewed them
testthat::snapshot_accept()

snapshot_review() is the honest path: it shows you old versus new and makes you look. snapshot_accept() without looking is how snapshot suites become worthless. The reason practice #2 (human-readable) matters so much is that it’s what makes this review possible. A binary blob can’t be reviewed, so you’d rubber-stamp it by necessity.

And snapshots are your dependencies. They’re checked into version control and they show up in pull requests. A changed snapshot in a diff deserves the same scrutiny as a changed function, often more, because it’s the line where “the behavior changed” becomes visible.

Build your own snapshot expectations

Notice what expect_plot, expect_snap.data.frame, and the transform-wrapped .expect_snapshot have in common. Each one is a domain-specific expectation that bakes the four practices into a reusable function:

expect_plot handles scoping, the difference threshold, and platform variants.
expect_snap.data.frame handles deterministic ordering and meaning-based comparison.
.expect_snapshot handles filtering nondeterministic console output.

You decide once how a given kind of output should be captured, made readable, and made deterministic. Then every test that uses the expectation gets it right for free. That’s the real payoff. You can use expect_snapshot() as is, but you can also tailor available testthat functions to better fit your needs and make them more reusable, more expressive. The expectations you build on top of it are where snapshot testing becomes a tool your whole suite can lean on.

Cheat sheet

Output type	Capture as	How to keep it deterministic
ggplot	SVG (`vdiffr`)	Fixed data; vdiffr handles rendering
htmlwidget / Shiny	PNG (`webshot`)	Difference threshold + platform variant
Interactive widget	PNG of a state	Drive a deterministic action, then shot
Console / reporter	`.md` text	`transform` to strip timing/spinners
Data frame	CSV	Sort rows; compare parsed frames
Error / warning	`.md` text	`error = TRUE`; message is already fixed
Nested structure	pretty JSON/YAML	Stable key order from the serializer

The technique is the same everywhere: capture, save, compare. What changes is the format and how you tame the noise. Keep snapshots scoped so failures mean something, readable so you can review them, deterministic so they don’t flake, and well-named so they stand on their own.

Do that, and snapshot testing covers far more ground than the screenshots it’s famous for.

Apply this

Reading about the practices is easy. Applying them to a real suite is the work. The fastest way to internalize them is to point an AI agent at your own snapshot tests, have it find where they drift from the five practices, then fix the worst few yourself; that’s how you learn.

Open your test files in your AI coding agent (Claude Code, Cursor, Copilot Chat) and paste this prompt:

You are a senior R engineer reviewing a test suite's use of snapshot testing (testthat 3rd edition: expect_snapshot / expect_snapshot_file, plus vdiffr for plots).

Scope: audit the test file(s) I've shared AND the production code they exercise. Snapshot quality usually can't be judged from the test alone — you need to see what's being captured and where any nondeterminism comes from.

Judge every snapshot against five practices, and flag where each is violated:

1. Scoped — the snapshot captures exactly the behavior under test, nothing more. A test for one chart that snapshots a whole dashboard (header, sidebar, "last refreshed" timestamp) fails on unrelated changes. Fix: capture the smallest artifact that proves the behavior.
2. Human-readable — the snapshot is text a reviewer can read in a diff: .md, .csv, .svg, pretty JSON/YAML. Flag binary or .rds snapshots; they can't be reviewed, so they get rubber-stamped. Fix: serialize to a text format.
3. Deterministic — the snapshot is identical run to run. Flag captured timestamps, random IDs, elapsed-time/durations, unordered query results, locale-dependent formatting. Fix at the source first (inject a fixed clock / seed / IDs — dependency injection); if you can't, filter with the `transform` argument, or sort rows before serializing.
4. Platform-stable — rendered (image) snapshots use a `variant` keyed on OS / R version / key dependency version, and image comparison allows a tolerance instead of pixel-exact equality. Flag a single _snaps file shared across platforms, or a zero-threshold pixel compare on CI. Mention the CI-as-source-of-truth workflow (testthat::snapshot_download_gh) where relevant.
5. Well-named — the test title states the precondition and the expected output (not "<fn> works"), and expect_snapshot_file snapshots get an explicit, descriptive name. The filename and title should explain the artifact without opening the test.

Also check suitability and reuse:
- Wrong tool: a snapshot of a single scalar or boolean should be expect_equal(); a 40-line field-by-field expect_equal() on a big nested object could be a snapshot. Flag both directions.
- Missing abstraction: if the same capture / clean / serialize logic repeats across tests, recommend extracting a domain-specific expectation (e.g. expect_plot(), expect_snap.data.frame()) that bakes the practices in once.

Rules:
- Only flag clear instances. Don't invent issues to look thorough.
- Quote the offending lines and cite file:line for every finding.
- Never change what the production code does — only its testability. If a fix needs dependency injection (a signature change), say so and describe the new interface.

Output:
1. A triage table: practice violated | file:line | severity (high/med/low) | one-line why.
2. Then fix the highest-severity findings as before/after code blocks — the smallest change that removes the problem. Stop after three and ask before continuing if more remain.
3. Tell me how to re-run just these tests, and which snapshots I'll need to review and accept.

Before you accept a snapshot, run it past this checklist:

You read the recorded output and decided it’s correct — you didn’t just accept the green check.
The snapshot captures only the behavior under test, so a failure means something.
It’s stored in human-readable format, not binary, so you can read and diff it in a pull request.
Nothing in it changes run to run — no timestamps, random IDs, or incidental ordering.
The test title and snapshot filename state the behavior on their own.

Want to turn these habits into a path you can work through for your whole suite? The R testing roadmap lays out the steps.

References

testthat — Snapshot tests
vdiffr — Visual regression testing for ggplot2
muttest — progress reporter snapshot tests
11 Test Smells That Make Your Tests Lie to You

To leave a comment for the author, please follow the link and comment on their blog: Jakub Sobolewski.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.