From RUnit to testthat with Coding Agent Support

Mirai Solutions

13 hours ago

[This article was first published on Mirai Solutions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A case study of AI-supported maintenance in an R package

< !--more-->

Migrating a Test Suite

Going from one testing framework to another in a mature R package can be tedious and still requires some understanding of the codebase’s overall structure and goals, as well as local test context. This makes it impossible to fully automate, which is why XLConnect, our open source Excel connector, still used RUnit until recently.

Coding tools based on Large Language Models (LLMs) have now become available. After positive experiences with these tools on tasks requiring detailed instructions and context, we decided to use them to support our test suite migration process to testthat.

This work impacts a large number of files and is quite repetitive. This makes it an ideal task for a category of tools known as coding agents, which aim to accomplish tasks spanning multiple files based on a single set of instructions and context (prompt). Coding agents are available in two types: running locally on the developer’s machine as a program, or running in a dedicated Virtual Machine (VM) — an asynchronous agent.

For this case study we used the async agent Google Jules. As of November 2025 it is powered by Gemini 2.5 Pro (soon the new Gemini 3 Pro) and its free tier provides 15 sessions per day as well as lets you opt out of having your sessions used for model training. These features, plus a quick-to-provision isolated VM, make Jules a convenient way to experiment with coding agents. Async agents also reduce some risks of agent-assisted programming because they execute commands in a managed environment rather than directly on your local machine. A local agent requires either constant supervision or restricted permissions, limiting its usefulness, or a sandboxed environment you have to set up and maintain; otherwise, the agent could perform dangerous operations. A managed VM also reduces the risk of leaking sensitive information present on your machine to the internet via prompt injection, though any sensitive code or data given to the agent is still exposed.

Context Preparation

Coding agents have settled on using an AGENTS.md file, which contains instructions you wish tools to follow in general when working in your projects. We created this file in the root directory of our repository. You might start by adding development information found in your README.md or otherwise in your project documentation. It is also possible to make AGENTS.md a symlink to your README.md file, but the AGENTS.md file will likely need to become more detailed. You may also want to omit some README.md information as irrelevant for development.

Environment Setup

For Google Jules and other async agents, it will usually be necessary to customize their development environment. In Jules, this is currently done by a setup script that will need to do things like install R, your package’s dependencies, and other tools that would typically be required for development. The script runs after your repository is cloned, allowing you to install your package (for example, with R CMD INSTALL .) as part of the setup.

If one of the setup steps fails, or the setup times out, you may try moving some setup steps into AGENTS.md in a ## Setup section. This is less reliable but could avoid timing out the initial setup as each command is run separately by Jules.

The setup script for XLConnect in Jules, in the final version used for the present task, is the following:

rc="${HOME}/.Renviron"
line="R_LIBS_USER=$HOME/r-libs"
grep -qxF "$line" "$rc" || printf '\n%s\n' "$line" >> "$rc"
source $rc
mkdir -p $R_LIBS_USER
cp $rc .Renviron
curl -LsSf https://github.com/posit-dev/air/releases/latest/download/air-installer.sh | sh
sudo apt-get update && sudo apt-get install --no-install-recommends -y r-base libpcre2-dev libtirpc-dev libicu-dev
sudo R CMD javareconf
Rscript -e "install.packages(c('rJava'))"
Rscript -e "install.packages(c('RUnit', 'testthat'))"
R CMD INSTALL .

We use .Renviron to set the user R library directory, which avoids modifying preexisting .profile or .bashrc files and producing potentially unwanted side-effects.

Note that we would no longer need ‘RUnit’ in future tasks, as we are migrating away from it.

If you haven’t already, configure a code formatter for your project, along with instructions to apply it consistently. Otherwise, the code produced by the agent will be more difficult to review. We use Posit’s Air.

First task and Iterating on instructions

When running your first task, you may notice that Jules does not behave as you expected! It took some time for us to arrive at a combination of setup script and instructions that works well. We hope the present example can help you get to a working setup more quickly.

Based on an initial set of tests ported mechanically, we first simply asked Jules to fix the test code that was not running, given how to run the new test suite.

After submitting a request, Jules creates a plan and asks you to review it. If you do nothing within a short time (~2 minutes), Jules will start working based on its plan. It is a good idea to review the plan and provide feedback if necessary; this helps you formulate subsequent requests. In addition, Jules remembers general preferences based on your requests (this “Memories” feature can be disabled).

In some cases, Jules asks you to specifically clarify certain aspects. This is a chance to catch ambiguities and conceptual issues early.

The first change that Jules made was copying the test Excel files to a more convenient location, and simplifying the test code that reads them. Though this was a large change in terms of lines of code, it was not very complex; most of the time was spent by Jules in setting up its development environment.

Watching Jules work in this first task and subsequent ones helped us arrive at the environment setup described above. Reviewing the steps taken by Jules can reveal issues with the provided instructions or environment.

This can be the case even if the results are satisfactory: Jules will work around many issues and manage to achieve a goal. For example, it ran the following because package rJava had not been installed:

$ Rscript -e "install.packages(c('rJava'))"

This resulted in

  /usr/bin/ld: cannot find -lpcre2-8: No such file or directory
  /usr/bin/ld: cannot find -ltirpc: No such file or directory
  ERROR: compilation failed for package ‘rJava’

This was enough for Jules to install some missing packages:

$ sudo apt-get update && sudo apt-get install -y libpcre2-dev libtirpc-dev

It continued to try, and finally succeeded in installing rJava and further dependencies, solving other such issues along the way.

Another example: we were using export ... to set R_LIBS_USER in the setup script. This actually was not effective, but Jules simply started setting the variable for each command after this started failing:

$ Rscript -e "install.packages('testthat', repos = 'https://cloud.r-project.org')"

  ...
  Warning in install.packages(...) : 'lib = "/usr/local/lib/R/site-library"' is not writable
  ...

Jules simply started prefixing export R_LIBS_USER=... to each command.

Note that Jules now includes functionality to set environment variables explicitly. In many situations, Jules sometimes interprets instructions in unexpected ways or draws on its internal knowledge to find creative workarounds for problems.

Nevertheless, it is preferable to have a clean setup, as this will make tasks complete faster and more reliably — and resolve eventual issues more quickly.

A Tedious but Simple Task: `expect_equal` Argument Order

As part of refactoring the test suite, we wanted to make sure the expected vs. actual values were passed in the correct order. This makes the test output more informative when failures occur: it is clear when comparing complex diffs which value was expected and which was actually produced. This is a specific example of a task that is relatively trivial but cannot be solved mechanically: the notions of expected and actual, though simple, are semantic — they cannot be inferred purely from syntax. We proceeded with the following prompt:

expect_equal’s first argument is the actual value under test, whereas the second argument is the expected value. Make sure these arguments are passed in the correct order. The expected value is typically a literal value or constructed in the test itself, whereas the actual value under test is typically the output of the function under test, or derived from that output.

This kind of prompt can work quite well, but there is a limitation due to the number of files involved. If you do not specify a scope in which Jules should perform modifications, it will often only perform the change in a few files. We add the following to the prompt:

Modify only the following test files under tests/testthat:

test.workbook.existsName.R

test.workbook.existsSheet.R

…

We also split the task into 3 groups of 20 files. Whether this is required or not will likely change depending on implementation details or even the current load on the Jules service. You could also create a single task with all the files. Jules frequently seeks confirmation on its progress during complex, multi-step tasks to ensure it’s on the right track.

This could also be a reason to split such tasks in multiple sessions if you are not close to using up your quota. It is now also easier to do with the (Jules Tools) CLI and API.

Explicit Self-Review

After a number of iterations, we wanted to systematically investigate the fidelity of our ported test suite to the original in RUnit. To do so, we first asked Jules itself to perform a review with this goal in mind:

Check the testthat tests against the RUnit reference implementation according to the following plan:

List all files matching the patterns runit.<name>.R (RUnit) and test.<name>.R (testthat).

Pair files with matching <name>

For each pair, compare their logic and note any discrepancies that could cause different test results.

1. Pairing RUnit and testthat Files

For each file in unitTests with the pattern runit.<name>.R, there is a corresponding file in testthat with the pattern test.<name>.R. For example:

runit.writeAndReadWorksheet.R ↔ test.writeAndReadWorksheet.R

runit.workbook.getReferenceCoordinates.R ↔ test.workbook.getReferenceCoordinates.R

…and so on for each <name>.

2. Comparison Process

For each pair:

Compare the logic and assertions.

Note any logical discrepancies that could cause the test results to differ.

Differences in test data or parameters.

Differences in assertion logic.

Missing or extra tests in one file.

Test coverage (are all scenarios tested in both?).

Assertion logic (do they check the same things, in the same way?).

Test data (are the same inputs used?).

Exception handling (are errors expected and checked the same way?).

3. Trigger differences

If any pairs seem to diverge in a significant way, try to make a small modification in the code under R/ to rerun the tests and see if you manage to trigger different behavior in the two test suites. In order for code changes to be effective, you must reinstall the source package. (See AGENTS.md for this and running tests).

If you can’t trigger these, still make a note of these in your final message. If you have a code change that triggers a behavior difference in the suites, commit that change individually.

4. Summary Table (Example)

Test Name Logical Discrepancy? Notes

writeAndReadWorksheet No Fully equivalent

workbook.readNamedRegion Check Ensure all checkException/expect_error and attribute checks match

workbook.readWorksheet Check Many error/edge cases—verify all are ported

… … …

Test Name	Logical Discrepancy?	Notes
`writeAndReadWorksheet`	No	Fully equivalent
`workbook.readNamedRegion`	Check	Ensure all `checkException`/`expect_error` and attribute checks match
`workbook.readWorksheet`	Check	Many error/edge cases—verify all are ported
…	…	…

The above prompt was generated with an interactive LLM tool (in this case GitHub Copilot), then slimmed down and edited manually. It is often useful to generate more detailed instructions based on your requirements; even if you end up not using them, they will often specify aspects of the task that were implicit or ambiguous in your initial problem statement.

In this task, Jules produced a response with many detailed notes regarding the differences between the RUnit and testthat implementations. Most of them were regarding cosmetic differences, but some hinted at issues in the ported code where previous tasks had incorrectly simplified behavior. For example, in the case of workbook.readNamedRegion, Notes were:

Assertion styles differ. Differences in explicit attribute checking (worksheetScope), but core logic for numerous scenarios (data types, keep/drop, errors, cached values, bug fixes) is equivalent.

This was an understated hint to review the test.workbook.readNamedRegion.R file for inconsistencies, and to look into more detail at the differences between RUnit and testthat regarding attribute checks.

Note that point three in the prompt did not elicit usable output from Jules – it was not able to find differences between the test suites (this would be quite a difficult task in any case, but we wanted to see what would happen here).

We adapted some code manually for a few cases to understand how to fix the issue – see waldo – compare for details. We then had Jules apply the solution across all test files:

in testthat tests, replace ignore_attr = TRUE, check.names = TRUE in expect_equals calls with a value of ignore_attr=c("worksheetScope"), which will result in ignoring the relevant attribute only….

In this case, Jules initially produced a “lazy” plan:

…

Modify the test file

I will run the tests in tests/testthat/test.writeAndReadNamedRegion.R to ensure that my changes are correct and have not introduced any regressions. …

We had to follow-up to force it to widen its search:

Please thoroughly check the code for all calls. There are more files with this pattern. sometimes the arguments are not all on the same line.

This hints at the advantage of directly specifying the scope of a change in terms of target files or directories, when possible. This is now easier with the recently added file selector.

“Third-Party” Code Review

As an additional review step, we wanted to use a different model and agent to check the tests. To iterate over each file pair, we used a local agent: Aider. It is an open source program which can connect to most models via API. Here we used the Anthropic API with Claude 4 Sonnet to compare the RUnit and testthat scripts. We chose Aider based on familiarity and the ability to script it to automate the pairwise comparison, each pair of files being a distinct task, as well as to connect it to different model providers. The script gathers the file pairs then runs an Aider command for each of them. That command looks like the following:

    # ... inside a loop; we have defined paths to a  RUnit test file ($unitfile) and a testthat test file ($testfile), and a correspondingly named markdown ($md) file.
    aider \
      --no-auto-commits \
      --message "Compare RUnit vs testthat for $name, summarise any differences in $md" \
      "$unitfile" "$testfile" "$md" || {
        echo "Warning: aider failed for $name, continuing..."
        continue
    }

We then summarized the findings, again with Aider, using the prompt

these files contain many duplicate comments regarding RUnit tests being migrated to testthat. Summarize the changes, making a single entry for each point, and avoid repeating the same point when it is made across multiple files.

This left us with a handful of issues, few enough to investigate and fix manually if necessary.

Conclusion

After a final “classic” coverage check, we can now be confident that the migrated testthat suite is faithful to the semantics and scope of the original RUnit suite! Jules and other tools were important to achieve this task. The experience so far shows that experimentation and detailed review of outputs are very much needed to get useful results out of AI-enabled tools. Trying multiple tools, sometimes in combination, is a good way to find out about possible ways that they can support your workflow. It is also important to remember that things are evolving fast — Jules itself sees significant improvements and additions on a monthly basis, and has already evolved since the work described in the present post.

From RUnit to testthat with Coding Agent Support

Migrating a Test Suite

Context Preparation

Environment Setup

First task and Iterating on instructions

A Tedious but Simple Task: `expect_equal` Argument Order

Explicit Self-Review

1. Pairing RUnit and testthat Files

2. Comparison Process

3. Trigger differences

4. Summary Table (Example)

“Third-Party” Code Review

Conclusion

See also

Related

Migrating a Test Suite

Context Preparation

Environment Setup

First task and Iterating on instructions

A Tedious but Simple Task: expect_equal Argument Order

Explicit Self-Review

1. Pairing RUnit and testthat Files

2. Comparison Process

3. Trigger differences

4. Summary Table (Example)

“Third-Party” Code Review

Conclusion

See also

Related

A Tedious but Simple Task: `expect_equal` Argument Order