LLMs can’t be trusted to do scientific coding accurately, but humans make mistakes too

[This article was first published on Seascapemodels, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I often hear the comment that LLMs/generative AI (large language models) can’t be trusted for research tasks.

Image Google’s Nano Banana tasked with “Generate an image of a male African researcher holding a balloon that is pulling them up above a tidal wave of AI generated slop that is full of errors. The balloon has a research paper inside of it. Generate the image in the style of a Simpsons cartoon.”

But this is the wrong way to think about LLMs. Humans also can’t be trusted to do scientific research accurately. They make mistakes. That’s why we have systems for review.

The more important question is: Are LLMs more accurate than humans at completing a given task?

I actually think LLMs might lead to better scientific coding and statistical analysis.

A common example of what LLMs get criticised for is writing code or performing statistical analyses. The LLM might hallucinate non-truths, or at least mislead you into thinking the analysis you have done is scientifically accurate.

The implication is that we should not be using them for particular tasks, like designing statistical models.

Its right to be skeptical of AI produced output. However, we also need to be skeptical of human produced output. Humans make mistakes as well.

As scientists peer-review is baked into our culture. But code review is much rarer. We also don’t have many systematic reviews of scientific coding that have quantified the rate of mistakes.

I suspect that mistakes in scientific coding are more common than we’d like to believe.

In one (rare) example, researchers reviewed population modelling analyses and found mathematical errors were common. One type of error occured in 62% of studies!

Now I haven’t set an LLM agent the task of doing the equivalent population models to see what its error rate is. However, my tests (which are under review) of agents at quite complicated stats and ecological modelling are showing 80-90% performance at accurately completing the tasks.

So the LLM agents are potentially doing better than the humans and making fewer mistakes.

Why I think LLMs might lead to better research is that they give us more time for code review.

As an ecological modeller I invest a ton of time into writing code, then checking that code works the way I want (and in a mathematically accurate way).

LLMs are now doing more of the code writing for me. Used effectively, this gives me more time to review the code for accuracy, as well as checking the code is an accurate representation of the scientific theory.

A human with an LLM partner could choose to: (1) produce crap work faster than pre-LLM, OR (2) produce higher quality work in a similar amount of time to what it took them pre-LLM.

I’m arguing that we should be aiming to produce the higher quality work. We can do this if we use LLMs to speed up code, then use the extra time for more quality assurance.

More generally, don’t get fooled by the argument that “genAI makes mistakes, so it can’t be trusted”.

Its the wrong way to think about the problem, and I think will lead us to being blind-sided by the oncoming flood of research slop created with genAI.

A better way to think about it is: “genAI and humans both make mistakes, how can we design workflows so that their strengths complement each other and we produce higher quality work”.

This will give us outcomes that are of higher quality than the pre-LLM world, and hopefully will rise above the huge quantity of AI generated slop that is currently happening.

To leave a comment for the author, please follow the link and comment on their blog: Seascapemodels.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)