Functions over Idioms – Writing R in Python with rfuns

Jonathan Carroll

10 hours ago

[This article was first published on rstats on Irregularly Scheduled Programming, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

If you’ve read any of my past posts you know I like to program in several different languages, some of which I like more than others. Sometimes a problem calls for a particular language to be used, and with that comes adjusting one’s brain to thinking in that language and using the appropriate idioms to leverage that language’s features. But what if I don’t want to?

I don’t want to

The line between R and Python has been heavily blurred the last few years, particularly with {reticulate} enabling us to use Python within R code, RStudio rebranding as Posit and taking on a strong Python development effort, releasing Positron as a multi-language IDE, and Quarto being a multi-language rethink of Rmarkdown.

I occasionally need to use Python directly – an SDK wrapping an API exists and I don’t particularly want to spend a lot of time writing my own R version, especially before I know what I want to get out of the endpoints. At this point I tend to bump up against my muscle-memory from R and try to use functions I’m familiar with from R, but which don’t actually exist in Python. Now, that might sometimes be because the pattern I’m trying to encode simply has a different name in Python; instead of an sapply(x, f)

sapply(c(2, 3, 4, 5), \(x) x ^ 2)
## [1]  4  9 16 25

I should reach for map, in which case I am reminded that this produces a lazy iterator that doesn’t show me the results

map(lambda x: x ** 2, [2, 3, 4, 5])
## <map object at 0x10d7fbee0>

and so I need to wrap it into a list to get the values out

list(map(lambda x: x ** 2, [2, 3, 4, 5]))
## [4, 9, 16, 25]

Or, I could use a list comprehension which isn’t lazy

[v ** 2 for v in [2, 3, 4, 5]]
## [4, 9, 16, 25]

That’s the idiom that I should be reaching for. Sure.

Other times there’s a package I need to use and a slightly different way of approaching the problem. In R I love the table() function for getting histogram-like counts of the unique values of a vector

table(c("b", "a", "c", "a", "b", "a"))
## 
## a b c 
## 3 2 1

which in Python looks like

from collections import Counter

sorted(Counter(["b", "a", "c", "a", "b", "a"]).items())
## [('a', 3), ('b', 2), ('c', 1)]

Probably Pythonistas remember that idiom and the package to import and the .items() extractor and the fact that they maybe want to sort the result. But I kept coming back to a question I ask myself: what if I don’t want to? Why is there not a function that wraps this idiom? If there was, why not just call it “table”? Admittedly, it’s far from the catchiest, most memorable, or most useful name, but it’s immediately recognisable to an R user (ditto for “sapply”).

One approach I considered here was to just call R from Python. That can be done, but I doubt I or anyone else wants to deal with that every time we want to iterate over a list. There’s a package on the Python package index which seems to support this nicely: https://pypi.org/project/r-functions/ but it’s wrappers around individual R files, via RScript. I’m thinking more along the lines of ‘native Python with an R interface’.

Python is an object-oriented language, but it has functions, so why not make one

from collections import Counter

def table(x):
    return dict(sorted(Counter(x).items()))

table(["b", "a", "c", "a", "b", "a"])
## {'a': 3, 'b': 2, 'c': 1}
def sapply(x, func):
    return [func(v) for v in x]
  
sapply([2, 3, 4, 5], lambda x: x ** 2)
## [4, 9, 16, 25]

and have a nicer function interface to apply these idioms? I thought about this a bit longer, and realised there’s lots of functions I use in R that I wish I could use in Python. An idiom for finding the index of elements of a ‘vector’ (list in Python) which are true (TRUE in R, True in Python) is

[i for i, v in enumerate(x) if v]

but I just want to call which(x)

which(c(FALSE, FALSE, TRUE, FALSE , TRUE))
## [1] 3 5

so why not define this

def which(x):
    return [i for i, v in enumerate(x) if v]
  
which([False, False, True, False, True])
## [2, 4]

(remembering that Python is 0-indexed).

How far could one take this? Quite a long way!

I thought more about what differences would need to be accounted for, and one that immediately came to mind was that R is vectorised. If I was to recreate R’s character counting function nchar(s) as essentially len(s), I’d need to consider whether I wanted it to work on a single string or a ‘vector’ of strings

In R:

nchar(c("these", "all", "have", "different", "lengths"))
## [1] 5 3 4 9 7

But in Python, len() expects a single value, so it calculates the length of the list

len(["these", "all", "have", "different", "lengths"])
## 5

The ‘proper’ way to do it is to map over the list

[len(s) for s in ["these", "all", "have", "different", "lengths"]]
## [5, 3, 4, 9, 7]

but again, why do I need to use an idiom for this? What if I just made a decorator to change a regular function to a vectorised one by applying this list comprehension internally when it’s passed a list (or a tuple), and which otherwise just evaluates the function with the argument?

import functools

def make_vec(func):
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        if isinstance(args[0], (list, tuple)):
            return [func(xi, *args[1:], **kwargs) for xi in args[0]]
        return func(*args, **kwargs)
    return wrapper

@make_vec
def my_len(s):
    return len(s)

my_len(["these", "all", "have", "different", "lengths"])
## [5, 3, 4, 9, 7]

and I could name it… “nchar”!

The other use-case that came to mind was Elio venting (and referencing a post to which I also wrote a sort of response) that they needed to list the files in the current directory

< svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="32" height="32" viewBox="0 0 79 75">< path d="M63 45.3v-20c0-4.1-1-7.3-3.2-9.7-2.1-2.4-5-3.7-8.5-3.7-4.1 0-7.2 1.6-9.3 4.7l-2 3.3-2-3.3c-2-3.1-5.1-4.7-9.2-4.7-3.5 0-6.4 1.3-8.6 3.7-2.1 2.4-3.1 5.6-3.1 9.7v20h8V25.9c0-4.1 1.7-6.2 5.2-6.2 3.8 0 5.8 2.5 5.8 7.4V37.7H44V27.1c0-4.9 1.9-7.4 5.8-7.4 3.5 0 5.2 2.1 5.2 6.2V45.3h8ZM74.7 16.6c.6 6 .1 15.7.1 17.3 0 .5-.1 4.8-.1 5.3-.7 11.5-8 16-15.6 17.5-.1 0-.2 0-.3 0-4.9 1-10 1.2-14.9 1.4-1.2 0-2.4 0-3.6 0-4.8 0-9.7-.6-14.4-1.7-.1 0-.1 0-.1 0s-.1 0-.1 0 0 .1 0 .1 0 0 0 0c.1 1.6.4 3.1 1 4.5.6 1.7 2.9 5.7 11.4 5.7 5 0 9.9-.6 14.8-1.7 0 0 0 0 0 0 .1 0 .1 0 .1 0 0 .1 0 .1 0 .1.1 0 .1 0 .1.1v5.6s0 .1-.1.1c0 0 0 0 0 .1-1.6 1.1-3.7 1.7-5.6 2.3-.8.3-1.6.5-2.4.7-7.5 1.7-15.4 1.3-22.7-1.2-6.8-2.4-13.8-8.2-15.5-15.2-.9-3.8-1.6-7.6-1.9-11.5-.6-5.8-.6-11.7-.8-17.5C3.9 24.5 4 20 4.9 16 6.7 7.9 14.1 2.2 22.3 1c1.4-.2 4.1-1 16.5-1h.1C51.4 0 56.7.8 58.1 1c8.4 1.2 15.5 7.5 16.6 15.6Z" fill="currentColor"/>
Post by @eliocamp@mastodon.social

View on Mastodon

with the idiom

import os

[os.path.join(path, f) for f in os.listdir(path)]

The supplied suggestions included

from pathlib import Path

list(Path(path).iterdir())

(just rolls off the tongue, doesn’t it?) which returns a list of PosixPath() objects and is hardly easy to parse visually.

So, why not have a function?!?

import os

def list_files(path):
    return [os.path.join(path, f) for f in os.listdir(path)]

path = "path/to/files"

list_files(path)
## ['path/to/files/file1.txt', 'path/to/files/file2.txt', 'path/to/files/file3.csv']

I would have liked to call this list.files() but, since Python strictly uses the dot for method calling, it can’t be that.

This then raises the question of “should I support the arguments already in the R functions?” In this case, should it support a recursive argument? Yes, that adds complexity, but it’s surely do-able. At this point I reached for some AI assistance and had Claude help me to implement as many functions as we could think of, supporting as many common arguments as possible. This involved extending the decorator to support vectorising other arguments (which also need to be careful about dots).

On testing it out, it looked like we had something viable.

One last piece I wanted to support, though: the which() example above extracts the elements of a logical vector which are True, but in order to build that vector in the first place, I would naturally leverage R’s vectorisation as an array language. The two steps involved here are to first compute the comparison resulting in a logical vector, then to use which() to identify the indices of those which are true

which(c("c", "b", "a", "c", "a", "b") == "a")
## [1] 3 5

The vectorisation decorator above doesn’t help here, because it’s at the point of == that we want to vectorise

['c', 'b', 'a', 'c', 'a', 'b'] == 'a'
## False

This is False because the character 'a' is not equal to the given list.

The appropriate idiom is once again to use a list comprehension

which(x == 'a' for x in ['c', 'b', 'a', 'c', 'a', 'b'])
## [2, 4]

The solution I’m fond of is to create a new ‘Vec’ class which wraps binary operators with a list comprehension, again abstracting away this detail. This means implementing __eq__, __add__, __and__ and lots of other binary operations, but with that, and a wrapper to create such an object, the comparison operators can be vectorised

vals = vec(['c', 'b', 'a', 'c', 'a', 'b'])
which(vals == 'a')
## [2, 4]

Not pristine, but quite clean, if you ask me.

With all these pieces in place, adding implementations for common base R functions including most arguments and a way to vectorise lists, I wrapped everything up into a Python package (my first) to learn how to do it.

The workflow isn’t particularly painful, with my biggest complication being different versions of Python supporting different requirements in pyproject.toml, and so some GitHub Actions are failing because of that.

As part of building out the implementations I had Claude add tests for each of the functions with some expected values – if I do want to improve some of the idioms internally, I want to ensure I don’t change the values produced. That works for having any testing at all, but how can I be sure that I’m reproducing what I would get if I was working in R? One option was to just run all of the test functions by hand and confirm that the values look similar enough, accounting for list vs vector and 0 vs 1 indexing. Instead, Claude managed to write an adaptor for pytest which does the realignment of e.g. list_files to list.files (and similarly for arguments), realigns the indexing where needed, and runs all existing tests directly in R via rpy2 (skipping over some for which I don’t have tests yet). I’m disabling automated testing of this because I suspect it could get flaky dealing with both R and Python on GitHub Actions, but I can confirm that all the current tests pass.

I wanted to have a documentation website similar to what we have via {pkgdown} and came across quartodoc which is what the Python version of {pins} uses. Getting that to work required downgrading a specific Python dependency, but was otherwise painless.

I have a working package locally – how do I share it? This seemed like the perfect opportunity to learn what the release process looks like for Python. I have a handful of packages on CRAN and one on Bioconductor, and the process there is far from frictionless, with the side-effect that there’s some trust you can place on the interoperability of packages and minimal (automated) code checking. While Python is more ‘wild west’ in terms of what can be uploaded, it’s really nice to see that they do have an entirely separate test server where you can upload your package and see how it looks. I’m reminded of the quote

Everybody has a testing environment. Some people are lucky enough to have a totally separate environment to run production in.

Given that it’s not currently possible to run 100% of the CRAN checks locally (and even some that you can give a different result to what’s on their systems) this does make me a little jealous. I wonder whether the decrease in load from rejecting failing submissions would offset supporting a test submission server.

All went well pushing to the test server (via an authentication key) and I managed to build up the courage to push to the production instance… it’s live!

rfuns logo – R functions in Python… are fun

and the documentation site isn’t too bad, either (in my opinion).

This means that you can now run

uv add rfuns

(or the equivalent in whatever virtual environment management configuration you’re using, e.g. pip install rfuns) and start using some R functions directly in Python!

Depending on how you like to manage your imports, you can import everything

from rfuns import *

which([False, False, True, False, True])
## [2, 4]

or, if you prefer to namespace

import rfuns as r

r.which([False, False, True, False, True])
## [2, 4]

The list of functions currently imported, grouped into sections is:

Strings

nchar(x)
nzchar(x)
paste(*args, sep=" ", collapse=None)
paste0(*args, collapse=None)
grepl(pattern, x, ignore_case=False, fixed=False)
grep(pattern, x, ignore_case=False, fixed=False, value=False, invert=False)
gsub(pattern, replacement, x, ignore_case=False, fixed=False)
sub(pattern, replacement, x, ignore_case=False, fixed=False)
trimws(x, which="both", whitespace=r"[ \t\r\n]")
toupper(x)
tolower(x)
startsWith(x, prefix)
endsWith(x, suffix)
strsplit(x, split, fixed=False)
substr(x, start, stop)
chartr(old, new, x)
formatC(x, digits=6, format="g", width=None)

Vectors

which(x)
which_min(x)
which_max(x)
diff(x, lag=1)
cumsum(x)
cumprod(x)
cummax(x)
cummin(x)
rev(x)
duplicated(x)
setdiff(x, y)
intersect(x, y)
union(x, y)
unique(x)
seq_along(x)
seq_len(n)
seq(from_=0, to=None, by=None, length_out=None) (from is a reserved keyword)
sign(x)
r_range(x) (renamed to not conflict with range())

Math

sign(x)
trunc(x)
ceiling(x)
floor(x)
sqrt(x)
log(x, base=None)
log2(x)
log10(x)
exp(x)
abs(x)
var(x, na_rm=False)
sd(x, na_rm=False)
mean(x, na_rm=False)
median(x, na_rm=False)
quantile(x, probs=None, na_rm=False)
scale(x, center=True, scale_=True)
round(x, digits=0)

Files

list_files(path=".", pattern=None, all_files=False, full_names=False, recursive=False, ignore_case=False, include_dirs=False, no_dot=False)
file_exists(path)
dir_exists(path)
basename(path)
dirname(path)
file_path(*args)

Table

table(x)
prop_table(x)
margin_table(x)

Functional

lapply(x, func)
sapply(x, func)
vapply(x, func, expected_type)
tapply(x, index, func)
rapply(x, func)
Filter(func, x)
Map(func, *args)
Reduce(func, x, init=None, accumulate=False)

Inspect

head(x, n=6)
tail(x, n=6)
length(x)
nrow(x)
ncol(x)
dim(x)
summary(x)
rstr(x) (renamed to not conflict with str())

Utils

vec(x)

Some of these are vectorised

nchar(["these", "all", "have", "different", "lengths"])
## [5, 3, 4, 9, 7]
grepl("ar", ["frog", "carpet", "basket", "dart"])
## [False, True, False, True]
sqrt([36, 81, 9])
## [6.0, 9.0, 3.0]

while others (approximately, up to 0-indexing) preserve the R behaviour, such as how seq() works

seq(5)
## [0, 1, 2, 3, 4]
seq(from_=0, to=10, by=2)
## [0, 2, 4, 6, 8, 10]

(note that from is a keyword in Python, so the argument here is now from_) and set operations

setdiff([5, 2, 4, 1], [2, 1])
## [5, 4]

whereas this does not preserve order

set([5, 2, 4, 1]) -  set([2, 1])
## {4, 5}

Doing all of this myself would have taken quite some time, so I’m grateful to be able to direct an agent towards accomplishing some of the tedious parts of this project. I still drove the decision making and made sure to verify outputs, so I don’t consider this a ‘vibe-coded’ project.

I’m not recommending you use this in production at all – I’ve taken whatever idiom I could find (or generate) for the internals of all of these, and haven’t paid any attention to their performance. The goal was to make it easier for me to work interactively in a REPL when I’m reaching for particular functions. That being said, I’ll gladly do my best to understand the Pythonic versions as best as I can so that I can better appreciate native Python and use the idioms when my helper package isn’t available (or unsuitable). I’d say it’s fair to argue that R users using Python should learn how to do things in a Pythonic way, but I also just want to get some small things done occasionally, so I’m happy this now exists.

If you’re working with non-R colleagues then introducing these abstractions — while they may make your life simpler in the moment — will probably result in confusion as you’re hiding away the implementation and giving it a name they won’t recognise. That’s precisely what functions are for (with helpful names), of course, but unless this package becomes popular, I’ll bet that the inline idioms are more welcomed in a codebase.

I’d love to hear what people think about this, although I’m entirely fine with me being the sole user of it. Should I just force my muscle-memory to take on the Python idioms? Am I going to be punished for ‘crossing the streams’ of two incompatible languages? Would this be helpful to you? Are there other considerations I’ve missed? As always, I can be found on Mastodon and the comment section below.

Shoutouts to Elio Campitelli and Michael Sumner for feedback on a draft of this post.

< details> < summary> devtools::session_info()

## ─ Session info ───────────────────────────────────────────────────────────────
##  setting  value
##  version  R version 4.5.3 (2026-03-11)
##  os       macOS Tahoe 26.3.1
##  system   aarch64, darwin20
##  ui       X11
##  language (EN)
##  collate  en_US.UTF-8
##  ctype    en_US.UTF-8
##  tz       Australia/Adelaide
##  date     2026-05-22
##  pandoc   3.6.3 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/aarch64/ (via rmarkdown)
##  quarto   1.7.31 @ /usr/local/bin/quarto
## 
## ─ Packages ───────────────────────────────────────────────────────────────────
##  package     * version date (UTC) lib source
##  blogdown      1.23    2026-01-18 [1] CRAN (R 4.5.2)
##  bookdown      0.46    2025-12-05 [1] CRAN (R 4.5.2)
##  bslib         0.10.0  2026-01-26 [1] CRAN (R 4.5.2)
##  cachem        1.1.0   2024-05-16 [1] CRAN (R 4.5.0)
##  cli           3.6.5   2025-04-23 [1] CRAN (R 4.5.0)
##  devtools      2.4.6   2025-10-03 [1] CRAN (R 4.5.0)
##  digest        0.6.39  2025-11-19 [1] CRAN (R 4.5.2)
##  ellipsis      0.3.2   2021-04-29 [1] CRAN (R 4.5.0)
##  evaluate      1.0.5   2025-08-27 [1] CRAN (R 4.5.0)
##  fastmap       1.2.0   2024-05-15 [1] CRAN (R 4.5.0)
##  fs            1.6.7   2026-03-06 [1] CRAN (R 4.5.2)
##  glue          1.8.1   2026-04-17 [1] CRAN (R 4.5.2)
##  htmltools     0.5.9   2025-12-04 [1] CRAN (R 4.5.2)
##  jquerylib     0.1.4   2021-04-26 [1] CRAN (R 4.5.0)
##  jsonlite      2.0.0   2025-03-27 [1] CRAN (R 4.5.0)
##  knitr         1.51    2025-12-20 [1] CRAN (R 4.5.2)
##  lattice       0.22-9  2026-02-09 [1] CRAN (R 4.5.3)
##  lifecycle     1.0.5   2026-01-08 [1] CRAN (R 4.5.2)
##  magrittr      2.0.4   2025-09-12 [1] CRAN (R 4.5.0)
##  Matrix        1.7-4   2025-08-28 [1] CRAN (R 4.5.3)
##  memoise       2.0.1   2021-11-26 [1] CRAN (R 4.5.0)
##  otel          0.2.0   2025-08-29 [1] CRAN (R 4.5.0)
##  pkgbuild      1.4.8   2025-05-26 [1] CRAN (R 4.5.0)
##  pkgload       1.5.0   2026-02-03 [1] CRAN (R 4.5.2)
##  png           0.1-9   2026-03-15 [1] CRAN (R 4.5.2)
##  purrr         1.2.2   2026-04-10 [1] CRAN (R 4.5.2)
##  R6            2.6.1   2025-02-15 [1] CRAN (R 4.5.0)
##  Rcpp          1.1.1   2026-01-10 [1] CRAN (R 4.5.2)
##  remotes       2.5.0   2024-03-17 [1] CRAN (R 4.5.0)
##  reticulate    1.45.0  2026-02-13 [1] CRAN (R 4.5.2)
##  rlang         1.1.7   2026-01-09 [1] CRAN (R 4.5.2)
##  rmarkdown     2.30    2025-09-28 [1] CRAN (R 4.5.0)
##  rstudioapi    0.18.0  2026-01-16 [1] CRAN (R 4.5.2)
##  sass          0.4.10  2025-04-11 [1] CRAN (R 4.5.0)
##  sessioninfo   1.2.3   2025-02-05 [1] CRAN (R 4.5.0)
##  usethis       3.2.1   2025-09-06 [1] CRAN (R 4.5.0)
##  vctrs         0.7.1   2026-01-23 [1] CRAN (R 4.5.2)
##  withr         3.0.2   2024-10-28 [1] CRAN (R 4.5.0)
##  xfun          0.56    2026-01-18 [1] CRAN (R 4.5.2)
##  yaml          2.3.12  2025-12-10 [1] CRAN (R 4.5.2)
## 
##  [1] /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library
## 
## ─ Python configuration ───────────────────────────────────────────────────────
##  python:         /Users/jono/.cache/uv/archive-v0/Q1veGTfRq3GBaNYBXjagV/bin/python
##  libpython:      /Users/jono/.local/share/uv/python/cpython-3.12.12-macos-aarch64-none/lib/libpython3.12.dylib
##  pythonhome:     /Users/jono/.cache/uv/archive-v0/Q1veGTfRq3GBaNYBXjagV:/Users/jono/.cache/uv/archive-v0/Q1veGTfRq3GBaNYBXjagV
##  virtualenv:     /Users/jono/.cache/uv/archive-v0/Q1veGTfRq3GBaNYBXjagV/bin/activate_this.py
##  version:        3.12.12 (main, Oct 28 2025, 11:52:25) [Clang 20.1.4 ]
##  numpy:          /Users/jono/.cache/uv/archive-v0/Q1veGTfRq3GBaNYBXjagV/lib/python3.12/site-packages/numpy
##  numpy_version:  2.4.6
##  
##  NOTE: Python version was forced by VIRTUAL_ENV
## 
## ──────────────────────────────────────────────────────────────────────────────

To leave a comment for the author, please follow the link and comment on their blog: rstats on Irregularly Scheduled Programming.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.