Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
If you’ve read any of my past posts you know I like to program in several different languages, some of which I like more than others. Sometimes a problem calls for a particular language to be used, and with that comes adjusting one’s brain to thinking in that language and using the appropriate idioms to leverage that language’s features. But what if I don’t want to?
The line between R and Python has been heavily blurred the last few years, particularly with {reticulate} enabling us to use Python within R code, RStudio rebranding as Posit and taking on a strong Python development effort, releasing Positron as a multi-language IDE, and Quarto being a multi-language rethink of Rmarkdown.
I occasionally need to use Python directly – an SDK wrapping an API exists and
I don’t particularly want to spend a lot of time writing my own R version,
especially before I know what I want to get out of the endpoints. At this point
I tend to bump up against my muscle-memory from R and try to use functions I’m
familiar with from R, but which don’t actually exist in Python. Now, that might
sometimes be because the pattern I’m trying to encode simply has a different name
in Python; instead of an sapply(x, f)
sapply(c(2, 3, 4, 5), \(x) x ^ 2) ## [1] 4 9 16 25
I should reach for map, in which case I am reminded that this produces a lazy
iterator that doesn’t show me the results
map(lambda x: x ** 2, [2, 3, 4, 5]) ## <map object at 0x10d7fbee0>
and so I need to wrap it into a list to get the values out
list(map(lambda x: x ** 2, [2, 3, 4, 5])) ## [4, 9, 16, 25]
Or, I could use a list comprehension which isn’t lazy
[v ** 2 for v in [2, 3, 4, 5]] ## [4, 9, 16, 25]
That’s the idiom that I should be reaching for. Sure.
Other times there’s a package I need to use and a slightly different way of
approaching the problem. In R I love the table() function for getting
histogram-like counts of the unique values of a vector
table(c("b", "a", "c", "a", "b", "a"))
##
## a b c
## 3 2 1
which in Python looks like
from collections import Counter
sorted(Counter(["b", "a", "c", "a", "b", "a"]).items())
## [('a', 3), ('b', 2), ('c', 1)]
Probably Pythonistas remember that idiom and the package to import and the
.items() extractor and the fact that they maybe want to sort the result. But I
kept coming back to a question I ask myself: what if I don’t want to? Why is
there not a function that wraps this idiom? If there was, why not just call it
“table”? Admittedly, it’s far from the catchiest, most memorable, or most useful
name, but it’s immediately recognisable to an R user (ditto for “sapply”).
One approach I considered here was to just call R from Python. That can be done, but I doubt I or anyone else wants to deal with that every time we want to iterate over a list. There’s a package on the Python package index which seems to support this nicely: https://pypi.org/project/r-functions/ but it’s wrappers around individual R files, via RScript. I’m thinking more along the lines of ‘native Python with an R interface’.
Python is an object-oriented language, but it has functions, so why not make one
from collections import Counter
def table(x):
return dict(sorted(Counter(x).items()))
table(["b", "a", "c", "a", "b", "a"])
## {'a': 3, 'b': 2, 'c': 1}
def sapply(x, func):
return [func(v) for v in x]
sapply([2, 3, 4, 5], lambda x: x ** 2)
## [4, 9, 16, 25]
and have a nicer function interface to apply these idioms? I thought about this
a bit longer, and realised there’s lots of functions I use in R that I wish
I could use in Python. An idiom for finding the index of elements of a ‘vector’
(list in Python) which are true (TRUE in R, True in Python) is
[i for i, v in enumerate(x) if v]
but I just want to call which(x)
which(c(FALSE, FALSE, TRUE, FALSE , TRUE)) ## [1] 3 5
so why not define this
def which(x):
return [i for i, v in enumerate(x) if v]
which([False, False, True, False, True])
## [2, 4]
(remembering that Python is 0-indexed).
How far could one take this? Quite a long way!
I thought more about what differences would need to be accounted for, and one that
immediately came to mind was that R is vectorised. If I was to recreate R’s
character counting function nchar(s) as essentially len(s), I’d need to consider
whether I wanted it to work on a single string or a ‘vector’ of strings
In R:
nchar(c("these", "all", "have", "different", "lengths"))
## [1] 5 3 4 9 7
But in Python, len() expects a single value, so it calculates the length of
the list
len(["these", "all", "have", "different", "lengths"]) ## 5
The ‘proper’ way to do it is to map over the list
[len(s) for s in ["these", "all", "have", "different", "lengths"]] ## [5, 3, 4, 9, 7]
but again, why do I need to use an idiom for this? What if I just made a decorator to change a regular function to a vectorised one by applying this list comprehension internally when it’s passed a list (or a tuple), and which otherwise just evaluates the function with the argument?
import functools
def make_vec(func):
@functools.wraps(func)
def wrapper(*args, **kwargs):
if isinstance(args[0], (list, tuple)):
return [func(xi, *args[1:], **kwargs) for xi in args[0]]
return func(*args, **kwargs)
return wrapper
@make_vec
def my_len(s):
return len(s)
my_len(["these", "all", "have", "different", "lengths"])
## [5, 3, 4, 9, 7]
and I could name it… “nchar”!
The other use-case that came to mind was Elio venting (and referencing a post to which I also wrote a sort of response) that they needed to list the files in the current directory
< svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="32" height="32" viewBox="0 0 79 75">< path d="M63 45.3v-20c0-4.1-1-7.3-3.2-9.7-2.1-2.4-5-3.7-8.5-3.7-4.1 0-7.2 1.6-9.3 4.7l-2 3.3-2-3.3c-2-3.1-5.1-4.7-9.2-4.7-3.5 0-6.4 1.3-8.6 3.7-2.1 2.4-3.1 5.6-3.1 9.7v20h8V25.9c0-4.1 1.7-6.2 5.2-6.2 3.8 0 5.8 2.5 5.8 7.4V37.7H44V27.1c0-4.9 1.9-7.4 5.8-7.4 3.5 0 5.2 2.1 5.2 6.2V45.3h8ZM74.7 16.6c.6 6 .1 15.7.1 17.3 0 .5-.1 4.8-.1 5.3-.7 11.5-8 16-15.6 17.5-.1 0-.2 0-.3 0-4.9 1-10 1.2-14.9 1.4-1.2 0-2.4 0-3.6 0-4.8 0-9.7-.6-14.4-1.7-.1 0-.1 0-.1 0s-.1 0-.1 0 0 .1 0 .1 0 0 0 0c.1 1.6.4 3.1 1 4.5.6 1.7 2.9 5.7 11.4 5.7 5 0 9.9-.6 14.8-1.7 0 0 0 0 0 0 .1 0 .1 0 .1 0 0 .1 0 .1 0 .1.1 0 .1 0 .1.1v5.6s0 .1-.1.1c0 0 0 0 0 .1-1.6 1.1-3.7 1.7-5.6 2.3-.8.3-1.6.5-2.4.7-7.5 1.7-15.4 1.3-22.7-1.2-6.8-2.4-13.8-8.2-15.5-15.2-.9-3.8-1.6-7.6-1.9-11.5-.6-5.8-.6-11.7-.8-17.5C3.9 24.5 4 20 4.9 16 6.7 7.9 14.1 2.2 22.3 1c1.4-.2 4.1-1 16.5-1h.1C51.4 0 56.7.8 58.1 1c8.4 1.2 15.5 7.5 16.6 15.6Z" fill="currentColor"/>Post by @eliocamp@mastodon.socialView on Mastodon
with the idiom
import os [os.path.join(path, f) for f in os.listdir(path)]
The supplied suggestions included
from pathlib import Path list(Path(path).iterdir())
(just rolls off the tongue, doesn’t it?) which returns a list of PosixPath()
objects and is hardly easy to parse visually.
So, why not have a function?!?
import os
def list_files(path):
return [os.path.join(path, f) for f in os.listdir(path)]
path = "path/to/files"
list_files(path)
## ['path/to/files/file1.txt', 'path/to/files/file2.txt', 'path/to/files/file3.csv']
I would have liked to call this list.files() but, since Python strictly uses
the dot for method calling, it can’t be that.
This then raises the question of “should I support the arguments already in the R
functions?” In this case, should it support a recursive argument? Yes, that
adds complexity, but it’s surely do-able. At this point I reached for some AI
assistance and had Claude help me to implement as many functions as we could think
of, supporting as many common arguments as possible. This involved extending the
decorator to support vectorising other arguments (which also need to be careful
about dots).
On testing it out, it looked like we had something viable.
One last piece I wanted to support, though: the which() example above extracts
the elements of a logical vector which are True, but in order to build that vector
in the first place, I would naturally leverage R’s vectorisation as an array
language. The two steps involved here are to first compute the comparison resulting
in a logical vector, then to use which() to identify the indices of those which are
true
which(c("c", "b", "a", "c", "a", "b") == "a")
## [1] 3 5
The vectorisation decorator above doesn’t help here, because it’s at the point of
== that we want to vectorise
['c', 'b', 'a', 'c', 'a', 'b'] == 'a' ## False
This is False because the character 'a' is not equal to the given list.
The appropriate idiom is once again to use a list comprehension
which(x == 'a' for x in ['c', 'b', 'a', 'c', 'a', 'b']) ## [2, 4]
The solution I’m fond of is to create a new ‘Vec’ class which wraps binary operators
with a list comprehension, again abstracting away this detail. This means
implementing __eq__, __add__, __and__ and lots of other binary operations,
but with that, and a wrapper to create such an object, the comparison operators
can be vectorised
vals = vec(['c', 'b', 'a', 'c', 'a', 'b']) which(vals == 'a') ## [2, 4]
Not pristine, but quite clean, if you ask me.
With all these pieces in place, adding implementations for common base R functions including most arguments and a way to vectorise lists, I wrapped everything up into a Python package (my first) to learn how to do it.
The workflow isn’t particularly painful, with my biggest complication being
different versions of Python supporting different requirements in pyproject.toml,
and so some GitHub Actions are failing because of that.
As part of building out the implementations I had Claude add tests for each of the
functions with some expected values – if I do want to improve some of the idioms
internally, I want to ensure I don’t change the values produced. That works for
having any testing at all, but how can I be sure that I’m reproducing what I
would get if I was working in R? One option was to just run all of the test
functions by hand and confirm that the values look similar enough, accounting for
list vs vector and 0 vs 1 indexing. Instead, Claude managed to write an adaptor
for pytest which does the realignment of e.g. list_files to list.files
(and similarly for arguments), realigns the indexing where needed, and runs all
existing tests directly in R via rpy2 (skipping over some for which I don’t
have tests yet). I’m disabling automated testing of this because I suspect it
could get flaky dealing with both R and Python on GitHub Actions, but I can
confirm that all the current tests pass.
I wanted to have a documentation website similar to what we have via {pkgdown} and came across quartodoc which is what the Python version of {pins} uses. Getting that to work required downgrading a specific Python dependency, but was otherwise painless.
I have a working package locally – how do I share it? This seemed like the perfect opportunity to learn what the release process looks like for Python. I have a handful of packages on CRAN and one on Bioconductor, and the process there is far from frictionless, with the side-effect that there’s some trust you can place on the interoperability of packages and minimal (automated) code checking. While Python is more ‘wild west’ in terms of what can be uploaded, it’s really nice to see that they do have an entirely separate test server where you can upload your package and see how it looks. I’m reminded of the quote
Everybody has a testing environment. Some people are lucky enough to have a totally separate environment to run production in.
Given that it’s not currently possible to run 100% of the CRAN checks locally (and even some that you can give a different result to what’s on their systems) this does make me a little jealous. I wonder whether the decrease in load from rejecting failing submissions would offset supporting a test submission server.
All went well pushing to the test server (via an authentication key) and I managed to build up the courage to push to the production instance… it’s live!
and the documentation site isn’t too bad, either (in my opinion).
This means that you can now run
uv add rfuns
(or the equivalent in whatever virtual environment management configuration you’re
using, e.g. pip install rfuns) and start using some R functions directly in
Python!
Depending on how you like to manage your imports, you can import everything
from rfuns import * which([False, False, True, False, True]) ## [2, 4]
or, if you prefer to namespace
import rfuns as r r.which([False, False, True, False, True]) ## [2, 4]
The list of functions currently imported, grouped into sections is:
Strings
nchar(x)nzchar(x)paste(*args, sep=" ", collapse=None)paste0(*args, collapse=None)grepl(pattern, x, ignore_case=False, fixed=False)grep(pattern, x, ignore_case=False, fixed=False, value=False, invert=False)gsub(pattern, replacement, x, ignore_case=False, fixed=False)sub(pattern, replacement, x, ignore_case=False, fixed=False)trimws(x, which="both", whitespace=r"[ \t\r\n]")toupper(x)tolower(x)startsWith(x, prefix)endsWith(x, suffix)strsplit(x, split, fixed=False)substr(x, start, stop)chartr(old, new, x)formatC(x, digits=6, format="g", width=None)
Vectors
which(x)which_min(x)which_max(x)diff(x, lag=1)cumsum(x)cumprod(x)cummax(x)cummin(x)rev(x)duplicated(x)setdiff(x, y)intersect(x, y)union(x, y)unique(x)seq_along(x)seq_len(n)seq(from_=0, to=None, by=None, length_out=None)(fromis a reserved keyword)sign(x)r_range(x)(renamed to not conflict withrange())
Math
sign(x)trunc(x)ceiling(x)floor(x)sqrt(x)log(x, base=None)log2(x)log10(x)exp(x)abs(x)var(x, na_rm=False)sd(x, na_rm=False)mean(x, na_rm=False)median(x, na_rm=False)quantile(x, probs=None, na_rm=False)scale(x, center=True, scale_=True)round(x, digits=0)
Files
list_files(path=".", pattern=None, all_files=False, full_names=False, recursive=False, ignore_case=False, include_dirs=False, no_dot=False)file_exists(path)dir_exists(path)basename(path)dirname(path)file_path(*args)
Table
table(x)prop_table(x)margin_table(x)
Functional
lapply(x, func)sapply(x, func)vapply(x, func, expected_type)tapply(x, index, func)rapply(x, func)Filter(func, x)Map(func, *args)Reduce(func, x, init=None, accumulate=False)
Inspect
head(x, n=6)tail(x, n=6)length(x)nrow(x)ncol(x)dim(x)summary(x)rstr(x)(renamed to not conflict withstr())
Utils
vec(x)
Some of these are vectorised
nchar(["these", "all", "have", "different", "lengths"])
## [5, 3, 4, 9, 7]
grepl("ar", ["frog", "carpet", "basket", "dart"])
## [False, True, False, True]
sqrt([36, 81, 9])
## [6.0, 9.0, 3.0]
while others (approximately, up to 0-indexing) preserve the R behaviour, such as
how seq() works
seq(5) ## [0, 1, 2, 3, 4] seq(from_=0, to=10, by=2) ## [0, 2, 4, 6, 8, 10]
(note that from is a keyword in Python, so the argument here is now from_)
and set operations
setdiff([5, 2, 4, 1], [2, 1]) ## [5, 4]
whereas this does not preserve order
set([5, 2, 4, 1]) - set([2, 1])
## {4, 5}
Doing all of this myself would have taken quite some time, so I’m grateful to be able to direct an agent towards accomplishing some of the tedious parts of this project. I still drove the decision making and made sure to verify outputs, so I don’t consider this a ‘vibe-coded’ project.
I’m not recommending you use this in production at all – I’ve taken whatever idiom I could find (or generate) for the internals of all of these, and haven’t paid any attention to their performance. The goal was to make it easier for me to work interactively in a REPL when I’m reaching for particular functions. That being said, I’ll gladly do my best to understand the Pythonic versions as best as I can so that I can better appreciate native Python and use the idioms when my helper package isn’t available (or unsuitable). I’d say it’s fair to argue that R users using Python should learn how to do things in a Pythonic way, but I also just want to get some small things done occasionally, so I’m happy this now exists.
If you’re working with non-R colleagues then introducing these abstractions — while they may make your life simpler in the moment — will probably result in confusion as you’re hiding away the implementation and giving it a name they won’t recognise. That’s precisely what functions are for (with helpful names), of course, but unless this package becomes popular, I’ll bet that the inline idioms are more welcomed in a codebase.
I’d love to hear what people think about this, although I’m entirely fine with me being the sole user of it. Should I just force my muscle-memory to take on the Python idioms? Am I going to be punished for ‘crossing the streams’ of two incompatible languages? Would this be helpful to you? Are there other considerations I’ve missed? As always, I can be found on Mastodon and the comment section below.
Shoutouts to Elio Campitelli and Michael Sumner for feedback on a draft of this post.
< details> < summary> devtools::session_info()
## ─ Session info ─────────────────────────────────────────────────────────────── ## setting value ## version R version 4.5.3 (2026-03-11) ## os macOS Tahoe 26.3.1 ## system aarch64, darwin20 ## ui X11 ## language (EN) ## collate en_US.UTF-8 ## ctype en_US.UTF-8 ## tz Australia/Adelaide ## date 2026-05-22 ## pandoc 3.6.3 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/aarch64/ (via rmarkdown) ## quarto 1.7.31 @ /usr/local/bin/quarto ## ## ─ Packages ─────────────────────────────────────────────────────────────────── ## package * version date (UTC) lib source ## blogdown 1.23 2026-01-18 [1] CRAN (R 4.5.2) ## bookdown 0.46 2025-12-05 [1] CRAN (R 4.5.2) ## bslib 0.10.0 2026-01-26 [1] CRAN (R 4.5.2) ## cachem 1.1.0 2024-05-16 [1] CRAN (R 4.5.0) ## cli 3.6.5 2025-04-23 [1] CRAN (R 4.5.0) ## devtools 2.4.6 2025-10-03 [1] CRAN (R 4.5.0) ## digest 0.6.39 2025-11-19 [1] CRAN (R 4.5.2) ## ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.5.0) ## evaluate 1.0.5 2025-08-27 [1] CRAN (R 4.5.0) ## fastmap 1.2.0 2024-05-15 [1] CRAN (R 4.5.0) ## fs 1.6.7 2026-03-06 [1] CRAN (R 4.5.2) ## glue 1.8.1 2026-04-17 [1] CRAN (R 4.5.2) ## htmltools 0.5.9 2025-12-04 [1] CRAN (R 4.5.2) ## jquerylib 0.1.4 2021-04-26 [1] CRAN (R 4.5.0) ## jsonlite 2.0.0 2025-03-27 [1] CRAN (R 4.5.0) ## knitr 1.51 2025-12-20 [1] CRAN (R 4.5.2) ## lattice 0.22-9 2026-02-09 [1] CRAN (R 4.5.3) ## lifecycle 1.0.5 2026-01-08 [1] CRAN (R 4.5.2) ## magrittr 2.0.4 2025-09-12 [1] CRAN (R 4.5.0) ## Matrix 1.7-4 2025-08-28 [1] CRAN (R 4.5.3) ## memoise 2.0.1 2021-11-26 [1] CRAN (R 4.5.0) ## otel 0.2.0 2025-08-29 [1] CRAN (R 4.5.0) ## pkgbuild 1.4.8 2025-05-26 [1] CRAN (R 4.5.0) ## pkgload 1.5.0 2026-02-03 [1] CRAN (R 4.5.2) ## png 0.1-9 2026-03-15 [1] CRAN (R 4.5.2) ## purrr 1.2.2 2026-04-10 [1] CRAN (R 4.5.2) ## R6 2.6.1 2025-02-15 [1] CRAN (R 4.5.0) ## Rcpp 1.1.1 2026-01-10 [1] CRAN (R 4.5.2) ## remotes 2.5.0 2024-03-17 [1] CRAN (R 4.5.0) ## reticulate 1.45.0 2026-02-13 [1] CRAN (R 4.5.2) ## rlang 1.1.7 2026-01-09 [1] CRAN (R 4.5.2) ## rmarkdown 2.30 2025-09-28 [1] CRAN (R 4.5.0) ## rstudioapi 0.18.0 2026-01-16 [1] CRAN (R 4.5.2) ## sass 0.4.10 2025-04-11 [1] CRAN (R 4.5.0) ## sessioninfo 1.2.3 2025-02-05 [1] CRAN (R 4.5.0) ## usethis 3.2.1 2025-09-06 [1] CRAN (R 4.5.0) ## vctrs 0.7.1 2026-01-23 [1] CRAN (R 4.5.2) ## withr 3.0.2 2024-10-28 [1] CRAN (R 4.5.0) ## xfun 0.56 2026-01-18 [1] CRAN (R 4.5.2) ## yaml 2.3.12 2025-12-10 [1] CRAN (R 4.5.2) ## ## [1] /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library ## ## ─ Python configuration ─────────────────────────────────────────────────────── ## python: /Users/jono/.cache/uv/archive-v0/Q1veGTfRq3GBaNYBXjagV/bin/python ## libpython: /Users/jono/.local/share/uv/python/cpython-3.12.12-macos-aarch64-none/lib/libpython3.12.dylib ## pythonhome: /Users/jono/.cache/uv/archive-v0/Q1veGTfRq3GBaNYBXjagV:/Users/jono/.cache/uv/archive-v0/Q1veGTfRq3GBaNYBXjagV ## virtualenv: /Users/jono/.cache/uv/archive-v0/Q1veGTfRq3GBaNYBXjagV/bin/activate_this.py ## version: 3.12.12 (main, Oct 28 2025, 11:52:25) [Clang 20.1.4 ] ## numpy: /Users/jono/.cache/uv/archive-v0/Q1veGTfRq3GBaNYBXjagV/lib/python3.12/site-packages/numpy ## numpy_version: 2.4.6 ## ## NOTE: Python version was forced by VIRTUAL_ENV ## ## ──────────────────────────────────────────────────────────────────────────────
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
