Running local LLMs on your NPU from R with Foundry Local and ellmer
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
TLDR: this post summarizes how I was able to leverage my Surface Pro 11’s Neural Processing Unit (NPU) to chat with Large Language Models (LLMs) from R (for some reason). The code has been adapted from this guide from Microsoft.

While I’ve been obsessively utilizing LLMs since the hype train first set off, in the last 12 months I’ve been experimenting with local LLMs so I can better utilize them in my work. For the most part, I’ve been using my laptop’s RTX 4060 for this, but I’ve long been curious what the integrated NPU in my Surface Pro 11 can (or can’t) do. Unfortunately, at the time of writing, neither Ollama or LM Studio1 natively support this functionality which has made satisfying my curiosity trickier than I’d like.
At the outset, I’m still not entirely sure this is a question worth answering (or a blog worth writing) given the RTX 4060’s performance thoroughly trumps that of my NPU, but I’m a tinkerer and the idea of being stranded without a stochastic parrot on hand is nothing short of terrifying.
Foundry Local
Before starting, you’ll need to install Foundry Local which can be installed using the Bash command below:
npm install foundry-local-sdk-winml openai
You’ll also need to make sure that Foundry is available on PATH, so your OS knows how to launch it (see here).
The R Code
Microsoft provides a getting started guide here, which includes a set of Python code that this was based on (with Claude’s help):
# Load necessary packages ---------------------------------------------------
library(ellmer) # LLM chat interface (chat_openai_compatible)
library(httr2) # resolve the exact loaded model id from /v1/models
# Set project assumptions and define functions ------------------------------
ref_model_alias <- "qwen2.5-0.5b"
ref_prompt <- "What is the golden ratio?"
# fnc_foundry_load: ensure the service is up and the model is loaded.
fnc_foundry_load <- function(alias) {
if (Sys.which("foundry") == "") {
stop("`foundry` CLI not found on PATH. Install Foundry Local first.")
}
system2("foundry", c("service", "start"))
system2("foundry", c("model", "download", alias))
system2("foundry", c("model", "load", alias))
invisible(alias)
}
# fnc_foundry_endpoint: discover the service base URL (port is dynamic).
fnc_foundry_endpoint <- function() {
tmp_status <- system2("foundry", c("service", "status"), stdout = TRUE)
tmp_status <- iconv(paste(tmp_status, collapse = " "),
to = "ASCII", sub = " ")
tmp_hostport <- regmatches(
tmp_status,
regexpr("[0-9]{1,3}(\\.[0-9]{1,3}){3}:[0-9]+", tmp_status)
)
if (length(tmp_hostport) == 0) {
stop("Could not parse endpoint from status: ", tmp_status)
}
paste0("http://", tmp_hostport[1])
}
# fnc_model_id: resolve the concrete model id required by the REST API.
fnc_model_id <- function(base, alias) {
tmp_models <- request(paste0(base, "/v1/models")) |>
req_perform() |>
resp_body_json(simplifyVector = FALSE)
tmp_ids <- vapply(tmp_models$data, \(m) m$id, character(1))
tmp_hit <- tmp_ids[grepl(alias, tmp_ids, fixed = TRUE)]
if (length(tmp_hit)) tmp_hit[1] else tmp_ids[1]
}
# fnc_foundry_unload: release the model from memory.
fnc_foundry_unload <- function(alias) {
system2("foundry", c("model", "unload", alias))
invisible(alias)
}
fnc_foundry_load(ref_model_alias)
ref_endpoint <- fnc_foundry_endpoint()
ref_model_id <- fnc_model_id(ref_endpoint, ref_model_alias)
cat("Model loaded and ready.\n")
# Point ellmer at the local OpenAI-compatible endpoint. Foundry Local needs
mod_chat <- chat_openai_compatible(
base_url = paste0(ref_endpoint, "/v1"),
name = "foundry-local",
credentials = \() "not-needed",
model = ref_model_id,
echo = "output"
)
#send prompt to local model
rlt_reply <- mod_chat$chat(ref_prompt)
- Although there were rumors that LM Studio was working on this feature based on this video
︎
The post Running local LLMs on your NPU from R with Foundry Local and ellmer appeared first on Giles.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.