Running local LLMs on your NPU from R with Foundry Local and ellmer

Posted on June 27, 2026 by Giles Dickenson-Jones in R bloggers | 0 Comments

[This article was first published on Data Analytics and AI Archives - Giles, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

TLDR: this post summarizes how I was able to leverage my Surface Pro 11’s Neural Processing Unit (NPU) to chat with Large Language Models (LLMs) from R (for some reason). The code has been adapted from this guide from Microsoft.

While I’ve been obsessively utilizing LLMs since the hype train first set off, in the last 12 months I’ve been experimenting with local LLMs so I can better utilize them in my work. For the most part, I’ve been using my laptop’s RTX 4060 for this, but I’ve long been curious what the integrated NPU in my Surface Pro 11 can (or can’t) do. Unfortunately, at the time of writing, neither Ollama or LM Studio¹ natively support this functionality which has made satisfying my curiosity trickier than I’d like.

At the outset, I’m still not entirely sure this is a question worth answering (or a blog worth writing) given the RTX 4060’s performance thoroughly trumps that of my NPU, but I’m a tinkerer and the idea of being stranded without a stochastic parrot on hand is nothing short of terrifying.

Foundry Local

Before starting, you’ll need to install Foundry Local which can be installed using the Bash command below:

npm install foundry-local-sdk-winml openai

You’ll also need to make sure that Foundry is available on PATH, so your OS knows how to launch it (see here).

The R Code

Microsoft provides a getting started guide here, which includes a set of Python code that this was based on (with Claude’s help):

# Load necessary packages ---------------------------------------------------
library(ellmer)   # LLM chat interface (chat_openai_compatible)
library(httr2)    # resolve the exact loaded model id from /v1/models

# Set project assumptions and define functions ------------------------------
ref_model_alias <- "qwen2.5-0.5b"
ref_prompt      <- "What is the golden ratio?"

# fnc_foundry_load: ensure the service is up and the model is loaded.
fnc_foundry_load <- function(alias) {
  if (Sys.which("foundry") == "") {
    stop("`foundry` CLI not found on PATH. Install Foundry Local first.")
  }
  system2("foundry", c("service", "start"))
  system2("foundry", c("model", "download", alias))
  system2("foundry", c("model", "load", alias))
  invisible(alias)
}

# fnc_foundry_endpoint: discover the service base URL (port is dynamic).
fnc_foundry_endpoint <- function() {
  tmp_status   <- system2("foundry", c("service", "status"), stdout = TRUE)
  tmp_status   <- iconv(paste(tmp_status, collapse = " "),
                        to = "ASCII", sub = " ")
  tmp_hostport <- regmatches(
    tmp_status,
    regexpr("[0-9]{1,3}(\\.[0-9]{1,3}){3}:[0-9]+", tmp_status)
  )
  if (length(tmp_hostport) == 0) {
    stop("Could not parse endpoint from status: ", tmp_status)
  }
  paste0("http://", tmp_hostport[1])
}

# fnc_model_id: resolve the concrete model id required by the REST API.
fnc_model_id <- function(base, alias) {
  tmp_models <- request(paste0(base, "/v1/models")) |>
    req_perform() |>
    resp_body_json(simplifyVector = FALSE)
  tmp_ids <- vapply(tmp_models$data, \(m) m$id, character(1))
  tmp_hit <- tmp_ids[grepl(alias, tmp_ids, fixed = TRUE)]
  if (length(tmp_hit)) tmp_hit[1] else tmp_ids[1]
}

# fnc_foundry_unload: release the model from memory.
fnc_foundry_unload <- function(alias) {
  system2("foundry", c("model", "unload", alias))
  invisible(alias)
}

fnc_foundry_load(ref_model_alias)
ref_endpoint <- fnc_foundry_endpoint()
ref_model_id <- fnc_model_id(ref_endpoint, ref_model_alias)
cat("Model loaded and ready.\n")

# Point ellmer at the local OpenAI-compatible endpoint. Foundry Local needs
mod_chat <- chat_openai_compatible(
  base_url    = paste0(ref_endpoint, "/v1"),
  name        = "foundry-local",
  credentials = \() "not-needed",
  model       = ref_model_id,
  echo        = "output"
)

#send prompt to local model 
rlt_reply <- mod_chat$chat(ref_prompt)

Although there were rumors that LM Studio was working on this feature based on this video ︎

The post Running local LLMs on your NPU from R with Foundry Local and ellmer appeared first on Giles.

To leave a comment for the author, please follow the link and comment on their blog: Data Analytics and AI Archives - Giles.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Running local LLMs on your NPU from R with Foundry Local and ellmer

Foundry Local

The R Code

Related

Foundry Local

The R Code

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)