Fitting a rational function in R using ordinary least-squares regression

[This article was first published on Revolutions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

by Srini Kumar, VP of Product Management and Data Science, LevaData; and Bob Horton, Senior Data Scientist, Microsoft

A rational function is defined as the ratio of two functions. The Padé Approximant uses a ratio of polynomials to approximate functions:

$$ R(x)= \frac{\sum_{j=0}^m a_j x^j}{1+\sum_{k=1}^n b_k x^k}=\frac{a_0+a_1x+a_2x^2+\cdots+a_mx^m}{1+b_1 x+b_2x^2+\cdots+b_nx^n} $$ Here we show a way to fit this type of function to a set of data points using the lm function in R.

We’ll use a data-simulating function based on a ratio of polynomials to generate a set of y values for some random x values, add some noise, then see how well we can fit a curve to these noisy points.

N <- 1000
x <- rnorm(N)
f <- function(x) 50*x^2/(1 + 4*x) # data-simulating function

y <- f(x) + rnorm(N, sd=3)

point_data <- data.frame(x, y)


ggplot(point_data, aes(x=x, y=y)) + 
  geom_point() + 
  ylim(-100, 100) + 
  ggtitle("simulated data points")


Note that in the formula for the Padé approximant the numerator polynomial has an zeroth-order coefficient (\(a_0\)), where the denominator just has a constant \( 1\). This makes it easy to rearrange the equation to express \(y\) as a linear combination of terms.

$$ y = \frac{a_0 + a_1 x + a_2 x^2}{1 + b_1 x + b_2 x^2} $$ $$ y (1 + b_1 x + b_2 x^2) = a_0 + a_1 x + a_2 x^2 $$ $$ y = a_0 + a_1 x + a_2 x^2 - b_1 x y - b_2 x^2 y $$

Now we have a form that lm can work with. We just need to specify a set of inputs that are powers of x (as in a traditional polynomial fit), and a set of inputs that are y times powers of x. This may seem like a strange thing to do, because we are making a model where we would need to know the value of y in order to predict y. But the trick here is that we will not try to use the fitted model to predict anything; we will just take the coefficients out and rearrange them in a function. The fit_pade function below takes a dataframe with x and y values, fits an lm model, and returns a function of x that uses the coefficents from the model to predict y:

fit_pade <- function(point_data){
  fit <- lm(y ~ x + I(x^2) + I(y*x) + I(y*x^2), point_data)
  lm_coef <- as.list(coef(fit))
  names(lm_coef) <- c("a0", paste0(rep(c('a','b'), each=2), 1:2))

  with(lm_coef, function(x)(a0 + a1*x + a2*x^2)/(1 - b1*x - b2*x^2))

We'll use ggplot2 to visualize the results; since it drops points outside the plotted range, it does not try to connect the points on either side of the discontinuity, as long as the data gives expected y values past those limits.

plot_fitted_function <- function(x_data, fitted_fun, title){
  x_data$y_hat <- fitted_fun(x_data$x)
  g <- ggplot(x_data, aes(x=x, y=y)) + 
    geom_point() + ylim(-100, 100) +
    geom_line(aes(y=y_hat), col="red", size=1) + 


When we draw the fitted values over the input data points, we see that in this case the function describes the points quite well:

pade_approx <- fit_pade(point_data)

plot_fitted_function(point_data, pade_approx, title="fitted function")


Here are some more examples, showing that this approach can work with some interesting patterns of points:

function_list <- list(
  function(x) (100 - 50*x - 100*x^2)/(1 - 10*x - 10*x^2),
  function(x) (100 - 50*x - 100*x^2)/(1 - 10*x - 5*x^2),
  function(x) (100 - 50*x - 100*x^2)/(1 - 50*x - 5*x^2)

for (f in function_list){
  sim_data <- point_data %>% mutate(y=f(x) + 
      rnorm(nrow(point_data), sd=3))
  plot_fitted_function(sim_data, fit_pade(sim_data), 




Here we have used linear regression by ordinary least squares (with lm) to fit distinctly nonlinear rational functions. For simplicity, these examples focus on equations of second order (or less) in both numerator and denominator, but the idea extends to higher orders. On a cautionary note, this approach seems to have numerical stability issues with some inputs. For example, if you take one of the data-simulating equations above and make selected coefficients much larger, you can create datasets that are fitted poorly by this method. And if the data-simulating function does not have the correct form (for example, if the zeroth order term in the denominator is not 1), the fitted curves can be completely wrong. For practical purposes it might be preferable to use a nonlinear least squares approach (e.g., the nls function). Still, this approach works well in many examples, and it lets you fit some curves that cannot be represented by ordinary polynomials.

To leave a comment for the author, please follow the link and comment on their blog: Revolutions. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)