**R – Brokering Closure**, and kindly contributed to R-bloggers)

I have just published a small package `lspline`

on CRAN that implements linear splines using convenient parametrisations such that

- coefficients are slopes of consecutive segments
- coefficients capture slope change at consecutive knots

Knot locations can be specified

- manually (with
`lspline()`

) - at breaks dividing the range of
`x`

into`q`

equal-frequency intervals (with`qlspline()`

) - at breaks dividing the range of
`x`

into`n`

equal-width intervals (with`elspline()`

)

The implementation follows Greene (2003, chapter 7.5.2).

The package sources are on GitHub here.

# Examples

We will use the following artificial data with knots at `x=5`

and `x=10`

:

```
set.seed(666)
n <- 200
d <- data.frame(
x = scales::rescale(rchisq(n, 6), c(0, 20))
)
d$interval <- findInterval(d$x, c(5, 10), rightmost.closed = TRUE) + 1
d$slope <- c(2, -3, 0)[d$interval]
d$intercept <- c(0, 25, -5)[d$interval]
d$y <- with(d, intercept + slope * x + rnorm(n, 0, 1))
```

Plotting `y`

against `x`

:

```
library(ggplot2)
fig <- ggplot(d, aes(x=x, y=y)) +
geom_point(aes(shape=as.character(slope))) +
scale_shape_discrete(name="Slope") +
theme_bw()
fig
```

The slopes of the consecutive segments are 2, -3, and 0.

## Setting knot locations manually

We can parametrize the spline with slopes of individual segments (default `marginal=FALSE`

):

```
library(lspline)
m1 <- lm(y ~ lspline(x, c(5, 10)), data=d)
knitr::kable(broom::tidy(m1))
```

term | estimate | std.error | statistic | p.value |
---|---|---|---|---|

(Intercept) | 0.1343204 | 0.2148116 | 0.6252941 | 0.5325054 |

lspline(x, c(5, 10))1 | 1.9435458 | 0.0597698 | 32.5171747 | 0.0000000 |

lspline(x, c(5, 10))2 | -2.9666750 | 0.0503967 | -58.8664832 | 0.0000000 |

lspline(x, c(5, 10))3 | -0.0335289 | 0.0518601 | -0.6465255 | 0.5186955 |

Or parametrize with coeficients measuring change in slope (with `marginal=TRUE`

):

```
m2 <- lm(y ~ lspline(x, c(5,10), marginal=TRUE), data=d)
knitr::kable(broom::tidy(m2))
```

term | estimate | std.error | statistic | p.value |
---|---|---|---|---|

(Intercept) | 0.1343204 | 0.2148116 | 0.6252941 | 0.5325054 |

lspline(x, c(5, 10), marginal = TRUE)1 | 1.9435458 | 0.0597698 | 32.5171747 | 0.0000000 |

lspline(x, c(5, 10), marginal = TRUE)2 | -4.9102208 | 0.0975908 | -50.3143597 | 0.0000000 |

lspline(x, c(5, 10), marginal = TRUE)3 | 2.9331462 | 0.0885445 | 33.1262479 | 0.0000000 |

The coefficients are

`lspline(x, c(5, 10), marginal = TRUE)1`

– the slope of the first segment`lspline(x, c(5, 10), marginal = TRUE)2`

– the change in slope at knot*x*= 5; it is changing from 2 to -3, so by -5`lspline(x, c(5, 10), marginal = TRUE)3`

– tha change in slope at knot*x*= 10; it is changing from -3 to 0, so by 3

The two parametrisations (obviously) give identical predicted values:

```
all.equal( fitted(m1), fitted(m2) )
## [1] TRUE
```

graphically

```
fig +
geom_smooth(method="lm", formula=formula(m1), se=FALSE) +
geom_vline(xintercept = c(5, 10), linetype=2)
```

## Knots at `n`

equal-length intervals

Function `elspline()`

sets the knots at points dividing the range of `x`

into `n`

equal length intervals.

```
m3 <- lm(y ~ elspline(x, 3), data=d)
knitr::kable(broom::tidy(m3))
```

term | estimate | std.error | statistic | p.value |
---|---|---|---|---|

(Intercept) | 3.5484817 | 0.4603827 | 7.707678 | 0.00e+00 |

elspline(x, 3)1 | 0.4652507 | 0.1010200 | 4.605529 | 7.40e-06 |

elspline(x, 3)2 | -2.4908385 | 0.1167867 | -21.328105 | 0.00e+00 |

elspline(x, 3)3 | 0.9475630 | 0.2328691 | 4.069080 | 6.84e-05 |

Graphically

```
fig +
geom_smooth(aes(group=1), method="lm", formula=formula(m3), se=FALSE, n=200)
```

## Knots at `q`

uantiles of `x`

Function `qlspline()`

sets the knots at points dividing the range of `x`

into `q`

equal-frequency intervals.

```
m4 <- lm(y ~ qlspline(x, 4), data=d)
knitr::kable(broom::tidy(m4))
```

term | estimate | std.error | statistic | p.value |
---|---|---|---|---|

(Intercept) | 0.0782285 | 0.3948061 | 0.198144 | 0.8431388 |

qlspline(x, 4)1 | 2.0398804 | 0.1802724 | 11.315548 | 0.0000000 |

qlspline(x, 4)2 | 1.2675186 | 0.1471270 | 8.615132 | 0.0000000 |

qlspline(x, 4)3 | -4.5846478 | 0.1476810 | -31.044273 | 0.0000000 |

qlspline(x, 4)4 | -0.4965858 | 0.0572115 | -8.679818 | 0.0000000 |

Graphically

```
fig +
geom_smooth(method="lm", formula=formula(m4), se=FALSE, n=200)
```

# Installation

Stable version from CRAN or development version from GitHub with

```
devtools::install_github("mbojan/lspline", build_vignettes=TRUE)
```

# Acknowledgements

Inspired by Stata command `mkspline`

and function `ares::lspline`

from Junger & Ponce de Leon (2011). As such, the implementation follows Greene (2003), chapter 7.5.2.

- Greene, William H. (2003)
*Econometric analysis*. Pearson Education - Junger & Ponce de Leon (2011)
. R package version 0.7.2 retrieved from CRAN archives.`ares`

: Environment air pollution epidemiology: a library for timeseries analysis

**leave a comment**for the author, please follow the link and comment on their blog:

**R – Brokering Closure**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...