Principal curves example (Elements of Statistical Learning)

April 21, 2016
By

(This article was first published on R – BioStatMatt, and kindly contributed to R-bloggers)

The bit of R code below illustrates the principal curves methods as described in The Elements of Statistical Learning, by Hastie, Tibshirani, and Friedman (Ch. 14; the book is freely available from the authors’ website). Specifically, the code generates some bivariate data that have a nonlinear association, initializes the principal curve using the first (linear) principal component, and then computes three iterations of the algorithm described in section 14.5.2. I used the ‘animation’ package to generate the following animated GIF, which illustrates these steps.

principal curves animation


## generate some bivariate data
set.seed(42)
x1 <- seq(1,10,0.3)
w = .6067;
a0 = 1.6345;
a1 = -.6235;
b1 = -1.3501;
a2 = -1.1622;
b2 = -.9443;
x2 = a0 + a1*cos(x1*w) + b1*sin(x1*w) + a2*cos(2*x1*w) +
          b2*sin(2*x1*w) + rnorm(length(x1),0,3/4)
x <- scale(cbind(x1,x2))
alim <- extendrange(x, f=0.1)
alim_ <- range(x)

## plot centered data
plot(x[,1], x[,2], bty='n',
     xlab=expression(x[1]),
     ylab=expression(x[2]),
     xlim=alim, ylim=alim)
legend("topleft", legend=c("Initialize"), bty="n")

## plot first principal component line
svdx <- svd(x)
clip(alim_[1],alim_[2],alim_[1],alim_[2])
with(svdx, abline(a=0, b=v[2,1]/v[1,1]))

## plot projections of each point onto line
z1 <- with(svdx, x%*%v[,1]%*%t(v[,1]))
segments(x0=x[,1],y0=x[,2],
         x1=z1[,1],y1=z1[,2])

## compute initial lambda (arc-lengths associated with
## orthogonal projections of data onto curve)
lam <- with(svdx, as.numeric(u[,1]*d[1]))

for(itr in 1:3) {
  
  #### step (a) of iterative algorithm ####
  
  ## compute scatterplot smoother in either dimension
  ## increase 'df' to make the curve more flexible
  fit1 <- smooth.spline(x=lam, y=x[,1], df=4)
  fit2 <- smooth.spline(x=lam, y=x[,2], df=4)

  ## plot data and the principal curve for a sequence of lambdas
  plot(x[,1], x[,2], bty='n',
       xlab=expression(x[1]),
       ylab=expression(x[2]),
       xlim=alim, ylim=alim)
  legend("topleft", legend=c("Step (a)"), bty="n")
  seq_lam <- seq(min(lam),max(lam),length.out=100)
  lines(predict(fit1, seq_lam)$y, predict(fit2, seq_lam)$y)

  ## show points along curve corresponding
  ## to original lambdas
  z1 <- cbind(predict(fit1, lam)$y, predict(fit2, lam)$y)
  segments(x0=x[,1],y0=x[,2],
           x1=z1[,1],y1=z1[,2])


  #### step (b) of iterative algorithm ####

  ## recompute lambdas 
  euc_dist <- function(l, x, f1, f2)
    sum((c(predict(f1, l)$y, predict(f2, l)$y) - x)^2)
  lam <- apply(x,1,function(x0) optimize(euc_dist,
    interval=extendrange(lam, f=0.50),
    x=x0, f1=fit1, f2=fit2)$minimum)


  ## show projections associated with recomputed lambdas
  plot(x[,1], x[,2], bty='n',
       xlab=expression(x[1]),
       ylab=expression(x[2]),
       xlim=alim, ylim=alim)
  legend("topleft", legend=c("Step (b)"), bty="n")
  seq_lam <- seq(min(lam),max(lam),length.out=100)
  lines(predict(fit1, seq_lam)$y, predict(fit2, seq_lam)$y)
  z1 <- cbind(predict(fit1, lam)$y, predict(fit2, lam)$y)
  segments(x0=x[,1],y0=x[,2],
           x1=z1[,1],y1=z1[,2])
}

To leave a comment for the author, please follow the link and comment on their blog: R – BioStatMatt.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)