Beautiful plots while simulating loss in two-part procrustes problem

[This article was first published on "R" you ready?, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

loss2Today I was working on a two-part procrustes problem and wanted to find out why my minimization algorithm sometimes does not converge properly or renders unexpected results. The loss function to be minimized is

\displaystyle  L(\mathbf{Q},c) = \| c \mathbf{A_1Q} - \mathbf{B_1} \|^2 + \| \mathbf{A_2Q} - \mathbf{B_2} \|^2 \rightarrow min

with \| \cdot \| denoting the Frobenius norm, c is an unknown scalar and \mathbf{Q} an unknown rotation matrix, i.e. \mathbf{Q}^T\mathbf{Q}=\mathbf{I}. \;\mathbf{A_1}, \mathbf{A_2}, \mathbf{B_1}, and \mathbf{B_1} are four real valued matrices. The minimum for c is easily found by setting the partial derivation of L(\mathbf{Q},c) w.r.t c equal to zero.

\displaystyle  c = \frac {tr \; \mathbf{Q}^T \mathbf{A_1}^T \mathbf{B_1}}  { \| \mathbf{A_1} \|^2 }

By plugging c into the loss function L(\mathbf{Q},c) we get a new loss function L(\mathbf{Q}) that only depends on \mathbf{Q}. This is the starting situation.

When trying to find out why the algorithm to minimize L(\mathbf{Q}) did not work as expected, I got stuck. So I decided to conduct a small simulation and generate random rotation matrices to study the relation between the parameter c and the value of the loss function L(\mathbf{Q}). Before looking at the results for the entire two-part procrustes problem from above, let’s visualize the results for the first part of the loss function only, i.e.

\displaystyle  L(\mathbf{Q},c) = \| c \mathbf{A_1Q} - \mathbf{B_1} \|^2 \rightarrow min

Here, c has the same minimum as for the whole formula above. For the simulation I used

\mathbf{A_1}= \begin{pmatrix}  0.0 & 0.4 & -0.5 \\  -0.4 & -0.8 & -0.5 \\  -0.1 & -0.5 & 0.2 \\  \end{pmatrix} \mkern18mu \qquad \text{and} \qquad \mkern36mu \mathbf{B_1}= \begin{pmatrix}  -0.1 & -0.8 & -0.1 \\  0.3 & 0.2 & -0.9 \\  0.1 & -0.3 & -0.5 \\  \end{pmatrix}

as input matrices. Generating many random rotation matrices \mathbf{Q} and plotting c against the value of the loss function yields the following plot.

This is a well behaved relation, for each scaling parameter c the loss is identical. Now let’s look at the full two-part loss function. As input matrices I used

\displaystyle  A1= \begin{pmatrix}  0.0 & 0.4 & -0.5 \\  -0.4 & -0.8 & -0.5 \\  -0.1 & -0.5 & 0.2 \\  \end{pmatrix} \mkern18mu , \mkern36mu B1= \begin{pmatrix}  -0.1 & -0.8 & -0.1 \\  0.3 & 0.2 & -0.9 \\  0.1 & -0.3 & -0.5 \\  \end{pmatrix}
A2= \begin{pmatrix}  0 & 0 & 1 \\  1 & 0 & 0 \\  0 & 1 & 0 \\  \end{pmatrix} \mkern18mu , \mkern36mu B2= \begin{pmatrix}  0 & 0 & 1 \\  1 & 0 & 0 \\  0 & 1 & 0 \\  \end{pmatrix}

and the following R-code.

# trace function
tr <- function(X) sum(diag(X))

# random matrix type 1
rmat_1 <- function(n=3, p=3, min=-1, max=1){
  matrix(runif(n*p, min, max), ncol=p)
}

# random matrix type 2, sparse
rmat_2 <- function(p=3) {
  diag(p)[, sample(1:p, p)]
}

# generate random rotation matrix Q. Based on Q find 
# optimal scaling factor c and calculate loss function value
#
one_sample <- function(n=2, p=2)
{
  Q <- mixAK::rRotationMatrix(n=1, dim=p) %*%         # random rotation matrix det(Q) = 1
    diag(sample(c(-1,1), p, rep=T))                   # additional reflections, so det(Q) in {-1,1}
  s <- tr( t(Q) %*% t(A1) %*% B1 ) / norm(A1, "F")^2  # scaling factor c
  rss <- norm(s*A1 %*% Q - B1, "F")^2 +               # get residual sum of squares
         norm(A2 %*% Q - B2, "F")^2 
  c(s=s, rss=rss)
}

# find c and rss or many random rotation matrices
#
set.seed(10)  # nice case for 3 x 3
n <- 3
p <- 3
A1 <- round(rmat_1(n, p), 1)
B1 <- round(rmat_1(n, p), 1)
A2 <- rmat_2(p)
B2 <- rmat_2(p)

x <- rdply(40000, one_sample(3,3)) 
plot(x$s, x$rss, pch=16, cex=.4, xlab="c", ylab="L(Q)", col="#00000010")

This time the result turns out to be very different and … beautiful 🙂

Here, we do not have a one to one relation between the scaling parameter and the loss function any more. I do not quite know what to make of this yet. But for now I am happy that it has aestethic value. Below you find some more beautiful graphics with different matrices as inputs.

Cheers!

To leave a comment for the author, please follow the link and comment on their blog: "R" you ready?.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)