“Hatter?” asked Alice, “Why are support vector machines so hard to understand?” Suddenly, before you can ask yourself why Alice is studying machine learning in the middle of the 19th century, the Hatter disappeared. “Where did he go?” thought Alice as she looked down to see a compass painted on the floor below her. Arrows pointed in every direction with each one associated with a word or phrase. One arrow pointed toward the label “Tea Party.” Naturally, Alice associated Tea Party with the Hatter, so she walked in that direction and ultimately found him.
“And now,” the Hatter said while taking Alice’s hand and walking through the looking glass. Once again, the Hatter was gone. This time there was no compass on the floor. However, the room was filled with characters, some that looked more like Alice and some that seemed a little closer in appearance to the Hatter. With so many blocking her view, Alice could see clearly only those nearest to her. She identified the closest resemblance to the Hatter and moved in that direction. Soon she saw another that might have been his relative. Repeating this process over and over again, she finally found the Mad Hatter.
Alice did not fully comprehend what the Hatter told her next. “The compass works only when the input data separates Hatters from everyone else. When it fails, you go through the looking glass into the observation space where all we have is resemblance or similarity. Those who know me will recognize me and all that resemble me. Try relying on a feature like red hair and you might lose your head to the Red Queen. We should have some tea with Wittgenstein and discuss family resemblance. It’s derived from features constructed out of input that gets stretched, accentuated, masked and recombined in the most unusual ways.”
The Hatter could tell that Alice was confused. Reassuringly, he added, “It’s a acquired taste that takes some time. We know we have two classes that are not the same. We just can’t separate them from the data as given. You have to look it in just the right way. I’ll teach you the Kernel Trick.” The Mad Hatter could not help but laugh at his last remark – looking at in in just the right way could be the best definition of support vector machines.
Note: Joseph Rickert’s post in R Bloggers shows you the R code to run support vector machines (SVMs) along with a number of good references for learning more. My little fantasy was meant to draw some parallels between the linear algebra and human thinking (see Similarity, Kernels, and the Fundamental Constraints on Cognition for more). Besides, Tim Burton will be releasing soon his movie Alice Through the Looking Glass, and the British Library is celebrating 150 years since the publication of Alice in Wonderland. Both Alice and SVMs invite you to go beyond the data as inputted and derive “impossible” features that enable differentiation and action in a world at first unseen.