Visualizing Gestures as Paths

[This article was first published on David Chudzicki's Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Kaggle is hosting an exciting new competition in which the object is to learn to identify sequences of gestures from just one example of each gesture. I would bet this competition has a lot of potential to attract academics interested in machine learning.

The competition comes with sample code for importing the datasets (AVI videos) to MATLAB, but right now I don’t have MATLAB (although a recent post from one of my favorite bloggers reminded me to obtain it even before this annoyance).

The other tools I’ve used for data analysis are R and Octave (a free program similar to MATLAB). The best option I found for importing the data was Octave’s ‘video’ package (see below the fold for installation tips). Please let me know if you find other possibilities!

The data come in batches, so I imported the first batch of data, saved it, and loaded it in R. When imported the data, I also shrunk each image, for two reasons:
  1. Smaller dataset is easier to deal with (shrinking each dimension by a factor of 3 shrinks the final dataset by a factor of 9).
  2. I also hoped that blurring the fine distinctions of a larger image might cause each video to trace out more of a continuous path.

Then treating each frame (image) as one “row” of data, plotted the first two principal components (of the training data only) as a ‘quick and dirty’ way to visualize the data. In this plot, the points represent frames, and the colors encode which video each frame came from:

  • I was surprised the paths don’t trace out paths that are more continuous.
  • Although they do trace out somewhat continuous paths
  • Each gesture/video traces out a somewhat distinctive path
  • Most gestures/videos begin and in roughly the same region (this makes sense — each video seems to begin and end with the person in roughly the same position).
Future possibilities:
  • Probably it would make more sense to include the “test” data in my PCA.
  • The real question is how much the appropriate segment of each path in the “test” data resemble the corresponding path in the training data.
  • I want to visualize the data (and do my learning) with an embedding/transformation that makes more sense than PCA. Presumably there is some structure in the set of all images, and a method like Laplacian Eigenmaps or ISOMAP will presumably do a better job taking advantage of that.

All of my code for this is available on Github.

Installing Octave’s video package:

I’m sort of roughly following some advice found here:

Get the package:

tar xf video-1.0.2.tar.gz video-1.0.2/
cd video-1.0.2/src

Try to configure:


Apparently I need something called “ffmpeg”. I’ll get to that in a minute. The site above also mentioned something about these. I don’t know if I need them or not:

sudo aptitude install octave3.0-headers
sudo aptitude install build-essential

Now get ffmpeg:

tar xf ffmpeg-0.5.tar.bz2 
cd ffmpeg-0.5/
./configure --enable-gpl --enable-shared --enable-swscale --prefix=/usr && make
sudo make install

Check installation:

ldd ./ffmpeg

Now try to install the video package again:

cd ../video-1.0.2/src/

It worked!

To leave a comment for the author, please follow the link and comment on their blog: David Chudzicki's Blog. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)