Clickable list of the best animations since 1900, gathered the geek way.

October 22, 2015
By

(This article was first published on R – AmitKohli.com, and kindly contributed to R-bloggers)

images234

In the midst of our random data exploration, Laure and I started playing around with Hadley’s movies dataset and noticed that there were a lot of old cartoon animations… I mean REALLY old. So we got excited and wondered if we could find Youtube links for all these old animations. Indeed we could! Here’s how we did it. As always, the full analysis is on github.

  1. In R, Load in ggplot2::hadley, which has something like 60 000 movies
  2. Reduce this dataset to animations of only 10 minutes or less and then arrange by year and descending rating, then select only 1 best animation per year. Hrm… actually, select each year’s 3 best-rated cartoons. You’ll see why later. This reduced dataset has <300 rows… much more manageable. Of this, only keep the title and year, you don’t need the rest.
  3. Now let’s create a column that will give us good search results… in order to do that, prepend with cartoon and put the year between parentheses so that each row looks like this: “Cartoon ”It’s the Cat” (2004)” (the backslashes escape the quotes inside the string… dontworriboutem).
  4. Now feed each one of these lines into bing.com (google/yahoo don’t let us!) and capture the results using httr::GET()
  5. Now we have the whole webpage with the results inside. We use XPath to try our best to grab only the parts of the webpage that we want, namely the search results. Find the first Youtube hit. It takes a LOT of cleaning to figure out what you need and what needs to be thrown out. From this, grab the title of each search result page and the link (by the way, at this point, we have a shorter list because not every movie has a Youtube link).
  6. Now we have to face the reality that the search result may have given us a Youtube link that wasn’t the film we wanted. In order to understand whether we did or not, we use a package called stringdistance. This measures the difference between the title we were looking for and the title of the Youtube hit we got. Sometimes you look for “Unsteady Chough, The” and get back “The unsteady chorough”. So for this analysis, I found the qgram method to perform best. Unfortunately, sometimes even the string distance alone isn’t a good predictor. One or two misspellings in a film with 3 letters is a bigger deal than 5 misspellings in a film with 30 letters. Therefore:
  7. Come up with a Percent parameter where you divide the stringdistance() by the number of characters in the title you were looking for. This will give you a good indication of how much error there is per string.
  8. Based on an evaluation of the string distances and percents,  we realized that very few cartoons have good correspondence between the title looked for and the title found.This is why one movie per year doesn’t work and we had to go back and select the 3 best cartoons.
  9. After analyzing the results carefully, we decided to cut the dataset to only stringdistance() <8 and Perc <100 (although results vary a lot here). This left about a hundred cartoons.
  10. So now that you have good cartoons and their Youtube links, put some html code in front and in back, and use cat() to push it all out into an html file!

Final thoughts

  • We tried to find the images for each video entice viewers to watch the cartoon, but this was reeeeeally hard to do! All the search engines we found push the images as data or in iframes so we couldn’t capture the image with GET. We finally gave up :(
  • We could have probably found  more hits if we had looked at more movie sites than youtube, but we didn’t like the idea of going to whatever site… who knows what ads they might have popup or whatever.
  • It’s lots of fun to see how cartoons have changed through the ages! It’s amazing to see what they could already do in 1922 and how quickly cartoons improved over time! Click around and see for yourself!
  • Some of the early attempts were full of errors, but in a way it feels a bit psychedelic to click on these links… it’s Youtube roulette! I dare you! Click around >>here<<.

5

RESULTS HERE BELOW! Click to go to the youtube link!

 

1906 – Humorous Phases of Funny Faces

1911 – Little Nemo

1916 – R.F.D. 10,000 B.C.

1916 – Krazy Kat, Bugologist

1919 – Feline Follies

1922 – Puss in Boots

1925 – Alice’s Egg Plant

1928 – Vormittagsspuk

1929 – Springtime

1931 – Bimbo’s Initiation

1932 – Flowers and Trees

1933 – Une nuit sur le mont chauve

1933 – Three Little Pigs

1934 – Tale of the Vienna Woods

1934 – China Shop, The

1935 – Three Orphan Kittens

1937 – Lonesome Ghosts

1938 – Porky in Wackyland

1939 – Peace on Earth

1939 – Blue Danube, The

1941 – Dance of the Weed

1941 – Fox and the Grapes, The

1942 – Horton Hatches the Egg

1943 – Fighting Tools

1943 – Red Hot Riding Hood

1947 – Tubby the Tuba

1948 – Cat That Hated People, The

1948 – Mouse Wreckers

1950 – Morris the Midget Moose

1950 – Ventriloquist Cat

1951 – Dude Duck

1951 – Symphony in Slang

1952 – Rock-a-Bye Bear

1955 – You and Your Senses of Smell and Taste

1959 – Short and Suite

1960 – High Note

1962 – Self Defense… for Cowards

1962 – Now Hear This

1965 – Go Go Amigo

1967 – Bear That Wasn’t, The

1968 – Windy Day

1969 – Corbeau et le renard, Le

1969 – Bambi Meets Godzilla

1970 – Is It Always Right to Be Right?

1971 – Thank You Mask Man

1972 – Balablok

1978 – Afterlife

1978 – Special Delivery

1979 – Harpya

1979 – Canada Vignettes: Log Driver’s Waltz

1979 – Every Child

1985 – Broken Down Film

1985 – Concerto Grosso Modo

1986 – Luxo Jr.

1988 – Cat Came Back, The

1995 – Life of Larry, The

1995 – Chicken from Outer Space, The

1999 – Pinocchio

To leave a comment for the author, please follow the link and comment on their blog: R – AmitKohli.com.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)