What are the most overrated films?

May 5, 2014
By

(This article was first published on Benomics » R, and kindly contributed to R-bloggers)

“Overrated” and “underrated” are slippery terms to try to quantify. An interesting way of looking at this, I thought, would be to compare the reviews of film critics with those of Joe Public, reasoning that a film which is roundly-lauded by the Hollywood press but proved disappointing for the real audience would be “overrated” and vice versa.

To get some data for this I turned to the most prominent review aggregator: Rotten Tomatoes. All this analysis was done in the R programming language, and full code to reproduce it will be attached at the end.

Rotten Tomatoes API

This API is nicely documented, easy to access and permissive with rate limits, as well as being cripplingly restrictive in what data is presents. Want a list of all films in the database? Nope. Most reviewed? Top rated? Highest box-office takings? Nope.

The related forum is full of what seem like simple requests that should be available through the API but aren’t: top 100 lists? Search using mulitple IDs at once? Get audience reviews? All are unanswered or not currently implemented.

So the starting point (a big list of films) is actually kinda hard to get at. The Rube Golbergian method I eventually used was this:

  1. Get the “Top Rentals” list of movie details (max: 50)
  2. Search each one for “Similar films” (max: 5)
  3. Get the unique film IDs from step 2 and iterate

(N.B. This wasn’t my idea but one from a post in the API forums, unfortunately didn’t save the link.)

In theory this grows your set of films at a reasonable pace, but in reality the number of unique films being returned was significantly lower (shown below). I guess this was due to pulling in “walled gardens” to my dataset, e.g. if a Harry Potter film was hit, each further round would pull in the 5 other films as most similar.

Films returned

Results

Here’s an overview of the critic and audience scores I collected through the Rotten Tomatoes API, with some outliers labelled.

Most over- and underrated films

On the whole it should be noted that critics and audience agree most of the time, as shown by the Pearson correlation coefficient between the two scores (0.71 across >1200 films).

Most underrated films

Using our earlier definition it’s easy to build a table of those films where the audience ending up really liking a film that was panned by critics.

Scores are shown out of 100 for both aggregated critics and members of Rotten Tomatoes.

Scores are shown out of 100 for both aggregated critics and members of Rotten Tomatoes.

Somewhat surprisingly, the top of the table is Facing the Giants (2006), an evangelical Christian film. I guess non-Christians might have stayed away, and presumably it struck a chord within its target demographic — but after watching the trailer, I’d probably agree with the critics on this one.

This showed that some weighting of the difference might be needed, at the very least weighting by number of reviews, but the Rotten Tomatoes API doesn’t provide that data.

In addition the Rotten Tomatoes page for the film, shows a “want to see” percentage, rather than an audience score. This came up a few times and I’ve seen no explanation for it, presumably “want to see” rating is for unreleased films, but the API returns a separate (and undisclosed?) audience score for these films also.

Above shows a "want to see" rating, different to the "liked it" rating returned by the API and shown below

Above shows a “want to see” rating, different to the “liked it” rating returned by the API and shown below. Note: these screenshots from RottenTomatoes.com are not CC licensed and is shown here under a claim of Fair Use, reproduced for comment/criticism.

Looking over the rest of the table, it seems the public is more fond of gross-out or slapstick comedies (such as Diary of a Mad Black Woman (2005), Grandma’s boy (2006)) than the critics. Again, not films I’d jump to defend as underrated. Bad Boys II however…

Most overrated films

Here we’re looking at those films which the critics loved, but paying audiences were then less enthused.

As before, scores are out of 100 and they're ranked by difference between audience and critic scores.

As before, scores are out of 100 and they’re ranked by difference between audience and critic scores.

Strangely the top 15 (by difference) contains both the original 2001 Spy Kids and the sequel Spy Kids 2: The Island of Lost Dreams (2002). What did critics see in these films that the public didn’t? A possibility is bias in the audience reviews collected, the target audience is young children for these films and they probably are underrepresented amongst Rotten Tomatoes reviewers. Maybe there’s even an enrichment for disgruntled parent chaperones.

Thankfully, though, in this table there’s the type of film we might more associate with being “overrated” by critics. Momma’s Man (2008) is an indie drama debuted at the 26th Torino Film Festival. Essential Killing is a 2010 drama and political thriller from Polish director and screenwriter Jerzy Skolimowski. 

There’s also a smattering of Rom-Coms (Friends with Money (2006), Splash (1984)) — if the API returned genre information it would be interesting to look for overall trends but, alas. Additional interesting variables to consider might be budget, the lead, reviews of producer’s previous films… There’s a lot of scope for interesting analysis here but it’s currently just not possible with the Rotten Tomatoes API.

 Caveats / Extensions

The full code will be posted below, so if you want to do a better job with this analysis, please do so and send me a link! :)

  • Difference is too simple a metric. A better measure might be weighted by (e.g.) critic ranking. A film critics give 95% but audiences 75% might be more interesting than the same points difference between a 60/40 rated film.
  • There’s something akin to a “founder effect” of my initial chosen films that makes it had to diversify the dataset, especially to films from previous decades and classics.
  • The Rotten Tomatoes API provides an IMDB id for cross-referencing, maybe that’s a path to getting more data and building a better film list.
Full code to reproduce analysis

Note: If you’re viewing this on r-bloggers, you may need to visit the Benomics version to see the attached gist.

 


To leave a comment for the author, please follow the link and comment on his blog: Benomics » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.