OkCupid: Finding your Valentine with R

February 14, 2011

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

Free dating site OkCupid (which was recently acquired by match.com) collects a lot of data. With over 3 million members, many of whom have provided extensive information about their personal details including preferences, lifestyle, sexuality and hobbies via their dating profiles, they have a wealth of information upon which to identify trends about the love lives of a typical OkCupid member.

On their informative, entertaining and sometimes controversial blog OkTrends, co-founder Christian Rudder (with the assistance of data scientist Max Shron) analyzes the data to report aggregate trends and insights, such as the differences in preferences between white and black people, how the behaviors of gay members are at odds with some pernicious gay stereotypes, and how religion relates to reading and writing levels. With its blend of data analysis and humor, the OkTrends blog addresses interesting facts and topics that not many others have the will — or the data — to write openly about.

Rudder tells me that when the blog first launched, the data analyses were run manually in Microsoft Excel. Six month later, Max Shron introduced the OkCupid team to R, and enabled more interesting analyses of the data, and to use more of it. According to co-founder Sam Yagan, once they ran data on R, everything got a lot "better and faster", and they were able to produce posts faster and write about more intricate data with better visualizations.

Today (appropriately, on St Valentine's day), GigaOM has published an in-depth article about OkCupid's use of R, by Revolution Analytics' Mike Minelli. Read the article for more details about the data, analyses and reporting OkCupid does with R to reveal hidden facts about our love lives and, ultimately, to find our Valentine.

GigaOM: OkCupid Demystifies Dating with Big Data


To leave a comment for the author, please follow the link and comment on their blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , ,

Comments are closed.


Mango solutions

RStudio homepage

Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training



CRC R books series

Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)