OkCupid: Finding your Valentine with R

February 14, 2011

Free dating site OkCupid (which was recently acquired by match.com) collects a lot of data. With over 3 million members, many of whom have provided extensive information about their personal details including preferences, lifestyle, sexuality and hobbies via their dating profiles, they have a wealth of information upon which to identify trends about the love lives of a typical OkCupid member.

On their informative, entertaining and sometimes controversial blog OkTrends, co-founder Christian Rudder (with the assistance of data scientist Max Shron) analyzes the data to report aggregate trends and insights, such as the differences in preferences between white and black people, how the behaviors of gay members are at odds with some pernicious gay stereotypes, and how religion relates to reading and writing levels. With its blend of data analysis and humor, the OkTrends blog addresses interesting facts and topics that not many others have the will — or the data — to write openly about.

Rudder tells me that when the blog first launched, the data analyses were run manually in Microsoft Excel. Six month later, Max Shron introduced the OkCupid team to R, and enabled more interesting analyses of the data, and to use more of it. According to co-founder Sam Yagan, once they ran data on R, everything got a lot "better and faster", and they were able to produce posts faster and write about more intricate data with better visualizations.

Today (appropriately, on St Valentine's day), GigaOM has published an in-depth article about OkCupid's use of R, by Revolution Analytics' Mike Minelli. Read the article for more details about the data, analyses and reporting OkCupid does with R to reveal hidden facts about our love lives and, ultimately, to find our Valentine.

GigaOM: OkCupid Demystifies Dating with Big Data


