Birds of a feather shop together

[This article was first published on Decision Science News » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.


This week, Decision Science News is doing a special cross-posting with Messy Matters. The post below is by Sharad Goel and describes work that he and your Decision Science News editor Dan Goldstein are jointly undertaking at Yahoo!

Do you know what the #$*! your social media strategy is? Perhaps it’s “to facilitate audience conversations and drive engagement with social currency”? Or maybe, “to amplify word of mouth by motivating influencers”? Well, given all the lies and damned lies being told about social, fellow yahoo Dan Goldstein and I decided to enter the fray with statistics. We measured the extent to which your friends’ behavior predicts your own, and found that in several consumer domains the effect is substantial, complementing traditional demographic and behavioral predictors.

That friends are similar along a variety of dimensions is a long-observed empirical regularity—a pattern sociologists call homophily. As McPherson et al. write in their canonical review on the subject, “homophily limits people’s social worlds in a way that has powerful implications for the information they receive, the attitudes they form, and the interactions they experience.” Turning this statement around, where there is homophily, one can in principle predict an individual’s behavior based on the attributes and actions of his or her associates.

To assess the quality of such network-based predictions, we merged a large social network (based on email and IM exchanges) with offline sales data at an upscale, national department store chain. Thus, for each of over one million users, we had their past purchase amounts in dollars, and had the same information for each of their network contacts. Think about this for a minute: we not only know how much these individuals themselves spent at an offline retailer, but also how much their social contacts spent, a testament to how profoundly the Internet is changing the way we study human behavior. (Despite bolstering social science research, these newfound tools raise serious privacy issues. We left the matching to a third party that specializes in doing this securely, so neither we nor the department store had access to the other’s complete customer database.)

The plot below summarizes our findings. First, as indicated by the top line, consumers whose friends spent a lot, also spent a lot themselves, consistent with the hypothesis that homophily extends to consumer behavior. When friends (alters) on average spent $400 during the six-month observation period, the consumer herself (ego) spent nearly $600, more than twice the typical consumer (indicated by the dotted line). As our aim is prediction, however, the relevant question is not just whether friends are similar in their purchasing behavior, but rather how much information is conveyed by social ties relative to other attributes. One might conjecture that ties simply indicate demographic (i.e., age and sex) similarity, that those who spend a lot are more likely to be middle-aged women—the primary market segment for this department store—and that friends of middle-aged women tend also to be middle-aged women. To test this hypothesis, we first paired each individual with a randomly chosen consumer of identical age and sex. The bottom line shows that this demographically matched group is, perhaps surprisingly, pretty ordinary. In other words, looking only at age and sex, you can’t identify consumers whose friends spend a lot (and who we know spend a lot themselves).

Though it’s standard marketing practice to target consumers based on their demographics, it’s an admittedly noisy profiling technique. So, to put social through the wringer, we next took the “socially select” group—consumers whose friends spent a lot—and matched them to random consumers with identical age, sex, and past purchase amounts. Each social candidate, that is, was matched to a consumer not only of the same age and sex, but one who spent approximately the same amount as the social candidate during the previous six months. Even relative to this formidable baseline, social cues still provide considerable information. As the middle line indicates, knowing a consumer’s age, sex and past purchases, but not that their friends are shopaholics, one would still underestimate their future sales.[1]

We repeated this analysis for two other domains—examining signups for Yahoo! Fantasy Football, and clicks on ten online banner ads for movies, apparel, government programs, and beyond—again finding that the predictive power of social persists even after adjusting for age, sex, and past behavior. Lest you run off to rejigger your social strategy, we should mention a couple of caveats. First, we have shown that consumers with big-spending friends tend to spend a lot—more, in fact, than demographics and past purchases alone would suggest. But since most people, even premium customers, don’t have shopaholic friends, social cues do not substantially boost average predictive performance. Second, though social signals help predict how much consumers spend, they don’t always help identify which consumers will spend the most. Those who recently spent fifty grand on sartorial elegance are likely to be habitual top spenders, regardless of what you know about their friends.

Assessing the value of social, as with most things, is a messy affair. On the one hand, network ties convey information not captured by the usual egocentric metrics, a conclusion that at the very least we find scientifically interesting. On the other hand, it’s not immediately obvious how to use that knowledge to take over the world. Well, rest assured that an army of social strategy gurus are waiting in the wings with a game-changing, technology-disrupting way to, you know, “leverage the social graph to deliver personalized experiences” or something.

N.B.Thanks to Randall Lewis and David Reiley for acquiring the sales data, Jake Hofman for assembling the email data, and Duncan Watts and Dan Reeves for comments. For related work in the telecom domain, check out the paper, “Network-Based Marketing: Identifying Likely Adopters via Consumer Networks,” by Shawndra Hill, Foster Provost, and Chris Volinsky.

Illustration by Kelly Savage


[1] It’s perhaps tempting to conclude from these results that shopping is contagious (i.e., to assert causation where only correlation has been shown). Though there is probably some truth to that claim, establishing such is neither our objective nor justified from our analysis.

To leave a comment for the author, please follow the link and comment on their blog: Decision Science News » R. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)