Site icon R-bloggers

Information Transmission in a Social Network: Dissecting the Spread of a Quora Post

[This article was first published on Edwin Chen's Blog » r, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

tl;dr See this movie visualization for a case study on how a post propagates through Quora.

How does information spread through a network? Much of Quora’s appeal, after all, lies in its social graph — and when you’ve got a network of users, all broadcasting their activities to their neighbors, information can cascade in multiple ways. How do these social designs affect which users see what?

Think, for example, of what happens when your kid learns a new slang word at school. He doesn’t confine his use of the word to McKinley Elementary’s particular boundaries, between the times of 9-3pm — he introduces it to his friends from other schools at soccer practice as well. A couple months later, he even says it at home for the first time; you like the word so much, you then start using it at work. Eventually, Justin Bieber uses the word in a song, at which point the word’s popularity really starts to explode.

So how does information propagate through a social network? What types of people does an answer on Quora reach, and how does it reach them? (Do users discover new answers individually, or are hubs of connectors more key?) How does the activity of a post on Quora rise and fall? (Submissions on other sites have limited lifetimes, fading into obscurity soon after an initial spike; how does that change when users are connected and every upvote can revive a post for someone else’s eyes?)

To start answering some of these questions, I dug into one of my more popular posts, on a layman’s introduction to random forests.

Users, Topics

Before looking deeper into the voting dynamics of my post, let’s first get some background on what kinds of users my answer reached.

Here’s a graph of the topics that question upvoters follow. (Each node is a topic, and every time upvoter X follows both topics A and B, I add an edge between A and B.)

We can see from the graph that upvoters tend to be interested in three kinds of topics:

Also, here’s the network of the upvoters themselves (there’s an edge between users A and B if A follows B):

We can see three main clusters of users:

Digging into how these topic and user graphs are related:

More interestingly, though, we can ask: how do the connections between upvoters relate to the way the post spread?

Social Voting Dynamics

So let’s take a look. Here’s a visualization I made of upvotes on my answer across time (click here for a larger view).

To represent the social dynamics of these upvotes, I drew an edge from user A to user B if user A transmitted the post to user B through an upvote. (Specifically, I drew an edge from Alice to Bob if Bob follows Alice and Bob’s upvote appeared within five days of Alice’s upvote; this is meant to simulate the idea that Alice was the key intermediary between my post and Bob.) Also,

Here’s a play-by-play of the video:

(Already we can see some relationships between the graph of user connections and the way the post propagated. Many of the users from the orange cluster, for example, come from Alex Kamil and Christian Langreiter’s upvotes, and many of the users from the green cluster come from Aditya Sengupta and Marc Bodnick’s upvotes. What’s interesting, though, is, why didn’t the cluster of green users appear all at once, like the orange cluster did? People like Kah Seng Tay, Tracy Chou, Venkatesh Rao, and Chad Little upvoted the answer pretty early on, but it wasn’t until Aditya Sengupta’s upvote a couple months later that people like Marc Bodnick, Edmond Lau, and many of the other green users (who do indeed follow that first set of folks) discovered the answer. Did the post simply get lost in users’ feeds the first time around? Was the post perhaps ignored until it received enough upvotes to be considered worth reading? Are some users’ upvotes just trusted more than others’?)

For another view of the upvote dynamics, here’s a static version of the movie, where we can again easily see the clusters of activity:

Fin

To summarize a bit what we’ve found, here are some statistics on the role the social graph played in spreading the answer:

So it looks like the social graph played quite a large part in the post’s propagation, and I’ll end with a big shoutout to Stormy Shippy, who provided an awesome set of scripts I used to collect a lot of this data.


To leave a comment for the author, please follow the link and comment on their blog: Edwin Chen's Blog » r.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.