Visualizing Twitter Followers Using Pointillism

[This article was first published on Decisions and R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A funny thing about social media in the 21st century is that it allows us to connect with a lot of people.. by a lot, I mean so many that it's easy to lose track of any sense of scale. Maybe others are better at this, but I have a hard time wrapping my head around what (say) 8,000 twitter followers looks like.

To try to get a grip on this, I thought it'd be fun to try to represent the number of followers a person has by creating plots with a point for each follower. Using R, this turns out to be really easy!

In order to make these plots a bit more interesting than just a mass of dots, I decided to use twitter profile pictures as a source of color. The result is pretty cool – we get a plot with a 'pointillist' representation of the profile picture. To try this out, I've created representations for a few famous R bloggers – David Smith above, Tal Galili and Hadley Wickham below.

While there are a bunch of ways to do this, the code below (roughly) samples a bunch of (x,y) coordinates from the same dimensions as the picture, and finds a 'close color' from the original image, then replots this set of points.

In order to preserve the pointillist aesthetic for very small and vary large numbers of followers, the size of the points is a decreasing function of the number of followers.

There are a bunch of ways this function could be improved – right now, it only works if the original image is a jpeg file. Also, I've limited the number of points the function will visualize to 30,000.

The script draws twitter profile pictures, and the number of followers using the Jeff Gentry's twitteR package. Getting this running can be a bit of a pain, but there's help here

Here's the code – please feel free to improve on it.. it's pretty hacked-together:

## required libraries:
## note: you need to register twitteR credentials before running!
im.func.1 = function(image, k.cols = 10){
  # creating a dataframe:
  test.mat = matrix(image,ncol = 3)
  df = data.frame(test.mat)
  colnames(df) = c("r","g","b")
  df$y = rep(1:dim(image)[1],dim(image)[2])
  df$x = rep(1:dim(image)[2], each = dim(image)[1])
  # extracting colors:
  k2 = kmeans(df[,1:3],k.cols)
  # adding centers back:
  fit.test = fitted(k2)
  df$r.pred = fit.test[,1]
  df$g.pred = fit.test[,2]
  df$b.pred = fit.test[,3]
} = function(x1){
  ref.dat = data.frame(num = 10:15, let = LETTERS[1:6])
  out = as.character(x1)
  if(x1 %in% 10:15){out = as.character(ref.dat$let[which(ref.dat$num == x1)])}
rgb.func = function(vec){
  #note: vec is a triple of color intensities
  r1 = floor(255*vec[1])
  g1 = floor(255*vec[2])
  b1 = floor(255*vec[3])
  x1 = r1 %/% 16
  x2 = r1 %% 16
  x3 = g1 %/% 16
  x4 = g1 %% 16
  x5 = b1 %/% 16
  x6 = b1 %% 16
  x1 =
  x2 =
  x3 =
  x4 =
  x5 =
  x6 =
  out = paste("#",x1,x2,x3,x4,x5,x6, sep = "")
dot.size.func = function(n){
  dot = 1
  if(n>10000){dot = .5}
  if(n<5000){dot = 2}
  if(n<2000){dot = 3}
  if(n<1000){dot = 4}
  if(n<500){dot = 5}
  if(n<200){dot = 6}
  if(n>30000){dot = NA}
general.func = function(user){
  get.em = getUser(user, cainfo = "cacert.pem")
  img = readImage(get.em$profileImageUrl)
  n.follow = get.em$followersCount
  dot.size = dot.size.func(n.follow)
  dat1 = data.frame(x = runif(n.follow,0,dim(img)[2]),
                    y = runif(n.follow,0,dim(img)[1]),
                    radius = rep(dot.size,n.follow))
  temp = im.func.1(img,k.cols = 10)
  dat1$x.round = round(dat1$x,0)
  dat1$y.round = round(dat1$y,0)
  dat1$x.round[dat1$x.round == 0] = 1
  dat1$y.round[dat1$y.round == 0] = 1
  dat1 = merge(dat1,temp,by.x = c("x.round","y.round"), by.y = c("x","y"))
  # splice in the colors:
  dat1$col = apply(dat1[9:11],1,rgb.func)
  ## trying out a different plot:
  dat1$x = max(dat1$x) - dat1$x
  g = rasterGrob(img, interpolate = TRUE)
  p = ggplot(dat1,aes(x = y, y = x, col = col)) + geom_point(size = dat1$radius) + scale_colour_identity() +
    ylim(min(temp$x),max(temp$x)) +
    xlim(min(temp$y),max(temp$y + 50)) +
    theme_bw() +
    theme(line = element_blank(),
          text = element_blank(),
          line = element_blank(),
          title = element_blank()) +
    annotation_custom(g, xmin = max(temp$y) + 2, xmax = max(temp$y) + 50, ymin = -Inf, ymax = Inf)
t.start = Sys.time()
t.end = Sys.time()
t.end - t.start
ggsave("RD.png", width = 8.25, height =4.42)

To leave a comment for the author, please follow the link and comment on their blog: Decisions and R. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)