Data sonification with R: the sound of Twitter data

May 29, 2013
By

(This article was first published on SoMe Lab » r-project, and kindly contributed to R-bloggers)

tweet_waveWhat does a tweet sound like? Not the kind that flies around in the air, but the kind that zips to and from our mobile devices. I’m intensely interested in finding ways to make sense of data. Sonification of data – representing data with sound – offers one way to do that. This post steps through R code to take the text of tweets and turn them into short chirping sounds. It also uses different tones for different users so that each user has a “voice”. In other words, this post shows how to use R to make Twitter data sing.

?Download download.R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
# Author: Jeff Hemsley jhemsley at uw dot edu
# Twitter: @JeffHemsley
# Created: Sometime in 2013
# 
# the location of the files I used are at:
# http://somelab.net/wp-content/uploads/2013/05/test_tweet_like_data.txt
# Grab these files or make your own and fix the path info below
#
# load the tuneR package
library(tuneR)
 
dir.path <- "c:/r/rt_nets/"
dir.path.dat <- "c:/r/rt_nets/dat/"
#dir.path.dat <- "c:/r/sound/"
tweet.data.file.name <- "earthRTs.txt"
#tweet.data.file.name <- "test_tweet_like_data.txt"
tweet.data.file <- paste(dir.path.dat, tweet.data.file.name, sep="")
 
# tweet data is stored in a file, often a big one with a tab as the separater
tweet.data <- data.frame(read.delim(file=tweet.data.file, sep='\t', stringsAsFactors=F, row.names=NULL))
 
# ok, just so we can see what we got...
colnames(tweet.data)
dim(tweet.data)
tweet.data[1,]
 
# now, we want have certain characters given some sounds, everything else we
# treat as a pause so as to sort of mimic language and bird chirps. So
# here is a list of the characters we will create sound for. Spaces give us
# us something like different words.
chars.to.sonify <- c("#", "@", "-", ",", ".", "'", letters, as.character(0:9))
chars.to.sonify.length <- length(chars.to.sonify)
 
# sampling rate. this is how many data points (I think) per second
sampeling.rate <- 6000
 
# how rich the sound is?
bits <- 8
 
long.pause <- .5 # in seconds
short.pause <- .1 # in seconds
character.sound.length <- 0.01
 
# we are going to setup a range of tones for each user
# to do that we need to find the total range, how many users, and a min and max for each user
# and then stuff it in a dataframe so we can get those ranges depending on who is "talking"
# note, Hz low values are deep song, and high values are high pitched
min.Hz <- 600
max.Hz <- 8000
Hz.range <- max.Hz - min.Hz
 
# get the users from the original dataframe
users.vector <- sort(unique(tweet.data$user.screen_name))
users.vector.length <- length(users.vector)
num.tone.start.buckets <- floor(Hz.range/users.vector.length)
 
# ok, now make the dataframe
user.voice.df <- data.frame(screen.name=users.vector, min.tone=rep(0, users.vector.length), max.tone=rep(0, users.vector.length))
user.voice.df$min.tone <- seq(from=min.Hz, by=num.tone.start.buckets, length=users.vector.length)
user.voice.df$max.tone <- seq(to=max.Hz, by=num.tone.start.buckets, length=users.vector.length)
user.voice.df[1,] # whats the first row look like?
user.voice.df[users.vector.length,] # whats the last row look like?
 
#ok. Now, we don't want to do all of the tweets, just a sample, for experimenting
# do like 3 to 10.
num.tweets.to.sonify <- 10
num.obs <- dim(tweet.data)[1]
 
if (num.obs < num.tweets.to.sonify) {
  num.tweets.to.sonify <- num.obs
}
tweet.rows.to.sing <- sample(x=1:num.obs, size=num.tweets.to.sonify)
 
# here is the sonify loop: for each user sing thier tweet
for (i in 1:num.tweets.to.sonify) {
 
  if (i == 1) {
    # wait! if this is the first iteration, lets make a wave object: a "coversation" of tweets
    w.conversation <- silence(duration = long.pause, xunit = c("samples", "time")[2], bit=bits, samp.rate=sampeling.rate)
  }
 
  # i-th sample
  df.tweet.row <- tweet.rows.to.sing[i]
 
  # get the user and set their range
  the.user <- tweet.data$user.screen_name[df.tweet.row]
  the.user.index <- which(user.voice.df$screen.name == the.user)
  user.min.Hz <- user.voice.df$min.tone[the.user.index]
  user.max.Hz <- user.voice.df$max.tone[the.user.index]
  user.Hz.range <- user.max.Hz - user.min.Hz
 
  # get the tweet text
  the.tweet <- tweet.data$text[df.tweet.row]
  the.tweet.length <- nchar(the.tweet)
 
  # break into a vector of characters
  # lowercase the letters. stuff it all in a vector
  tweet.text.vec <- unlist(strsplit(tolower(the.tweet), ""))
 
  # For each character in the tweet, find it's index in the chars.to.sonify 
  # (see above for our 'alphabet' of chars we are sounding out)
  tmp.index <- match(tweet.text.vec, chars.to.sonify)
 
  # each 'talker' starts with a pause of silence
  wobj <- silence(duration = long.pause, xunit = c("samples", "time")[2], bit=bits, samp.rate=sampeling.rate)
 
  # ok. for each character in the tweet, make a little wave for it.
  for (j in 1:the.tweet.length) {
    # j <- 1 + j
    if (is.na(tmp.index[j])) {
      w <- silence(duration = short.pause, xunit = c("samples", "time")[2], bit=bits, samp.rate=sampeling.rate)
    } else {
      tweet.char.freq <- (tmp.index[j] * (user.Hz.range/chars.to.sonify.length)) + user.min.Hz
      w <- sine(tweet.char.freq, duration=character.sound.length, xunit = c("samples", "time")[2], bit=bits, samp.rate=sampeling.rate)
    }
 
    # add each part of the wave to the wave object
    wobj <- bind(wobj, w)
  }
 
  # add each talker's tweet to the conversasion
  w.conversation <- bind(w.conversation, wobj)
}
 
play(w.conversation)
# write it all to a wav file.
writeWave(w.conversation, "c:/r/sound/tweet_data_sonification.wav")

To leave a comment for the author, please follow the link and comment on his blog: SoMe Lab » r-project.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.