Today we will use R to extract some interesting summary statistics regarding the music files stored in the computer. For all mp3 files I keep certain metadata in their ID3 tag. We will use this information to explore the distribution of music files with respect to the year of release. All the following are done on a Desktop computer, powered by Debian stable. On Windows it can be probably accomplished using a GNU/Linux liveCD.
In order to extract the metadata, we will use the wonderful command line utility, called lltag. Bash will take care of creating a text file with the useful output, and finally through R the file will be read and calculations and plotting will be performed.
I keep the music files in the directory /media/data/music/. The command that extracts only the date and saves the output on a text file on the Desktop is:
lltag --show-tags date /media/data/music/ -R > date.txt
The output text consists of pairs of lines: the first line is the file path and the second is something like “ DATE=2011″
A word of caution here: for some songs you might get a Warning instead of the year. These either don’t contain this information, or their ID3 tag is problematic. Now we can use the grep command to grab the lines containing dates. Then, the result is processed by the cut command which keeps only the 7th to 11th character (the year). Finally, the result is written to a new file:
grep DATE=* date.txt | cut -c8-11 > date2.txt
Now we launch R, read the file and create a histogram:
dd <- scan("~/date2.txt") png(filename = "date_dist.png", width = 400, height = 300) mt = "Time distribution of music files" hist(dd, breaks = seq(1950,2015, 5), col = "gray", main = mt, xlab = "year") dev.off()
You can see the output graph below.
Depending on R plotting skills and desired metadata attribute, one can play and create custom graphs.