A classical analysis (Radio Swiss classic program)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I am not a classical music expert at all, but I happen to have friends who are, and am even married to someone who plays the cello (and the ukulele!). I appreciate listening to such music from time to time, in particular Baroque music. A friend made me discover Radio Swiss classic, an online radio playing classical music all day and all night long, with a quite nice variety, and very little speaking between pieces, with no ads (thank you, funders of the radio!). Besides, the voices telling me which piece has just been played are really soothing, so Radio Swiss classic is a good one in my opinion.
Today, instead of anxiously waiting for the results of the French presidential elections, I decided to download the program of the radio in the last years and have a quick look at it, since after all, the website says that the radio aims at relaxing people.
Scraping the program
My webscraping became a bit more elegant because I followed the advice of EP alias expersso, who by the way should really start blogging. I started downloading programs since September 2008 because that’s when I met the friend who told me about Radio Swiss Classic.
<span class="n">dates</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">seq</span><span class="p">(</span><span class="n">from</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">lubridate</span><span class="o">::</span><span class="n">ymd</span><span class="p">(</span><span class="s2">"2008-09-01"</span><span class="p">),</span><span class="w">
</span><span class="n">to</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">lubridate</span><span class="o">::</span><span class="n">ymd</span><span class="p">(</span><span class="s2">"2017-04-22"</span><span class="p">),</span><span class="w">
</span><span class="n">by</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"1 day"</span><span class="p">)</span><span class="w">
</span><span class="n">base_url</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="s2">"http://www.radioswissclassic.ch/en/music-programme/search/"</span><span class="w">
</span><span class="n">get_one_day_program</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">date</span><span class="p">,</span><span class="w"> </span><span class="n">base_url</span><span class="p">){</span><span class="w">
</span><span class="c1"># in order to see progress</span><span class="w">
</span><span class="n">message</span><span class="p">(</span><span class="n">date</span><span class="p">)</span><span class="w">
</span><span class="c1"># build URL</span><span class="w">
</span><span class="n">date_as_string</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">as.character</span><span class="p">(</span><span class="n">date</span><span class="p">)</span><span class="w">
</span><span class="n">date_as_string</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">stringr</span><span class="o">::</span><span class="n">str_replace_all</span><span class="p">(</span><span class="n">date_as_string</span><span class="p">,</span><span class="w"> </span><span class="s2">"-"</span><span class="p">,</span><span class="w"> </span><span class="s2">""</span><span class="p">)</span><span class="w">
</span><span class="n">url</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">paste0</span><span class="p">(</span><span class="n">base_url</span><span class="p">,</span><span class="w"> </span><span class="n">date_as_string</span><span class="p">)</span><span class="w">
</span><span class="c1"># read page</span><span class="w">
</span><span class="n">page</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">try</span><span class="p">(</span><span class="n">xml2</span><span class="o">::</span><span class="n">read_html</span><span class="p">(</span><span class="n">url</span><span class="p">),</span><span class="w">
</span><span class="n">silent</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">)</span><span class="w">
</span><span class="k">if</span><span class="p">(</span><span class="n">is</span><span class="p">(</span><span class="n">page</span><span class="p">,</span><span class="w"> </span><span class="s2">"try-error"</span><span class="p">)){</span><span class="w">
</span><span class="n">message</span><span class="p">(</span><span class="s2">"horribly wrong"</span><span class="p">)</span><span class="w">
</span><span class="n">closeAllConnections</span><span class="p">()</span><span class="w">
</span><span class="nf">return</span><span class="p">(</span><span class="kc">NULL</span><span class="p">)</span><span class="w">
</span><span class="p">}</span><span class="k">else</span><span class="p">{</span><span class="w">
</span><span class="c1"># find all times, artists and pieces</span><span class="w">
</span><span class="n">times</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">xml2</span><span class="o">::</span><span class="n">xml_text</span><span class="p">(</span><span class="n">xml2</span><span class="o">::</span><span class="n">xml_find_all</span><span class="p">(</span><span class="n">page</span><span class="p">,</span><span class="w">
</span><span class="n">xpath</span><span class="o">=</span><span class="s2">"//span[@class='time hidden-xs']//text()"</span><span class="p">))</span><span class="w">
</span><span class="n">artists</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">xml2</span><span class="o">::</span><span class="n">xml_text</span><span class="p">(</span><span class="n">xml2</span><span class="o">::</span><span class="n">xml_find_all</span><span class="p">(</span><span class="n">page</span><span class="p">,</span><span class="w">
</span><span class="n">xpath</span><span class="o">=</span><span class="s2">"//span[@class='titletag']//text()"</span><span class="p">))</span><span class="w">
</span><span class="n">pieces</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">xml2</span><span class="o">::</span><span class="n">xml_text</span><span class="p">(</span><span class="n">xml2</span><span class="o">::</span><span class="n">xml_find_all</span><span class="p">(</span><span class="n">page</span><span class="p">,</span><span class="w">
</span><span class="n">xpath</span><span class="o">=</span><span class="s2">"//span[@class='artist']//text()"</span><span class="p">))</span><span class="w">
</span><span class="c1"># the last artist and piece are the current ones</span><span class="w">
</span><span class="n">artists</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">artists</span><span class="p">[</span><span class="m">1</span><span class="o">:</span><span class="p">(</span><span class="nf">length</span><span class="p">(</span><span class="n">artists</span><span class="p">)</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="m">1</span><span class="p">)]</span><span class="w">
</span><span class="n">pieces</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">pieces</span><span class="p">[</span><span class="m">1</span><span class="o">:</span><span class="p">(</span><span class="nf">length</span><span class="p">(</span><span class="n">pieces</span><span class="p">)</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="m">1</span><span class="p">)]</span><span class="w">
</span><span class="c1"># get a timedate from each time</span><span class="w">
</span><span class="n">timedates</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">paste</span><span class="p">(</span><span class="nf">as.character</span><span class="p">(</span><span class="n">date</span><span class="p">),</span><span class="w"> </span><span class="n">times</span><span class="p">)</span><span class="w">
</span><span class="n">timedates</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">lubridate</span><span class="o">::</span><span class="n">ymd_hm</span><span class="p">(</span><span class="n">timedates</span><span class="p">)</span><span class="w">
</span><span class="n">timedates</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">lubridate</span><span class="o">::</span><span class="n">force_tz</span><span class="p">(</span><span class="n">timedates</span><span class="p">,</span><span class="w"> </span><span class="n">tz</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Europe/Zurich"</span><span class="p">)</span><span class="w">
</span><span class="c1"># format the output</span><span class="w">
</span><span class="n">program</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">tibble</span><span class="o">::</span><span class="n">tibble</span><span class="p">(</span><span class="n">time</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">timedates</span><span class="p">,</span><span class="w">
</span><span class="n">artist</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">artists</span><span class="p">,</span><span class="w">
</span><span class="n">piece</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">pieces</span><span class="p">)</span><span class="w">
</span><span class="nf">return</span><span class="p">(</span><span class="n">program</span><span class="p">)</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="n">programs</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">purrr</span><span class="o">::</span><span class="n">map</span><span class="p">(</span><span class="n">dates</span><span class="p">,</span><span class="w"> </span><span class="n">get_one_day_program</span><span class="p">,</span><span class="w">
</span><span class="n">base_url</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">base_url</span><span class="p">)</span><span class="w">
</span><span class="n">programs</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">dplyr</span><span class="o">::</span><span class="n">bind_rows</span><span class="p">(</span><span class="n">programs</span><span class="p">)</span><span class="w">
</span><span class="n">save</span><span class="p">(</span><span class="n">programs</span><span class="p">,</span><span class="w"> </span><span class="n">file</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"data/radioswissclassic_programs.RData"</span><span class="p">)</span><span class="w">
</span>
There were some days without any program on the website, for which the website said something was horribly wrong with the server.
<span class="n">load</span><span class="p">(</span><span class="s2">"data/radioswissclassic_programs.RData"</span><span class="p">)</span><span class="w">
</span><span class="n">wegot</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">length</span><span class="p">(</span><span class="n">unique</span><span class="p">(</span><span class="n">lubridate</span><span class="o">::</span><span class="n">as_date</span><span class="p">(</span><span class="n">programs</span><span class="o">$</span><span class="n">time</span><span class="p">)))</span><span class="w">
</span><span class="n">wewanted</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">length</span><span class="p">(</span><span class="n">seq</span><span class="p">(</span><span class="n">from</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">lubridate</span><span class="o">::</span><span class="n">ymd</span><span class="p">(</span><span class="s2">"2008-09-01"</span><span class="p">),</span><span class="w">
</span><span class="n">to</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">lubridate</span><span class="o">::</span><span class="n">ymd</span><span class="p">(</span><span class="s2">"2017-04-22"</span><span class="p">),</span><span class="w">
</span><span class="n">by</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"1 day"</span><span class="p">))</span><span class="w">
</span>
However, I got a program for approximately 0.96 of the days.
Who are the most popular composers?
<span class="n">library</span><span class="p">(</span><span class="s2">"magrittr"</span><span class="p">)</span><span class="w">
</span><span class="n">table</span><span class="p">(</span><span class="n">programs</span><span class="o">$</span><span class="n">artist</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">broom</span><span class="o">::</span><span class="n">tidy</span><span class="p">()</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">dplyr</span><span class="o">::</span><span class="n">arrange</span><span class="p">(</span><span class="n">desc</span><span class="p">(</span><span class="n">Freq</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">head</span><span class="p">(</span><span class="n">n</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">20</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">knitr</span><span class="o">::</span><span class="n">kable</span><span class="p">()</span><span class="w">
</span>
Var1 | Freq |
---|---|
Wolfgang Amadeus Mozart | 37823 |
Ludwig van Beethoven | 20936 |
Joseph Haydn | 18140 |
Franz Schubert | 15596 |
Antonio Vivaldi | 14947 |
Johann Sebastian Bach | 12003 |
Felix Mendelssohn-Bartholdy | 11541 |
Antonin Dvorak | 10265 |
Gioachino Rossini | 9591 |
Frédéric Chopin | 8470 |
Piotr Iljitsch Tchaikowsky | 8092 |
Georg Friedrich Händel | 7935 |
Tomaso Albinoni | 6175 |
Gaetano Donizetti | 5945 |
Giuseppe Verdi | 5639 |
Johannes Brahms | 5526 |
Johann Nepomuk Hummel | 5439 |
Camille Saint-Saëns | 5395 |
Luigi Boccherini | 5130 |
Johann Christian Bach | 4976 |
I’ll have to admit that I don’t even know all the composers in this table but they’re actually all famous according to my live-in classical music expert. Radio Swiss classic allows listeners to rate pieces, so the most popular ones are programmed more often, and well I guess the person making the programs also tend to program famous composers quite often.
<span class="n">library</span><span class="p">(</span><span class="s2">"ggplot2"</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="s2">"hrbrthemes"</span><span class="p">)</span><span class="w">
</span><span class="n">table</span><span class="p">(</span><span class="n">programs</span><span class="o">$</span><span class="n">artist</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">broom</span><span class="o">::</span><span class="n">tidy</span><span class="p">()</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">ggplot</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">geom_histogram</span><span class="p">(</span><span class="n">aes</span><span class="p">(</span><span class="n">Freq</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">scale_x_log10</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">theme_ipsum</span><span class="p">(</span><span class="n">base_size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">14</span><span class="p">)</span><span class="w">
</span>
Interestingly, but not that surprisingly I guess given the popularity of, say, Mozart, the distribution of occurrences by composers seems to be log-normally distributed.
How long are pieces?
On the website of Radio Swiss classic it is stated that pieces are longer in the evening than during the day, which I wanted to try and see. Because the program of the radio was not corrected for time changes (i.e. on 25 hour-days there are only 24 hours of music according to the online program), I shall only look at pieces whose duration is smaller than 60 minutes, which solves the issue of missing days at the same time.
<span class="n">programs</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">dplyr</span><span class="o">::</span><span class="n">arrange</span><span class="p">(</span><span class="n">programs</span><span class="p">,</span><span class="w"> </span><span class="n">time</span><span class="p">)</span><span class="w">
</span><span class="n">programs</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">dplyr</span><span class="o">::</span><span class="n">mutate</span><span class="p">(</span><span class="n">programs</span><span class="p">,</span><span class="w">
</span><span class="n">duration</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">difftime</span><span class="p">(</span><span class="n">lead</span><span class="p">(</span><span class="n">time</span><span class="p">,</span><span class="w"> </span><span class="m">1</span><span class="p">),</span><span class="w">
</span><span class="n">time</span><span class="p">,</span><span class="w">
</span><span class="n">units</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"min"</span><span class="p">))</span><span class="w">
</span><span class="n">programs</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">dplyr</span><span class="o">::</span><span class="n">mutate</span><span class="p">(</span><span class="n">programs</span><span class="p">,</span><span class="w">
</span><span class="n">duration</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ifelse</span><span class="p">(</span><span class="n">duration</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="m">60</span><span class="p">,</span><span class="w">
</span><span class="kc">NA</span><span class="p">,</span><span class="w"> </span><span class="n">duration</span><span class="p">))</span><span class="w">
</span><span class="n">programs</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">dplyr</span><span class="o">::</span><span class="n">mutate</span><span class="p">(</span><span class="n">programs</span><span class="p">,</span><span class="w">
</span><span class="n">hour</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">as.factor</span><span class="p">(</span><span class="n">lubridate</span><span class="o">::</span><span class="n">hour</span><span class="p">(</span><span class="n">time</span><span class="p">)))</span><span class="w">
</span><span class="n">programs</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">ggplot</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">geom_boxplot</span><span class="p">(</span><span class="n">aes</span><span class="p">(</span><span class="n">hour</span><span class="p">,</span><span class="w"> </span><span class="n">duration</span><span class="p">))</span><span class="o">+</span><span class="w">
</span><span class="n">theme_ipsum</span><span class="p">(</span><span class="n">base_size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">14</span><span class="p">)</span><span class="w">
</span>
I don’t find the difference between day and night that striking, maybe I could try to define day and night to have a prettier figure (but I won’t do any test, I soon need to go watch TV).
<span class="n">programs</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">dplyr</span><span class="o">::</span><span class="n">mutate</span><span class="p">(</span><span class="n">night</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="n">lubridate</span><span class="o">::</span><span class="n">hour</span><span class="p">(</span><span class="n">time</span><span class="p">)</span><span class="w"> </span><span class="o"><=</span><span class="w"> </span><span class="m">4</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">lubridate</span><span class="o">::</span><span class="n">hour</span><span class="p">(</span><span class="n">time</span><span class="p">)</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="m">20</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">ggplot</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">geom_boxplot</span><span class="p">(</span><span class="n">aes</span><span class="p">(</span><span class="n">night</span><span class="p">,</span><span class="w"> </span><span class="n">duration</span><span class="p">))</span><span class="o">+</span><span class="w">
</span><span class="n">theme_ipsum</span><span class="p">(</span><span class="n">base_size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">14</span><span class="p">)</span><span class="w">
</span>
Conclusion
The website also states that the pieces are more lively in the morning, but I have no data to which to match the titles of the pieces to investigate that claim. Well I have not even looked for such data. Another extension that I would find interesting would be to match each composer’s name to a style and then see how often each style is played. Now I’ll stop relaxing and go stuff my face with food in front of the election results!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.