Women in Orchestras

[This article was first published on Ripples, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I believe in the truth of fairy-tales more than I believe in the truth in the newspaper (Lotte Reiniger)

In my opinion, this graph is a visual demonstration that we live in a male very chauvinist world.

Orchestras2

In this experiment I download the members of ten top orchestras of the world with the amazing rvest package. After cleaning texts, I obtain the gender of names with genderizeR package as I did here. Since I only take into account names genderized with high probability, these numbers cannot be exact. Apart of this, the plot speaks by itself.

setwd("YOUR WORKING DIRECTORY HERE")
library(rvest)
library(dplyr)
library(genderizeR)
read_html("http://www.berliner-philharmoniker.de/en/orchestra/") %>%
  html_nodes(".name") %>%
  html_text(trim=TRUE) %>%
  iconv("UTF-8") %>%
  gsub("[\r,\n]"," ", .) %>%
  gsub("\\s+", " ", .) %>%
  paste(collapse=" ") %>%
  findGivenNames() -> berliner
read_html("https://www.concertgebouworkest.nl/en/musicians") %>%
  html_nodes(".u-padding--b2") %>%
  html_text(trim=TRUE) %>%
  iconv("UTF-8") %>%
  gsub("\\s+", " ", .) %>%
  paste(collapse=" ") %>%
  findGivenNames() -> rco
read_html("http://www.philharmonia.spb.ru/en/about/orchestra/zkrasof/contents/") %>%
  html_nodes(".td") %>%
  html_text(trim=TRUE) %>%
  iconv("UTF-8") %>%
  gsub("[\r,\n]"," ", .) %>%
  gsub("\\s+", " ", .) %>%
  .[23] %>%
  findGivenNames() -> spb
read_html("http://ocne.mcu.es/conoce-a-la-ocne/orquesta-nacional-de-espana/componentes/") %>%
  html_nodes(".col-main") %>%
  html_text(trim=TRUE) %>%
  iconv("UTF-8") %>%
  gsub("[\r,\n]"," ", .) %>%
  gsub("\\s+", " ", .) %>%
  gsub("([[:lower:]])([[:upper:]][[:lower:]])", "\\1 \\2", .) %>%
  findGivenNames() -> one
read_html("http://www.gewandhausorchester.de/en/orchester/") %>%
  html_nodes("#content") %>%
  html_text(trim=TRUE) %>%
  iconv("UTF-8") %>%
  gsub("[\r,\n]"," ", .) %>%
  gsub("\\s+", " ", .) %>%
  findGivenNames()  -> leipzig
read_html("http://www.wienerphilharmoniker.at/orchestra/members") %>%
  html_nodes(".ModSuiteMembersC") %>%
  html_text(trim=TRUE) %>%
  iconv("UTF-8") %>%
  gsub("[\r,\n,\t,*]"," ", .) %>%
  gsub("\\s+", " ", .) %>%
  gsub("([[:lower:]])([[:upper:]][[:lower:]])", "\\1 \\2", .) %>%
  paste(collapse=" ") %>%
  .[-18] %>%
  findGivenNames() -> wiener
read_html("http://www.laphil.com/philpedia/orchestra-roster") %>%
  html_nodes(".view-content") %>%
  html_text(trim=TRUE) %>%
  iconv("UTF-8") %>%
  gsub("\\s+", " ", .) %>%
  gsub("(?%
  .[1] %>%
  findGivenNames() -> laphil
read_html("http://nyphil.org/about-us/meet/musicians-of-the-orchestra") %>%
  html_nodes(".resp-tab-content-active") %>%
  html_text(trim=TRUE) %>%
  iconv("UTF-8") %>%
  gsub("[\r,\n]"," ", .) %>%
  gsub("\\s+", " ", .) %>%
  gsub("(?%
  findGivenNames() -> nyphil
urls=c("http://lso.co.uk/orchestra/players/strings.html",
       "http://lso.co.uk/orchestra/players/woodwind.html",
       "http://lso.co.uk/orchestra/players/brass.html",
       "http://lso.co.uk/orchestra/players/percussion-harps-and-keyboards.html")
sapply(urls, function(x)
{
  read_html(x) %>%
    html_nodes(".clearfix") %>%
    html_text(trim=TRUE) %>%
    iconv("UTF-8") %>%
    gsub("[\r,\n,\t,*]"," ", .) %>%
    gsub("\\s+", " ", .)
}) %>% paste(., collapse=" ") %>%
  findGivenNames() -> lso
read_html("http://www.osm.ca/en/discover-osm/orchestra/musicians-osm") %>%
  html_nodes("#content-column") %>%
  html_text(trim=TRUE) %>%
  iconv("UTF-8") %>%
  gsub("[\r,\n]"," ", .) %>%
  gsub("\\s+", " ", .) %>%
  findGivenNames() -> osm
rbind(c("berliner", "Berliner Philharmoniker"),
c("rco", "Royal Concertgebouw Amsterdam"),
c("spb", "St. Petersburg Philharmonic Orchestra"),
c("one", "Orquesta Nacional de España"),
c("leipzig", "Gewandhaus Orchester Leipzig"),
c("wiener", "Wiener Philarmoniker"),
c("laphil", "The Los Angeles Philarmonic"),
c("nyphil", "New York Philarmonic"),
c("lso", "London Symphony Orchestra"),
c("osm", "Orchestre Symphonique de Montreal")) %>% as.data.frame()-> Orchestras
colnames(Orchestras)=c("Id", "Orchestra")
list.files(getwd(),pattern = ".RDS") %>%
  lapply(function(x)
  readRDS(x) %>% as.data.frame(stringsAsFactors = FALSE) %>% cbind(Id=gsub(".RDS", "", x))
  ) %>% do.call("rbind", .) -> all
all %>% mutate(probability=as.numeric(probability)) %>%
  filter(probability > 0.9 & count > 15) %>%
  filter(!name %in% c("viola", "tuba", "harp")) %>%
  group_by(Id, gender) %>%
  summarize(Total=n())->all
all %>% filter(gender=="female") %>% mutate(females=Total) %>% select(Id, females) -> females
all %>% group_by(Id) %>% summarise(Total=sum(Total)) -> total
inner_join(total, females, by = "Id") %>% mutate(po_females=females/Total) %>%
  inner_join(Orchestras, by="Id")-> df
library(ggplot2)
library(scales)
opts=theme(legend.position="none",
           plot.background = element_rect(fill="gray85"),
           panel.background = element_rect(fill="gray85"),
           panel.grid.major.y=element_blank(),
           panel.grid.major.x=element_line(colour="white", size=2),
           panel.grid.minor=element_blank(),
           axis.title = element_blank(),
           axis.line.y = element_line(size = 2, color="black"),
           axis.text = element_text(colour="black", size=18),
           axis.ticks=element_blank(),
           plot.title = element_text(size = 35, face="bold", margin=margin(10,0,10,0), hjust=0))
ggplot(df, aes(reorder(Orchestra, po_females), po_females)) +
  geom_bar(stat="identity", fill="darkviolet", width=.5)+
  scale_y_continuous(labels = percent, expand = c(0, 0), limits=c(0,.52))+
  geom_text(aes(label=sprintf("%1.0f%%", 100*po_females)), hjust=-0.05, size=6)+
  ggtitle(expression(atop(bold("Women in Orchestras"), atop("% of women among members", "")))) +
  coord_flip()+opts

To leave a comment for the author, please follow the link and comment on their blog: Ripples.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)