Well I walk upon the river like it’s easier than land
(Love is All, The Tallest Man on Earth)
The Metropolitan Museum of Art provides here a dataset with information on more than 450.000 artworks in its collection. You can do anything you want with these data: there are no restrictions of use. Each record contains information about the author, title, type of work, dimensions, date, culture and geography of a particular piece.
I can imagine a bunch of things to do with these data but since I am a big fan of highcharter, I have done a treemap, which is an artistic (as well as efficient) way to visualize hierarchical data. A treemap is useful to visualize frequencies. They can handle levels, allowing to navigate to go into detail about any category. Here you can find a good example of treemap.
To read data I use
fread function from
data.table package. I also use this package to do some data wrangling operations on the data set. After them, I filter it looking for the word
SPANISH in the columns
Artist Nationality and
Culture and looking for the word
SPAIN in the column
Country. For me, any piece created by an Spanish artist (like this one), coming from Spanish culture (like this one) or from Spain (like this one) is Spanish (this is my very own definition and may do not match with any academical one). Once it is done, it is easy to extract some interesting figures:
- There are 5.294 Spanish pieces in The Met, which means a 1,16% of the collection
- This percentage varies significantly between departments: it raises to 9,01% in The Cloisters and to 4,83% in The Robert Lehman Collection; on the other hand, it falls to 0.52% in The Libraries and to 0,24% in Photographs.
- The Met is home to 1.895 highlights and 44 of them (2,32%) are Spanish; It means that Spanish art is twice as important as could be expected (remember that represents a 1,16% of the entire collection)
My treemap represents the distribution of Spanish artworks by department (column
Department) and type of work (column
Classification). There are two important things to know before doing a treemap with
- You have to use
treemappackage to create a list with your data frame that will serve as input for
hctreemapfails if some category name is the same as any of its subcategories. To avoid this, make sure that all names are distinct.
This is the treemap:
Here you can see a full size version of it.
There can be seen several things at a glance: most of the pieces are drawings and prints and european sculpture and decorative arts (in concrete, prints and textiles), there is also big number of costumes, arms and armor is a very fragmented department … I think treemap is a good way to see what kind of works owns The Met.
Feel free to use my code to do your own experiments:
library(data.table) library(dplyr) library(stringr) library(highcharter) library(treemap) file="MetObjects.csv" # Download data if (!file.exists(file)) download.file(paste0("https://media.githubusercontent.com/media/metmuseum/openaccess/master/", file), destfile=file, mode='wb') # Read data data=fread(file, sep=",", encoding="UTF-8") # Modify column names to remove blanks colnames(data)=gsub(" ", ".", colnames(data)) # Clean columns to prepare for searching data[,`:=`(Artist.Nationality_aux=toupper(Artist.Nationality) %>% str_replace_all("\\[\\d+\\]", "") %>% iconv(from='UTF-8', to='ASCII//TRANSLIT'), Culture_aux=toupper(Culture) %>% str_replace_all("\\[\\d+\\]", "") %>% iconv(from='UTF-8', to='ASCII//TRANSLIT'), Country_aux=toupper(Country) %>% str_replace_all("\\[\\d+\\]", "") %>% iconv(from='UTF-8', to='ASCII//TRANSLIT'))] # Look for Spanish artworks data[Artist.Nationality_aux %like% "SPANISH" | Culture_aux %like% "SPANISH" | Country_aux %like% "SPAIN"] -> data_spain # Count artworks by Department and Classification data_spain %>% mutate(Classification=ifelse(Classification=='', "miscellaneous", Classification)) %>% mutate(Department=tolower(Department), Classification1=str_match(Classification, "(\\w+)(-|,|\\|)")[,2], Classification=ifelse(!is.na(Classification1), tolower(Classification1), tolower(Classification))) %>% group_by(Department, Classification) %>% summarize(Objects=n()) %>% ungroup %>% mutate(Classification=ifelse(Department==Classification, paste0(Classification, "#"), Classification)) %>% as.data.frame() -> dfspain # Do treemap without drawing tm_dfspain <- treemap(dfspain, index = c("Department", "Classification"), draw=F, vSize = "Objects", vColor = "Objects", type = "index") # Do highcharter treemap hctreemap( tm_dfspain, allowDrillToNode = TRUE, allowPointSelect = T, levelIsConstant = F, levels = list( list( level = 1, dataLabels = list (enabled = T, color = '#f7f5ed', style = list("fontSize" = "1em")), borderWidth = 1 ), list( level = 2, dataLabels = list (enabled = F, align = 'right', verticalAlign = 'top', style = list("textShadow" = F, "fontWeight" = 'light', "fontSize" = "1em")), borderWidth = 0.7 ) )) %>% hc_title(text = "Spanish Artworks in The Met") %>% hc_subtitle(text = "Distribution by Department") -> plot plot