Average dissertation and thesis length, take two

July 15, 2014
By

(This article was first published on R is my friend » R, and kindly contributed to R-bloggers)

About a year ago I wrote a post describing average length of dissertations at the University of Minnesota. I've been meaning to expand that post by adding data from masters theses since the methods for gathering/parsing the records are transferable. This post provides some graphics and links to R code for evaluating dissertation (doctorate) and thesis (masters) data from an online database at the University of Minnesota. In addition to describing data from masters theses, I've collected the most recent data on dissertations to provide an update on my previous post. I've avoided presenting the R code for brevity, but I invite interested readers to have a look at my Github repository where all source code and data are stored. Also, please, please, please note that I've since tried to explain that dissertation length is a pretty pointless metric of quality (also noted here), so interpret the data only in the context that they’re potentially descriptive of the nature of each major.

Feel free to fork/clone the repository to recreate the plots. The parsed data for theses and dissertations are saved as .RData files, 'thes_parse.RData' and 'diss_parse.RData', respectively. Plots were created in 'thes_plot.r' and 'diss_plot.r'. The plots comparing the two were created in 'all_plo.r'. To briefly summarize, the dissertation data includes 3037 records from 2006 to present. This differs from my previous blog by including all majors with at least five records, in addition to the most current data. The masters thesis data contains 930 records from 2009 to present. You can get an idea of the relative page ranges for each by taking a look at the plots. I've truncated all plots to maximum page ranges of 500 and 250 for the dissertation and thesis data, as only a handful of records exceeded these values. I'm not sure if these extremes are actually real data or entered in error, and to be honest, I'm too lazy to verify them myself. Just be cautious that there are some errors in the data and all plots are for informational purposes only, as they say…

-Marcus


Fig: Number of doctoral dissertations in the database by major.



Fig: Number of masters theses in the database by major.



Fig: Summary of page lengths of doctoral dissertations by major, sorted and color-coded by median. Boxes represent the median, 25th and 75th percentiles, 1.5 times the interquartile range as whiskers, and outliers beyond the whiskers. Number of records for each major are in parentheses.



Fig: Summary of page lengths of masters theses by major, sorted and color-coded by median. Boxes represent the median, 25th and 75th percentiles, 1.5 times the interquartile range as whiskers, and outliers beyond the whiskers. Number of records for each major are in parentheses.



Fig: Distributions of page lengths for all records separated as dissertations or theses.



Fig: Comparison of dissertation and thesis page lengths for majors having both degree programs in the database. Boxes represent the median, 25th and 75th percentiles, 1.5 times the interquartile range as whiskers, and outliers beyond the whiskers.



To leave a comment for the author, please follow the link and comment on his blog: R is my friend » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.