Is the Tax Code the longest Title?

August 19, 2013

(This article was first published on Bommarito Consulting » r, and kindly contributed to R-bloggers)

  Last week, I shared that Dan Katz and I had finally published a draft of our paper, Measuring the Complexity of the Law: The U.S. Code.  We’d previewed this research on Computational Legal Studies years ago.  Since then, we’ve received great feedback and a number of questions.

  The most common question, even among legal professionals, is exactly what you’d guess – is the Tax Code (i.e., Title 26, I.R.C.) the longest Title?  The answer, in our opinion, is also what you might guess – it depends.  But first, let’s look at a few measures of Titles in the Code.

Element Count Distribution

Element Count Distribution

Section Count Distribution

Section Count Distribution

Tokens per Section Distribution

Tokens per Section Distribution

Token Count Distribution

Token Count Distribution

(These plots are all based on data from our Github repository and the source to reproduce them can be found in this R and ggplot2 gist.)

  What do we notice?  Title 26 is not the longest or biggest of any measure.  It doesn’t have the most words (Title 42), the most elements/sections (again, Title 42), or even the most words per section (Title 23).  So what can we say?

  • Are Titles the right unit of measure?  Titles are the first cut of the hierarchical categorization of the U.S. Code.  It is generally accepted that they do not always represent a cohesive body of law; for example, Title 42 – Public Health and Welfare, is an amalgamation of topics as diverse as commercial space transportation, farm housing, and healthcare.  However, with Acts, they are the most commonly discussed group.
  • Is any division of the Code atomic?  If you’ve read any statutory text, you are familiar with references or citations that incorporate definitions, rules, or other language.  If Title 26 and Title 42 are heavily interdependent through reference, does it make sense to compare them?  We believe the only proper way to do this is by incorporating measures of the network structure of the Code, visualized in part in the figure below.

  Hopefully, this discussion has piqued your interest in measuring legal complexity and raised your awareness around some common pitfalls.  If so, please give our paper a read and let us know what you think!

To leave a comment for the author, please follow the link and comment on their blog: Bommarito Consulting » r. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Mango solutions

plotly webpage

dominolab webpage

Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training




CRC R books series

Six Sigma Online Training

Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)