Why I don’t like Dynamic Typing

February 25, 2012
By

(This article was first published on Win-Vector Blog » R, and kindly contributed to R-bloggers)

A lot of people consider the static typing found in languages such as C, C++, ML, Java and Scala as needless hairshirtism. They consider the dynamic typing of languages like Lisp, Scheme, Perl, Ruby and Python as a critical advantage (ignoring other features of these languages and other efforts at generic programming such as the STL).

I strongly disagree. I find the pain of having to type or read through extra declarations is small (especially if you know how to copy-paste or use a modern IDE). And certainly much smaller than the pain of the dynamic language driven anti-patterns of: lurking bugs, harder debugging and more difficult maintenance. Debugging is one of the most expensive steps in software development- so you want incur less of it (even if it is at the expense of more typing). To be sure, there is significant cost associated with static typing (I confess: I had to read the book and post a question on Stack Overflow to design the type interfaces in Automatic Differentiation with Scala; but this is up-front design effort that has ongoing benefits, not hidden debugging debt).

There is, of course, no prior reason anybody should immediately care if I do or do not like dynamic typing. What I mean by saying this is I have some experience and observations about problems with dynamic typing that I feel can help others.

I will point out a couple of example bugs that just keep giving. Maybe you think you are too careful to ever make one of these mistakes, but somebody in your group surely will. And a type checking compiler finding a possible bug early is the cheapest way to deal with a bug (and static types themselves are only a stepping stone for even deeper static code analysis).For my examples I will pick on the programming language R (which we have used and written about in the past).

One of the supposed advantages of dynamically typed languages is that “everything is a macro.” That is you write a function and it is really a template that specializes and works over many different data types. For example: suppose we decided to write our own function to compute sample variance in R:

variance <- function(x) {
   n <- length(x)
   sumX <- sum(x)
   sumXX <- sum(x*x)
   (n/(n-1))*(sumXX/n - (sumX/n)*(sumX/n))
}

This works great and even matches the built-in funciton var():

> variance(c(1000000,2000000,3000000,4000000,5000000))
[1] 2.5e+12
> var(c(1000000,2000000,3000000,4000000,5000000))
[1] 2.5e+12

That is it works until we (either knowingly or unknowingly) apply the function to data of a different type:

> variance(as.integer(c(1000000,2000000,3000000,4000000,5000000)))
[1] NA
Warning message:
In x * x : NAs produced by integer overflow

Our macro specialized to calculate over the integers when given integer arguments and then fails due to overflow. Here it is obvious, but in a dynamically typed language we don’t always know the type of what we are passing in as we may have gotten the value from somewhere else. If we define variance() as a function over doubles in a statically typed language then the language would force either an explicit (programmer supplied) or implicit (language supplied) coercion when attempting to use the function on a vector of integers. The problem is: it is a bigger responsibility to write a correct macro (as the macro has to work over more possible types than a simple function). The dynamic language pushes this onto us and sometimes we get burnt and sometimes everything is okay. This sort of consideration is one of the reasons functional programing advocates prefer anonymous functions to declaring on the fly classes: less is possible so it is easier to safely implement what is implied.

Some of the problem can be dispelled with test driven development. I am proponent of test driven development, so much so that I don’t want to waste my valuable test budget testing for things that a decent type system can defend against. Also, by starting broad (assuming it is fair to re-use a function on many different types of arguments) you have entered into a bad bargain where you either have to document what subset of arguments the function works properly on (which is essentially declaring types!), add extra defensive code to cast the arguments on the way in (a waste, and needlessly defensive coding brings in its own problems) or write enough tests to document proper function on a whole bunch of types you don’t actually care about (char, byte, short int …)). Unexpected properties of real world data will throw you enough testing and debugging challenges (for example: the effect of unexpected constant data in bad quicksort implementations) that you don’t need additional hidden challenges that a static type system could exclude.

My second complaint is that most dynamically typed languages go further and force the horrible anti-pattern of automatic (or zero-declaration) variables on us. Since we are not, in a dynamically typed language, required to declare type- it is considered a waste to force the user to declare variables at all (statements like “var colTypeClass“). This argument is seductive because another supposed advantage of dynamically typed languages is conciseness, and variable declarations appear to have little value if you are not declaring types. However consider the following code:

sqlColType <- function(colTypeName) {
   colTypeClass <- 'unhandled'
   if(colTypeName %in% list('smallint','integer','bigint','decimal','numeric','real','double precision','serial','bigserial','money')) {
      colTypeClass <- 'numeric'
   } else if(colTypeName %in% list('character varying','character','text','boolean')) {
      colTypeClass <- 'categorical'
   } else if(colTypeName %in% list('interval','date')) {
      colTypeGlass <- 'temporal'
   } else if(length(grep('time',colTypeName))>0) {
      colTypeClass <- 'temporal'
   }
   colTypeClass
}

This code (for better or for worse, and at some point we all have to write or use something this ugly) is attempting to map specific SQL column type names into broad classes of types (numeric, categorical and temporal). However there is a typo-bug in the above code that is only possible in a language with automatic variable declaration. Consider the following to applications of sqlColType():

> sqlColType('integer')
[1] "numeric"
> sqlColType('date')
[1] "unhandled"

The first result is as designed and the second is wrong. What happened is in the if-block where “date” should have been identified we accidentally spelled “Class” with a “G” and the result we meant to return was trapped in a shiny new automatic variable that never escapes the function. You may consider this particular bug unlikely, but in a language without automatic variable declaration it is literally impossible. And you don’t even have to actually have this bug in your code to suffer from it. This mistake is something you have to check for when inspecting/debugging faulty code (because you have not pre-guarantee it can not happen).

My third complaint is the common lack of significant refactoring tools for dynamically typed languages. The ability to automatically apply larger scale meaningful code changes (such as when using Eclipse’s Java development environment) is big. Dynamic type advocates would argue that most of the successful refactorings are just the IDE shepherding around type cruft that is not present in a dynamic language. This is not true. In addition to the trivial code motion and package management there are significant code transformations: method extraction, method signature alteration and safe variable renaming just to name three. It is a real luxury to work with a system that can safely rename a variable (and all of its references) even when there are other strings and variables using the same token. It is also a luxury to work in teams where nobody can say “yeah, we wanted to remove that argument from the method- but nobody has time to update and test all of the consumers.” Most dynamic languages don’t even have the very clever “poor man’s refactoring” (change the method declaration, attempt a re-compile and then insert changes everyplace the compiler flags an error). When changing a method signature in a typical dynamically typed language you are typically left with the lurking worry that some bit of code somewhere is still attempting to use the old signature and will exhibit a runtime error when the exact set of circumstances required to execute the bad path happen in production (i.e. that you won’t be lucky enough to find it in a test). IDEs have a somewhat dirty reputation as being a crutch (somewhat due to horrible interface builders and large boilerplate systems), but the treatment of code as an object subject to a series of meaningful transformations is game changing (and is most commonly associated with statically typed languages, somewhat by historic accident but also likely due to the presence of extra declaration blocks often in statically typed languages and not due to the actual type system itself).

To sum up: dynamic typing allows more expressive code and saves space. But we pay a large cost downstream in more expensive debugging and much weaker ability to refactor or analyze. I favor the compromise where most code is statically typed and either only language supplied functions are capable of dynamic typing or there are user escapes out (like templating). While there is some doubt as to whether you can design a language as powerful as Scheme or Python without dynamic typing (some attempts have failed and some attempts are still evolving) I still prefer static typing. Or (more accurately) I prefer to deal with statically typed code (and am willing to put up with some expense to have it). Initial coding is not the only phase of the software lifecycle.

Related posts:

  1. Programmers Should Know R
  2. Automatic Differentiation with Scala
  3. My Favorite Graphs

To leave a comment for the author, please follow the link and comment on his blog: Win-Vector Blog » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , , , , , ,

Comments are closed.