I have been meaning to write this for a while, but with the
data.table feud rising to new levels on Twitter the last couple of days, it all of a sudden seems more relevant. For those who don’t know what I am talking about, there are different ways of doing data science. There are the two major languages R and python, with their own implementations for analysing data. Then within R there are the different flavours of using the base language or applying the functions of the
tidyverse. Within the
tidyverse there is the
dplyr package to do data wrangling, of which the functionality of the
data.table package greatly overlaps. Each of the choices are wildly contested by users of both of the options. Oftentimes, these debates are first presented as objective comparisons between two options in which the favoured option clearly stands out. This then evokes fierce responses from the other camp and before you know it we are down to the point where we call each other’s choices not elegant or even ugly.
I hope to convince you that these debates are useless by looking into some of the underlying psychological principles that makes us vulnerable for this type of quarreling.
The main reason I think the discussion of the merits of different implementations is fruitless, is that the two camps can never fully understand each other. Sure, objective comparisons can be made in computation speed and to some extent functionality. We can read from the authors and maintainers what the motivation is for implementing something in a certain way. But the true prove of the pudding remains in the eating, that is enabling the user in effectively putting it to use. Before you can effectively and joyously apply a complex systems (which all these implementations are), you need to spend countless hours sweating and swearing, reading documents and googling error messages. Day after day, hour after hour, you will fight yourself into mastery of one of the systems. To be most effective and consistent almost everybody will have a go-to system, in which day-to-day task will be done. You don’t roll a dice in the mourning to decide if this is going to be an R or a python day. You don’t switch from
dplyr in the middle of an analysis, without a very good reason. You stick with what you know, because it will give you the answer you are after the quickest. There is a major path-dependency here. You initially start with one of the systems and with each time you use it your understanding and appreciation grows. Because of this you will keep using the system and so your love affair begins. Before you know it, you have a cognitive lock-in from your weapon-of-choice.
A big part of the appreciation for the system comes from your understanding of it. This understanding reliefs the large cognitive strain of doing data science a little bit. In the phenomenal Thinking Fast and Slow by Daniel Kahneman a full chapter is devoted to the topic of cognitive ease. It is shown that you get a good feeling out of things that are relatively easy to you. There is a very good evolutionary explanation for this; your brain consumes massive amounts of energy so parsimonious use of it is rewarded by feeling great. It is you body telling you, keep doing this not so hard thing, you are going to last a long time this way. Because the time you spent developing skills in your favourite system, looking at code from this system will give you a lot more cognitive ease than looking at code from a system you are far less familiar with. To understand code from the unfamiliar system at the same level as the familiar one, will require spending a lot of cognitive resources. This will be accompanied by negative emotions such as frustration, feeling tired, and even anger. It is then a very understandable but also a very silly mistake that these emotions are an indication that the unfamiliar system is poorly implemented or even ugly. However, it is ignorance, not the software being bad that caused these emotions.
From cognitive psychology we turn to social psychology. In the seventies Henri Tajfel developed the theory of social identity. Part of the way we view ourselves is determined by the groups we are part of. If a group we are part of does do well by some measure, we will personally start to feel better about ourselves even when we did not had any part in the achievement. Just look at the crowds that celebrate at a victory parade of the local sports teams. They did not spent a minute on the field, they might not even been at the stadium supporting the players, and still they experience the victory as theirs as well. Using a software system a good part of your waking hours, day after day, will inevitably lead to that system being part of your social identity. Gradually you are not just using the software, you are becoming the software as well.
On its own this is not a bad thing, as humans we need this sense of belonging. However, it will also change your behaviour, and oftentimea not for the good. In order to boost your self-esteem you want your “team” to be on the winning end. Each year when Stackoverflow shares the rises and falls of use of software languages, users of both R and python jubilantly share the results (both are growing year after year). While this is an objective development in which you had no part other than being one of the users, discussions about the merits of software systems have active user involvement. Probably the easiest way to make your team look better is to make the other team look worse. Just think about the tiring debates between fans of different sports teams to discuss which is the best. Or the endless mud throwing at political races. Flame wars are no difference, by mocking the other system we celebrate our own and we are part of the winning team.
Now, here is the good news. In sports and politics, the different parties are also objectively in a competition. They play a zero-sum game, in which one’s victory must mean the other’s defeat. We as a data science community are not in a zero-sum game and we often seem to forget it. Even when the ‘competing’ system is on the rise you can do your job effectively and with joy in the on you prefer. Instead of mocking each other we should be thankful for the wealth of options we have to do our jobs. When our primary system does not offer the functionality we are looking for, we might find it in the other. The different systems also can influence each other positively, functionality in other systems might inspire authors to make theirs more complete.
I have looked into two psychological mechanisms that I think do stir-up flame wars. I hope it will make you think again before posting a comment on Twitter or starting a heated discussion with a colleague that leads nowhere. We have several options to do our daily jobs and each of them has proven itself in practice. Each is used by at least tens of thousands of analysts and programmers, who use them to bring real value to real organisations. Mocking one of them is not only harmful, it is disrespectful. Only the brightest and most determined of our peers could create systems that are so complex, complete and fault free. They have committed thousands of hours, often unpaid, to serve the community because they care. Mocking their labour because you don’t properly understand the system they designed or because you want to feel better about yourself is ignorant, and you should refrain from it.