In scientific discovery, the first three paradigms were experimental, theoretical and (more recently) computational science. A new book of essays published by Microsoft (and available for free download — kudos, MS!) argues that a fourth paradigm of scientific discovery is at hand: the analysis of massive data sets. The book is dedicated to the late Microsoft researcher Dr Jim Gray, who pioneered the idea with the catchphrase: “It’s the data, stupid”. The basic idea is that our capacity for collecting scientific data has far outstripped our present capacity to analyze it, and so our focus should be on developing technologies that will make sense of this “Deluge of Data” (as this New York Times review of the book — well worth a read — calls it).
Dr Gray’s call-to-arms was not to develop isolated super-powerful super-computers but “to have a world in which all of the science literature is online, all of the science data is online, and they interoperate with each other.” This dream is already close to a reality in some scientific domains like astronomy, where advanced instruments routinely generate petabytes of data available for public analysis. And with further developments in distributed and high-performance computing, with freely-available high-scale data management tools like Hadoop, and with advanced open-source data-analysis tools like R rapidly adapting to the scales of these data sets, the fourth paradigm is certain to become a mainstream reality in other scientific domains as well.
Microsoft Research: The Fourth Paradigm: Data-Intensive Scientific Discovery