A growing community of scientists from a variety of disciplines is moving the norms of scientific research toward open practices. Supporters of open science hope to increase the quality and efficiency of research by enabling the widespread sharing of datasets, research software source code, publications, and other processes and products of research. The speed at which the open science community seems to be growing mirrors the rapid development of technological capabilities, including robust open source scientific software, new services for data sharing and publication, and novel data science techniques for working with massive datasets. Organizations like rOpenSci harness such capabilities and deploy various combinations of these research tools, or what I refer to here as open science infrastructures, to facilitate open science.
As studies of other digital infrastructures have pointed out, developing and deploying the technological capabilities that support innovative work within a community of practitioners constitutes just one part of making innovation happen. As quickly as the technical solutions to improving scientific research may be developing, a host of organizational and social issues are lagging behind and hampering the open science community’s ability to inscribe open practices in the culture of scientific research. Remedying organizational and social issues requires paying attention to open science infrastructures’ human components, such as designers, administrators, and users, as well as the policies, practices, and organizational structures that contribute to the smooth functioning of these systems.12 These elements of infrastructure development have, in the past, proven to be equal to or more important than technical capabilities in determining the trajectory the infrastructure takes (e.g., whether it “succeeds” or “fails”).34.
As a postdoc with rOpenSci, I have begun a qualitative, ethnographic project to explore the organizational and social processes involved in making open science the norm in two disciplines: astronomy and ecology. I focus on these two disciplines to narrow, isolate, and compare the set of contextual factors (e.g., disciplinary histories, research norms, and the like) that might influence perceptions of open science. Specifically, I aim to answer the following research questions (RQ):
RQ1a: What are the primary motivations of scientists who actively engage with open science infrastructures?
RQ1b: What are the factors that influence resistance to open science among some scientists?
RQ2: What strategies do open science infrastructure leaders use to encourage participation, govern contributions, and overcome resistance to open science infrastructure use?
a. To what extent do governance strategies balance standardization and flexibility, centralization and decentralization, and voluntary and mandatory contributions?
b. By what mechanisms are open science policies and practices enforced?
c. What are the commonalities and differences in the rationale behind choices of governance strategies?
Below, I describe how I am systematically investigating these questions in two parts. In Part 1, I am identifying the issues raised by scientists who engage with or resist the open science movement. In Part 2, I am studying the governance strategies open science leaders and decision-makers use to elicit engagement with open science infrastructures in these disciplines.
Part 1: Engagement with and Resistance to Open Science
I am firmly rooted in a research tradition which emphasizes that studying the uptake of a new technology or technological approach, no matter the type of work or profession, begins with capturing how the people charged with changing their activities respond to the change “on the ground.” In this vein, Part 1 of the study aims to lend empirical support or opposition to arguments for and against open science that are commonly found in opinion pieces, on social media, and in organizational mission statements. A holistic reading of such documents reveals several commonalities in the positions for and against open science. Supporters of open science often cite increased transparency, reproducibility, and collaboration as the overwhelming benefits of making scientific research processes and products openly available. Detractors highlight concerns over “scooping,” ownership, and the time costs of curating and publishing code and data.
I am seeking to verify and test these claims by interviewing and surveying astronomers and ecologists or, more broadly, earth and environmental scientists who fall on various parts of the open science engagement-to-resistance spectrum. I am conducting interviews using a semi-structured interview protocol5 across all interviewees. I will then use a qualitative data analysis approach based on the grounded theory method6 to extract themes from the responses, focusing on the factors that promote engagement (e.g., making data available, spending time developing research software, or making publications openly accessible) or resistance (e.g., unwillingness to share code used in a study or protecting access to research data). Similar questions will be asked at scale via a survey.
Armed with themes from the responses, I will clarify and refine the claims often made in the public sphere about the benefits and drawbacks of open science. I hope to develop this part of the study into actionable recommendations for promoting open science, governing contributions to open science repositories, and addressing the concerns of scientists who are hesitant about engagement.
Part 2: Focusing on Governance
Even with interviews and surveys of scientists on the ground, it is difficult to systematically trace and analyze the totality of social and political processes that support open science infrastructure development because the processes occur across geographic, disciplinary, and other boundaries.
However, as others have pointed out,7 the organizational and social elements of digital infrastructure development often become visible and amenable to study through infrastructure governance. Governance refers to the combination of “executive and management roles, program oversight functions organized into structures, and policies that define management principles and decision making.”8 Effective governance provides projects with the direction and oversight necessary to achieve desired outcomes of infrastructure development while allowing room for creativity and innovation.2910 Studying a project’s governance surfaces the negotiation processes that occur among stakeholders—users, managers, organizations, policymakers, and the like—throughout the development process. Outcomes include agreements about the types of technologies used, standards defining the best practices for technology use, and other policies to ensure that a robust, sustainable infrastructure evolves.911
Despite the scientific research community’s increasing reliance on open science infrastructures, few studies compare different infrastructure governance strategies2 and even fewer develop new or revised strategies for governing infrastructure development and use.12 The primary goal of this part of the project is to address this gap in our understanding of the governance strategies used to create, maintain, and grow open science infrastructures.
I am administering this part of the study by conducting in-depth, semi-structured interviews with leaders of various open science infrastructure projects supporting work in astronomy and ecology. I define “leaders” in this context as individuals or small groups of individuals who make decisions about the management of open science infrastructures and their component parts. This set of leaders includes founders and administrators of widely-used scientific software packages and collections of packages, of open data repositories, of open access publication and preprint services, and various combinations of open science tools. Furthermore, I intend to interview the leaders of organizations with which the open science community interacts—top publication editors, for example—to gauge how open science practices and processes are being governed outside of active open science organizations.
I will conduct qualitative coding as described above to develop themes from the responses of open science leaders. I will then ground these themes in the literature on digital infrastructure governance—which emphasizes gradual, decentralized, and voluntary development—and look for avenues to improve governance strategies.
Alongside the interview and survey methods, I am actively observing and retaining primary documents from the ongoing discourse around open science in popular scientific communication publications (e.g., Nature and Science), conferences and meetings (e.g., OpenCon and discipline-specific hackweeks), and in the popular media/social media (e.g., The New York Times and Twitter threads).
I entered this project with a very basic understanding of how open science “works”—the technical and social mechanisms by which scientists make processes and outputs publicly available. In learning about the open science movement, in general and in particular instantiations, I’ve begun to see the intricacies involved in efforts to change scientific research and its modes of communication: research data publication, citation, and access; journal publication availability; and research software development and software citation standards. Within the community trying to sustain these changes are participants and leaders who are facing and tackling several important issues head-on. I list some of the most common engagement, resistance, and governance challenges appearing in interview and observation transcripts below.
- Overcoming the fear of sharing code and data, specifically the fear of sharing “messy” code and the fear of being shamed for research errors.
- Defending the time and financial costs of participation in open science—particularly open source software development—to supervisors, collaborators, or tenure and promotion panels who are not engaged with open science.
- Finding time to make code and data usable for others (e.g., through good documentation or complete metadata) and, subsequently, finding a home where code and data can easily be searched and found.
- Navigating the issue of convincing researchers that software development and data publication/archiving “count” as research products, even though existing funding, publication, and tenure and promotion models may not yet value those contributions.
- Developing guidelines and processes for conducting peer review on research publication, software, and data contributions, especially the tensions involved in “open review.”
- Deciding whose responsibility it is to enforce code and data publication standards or policies, both within open science organizations and in traditional outlets like academic journals.
The points raised in this post and the questions guiding my project might seem like discussions you’ve had too many times over coffee during a hackathon break or over beers after a conference session. If so, I’d love to hear from you, even if you are not an astronomer, an ecologist, or an active leader of an open science infrastructure. I am always looking for new ideas, both confirming and disconfirming, to refine my approach to this project.
Braa, J., Hanseth, O., Heywood, A., Mohammed, W., Shaw, V. 2007. Developing health information systems in developing countries: The flexible standards strategy. MIS Quarterly, 31(2), 381-402. https://doi.org/10.2307/25148796 ↩
Borgman, C. L. 2010. Scholarship in the digital age: Information, infrastructure, and the Internet. MIT Press, Cambridge, MA. ISBN: 9780262250863 ↩
Vaast, E., Walsham, G. 2009. Trans-situated learning: Supporting a network of practice with an information infrastructure. Information Systems Research, 20(4), 547-564. https://doi.org/10.1287/isre.1080.0228 ↩
Spradley, J. P. (2016). The ethnographic interview. Longegrove, IL: Waveland Press. ISBN: 0030444969 ↩
Barrett, M., Davidson, E., Prabhu, J., Vargo, S. L. 2015. Service innovation in the digital age: Key contributions and future directions. MIS Quarterly, 39(1) 135-154. DOI: 10.25300/MISQ/2015/39:1.03 ↩
Hanford, M. 2005. Defining program governance and structure. IBM developerWorks. Available at: https://www.ibm.com/developerworks/rational/library/apr05/hanford/. ↩
Star, S. L., Ruhleder, K. 1996. Steps toward an ecology of infrastructure: Design and access for large information spaces. Information Systems Research, 7(1), 111-134. https://doi.org/10.1287/isre.7.1.111 ↩
Edwards, P. N., Jackson, S. J., Bowker, G. C., Knobel, C. P. 2007. Understanding infrastructure: Dynamics, tensions, and design. Final report for Workshop on History and Theory of Infrastructure: Lessons for New Scientific Cyberinfrastructures. NSF Report. Available at: https://deepblue.lib.umich.edu/handle/2027.42/49353. ↩
Hanseth, O., & Lyytinen, K. (2010). Design theory for dynamic complexity in information infrastructures: the case of building internet. Journal of Information Technology, 25(1), 1-19. ↩