
Executive Summary
Ithaka S+R’s Research Support Services program explores current trends and support needs in academic research. Our most recent project in this program, “Supporting Big Data Research,” focused specifically on the rapidly emerging use of big data in research across disciplines and fields. As part of our study, we partnered with librarians from more than 20 colleges and universities, who then conducted over 200 interviews with faculty. These interviews provided insights into the research methodologies and support needs of researchers working across a wide range of disciplines.
This report provides a detailed account of how big data research is pursued in academic contexts, focusing on identifying typical methodologies, workflows, outputs, and challenges big data researchers face. Full details and actionable recommendations for stakeholders are offered in the body of the report, which offers guidance to universities, funders, and others interested in improving institutional capacities and fostering intellectual climates to better support big data research. Our key findings are grouped into the following areas:
- Tension and Interplay between Disciplinary and Interdisciplinary perspectives. Big data research is an interdisciplinary enterprise conducted by practitioners working in institutional settings that are still organized around disciplines. Divergent incentive structures, cultures, and unequal access to funding can affect disciplinary participation in big data research projects. Moreover, widespread use of methodologies from the computer and data sciences—most importantly a clear trend towards machine learning—has created tension among researchers and raised questions about the relative importance of disciplinary perspectives.
- Managing Complex Data. In an era of relative data abundance, researchers often avoid the expense of generating new data and instead opt to work with existing data whenever possible. The work of acquiring, cleaning, and organizing data is typically the most labor-intensive aspect of big data projects.
- Structures for Collaboration. Big data research is almost always a collective endeavor involving students, faculty, staff, and colleagues, clients, and collaborators from in and beyond higher education. Labs are the core units for research, and within them, students (both undergraduate and graduate) make significant contributions to the research process. Researchers often also favor local, lab-based computing resources over centralized campus storage and computing options, including cloud computing services.
- Sharing Knowledge. Although peer-reviewed articles remain the most highly incentivized form of scholarly communication, researchers are broadly committed to the open sharing of research outputs, including data and code. However, academic sharing practices reflect a spectrum that extends well beyond formal sharing in open repositories that meet FAIR standards of findability, accessibility, interoperability, and reusability, encompassing many types of informal sharing with colleagues.[1] Barriers to formal sharing include widespread perceptions that much data is either derivative, low quality, or gathered from sources that are inappropriate for open sharing.
- Ethical Challenges. The ethical dimensions of big data research remain contested, and some researchers are uncertain about best practices for ethical research conduct. Although IRB guidance is valued, some researchers expressed concerns that IRB regulations are not well adapted to new or evolving research methods.
- Support and Training. Researchers tend to favor informal training methods, such as internet tutorials, over formal training in big data methods. While such methods work well for solving immediate problems, they are less well suited to acquiring foundational knowledge, leaving the potential for blind spots in academic research.
Leave a Reply