Research

Blog Article: Data sharing and hosting; Why and How?

It is a growing trend today in the scientific community to share one's research data with peers in a public repository. However, it wasn't always like this and this move towards a more 'open science' is a recent one. It has come in the backdrop of the increasing commercialization of scientific publishing in the last few decades and the dominance of a few mega publishing groups. This has led to 'science' becoming increasingly inaccessible to most scholars and institutions, especially from non Western countries. The rule of closed access journals and lack of data sharing practices have limited the reach of science among the general public. Apart from access, many scientific research are also prone to biases and weaknesses owing to lack of credibility of findings (no replication), disproportionate reportage of significant results only, authenticity of data and its insights, and lack of transparency of research work (Carter et al., 2019; Giolla et al., 2022). 

By sharing or hosting one's research data (or study protocol) to a public space, it benefits the researcher, sponsors, scientific community and the society as a whole (USGS, 2022). Most of the common data repositories today fulfill the basic function of providing secure and long term storage of research data. They also curate the data by managing, maintaining, validating, and adding value to research data (MQD, 2022). More importantly, the repositories provides the option to the contributors to share their data (or parts of their study) to the larger scientific community. Data sharing encourages more  collaboration between researchers and institutions, which can result in important new findings in that field of study. Sharing data increases data circulation and use within the scientific community by encouraging better transparency, enabling reproducibility of results, and informing the larger scientific community (BU Data Services, 2022; USGS, 2022).

 

For researchers looking to publish their manuscripts, most journals require some declarations by the authors regarding the availability of the research data. An increasing number of journals have even made this a mandatory part of the manuscript submission process. However, this does not mean that authors have to necessarily put up their research data for anyone to download or subscribe to expensive repositories. There are a number of online data repositories and content hosting platforms that allow you flexibility in how you want to share your data. Almost all repositories issue a DOI (or relevant URLs) for your datasets (required during journal submission process). However, it is up to you to restrict access or set embargoes based on other project limitations. A number of these repositories even allow hosting data sets temporarily (for journal submission process) or with options of completely anonymizing author details. A few Important details of a number of data repositories in use are given below along with the relevant links.

1. Harvard Dataverse (https://dataverse.harvard.edu/)

  • A free data repository open to all researchers from any discipline, both inside and outside of the Harvard community, where you can share, archive, cite, access, and explore research data. 

  • FREE, files up to 2.5GB, and provides unique data citation (includes DOI).

  • Sample: Data for Global refugee work rights

2. Dryad Digital Repository (https://datadryad.org/stash)

  • It is an open, easy-to-use, not-for-profit, community-governed data infrastructure that allows you to make research data discoverable, freely reusable, and citable.

  • $120 USD for first 20 GB (integrates with number of major publishers). Also provides stable DOI for data sets; almost seamless linking with ORCID.

  • Sample: Dataset on risk taking

3. 4TU. Research Data (https://data.4tu.nl/info/en/)

  • A data repository for technical-scientific research data; that stores the data in a permanent and sustainable manner. It supports Open Science and FAIR data principles to make research data Findable, Accessible, Interoperable and Reusable (FAIR). 

  • Data will have a persistent identifier, and will be citable; also has a license that guarantees openness and reusability.

  • Up to 5GB of data for FREE account; also has access to data curation, social sharing, community (slack workplace).

  • Sample: Supplementary data for publication

4. Zenodo (https://zenodo.org/)

  • Zenodo is a general-purpose open repository developed under the European OpenAIRE program and operated by CERN. It allows researchers to deposit research papers, data sets, research software, reports, and any other research related digital artefacts.

  • FREE and Open platform; For each submission, a persistent digital object identifier (DOI) is minted.

  • Sample: Twitter chatter database

5. Open Science Framework (https://osf.io/)

  • OSF promotes open, centralized workflows by enabling capture of different aspects and products of the research lifecycle, including developing a research idea, designing a study, storing and analyzing collected data, and writing and publishing reports or papers.

  • Free, open source project management tool developed and maintained by the Center for Open Science

  • Sample: Anonymized study protocol

6.  FigShare (https://figshare.com/)

  • A place to store, share, and cite your research outputs. Research outputs can include, but are not limited to, tabular data, images, video, presentations, posters, code, book chapters, and more.

  • FREE to upload (up to 20 GB) and access (in adherence to the principle of open data)

  • Sample: Dataset on sentiment analysis

7. Prospero (https://www.crd.york.ac.uk/prospero/)

  • International database of prospectively registered systematic reviews in health and social care, welfare, public health, education, crime, justice, and international development, where there is a health related outcome.

  • Key features from the review protocol are recorded and maintained as a permanent record. helps avoid duplication and reduce opportunity for reporting bias by enabling comparison of the completed review with what was planned in the protocol.

  • Sample: Umbrella review on depression

8. Neuro Morpho (https://neuromorpho.org/)

  • It is a centrally curated inventory of digitally reconstructed neurons and glia associated with peer-reviewed publications. Its main goal is to provide dense coverage of available reconstruction data for the neuroscience community.

  • To date, NeuroMorpho.Org is the largest collection of publicly accessible 3D neuronal reconstructions and associated metadata.

Other relevant resources for data hosting and sharing

References

  1. BU Data Services (2022). What is a data repository? Boston University. https://www.bu.edu/data/share/selecting-a-data-repository/

  2. Carter, E. C., Schönbrodt, F. D., Gervais, W. M., & Hilgard, J. (2019). Correcting for bias in psychology: A comparison of meta-analytic methods. Advances in Methods and Practices in Psychological Science, 2, 115-144.

  3. Mac Giolla, E., Karlsson, S., Neequaye, D. A., & Bergquist, M. (2022). Evaluating the Replicability of Social Priming Studies. https://psyarxiv.com/dwg9v/download?format=pdf

  4. Managing Qualitative Data. (2022). https://managing-qualitative-data.org/

  5. United States Geological Survey (USGS). (2022). Why Share Your Data. https://www.usgs.gov/data-management/why-share-your-data