Data Sharing

Why share your data?

Almost any kind of research data that is created is of potential value for future scientific work. In order to act as an accessible resource, data has to be shared, and made accessible for others. This has several advantages such as, avoiding costs of another (duplicate) data collection, enhancing visibility and transparency of your research, the possibility of leading to innovative, new uses of the data, as well as, increasing the impact of your research. Sometimes, data sharing can also be required by funding agencies, institutions or scientific societies.

There are many ways to share your data which differ in formality and effort. Data can be deposited with a data archive, data center or data bank or with an institutional repository; they can be submitted to a journal as supplemental material to your publication; they can be published online on an institutional website or can be shared informally with your peers. Each of these ways has its advantages and disadvantages which have to be balanced for each individual project. You will find additional information on this issue in the knowledge base’s section on data archives and repositories. Please note that data sharing also involves numerous legal and documentary aspects which are outlined in other sections of the knowledge base (e.g. copyright, privacy, informed consent, file formats or codebooks).

The DGPs-Framework on Data Sharing

In order to formalize the practice of data sharing, the DGPs Recommendations on Data Management in Psychological Science (Schönbrodt, Gollwitzer, Abele-Brehm, 2016) introduce rights and duties for researchers that share their data – referred to as “data sharers” (p. 7f).

Two types of data sharing are introduced:

Sharing Data that are part of a publication Sharing Data after project completion
Time of Data Sharing Print publication After project is finished
Scope (Data to be Shared) All data that underlie published results (data, that were collected but are not analyzed in this publication, should be reported) All data (including data that were not used in print publications)
Scope (Project Type) All projects regardless of funding Publicly funded projects
Embargo Period Only in exceptional cases (p.9) Up to 5 years (longer periods possible but need to be justified)
Additional materials Actionable analysis scripts (reproducing published results) All materials that make data interpretable: Analysis scripts, codebooks, stimuli, data generating code (for simulation studies), measurement instruments, software etc.

Furthermore, rights and duties of data sharers are introduced:

  • Right of first use to the data: Definition of Embargo Periods is allowed. During an embargo period data is stored in a repository but cannot be accessed (while, in general, metadata and codebooks can be accessed during this period).
  • Right to know who uses the data and the purpose of data use (especially right to be informed on data reuse before a publication takes place)
  • Duty to enable meaningful secondary use
    • informed consent that allows for secondary data use
    • thorough and comprehensible documentation of the data
  • Duty to document start and end date of embargo periods
  • Restrictions on data access  should be avoided if possible. However, for highly sensitive data (see the knowledge base’s section on data privacy) restricting data-use is an option. The German term which is used in this context is scientific-use files, indicating that data may only be accessed by researchers (i.e., employees of a scientific organizations) and only for scientific (i.e., non-commercial) purposes.

Reporting of data selection or data processing

In general, you should specify information about the data selection procedures or data processing that have taken place, if you do not share the complete dataset. You should state if a subset of subjects or variables was provided. For example, you should state if:

  • you had to remove a set of demographic variables because of data protection issues (or recoded values of these variables),
  • you had to draw a subsample of subjects because some subjects did not provide informed consent for sharing (personal) data,
  • missing values were imputed, or
  • anonymization techniques like swapping or disturbing were used that alter the research data (see the knowledge base’s section on privacy)