Reusing Datasets

Instead of collecting your own data, re-using already existing data (secondary data use) can be a quicker and more economical alternative. Among others, re-using existing data avoids unnecessary data collections and exploits the full potential of research data.

Searching for Data

Searching for data is similar to searching for literature, however, the information and search infrastructure is less extensive for datasets compared to the infrastructure for research literature. Nonetheless, there are various starting points to search for already existing research data:

  • The Registry of Research Data Repositories (www.re3data.org) allows researchers to search for research data repositories in their field. Subsequently, researchers are redirected to the relevant repository’s website and can search for datasets within its catalogue.
  • The registry agency DataCite assigns Digital Object Identifiers to datasets (it is therefore the counterpart of the agency CrossRef which assigns DOIs to text publications). DataCite’s services include a search function identifying all records of DataCite. Moreover, DataCite’s subagencies like da|ra provide their own search portals which allow to search with additional discipline-specific information.
  • PsychLinker (only German) provides a catalogue of psychology relevant links, which includes, amongst others, a research data section.
  • The Consortium of European Social Science data Archives (CESSDA) is currently developing a Products & Services Catalogue (PaSC) which will not hold any research data, but only their metadata and is going to direct researchers to the source/service where the data may be obtained. An overview of repositories from various disciplines is given by the Open Access Directory.
  • Supplemental material: Datasets are still commonly published as supplemental material for text publications. These are hosted directly by the publisher and not via Dryad or other repositories. Therefore datasets, which are published only as supplemental material of a text publication, are not identified as separate entities by DataCite. You can find them only via the respective text publication.
  • Personal or Institutional Homepages: Some datasets are also provided on institutional websites or personal homepages. Andersen, Prause and Silver (2011) provide a good overview on such sites.
  • Another search option is to identify potentially interesting datasets based on relevant publications and to request data underlying these publications from the author (informal data sharing). In this case, data terms of use should be established beforehand.

For further information on repositories, see also our knowledge base’s section on archives and repositories.

The next step is to evaluate the list of datasets that emerged from your search to determine weather these datasets meet your needs (data quality, collected constructs, etc.). A combination of datasets can also make good economic sense (either a combination of two already existing datasets or merging a new data collection with already existing data) and allows addressing new research questions.

In general, data management plans should include a statement which indicates whether re-using data was considered or not.

Guidance on Secondary Data Use

The DGPs’ Recommendations on Data Management in Psychological Science introduces duties for secondary data use (p. 9f). In general, valid data use should be ensured by contacting data depositors. Moreover, data depositors have to be contacted if a publication is planned. Additionally, the issue of co-authorship should be settled between data users and data depositors beforehand.

Secondary users duties are (DGPs, p. 9f):

  • to contact data depositors if a publication of their secondary research is planned and to provide information on the aim of the secondary research, its results and where these results will be published.
  • to cite the data they used.
  • to provide a reproducible script that includes all transformations and newly introduced variables of the secondary analysis. Original data itself should not be published for secondary analyses.
  • to offer a co-authorship to data depositors if secondary analyses complement or expand the research question of the original publication.

References