Persistent Identifiers and Data Citation

Why use persistent identifiers?

Persistent identifiers (PID) are inevitable if you want to make your research data citable, findable and traceable in the long-term. A crucial advantage of using persistent identifiers instead of URLs is, that persistent identifiers always stay the same. Thus, citations and references on the persistent identifier remain valid, even if the location of web resource changes.

As Juha Hakala (2010) states in his  overview on persistent identifiers:

Persistent identifiers have several tasks, but perhaps the most important ones are that they render the traditional identifiers actionable in the Web, and provide persistent links to the resources. Using a PI, the user can trust that he or she will get the appropriate work, even if the physical location of its manifestation has changed.  In practice, the PI has to be mapped to an up-to-date locator or locators which facilitate access to physical manifestation(s) of the resource.

The most frequently used PID for research data in Psychology is the digital object identifier (DOI). Hence, we will in the following focus on this type of PID.

How to get a persistent identifier?

The central organization, that assigns DOIs to datasets is DataCite (i.e., it fulfills the same task as CrossRef for articles). Both, DataCite and CrossRef, are registry agencies of the International DOI Foundation (IDF). Since these agencies do not allow researchers to register DOIs themselves, researchers can only obtain a DOI for their data by depositing the data with a repository or data archive that assigns DOIs to its data via a registration agency.

Data Citation

Citing data is important, because it helps other researchers to find data that were used, and, subsequently, facilitates verifying results. Additionally, researchers using the data should adequately appreciate the work of data producers by citing the data. As researchers are measured by their output (e.g., the number of peer reviewed publications and the number of citations for those publications), data citation is an important incentive for researchers to share high-quality data which can be further exploited by other researchers with reasonable effort. The FORCE11 group (Future of Research Communications and e-Scholarship) published data citation principles which introduce a framework for data citation. As currently different data citation schemes are proposed, we focus on two exemplary schemes proposed by  DataCite  and the APA.

DataCite format:

Creator (PublicationYear). Title. Publisher. Identifier

 APA format (proposed on the APA style blog):

  • For datasets with persistent identifier:

Creator. (PulicationYear). Title [Dataset]. Identifier

  • For datasets without persistent identifier:

Creator. (PulicationYear). Title [Dataset]. Retrieved from URL

  • For datasets with associated materials:

Creator. (PulicationYear). Title [Dataset and Type of Associated Materials]. Identifier

  • APA style In-Text citations of datasets are Creator (PulicationYear) or (Creator, PulicationYear).

Furthermore, the Digital Curation Center provides a comprehensive guide on data citation and linking data citations to publications (Ball & Duke, 2015). In this context, the guide introduces both, the repositories’ and the researchers’, perspectives.


Further Resources

  • Have a look at the DOI foundation’s web site for further information on DOIs such as, factsheets of key issues concerning DOI, FAQs, information on registration agencies, and many more.