What is metadata?
Metadata are data about data. They offer information that helps to understand the meaning of a digital object, as well as its structure and the actions that are allowed on this object. Actually, you will probably already be producing a large amount of metadata without knowing what metadata actually is.
Imagine you receive a data matrix, which is saved as plain text file, without any further information. What does the 99 in the first row mean? Is it a missing value, the age of a person, an intelligence score, the number of correct words or an ID? Obviously, someone cannot decipher the information that is represented by research data without sufficient knowledge on the data. Metadata is information that recipients require in order to make sense of data and to understand how it can be used. Thus, metadata can be regarded as a kind of documentation. Ideally, this documentation should begin with your research project and should be carried on through its course. Implementing documentation procedures at an early stage in the research process saves time and effort, since the reconstruction of information in a later project stage requires considerable resources (or, at worst, might be impossible).
Before you decide which metadata will be documented, you should define the target audience and the purpose that your metadata will serve. If the documentation is solely done for researchers that are part of your project (e.g. you want to ensure that you will understand the data you created 5 years from now), requirements of your documentation will differ from requirements of a documentation aimed at external researchers from your field or external researchers who are not familiar with your field.
Metadata can be collected for single variables, the whole data set or the whole data collection/study. For every object to be documented there are different aspects that should be considered (e.g. variable: variable name, label, value range, instrument used; data set: file version, file changes, checksum; study: date of data collection, study background, principal investigator, etc.). Depending on your target audience, multilingual documentation may be required.
In general, we recommend to include at least the title and an abstract of your work in your native language and English. This will give a broad audience access to your work. However, in most cases your data will only be interpretable for the international community if the information of your codebook is also available in English.
The NISO, a non-profit association, which develops standards for information management, proposed distinguishing three different types of metadata in their guidelines to understanding metadata (pdf) (Riley, 2017):
- Administrative Metadata provide information on managing a resource, such as when and how it was created, file type and other technical information, and access rights. There are several subsets of administrative data; the following two are sometimes listed as separate metadata types:
- Rights management metadata concern intellectual property rights connected to the data,
- Preservation metadata contain information needed to archive and preserve a resource.
- Structural Metadata indicate how related objects are put together, for example, how pages are ordered to form chapters.
- Descriptive Metadata describe a resource for the purpose of discovery or identification. They can include elements such as title, abstract, author, and keywords.
Psychological researchers generate all these types of metadata.
You can find information about which metadata should be created for your project according to different sources:
- Data Archive: Some data archives have their own set of mandatory metadata that have to be provided by the data depositor. If you are depositing your data with an archive, this archive may offer support in metadata creation.
- Developing your own metadata structure offers a lot of flexibility but, unfortunately, the metadata structure that you developed for your own study will probably not be interoperable with other groups’ metadata structures. Thus, it will be hard to interconnect your scientific finding with those of other researchers on a data-level. In order to guarantee interoperability and to provide searchable information, metadata standards have been developed. Whenever possible, you should map your metadata to one of these standards.
- Metadata reporting standards: There are several standards for metadata (for datasets) that offer a well-defined syntax. These can be used to create metadata. The only domain-specific established metadata standard in the social sciences is the Data Documentation Initiative (DDI)-standard. Different versions of the DDI-standard exist. DDI-2.X (also referred to as DDI-Codebook or DDI-C) and DDI-3.X (also referred to as DDI-Lifecycle or DDI-L). Currently, DDI-4 is under development. Unfortunately, basic knowledge of xml-file-creation is needed to create DDI-compliant files which may be perceived as a drawback by most psychological researchers. This may be perceived as a burden by most psychological researchers. However, several tools and initiatives exist which ease the creation of DDI-compliant metadata files: Colectica can be used as plugin for Excel while Nesstar Publisher is a standalone software that can import data from statistical packages and create DDI compliant documentation. The Dublin Core standard provides 15 core metadata elements which can be easily applied to every kind of resource. Interoperability of metadata refers to the ability to exchange metadata and to convert metadata to other metadata standards, for further information see Neiswender & Montgomery (2009). For example, there is a set of rules that define relationships between Dublin Core metadata and DDI metadata.
DataWiz assists you in describing your data by providing its own metadata scheme for describing psychological data, based on the PsychData metadata standard.
Which information should be included in your documentation?
Despite the lack of psychological data standards, reporting standards exist that offer guidance on how to document data collection procedures, study design, measurement instruments or interventions. These reporting standards may be helpful when you want to document your study in the context of your data collection:
- Guidance on Reporting (Experimental) Studies
- American Psychological Association’s Journal Article and Reporting Standard (JARS) is currently the most relevant standard for reporting psychological studies. It incorporates detailed information on how to describe methodical parts of a paper.
- The APA publication manual which is wider in scope than the JARS can also serve as guidance for your documentation.
- The CONSORT statement is probably the guideline which has the greatest impact in the field of health sciences. CONSORT stands for Consolidated Standards of Reporting Trials and it gives evidence-based recommendations for reporting randomized trials. It entails a 25-item checklist (report design, analysis and interpretation of the trial), as well as a flow diagram (flow of all participants through the trial). Various extensions of the CONSORT statement are already available or under development.
- Additionally, CONSORT-SPI, an extension for randomized controlled trials of social and psychological interventions is currently under development.
- Study Design Reporting: The SPIRIT statement, which provides checklists on various aspects of clinical trial protocols, may serve as guidance for reporting your study design.
- Pre-Registration: van’t Veer und Giner-Sorolla (2016) published recommendations on the pre-registration of studies in social psychology. Additional materials are available on the corresponding OSF project.
- MRI Data: Guidance by the Organization for Human Brain Mapping (OHBM) Committee on Best Practice in Data Analysis and Sharing (COBIDAS)
- EEG Data: (German) Empfehlungen zur Erzeugung und Dokumentation von EEG Daten [Recommendations on the Generation and Documentation of EEG data] of the Deutsche Gesellschaft für Klinische Neurophysiologie und Funktionelle Bildgebung.
- Measurement Instruments: The RatSWD – German Data Forum published a working paper on Quality Standards for the Development, Application, and Evaluation of Measurement Instruments in Social Science Survey Research
- Meta-analyses: JARS also provides guidance on reporting meta-analyses.
- A JARS adaption for qualitative data is currently under development. You can access the preprint here.
- Other guidelines can be retrieved from the equator network which maintains a directory of guidelines on health sciences with more than 280 entries.
- biosharing.org curates information on inter-related data standards, databases and policies (in life, environmental and biomedical sciences).
- The RDA Metadata Standards Directory Working Group is composed of individuals and organizations that are involved in developing, implementing and using metadata for scientific data. They are currently developing a collaborative and open directory of metadata standards.
- Neiswender, C., & Montgomery, E. (2009). Metadata Interoperability—What Is It, and Why Is It Important?. In Stocks, K. I., Neiswender, C., Isenor, A. W., Graybeal, J., Galbraith, N., Montgomery, E. T., Alexander, P., Watson, S., Bermudez, L., Gale, A., & Hogrefe, K. (Eds.), The MMI Guides: Navigating the World of Marine Metadata (p. 11-15). Retrieved from http://uop.whoi.edu/techdocs/presentations/MMI_Guides.pdf.
- Riley, J. (2017). Understanding Metadata: what is metadata, and what is it for?. Retrieved from
- Van ’t Veer, A. E., & Giner-Sorolla, R. (2016). Pre-registration in Social Psychology – A discussion and suggested template. Retrieved from https://osf.io/4frms/