Long-term Archiving and Data Storage

Long-term archiving is the process of generating data, that are long-term accessible and intelligible. This involves the choice and provision of adequate metadata (see also the knowledge base’s section on metadata) and file formats (see also the knowledge base’s section on file formats), while complying with relevant policies (see also the knowledge base’s section on policies).

File Formats

During the project, storing data in proprietary formats may be sensible, if the software, that is required to open these formats, is available for the whole duration of the project. However, if you want to access your data in 10 years’ time, it is certainly not a good idea to store your data in a proprietary format only, as there may no longer be any software available to open the file. Thus, you should store your data in non-proprietary formats (e.g. csv) that are more likely to be accessible 10, 15 or 20 years from now. Nonetheless, the UK data archive also classifies proprietary formats, such as Excel Files, as recommendable formats if they are widely used. It is highly unlikely that these files will become inaccessible in the future, as free software to open these formats already exists (e.g. open office for word documents). Depending on the information that your data contains (e.g. highly sensitive personal data) you should consider encrypting your data (protect data by password or use encryption tools). However, if the password is lost the data will be lost too. Thus, encrypting data for long-term preservation is risky.

Media

The choice of media for long-term preservation is not trivial. Keep in mind that 20 years ago, researchers stored their data on floppy disks and that CDs may also suffer from problems 20 years from now. Thus, the UK data archive recommends to store at least two separate copies and to renew those copies every 2 to 5 years (to switch to new media etc.). Fortunately, repositories and data archives exist which are specialized in the long-term preservation of data (see also the knowledge base’s section on archives and repositories). For this reason, the DGPs Recommendations on Data Management in Psychological Science encourage researchers to store their data in trustworthy repositories.

References

  • Cooper, H. (2016). Ethical choices in research: Managing data, writing reports, and publishing results in the social sciences. Washington, DC, US: American Psychological Association. Retrieved from http://dx.doi.org/10.1037/14859-004

Further Resources