Skip to Main Content

Data Management Guide

An overview for researchers on how to collect, manage, organize, and preserve data at all stages of the research process.

Project, experiment, and description

Common Issues Best Practices
Lab instruments and software are proprietary, file types are proprietary, data can only be read/analyzed on original instrument Use most common machines/software for discipline
Lab instruments and software are designed to run on older operating systems Keep copies of the OS and software, be aware of compatibility issues caused by software updates
Machines and software are not backwards compatible Avoid upgrading software systems
Exported/converted data cannot be manipulated (data loss) Save both original, uncompressed data files and exported data
Manufacturer mergers and acquisitions can lead to discontinued/unsupported products  

 

Documentation, organization, and storage

Data documentation and organization
Common Issues Best Practices
Inconsistently labeled files and folders Use file naming conventions
Poor version control Use version control software, or dates in file/folder names
Files saved in multiple locations Use permanent data identifiers to prevent duplication
Metadata is missing crucial information Use disciplinary metadata standards
No team protocols or procedures Document workflow, build a data dictionary, write data documentation procedures

 

Data storage
Common Issues Best Practices
Digital data are stored on local hard drives and not backed up Back up 3 copies of everything (original + external/local + external/remote)
Data are stored in the cloud owned by private sector  Use both hard drive and cloud storage
Machines are set to override data to clear space Change settings or increase storage capacity
Files are kept on common drives Establish access levels for files and folders
Specimens, samples, etc. are not secure; paper lab notebooks can be damaged or lost Have policy on keeping hard copies/specimens physically secure
Data have to be shared with remote collaborators/rotating lab personnel Train personnel on their roles in creating, storing, and taking responsibility for data security

 

Access, sharing, and reuse

Common Issues Best Practices
Sensitive data are not encrypted or anonymized Anonymize data using a random ID generator (not subject or experiment characteristics)
Patient consent forms do not cover re-use, re-purposing, or sharing Obtain permission from participants to make data publicly available
Copyright material is used w/o permission to distribute derivatives Review all relevant federal and state laws; seek copyright permissions
Certain data cannot be deposited w/o violating patient privacy Encrypt and store data for verification purposes only
Certain data cannot be shared for national security reasons/trade secrets/patents Encrypt and store data for verification purposes only

 

Archiving

Common Issues Best Practices
Researcher keeps the only copy of the data Make copies of the data
Others can access the data only be personal request (Researcher gets to vet uses) Place in an institutional or disciplinary repository so it can be discovered and accessed
Data is shared, but without metadata or instructions that make it possible for others to re-use or understand Include workflows and metadata
Process for accessing data is complicated or confusing Set permissions while depositing data

Placing your work in a public repository comes with added benefits, like fixity checks, metadata assistance, format migration, permissioning, backups, and increased discoverability.