Data preparation and transfer
Requirements
R14. Directors / PIs shall ensure that measures are in place to protect the confidentiality of study participants and the security of data sets when they are shared with, or analysed on behalf of, new users, and that practice complies with legal and regulatory requirements, MRC policies and relevant best practice.
R15. Studies must ensure that metadata documentation, a metadata catalogue or personnel with relevant knowledge and expertise can support the reasonable understanding and use of study datasets by new and external researchers.
R16. Studies must document data transfers and ensure that the data and accompanying documentation (metadata) are prepared to the agreed standards.
Expectations
1. Studies create and retain a record of how they prepared the data, which data have been transferred, when, via what media, and whether data were encrypted.
2. Studies document significant resources they devote to data preparation, metadata, other kinds of documentation and transfer.
3. Studies should aim to provide requested data in accessible or standardly used file formats.
Further good practice
1. The extracted data set is checked for small cell sizes (counts) and appropriate action taken (for example, values might be merged or blurred or suppressed).
2. A clear policy specifies when pseudo-identifiers should be used and when study/sample identifiers can be re-used.
3. Where this enhances data security, studies replace study/sample identifiers with unique pseudo-identifiers to limit the risk of re-identification.
4. These pseudo-identifiers are normally re-used for supplementary releases to the same external party so that recipients can link successive released data sets and to facilitate repeat access to samples in the future.
5. Specific data releases can be identically regenerated if necessary, to enable reproduction of particular results or for research governance investigations.
Resources
The eight Principles of the Data Protection Act.
Secure safe haven architectures can facilitate containment of datasets within a controlled and secure environment, reducing unnecessary replication and enabling detailed audit of data access.
Tools that support deductive disclosure risk assessment and mitigation include:
- Anonymisation - Measuring the disclosure risk of the International Household Survey Network.
- SUDA – a program for detecting special uniques by the University of Manchester.
The UK Data Archive has guidance on preparing research data for sharing, in particular: