Guidance for researchers: Developing a data management and sharing plan
These guidelines are intended to assist all applicants in developing effective and proportionate approaches to maximise the value of their research data. They set out the circumstances under which data management and sharing plans are required, and our expectations for the issues that these plans should address.
- What data outputs will your research generate and what data will have value to other researchers?
- When will you share the data?
- Where will you make the data available?
- How will other researchers be able to access the data?
- Are any limits to data sharing required - for example, to either safeguard research participants or to gain appropriate intellectual property protection?
- How will you ensure that key datasets are preserved to ensure their long-term value?
- What resources will you require to deliver your plan?
1. Our position statement on data management and sharing requires that all of our funded researchers maximise the availability of their research data with as few restrictions as possible. When developing research proposals, we expect all researchers to consider their approach for managing the data they will generate.
2. If you are applying via the Trust's biomedical sciences or medical humanities funding streams and your proposed research will result in data holding significant value as a resource for the wider research community, then you will be required to submit a data management and sharing plan prior to an award being made. Data sharing plans are not requested for applications to our Technology Transfer and Public Engagement schemes.
3. Where they are required, data management and sharing plans and any costs associated with them are considered as an integral part of the application during peer review.
4. We recognise that different types of data raise very different considerations and challenges, and that there are significant differences between fields in terms of, for example, the availability of repositories and level of established good practice for data sharing.To assist researchers, we intend to supplement this guidance over time with tailored information relevant to specific research fields.
5. A data management and sharing plan should be submitted where a proposal involves the generation of datasets that have clear scope for wider research use and hold significant long-term value. In short, plans are required in situations where the data outputs form a resource from which researchers and other users would be able to generate additional benefits.
6. This would include all proposals where the primary goal is to create a database resource. It would also include other research generating significant datasets that could be shared for added value - for example, those where the data has clear utility for research questions beyond those that the data generators are seeking to address.
7. In particular, it would cover datasets that might form "community resources" as defined by the Fort Lauderdale Principles and the Toronto statement. As noted in the Toronto statement, community resources will typically have the following attributes:
- large-scale (requiring significant resources over time)
- broad utility
- creating reference datasets
- associated with community buy-in.
8. Illustrative examples of applications that would require a data management and sharing plan would include large-scale genetic association studies of common diseases; genome-wide or large-scale functional genomic studies in a specific organism; and longitudinal studies of patient and population cohorts.
9. For studies generating small-scale and limited data outputs, a data management and sharing plan will not normally be required. Generally, the expected approach for projects of this type would be to make data available to other researchers on publication, and where possible to deposit data in appropriate data repositories in a timely manner. While a formal data management and sharing plan need not be submitted in such cases, applicants may find the guidance below helpful in planning their approaches for managing their data.
10. If you are unsure whether you are required to submit a data management and sharing plan as part of your proposal, please contact the Trust for advice.
11. We do not dictate a set format for data management and sharing plans where they are required: applicants can structure their plan in a manner most appropriate to the proposed research. As a general rule, they should be clear and concise, and avoid repetition of methodological detail included elsewhere in the grant application. They should also be proportionate - both to the scale of the datasets generated and their likely level of value to the research community.
12. The information submitted in plans should focus specifically on how data outputs will be managed and shared.Whilst timely publication of results in peer reviewed journals and presentations at scientific meetings are key forms of dissemination, they are not equivalent to data sharing and do not in themselves constitute a data management and sharing plan.
13. In considering their approach for data management and sharing, applicants should consider the following seven questions. Each of these questions should be addressed as briefly and unambiguously as possible in data management and sharing plans where these are required. Further guidance on the issues that should be considered under each of these questions is provided below.
i. What data outputs will your research generate and what data will have value to other researchers?
ii. When will you share the data?
iii. Where will you make the data available?
iv. How will other researchers be able to access the data?
v. Are any limits to data sharing required - for example, to either safeguard research participants or to gain appropriate intellectual property protection?
vi. How will you ensure that key datasets are preserved to ensure their long-term value?
vii. What resources will you require to deliver your plan?
14. Researchers should maximise access to research datasets of value to the wider research community in a timely and responsible manner. Any data that is shared should be of a sufficiently high quality that it will have value to other researchers and should be provided in a format that enables it to be used effectively.
15. We recognise that in some cases it may not be appropriate for researchers to share their data. However, if your research meets the criteria for requiring a data management and sharing plan but you are intending not to share your data, the reasons for this must be clearly justified.
16. Data should be shared in accordance with recognised data standards where these exist, and in a way that maximises opportunities for data linkage and interoperability. Sufficient metadata must be provided to enable the dataset to be used by others. Agreed best practice standards for metadata provision should be adopted where these are in place.
17. When developing data management and sharing plans, researchers should therefore consider and briefly describe:
- what types of data the proposed research will generate
- which data will have value to other research users and could be shared
- what data formats and quality standards will be applied to enable the data to be shared effectively.
18. All data management and sharing plans must state clearly the timescales over which datasets of value will be shared. Such timescales should take account of any recognised standards of good practice in the applicant's research field.
19. In considering the timescales that are appropriate, the Trust recognises fully that data generators have the right to a reasonable (but not unlimited) period of exclusive use for the research data that they produce.
20. As set out in our guidelines on good research practice, all grant holders must ensure as an absolute minimum that the data underpinning research papers are made available to other researchers on publication, providing this is consistent with any ethics approvals and consents which cover the data and any intellectual property rights in them.
21. In cases where the creation of a database resource is the primary goal of a Trust-funded activity, we would normally expect the data to be made widely available to user communities at the earliest feasible opportunity.
22. In line with the Fort Lauderdale Principles and subsequent Toronto statement on pre-publication data sharing, the Trust also encourages timely and responsible pre-publication data sharing for research that might constitute a "community resource" (i.e. those that have the characteristics set out in point 7 above).
23. Where appropriate, researchers may use publication moratoria to facilitate pre-publication sharing of data with other researchers, while protecting their right to first publication. Any such restrictions on data use should be reasonable, transparent and in line with established best practice.
24. Illustrative examples of timescales for data sharing are provided to help demonstrate different models that have been adopted and may be considered as examples of good practice in the field of large-scale genetics and genomics studies.
25. Researchers should deposit data in recognised data repositories where these exist for particular data types, unless there is a compelling reason not to do so. Further information on repositories that may be appropriate
26. If the intention is to create a tailored database resource or to store data locally, researchers should ensure that they have the resources and systems in place so that the data are curated, secured and shared in an way that maximises its value and safeguards any associated risks.This includes consideration of how data held in this way can be effectively linked and integrated with other datasets to enhance its value to users.
27. Data should be made available to other researchers with as few restrictions as possible. Where a managed access process is required - for example, where a study involves potentially identifiable data about research participants - the access mechanisms established should be proportionate to the risks associated with the data, and must not unduly restrict or delay access. Any managed access procedures that are proposed must be described clearly as part of your data management and sharing plan.
28. Depending on the study, it may be appropriate to establish a graded access procedure in which less sensitive data (e.g. anonymised and aggregate data) are made readily available, whereas applications to access to more sensitive datasets are subject to a more stringent assessment process.
29. Any managed access procedures should be consistent and transparent. In cases where a Data Access Committee is required to assess applications to access data, the composition of such Committees should include individuals with appropriate expertise who are independent of the project.
30. Where appropriate, the Trust would encourage those generating datasets that are likely to be of significant value to other researchers to publish a 'marker paper' or other form of publication, which enables data users to formally cite their usage of the resource.
31. Where a database resource is being developed as part of a funded activity, researchers should take reasonable steps to ensure that potential users are made aware of its availability. These should be outlined briefly in your plan.
32. For some research, delays or limits on data sharing may be necessary and appropriate to safeguard research participants or to ensure intellectual property protection is gained. Any such restrictions should, however, be minimised as far as feasible and set out clearly in data management and sharing plans where these are required.
Safeguarding research participants
33. For research involving samples or information pertaining to human subjects, data must be managed and shared in a way which is fully consistent with the terms of the consent under which samples and data were provided by the research participants.
34. For prospective studies, consent procedures should include provision for data sharing in a way that maximises the value of the data for wider research use, while providing adequate safeguards for participants. As part of the consent process, proposed procedures for data sharing should be set out clearly and current and potential future risks associated with this explained to research participants.
35. In designing studies, researchers must ensure that they have appropriate systems to protect the confidentiality and security of data pertaining to human subjects, and minimise any risks of identification by data users. This can be achieved through the use of appropriate anonymisation procedures and managed access processes. Such systems should be sufficient to safeguard participants, but proportionate to the level of sensitivity of the data and associated risk. They should not unduly inhibit responsible data sharing for legitimate research uses.
36. In line with our policy on intellectual property and patenting, we expect our funded researchers to ensure that any intellectual property in the outputs of their research is suitably protected and managed in a way that best enables the use of that knowledge for ultimate health benefit.
37. Delays or restrictions on data sharing may be appropriate to gain intellectual property protection or to further development of a technology for public benefit. As noted above, any such limits should be minimised as far as is feasible.
38. Researchers must consider how datasets that have long-term value will be preserved and curated beyond the lifetime of the grant. If the proposal is to create a bespoke data resource or to store data locally rather than to use a recognised data repository, data management plans should state clearly how the applicant expects that the dataset will be preserved and shared when the period of grant funding comes to an end.
39. The Trust is happy to discuss issues relating to longer-term preservation and sustainability with researchers so as to help provide the support required to maximise the long-term value of key research datasets.
40. In preparing data management and sharing plans, researchers should consider carefully any resources they may need to deliver their plan.Where dedicated resources are required, these should be outlined and justified as part of the plan.
41. Issues to consider include:
- People and skills - is there sufficient expertise and resource in the research team to manage, preserve and share the data effectively?Is additional specialist expertise (or training for existing staff) required? If so, how will this be sourced?
- Infrastructure - are there appropriate computational facilities to manage, store and analyse the data generated by the research?
- Tools - will additional computational facilities and resources need to be accessed, and what will be the costs associated with this?
Digital Curation Centre
The UK's leading centre of expertise in data curation. The DCC provides a range of resources and training opportunities for the UK higher education sector, and has developed a checklist of issues that should be considered in developing data management plans.
MRC Guidance and Resources
In support of its Data Preservation and Sharing Policy, the MRC has developed detailed practical guidance for researchers on data sharing. The MRC Data and Tissues Toolkit provides a visual guide through regulatory requirements for use of personal information and human tissue samples in healthcare research.
NCRI Informatics initiative
The NCRI Informatics initiative website provides several valuable resources for researchers, including information on data standards applicable to both cancer research and more broadly.
NCRI - Template for Access Policy Development
The NCRI has created a flexible template for access policy development, which can be tailored to the needs of a specific collection of data or samples. The resource also provides a template Data and Material Transfer Agreement (MTA).
UK Data Archive - Managing and Sharing Data
The UK Data Archive provides an online resource and guidance document for researchers on best practice in managing and sharing data
Wellcome Trust Sanger Institute Data Sharing guidance
The Wellcome Trust Sanger Institute has a policy setting out the principles that underlie data sharing at the Institute, with an associated guidance document for researchers.