We use cookies on this website. By continuing to use this site without changing your cookie settings, you agree that you are happy to accept our cookies and for us to access these on your device. Find out more about how we use cookies and how to change your cookie settings.

Data repositories and database resources

Nucleotide databases

EMBL Nucleotide Sequence Database
Europe's primary nucleotide sequence resource. The main sources for DNA and RNA sequences are direct submissions from individual researchers, genome sequencing projects and patent applications.It is one part of the European Nucleotide Archive (ENA).

Genome databases

Databases of genetic variation

In collaboration with the National Human Genome Research Institute, the National Center for Biotechnology Information has established the dbSNP database to serve as a central repository for both single base nucleotide substitutions and short deletion and insertion polymorphisms.

COSMIC is designed to store and display somatic mutation information and related details and contains information relating to human cancers.

Database of Genomic Variants (DGV)
Aims to provide a comprehensive summary of structural variation in the human genome and provides a useful catalog of control data for studies aiming to correlate genomic variation with phenotypic data. The database is continuously updated with new data from peer reviewed research studies.

Databases of genotype and phenotype data

The European Genome-phenome Archive (EGA) is designed to be a repository for all types of genotype experiments, including case control, population, and family studies. It includes SNP and CNV genotypes from array based methods and genotyping done with re-sequencing methods. This data may be either publicly available or limited access, depending on the design of the study.

The database of Genotypes and Phenotypes (dbGaP) was developed to archive and distribute the results of studies that investigate the interaction of genotype and phenotype. Such studies include genome-wide association studies, medical sequencing, molecular diagnostic assays, as well as association between genotype and non-clinical traits.

Protein and protein macromolecular structure databases

Protein Data Bank in Europe (PDBe)
The EBI Protein Structure Database in Europe is a project for the collection, management and distribution of data about macromolecular structures, derived from the Protein Data Bank (PDB). It is one of the founding members of Worldwide Protein Data Bank (wwPDB).

IntAct provides a freely available, open source database system and analysis tools for protein interaction data. All interactions are derived from literature curation or direct user submissions and are freely available.

Microarray databases


The ArrayExpress Archive is a database of functional genomics experiments including gene expression where you can query and download data collected to MIAME and MINSEQE standards. Gene Expression Atlas contains a subset of curated and re-annotated Archive data which can be queried for individual gene expression under different biological conditions across experiments.

Proteomics databases

The PRIDE PRoteomics IDEntifications database at EMBL-EBI is a centralised, standards compliant, public data repository for proteomics data. It has been developed to provide the proteomics community with a public repository for protein and peptide identifications together with the evidence supporting these identifications. PRIDE is also able to capture details of post-translational modifications.

Social sciences and humanities databases

UK Data Archive
The UK Data Archive (UKDA) is a centre of expertise in data acquisition, preservation, dissemination and promotion and is curator of the largest collection of digital data in the social sciences and humanities in the UK.

Bacteria collections

National Collection of Type Cultures
The National collection of Type Cultures (NCTC) is a specialised laboratory located in the Central Public Health Laboratory, Colindale. It accesses, preserves and supplies authentic cultures of bacteria and mycoplasmas that are pathogenic to man or other animals that may occur in food or water and in hospital or health related environments and which can be preserved by freeze-drying.

Virus collections

National Collection of Pathogenic Viruses
A wide-ranging archive of well-characterised, authenticated human pathogens which will resource the supply of viruses, and materials derived from them, to the scientific community.

Share |
Home  >  About us  >  Policy  >  Spotlight issues  >  Data sharing  >  Guidance for researchers  > Data repositories and database resources
Wellcome Trust, Gibbs Building, 215 Euston Road, London NW1 2BE, UK T:+44 (0)20 7611 8888