Human genome project analysis
Embargoed until 18.00 London time/13.00 US Eastern Time on 20 October 2004
Hinxton, Cambridge, UK, Thursday 21 October 2004. The International Human Genome Sequencing Consortium, of which the Wellcome Trust Sanger Institute is a major partner, today published their scientific analysis of the finished human genome, the Gold Standard sequence that is already acting to prime new biomedical research
The paper is published on 21 October 2004 in Nature and details the rigorous standards set and surpassed during the 13-year Human Genome Project (HGP). The analysis suggests that there are perhaps only 20,000-25,000 protein-coding genes in our human genome.
The Wellcome Trust Sanger Institute made the largest single contribution to the human genome sequence and the ‘genome browser’ ENSEMBL, run by the Sanger Institute and the EMBL-European Bioinformatics Institute is a leading resource for researchers around the globe.
Key results of the research are:
• The number of gaps has been reduced 400-fold to only 341
• It covers 99% of the gene-containing parts of the genome and is 99.999% accurate
• The new sequence correctly identifies almost all known genes (99.74%)
• It defines 22,287 'gene loci’, consisting of 19,599 protein-coding genes in the human genome and another 2,188 DNA segments that are predicted to be protein-coding genes
• It identifies the ‘birth’ of 1183 genes in the last 60-100 million years
• It identifies the ‘death’ of 30 or so genes in a similar time period
• The accuracy and completeness allows systematic searches for the causes of disease, for example, to find all key heritable factors predisposing to diabetes or mutations underlying breast cancer – with confidence that little can escape detection
• At a practical level, it eliminates tedious confirmatory work by researchers, who can now rely on highly accurate information
• More generally, the HGP demonstrates the tremendous potential value of coordinated projects to create community resources to propel biomedical research
‘In our analysis we revised some predictions based on the unfinished, draft sequence of the human genome,’ said Dr Jane Rogers, Head of Sequencing at the Wellcome Trust Sanger Institute. ‘The task of identifying genes remains challenging, but the finished human genome sequence, genome sequences from other organisms, better computational models and other improved resources, have combined to give a much clearer and more reliable picture of our genomic landscape.’
The quality of sequence produced has an estimated error rate of less than one per 100,000 bases of code – tenfold better than the original goal. This means that gene identification can be more reliable and that studies our genome and health – for example, what genetic changes mean some individuals are predisposed to disease – can be carried out with greater confidence.
‘Only a decade ago, most scientists thought humans had about 100,000 genes. When we analyzed the working draft of the human genome sequence three years ago, we estimated there were about 30,000 to 35,000 genes, which surprised many. This new analysis reduces that number even further and provides us with the clearest picture yet of our genome, ‘said NHGRI Director Francis S. Collins, MD, PhD. ‘The availability of the highly accurate human genome sequence in free public databases enables researchers around the world to conduct even more precise studies of our genetic instruction book and how it influences health and disease.’
Key challenges that lie ahead include: a systematic study of sequence variation among humans in a study of the association of variation with disease; systematic identification of non-protein-coding elements in the human genome, especially regulatory controls and structure elements; systematic identification of all the ‘modules’ in which genes and proteins function together to place genetic information in a functional context.
Sir John Sulston, former Director of The Wellcome Trust Sanger Institute, said ‘Collectively we have produced a sequence that is as accurate and complete as possible in the present state of the art. It will be open for continuous improvement over the years to come, and of course open for all to use for any purpose, without restraint or fee. Let us continue to work together to ensure that the enormous benefits from this new knowledge flow to all and not just to the few.'
NOTES TO EDITORS
1. More than 2,800 researchers who took part in the International Human Genome Sequencing Consortium share authorship on today’s Nature paper, which expands upon the group’s initial analysis published in Feb. 2001. Even more detailed annotations and analyses have already been published for chromosomes 5, 6, 7, 9, 10, 13, 14, 19, 20, 21, 22 and Y. Publications describing the remaining 12 chromosomes are forthcoming.
2. The finished human genome sequence and its annotations can be accessed through the following public genome browsers:
• the Ensembl Genome Browser ( www.ensembl.org) at the Wellcome Trust Sanger Institute and the EMBL-European Bioinformatics Institute;
• GenBank ( www.ncbi.nih.gov/Genbank) at NIH's National Center for Biotechnology Information (NCBI);
• the UCSC Genome Browser ( www.genome.ucsc.edu) at the University of California at Santa Cruz;
• EMBL-Bank( www.ebi.ac.uk/embl/index.html) at the EMBL-European Bioinformatics Institute;
• and the DNA Data Bank of Japan ( www.ddbj.nih.ac.jp).
3. The International Human Genome Sequencing Consortium includes scientists at 20 institutions located in France, Germany, Japan, China, the United Kingdom and the United States.
4. The Wellcome Trust Sanger Institute, which receives the majority of its funding from the Wellcome Trust, was founded in 1992 as the focus for UK sequencing efforts. The Institute was responsible for the completion of the sequence of approximately one-third of the human genome as well as genomes of model organisms such as mouse and zebrafish, and more than 90 pathogen genomes. In October 2001, funding was awarded by the Wellcome Trust to support a new range of post-genomic programmes designed to understand the biological function of genes and their relevance to our health.
5. The Wellcome Trust is an independent research-funding charity, established under the will of Sir Henry Wellcome in 1936. It is funded from a private endowment which is managed with long-term stability and growth in mind. The Trust's mission is to foster and promote research with the aim of improving human and animal health.
Don Powell Press Officer
Wellcome Trust Sanger Institute
Hinxton, Cambs, CB10 1SA, UK
Tel +44 (0)1223 494 956
Mobile +44 (0)7753 7753 97
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK