Understanding the genomeThe Wellcome Trust Sanger Institute has announced its new five-year £300 million research programme. Under the leadership of Dr Allan Bradley, the Institute will launch a raft of new initiatives to discover the role of genes in human health and disease. |
As it nears its tenth anniversary, the Sanger Institute is firmly established on the international stage. One of the largest genome centres in the world, the Sanger has made major contributions to the genome sequencing of a host of organisms - not only of humans, the most difficult task of all, but also yeast, numerous pathogens and the nematode worm, to name just a few.
While genome sequences are clearly an essential bedrock to research on the biology of organisms - whether yeast or human - the challenge facing the Sanger, and indeed researchers worldwide, is to find out how the genetic instruction manuals are read.
To take on this challenge, the Wellcome Trust is funding a five-year £300 million research programme at the Sanger Institute. The Institute will build and expand on its existing strengths, launching new projects to uncover the function of the genes in the human genome.
Present strengths...
The Sanger’s formidable expertise in genome sequencing, robotics and computing, which have driven its success in the Human Genome Project, are the foundation of the new research programme. "We’re responsible for the sequencing of eight of the 24 human chromosomes," says Dr Jane Rogers, Head of Sequencing. "Our speciality is in ‘finished’ sequence, which has an accuracy of 99.99 per cent, an error rate of less than 1 in 10 000 bases. We’re the largest contributor of finished sequence to the international project, ahead of schedule, and we have new robots coming online that will increase our output even further," she adds.
Genome data pour from the Sanger’s DNA sequencers 24 hours a day, seven days a week: every day at the Sanger, about 100 000 sequence reactions are run, generating about 50 million bases of raw sequence data. These data must be stored and analysed by the Sanger’s computing systems, which hold a staggering 20 000 gigabytes of information. "The rate of sequencing output is increasing about fourfold every year, and all these data must be collected, organised and managed," says Dr Richard Durbin, Head of Informatics. "Then we analyse the data, link them to other data, and present them to the end user - through websites such as Ensembl and Pfam - so that researchers can find information as simply as possible."
...and future challenges
The value of such automation and informatics will become even more apparent over the next five years, as the Sanger broadens and diversifies its research projects, adding new technologies that will provide essential information about what the genome does and how it works. Many of the new projects are ‘big science’, looking at, for example, the expression pattern of every gene in the genome in all tissues of the body.
"For the human genome project, the task of decoding all the As, Cs, Ts and Gs is colossal yet straightforward," says Dr Bradley. "But to understand the functions of genes and proteins, there is no immediately obvious, single bottleneck to be breached, nor is there just one technology that could be used. We need a variety of approaches that all feed in and build on what we know already about the genome. The human genome has about 30 000 genes, so it’s a huge task to find out what they all do."
Completion of the human genome sequence is the primary goal of the Sanger’s 250-strong sequencing team. The International Human Genome Sequencing Consortium has set a completion date of April 2003 (the 50th anniversary of Watson and Crick’s publication of the structure of DNA). The Sanger Institute is responsible for the completion of chromosomes 1, 6, 9, 10, 13, 20 and X (having already completed chromosome 22), and leads the world in output of finished sequence (about 40 Mb a month).
The genomes of other organisms provide resources for those working on the biology of the organisms, and are invaluable for comparative analysis, which helps in the identification of genes and for studies of genome organization, manipulation and analysis. The Sanger Institute has three major programmes of sequencing:
2. Deciphering the code
3. Where and when?
4. Genetic variation and disease
SNP mapping: Identifying common variant single nucleotide polymorphisms (SNPs; single letter variations) in all exons (coding regions) of the genome. The existing SNP map includes 1.42 million SNPs, distributed randomly through the genome - To ensure that almost all common SNPs in exons are identified, the new project will re-sequence all exons in the DNA of 32 unrelated individuals.
5. Genes in action
6. New investigators
See also
- The Human Genome Project: Information on the project and the key issues
- Super models: Article on key model organisms
- Tackling pathogens: Article on the pathogen sequencing unit
- Cancer quest: Article on the cancer genome project

