Understanding the genome

The Wellcome Trust Sanger Institute has announced its new five-year £300 million research programme. Under the leadership of Dr Allan Bradley, the Institute will launch a raft of new initiatives to discover the role of genes in human health and disease.

As it nears its tenth anniversary, the Sanger Institute is firmly established on the international stage. One of the largest genome centres in the world, the Sanger has made major contributions to the genome sequencing of a host of organisms - not only of humans, the most difficult task of all, but also yeast, numerous pathogens and the nematode worm, to name just a few.

While genome sequences are clearly an essential bedrock to research on the biology of organisms - whether yeast or human - the challenge facing the Sanger, and indeed researchers worldwide, is to find out how the genetic instruction manuals are read.

To take on this challenge, the Wellcome Trust is funding a five-year £300 million research programme at the Sanger Institute. The Institute will build and expand on its existing strengths, launching new projects to uncover the function of the genes in the human genome.

Present strengths...

The Sanger’s formidable expertise in genome sequencing, robotics and computing, which have driven its success in the Human Genome Project, are the foundation of the new research programme. "We’re responsible for the sequencing of eight of the 24 human chromosomes," says Dr Jane Rogers, Head of Sequencing. "Our speciality is in ‘finished’ sequence, which has an accuracy of 99.99 per cent, an error rate of less than 1 in 10 000 bases. We’re the largest contributor of finished sequence to the international project, ahead of schedule, and we have new robots coming online that will increase our output even further," she adds.

Genome data pour from the Sanger’s DNA sequencers 24 hours a day, seven days a week: every day at the Sanger, about 100 000 sequence reactions are run, generating about 50 million bases of raw sequence data. These data must be stored and analysed by the Sanger’s computing systems, which hold a staggering 20 000 gigabytes of information. "The rate of sequencing output is increasing about fourfold every year, and all these data must be collected, organised and managed," says Dr Richard Durbin, Head of Informatics. "Then we analyse the data, link them to other data, and present them to the end user - through websites such as Ensembl and Pfam - so that researchers can find information as simply as possible."

...and future challenges

The value of such automation and informatics will become even more apparent over the next five years, as the Sanger broadens and diversifies its research projects, adding new technologies that will provide essential information about what the genome does and how it works. Many of the new projects are ‘big science’, looking at, for example, the expression pattern of every gene in the genome in all tissues of the body.

"For the human genome project, the task of decoding all the As, Cs, Ts and Gs is colossal yet straightforward," says Dr Bradley. "But to understand the functions of genes and proteins, there is no immediately obvious, single bottleneck to be breached, nor is there just one technology that could be used. We need a variety of approaches that all feed in and build on what we know already about the genome. The human genome has about 30 000 genes, so it’s a huge task to find out what they all do."

1. Genome juggernaut
The human genome
Completion of the human genome sequence is the primary goal of the Sanger’s 250-strong sequencing team. The International Human Genome Sequencing Consortium has set a completion date of April 2003 (the 50th anniversary of Watson and Crick’s publication of the structure of DNA). The Sanger Institute is responsible for the completion of chromosomes 1, 6, 9, 10, 13, 20 and X (having already completed chromosome 22), and leads the world in output of finished sequence (about 40 Mb a month).
Other organisms
The genomes of other organisms provide resources for those working on the biology of the organisms, and are invaluable for comparative analysis, which helps in the identification of genes and for studies of genome organization, manipulation and analysis. The Sanger Institute has three major programmes of sequencing:
Mouse: The Sanger has already participated in the Mouse Sequencing Consortium, which has produced a draft sequence.
A new project will now go on to finish the sequence of about 20 per cent of the mouse genome, focusing on regions central to genetic studies at the Institute.
Zebrafish: The zebrafish genome provides a powerful tool to interpret the human genome, and the Sanger began sequencing earlier this year; sequencing of the 1.7 Gb zebrafish genome is expected to be complete by 2005.
Pathogens: The Pathogen Sequencing Unit of the Sanger Institute is decoding the genomes of some of the world’s major killers.

2. Deciphering the code
The sequencing and assembly of the human genome, and its presentation to the world for all to see, has been entirely dependent upon bioinformatics. Sophisticated software and databases keep the systems organized and collect the data, identify genes and other sequence features, and compare the human genome to the genomes of other organisms.
As well as continuing to enhance the core bioinformatics at the Sanger, the next five years will see the continued development of:
Ensembl: The genome viewer developed jointly by the Sanger and by the EBI to provide a public view of the annotated human genome via the Internet. Ensembl presents free, high-quality data, as well as excellent software tools for researchers worldwide to use.
Pfam: The pre-eminent protein domain family database.

3. Where and when?
A major new programme of work at the Sanger will define the expression pattern of every gene in the genome, to show when and where every gene is turned on and off in tissues and cells. This ambitious goal is dependent on large-scale, high-throughput systems - the Sanger’s speciality.
Display of the proteome: Discovering where all human and mouse genes are expressed as proteins in all normal tissues.
This study will use antibodies specific to each protein - with an estimated 3 million antibodies.
Molecular atlas of the cell: Discover where, within a cell, all the genome’s proteins reside normally.
Development: Discover when and where genes are turned on and off during development.

4. Genetic variation and disease
Genes and genetic variations underlie most human diseases and traits. The emergence of a complete sequence of the human genome, annotated with all the genes, will provide the basis to examine both the extent of sequence variation in human populations, and how specific variants contribute to disease. Identifying such disease-related variants will enable improved diagnosis and the development of new, more targeted treatments.
Projects include:
SNP mapping
: Identifying common variant single nucleotide polymorphisms (SNPs; single letter variations) in all exons (coding regions) of the genome. The existing SNP map includes 1.42 million SNPs, distributed randomly through the genome - To ensure that almost all common SNPs in exons are identified, the new project will re-sequence all exons in the DNA of 32 unrelated individuals.
Haplotype mapping: Haplotypes are ancestral sections of chromosomes that contain multiple SNPs and are inherited together in blocks. Rather than searching through millions of SNPs to search for disease genes, a knowledge of haplotypes means that scientists may be able to narrow their search significantly.
Association studies: Studying the association between common genetic variants and haplotypes and common disease.
Cancer Genome Project: determining the role of mutations in cancer.

5. Genes in action
Biological research worldwide is driven by studies of model organisms, whether fruit flies, mice, fish, yeast or frogs. With the arrival of genome sequences, these organisms have and will become even more powerful tools, bringing an understanding of the function of genes as they operate within living systems.
Over the next five years, the Sanger Institute will initiate study of mouse models of disease, complementing work in other centres in the UK and elsewhere. Sanger also has research groups studying the nematode worm Caenorhabditis elegans and the fission yeast Schizosaccharomyces pombe.

6. New investigators
Over the next five years, the Sanger Institute will recruit 20 independent investigators, from junior fellows to principal investigators, who will use Sanger’s technology platforms in their research.
The researchers will bring new expertise to the Sanger, and their research on more specific areas of biology will provide detailed genome annotation not achievable by high-throughput platforms alone.
Over the last nine years, the Sanger has trained 37 PhDs and over the last 13 years, 45 postdocs - most of whom are now in excellent positions in many of the top institutions of the world. The Sanger Institute will increase its career development mission, over the next few years, training the next generation of genomic scientists.

See also

External links

Share |
Home  >  News and features  >  2001  > Understanding the genome: The role of genes in human health and disease
Wellcome Trust, Gibbs Building, 215 Euston Road, London NW1 2BE, UK T:+44 (0)20 7611 8888