Know thy enemy

Pathogen genome sequencing

Recent years have seen a prodigious burst of activity in pathogen genome sequencing projects. The challenge now is to use this information to understand the pathogens and to develop new therapies.

Despite vast improvements in treatment since Sir Alexander Fleming’s discovery of penicillin, infectious diseases remain a major cause of illness and death in the developed world – and even more so in the developing world. A range of pathogens cause diseases as diverse as food poisoning (Campylobacter jejuni), meningitis (principally Neisseria meningitidis) and malaria (Plasmodium falciparum).

Over the millennia, pathogens have adapted very successfully to colonising their human hosts, their high rates of evolution enabling them to outpace human attempts at disease treatment and prevention. Bacteria, for example, are adept at acquiring resistance to antibiotics; parasites are also notorious for changing their surface structures to evade host immunity (antigenic variation). To identify chinks in pathogen armoury that may be strategically targeted in therapy, knowledge of the ‘enemy’ would clearly be key. The technological advances that have facilitated large-scale sequencing of the human genome have also enabled researchers to tackle the genomes of our microbial and other pathogens. One of the world’s leading centres in this area is the Pathogen Sequencing Unit, part of the Wellcome Trust Sanger Centre at Hinxton, near Cambridge. Led by Bart Barrell, the Unit was established to sequence the genomes of important human and animal pathogens, mainly of bacteria but also of some single-celled parasites. The Pathogen Sequencing Unit receives funding from a variety of sources, including the Wellcome Trust’s Beowulf Genomics Initiative, which was set up in 1998 to provide support for the sequencing of pathogen genomes at the Wellcome Trust Sanger Centre.

Pathogens have been selected for sequencing on the basis of their impact on human and animal health, as well as the potential for subsequent research and exploitation of the sequence information by the relevant scientific community. Each project has been the result of close collaboration between the Pathogen Sequencing Unit and the scientific community (both nationally and internationally), from conception to completion. New sequence data are posted daily on the Sanger Centre’s website, in accordance with the Wellcome Trust’s policy on free and instant access to sequence information, to facilitate rapid translation into research and health benefits.

The Pathogen Sequencing Unit has been remarkably productive. It has been funded to sequence 22 bacterial genomes, and is working in partnership with other sequencing centres on a further two bacterial genomes and those of ten eukaryotic pathogens. Among those completed are the genomes of some of the most serious human pathogens, including Mycobacterium tuberculosis (TB), N. meningitidis, C. jejuni, Yersinia pestis (plague), Salmonella typhi (typhoid) and Mycobacterium leprae (leprosy).

From genes to biology

Already, findings from pathogen sequencing have thrown new light on evolutionary relationships between pathogen species, and shown how each has developed special adaptations advantageous for each of their unique infectious lifestyles. In the longer term, an understanding of their genome and biology will enable scientists to design means of disrupting these infectious lifestyles.

Genome sequence information from N. meningitidis, a major cause of epidemics of bacterial meningitis and septicaemia in sub-Saharan Africa, revealed a surprising abundance of repetitive DNA. This characteristic, and the ability of Neisseria to take up DNA from the environment and incorporate it into its genome, may enable it to shuffle the genes coding for its surface antigens. An important effect of this surface variation is the occasional seemingly random conversion of a harmless bacterium – roughly one-third of the population carries N. meningitidis without suffering any ill effects – into an invasive form that multiplies rapidly in the bloodstream.

By contrast, sequence analysis of C. jejuni, a major cause of food poisoning in the UK, showed few repetitive sequences, but indicated strikingly rapid rates of sequence variation, which may enable the bacterium to rapidly vary its antigenic structures and hence evade host immunity. Its genome was also found to contain clusters of coordinately regulated genes coding for components of particular metabolic pathways. These are likely to be switched on in response to new environmental stimuli, and probably enable the bacterium to survive in a range of different environments.

Genomic comparisons can also provide useful insights. Of two closely related organisms, M. tuberculosis is responsible for TB whereas M. leprae causes leprosy. The latter can survive only in the human host, lodging within the peripheral nervous system. Interestingly, M. leprae has just 1600 genes compared with 4000 in M. tuberculosis – and a staggering 1100 or more of these are nonfunctional pseudogenes. The conserved genes are, appropriately enough, those essential for the organism’s life in its highly restricted environment.

Exciting findings emerging from sequencing projects are shared at pathogen-specific workshops organised regularly by the Trust. At these residential workshops, scientists with significant research interests and expertise in particular pathogens come together to discuss the new genomic information, share ideas and insights, coordinate activities, and explore the new areas of research made possible by the sequence data.

As many of the key microbial pathogens have been completed or are being sequenced, there is no longer a compelling need for a specific initiative such as Beowulf Genomics, which will wind up at the end of this year. Beowulf will be supporting the sequencing of a number of further genomes, following a call for proposals held in 2000. In the future, the Trust will continue to welcome pathogen sequencing proposals, which will be considered by its Infection and Immunity Panel.

This year will see the completion and annotation of several more pathogen genomes by the Sanger Centre’s Pathogen Sequencing Unit. As with the pathogen genomes highlighted above, the sequence information and analysis by the Unit team will be the springboard for novel avenues of fundamental research into pathogen biology as well as the development of diagnostics, vaccines and drugs in the ongoing battle against human infectious diseases.

Insights and exploits
The free, rapid and unrestricted release of pathogen genome sequence data has enabled scientists all over the world to begin exploring the genetics and biology of these organisms. Many interesting discoveries are being made, as the examples below illustrate.
At the University of Oxford, Professor Chris Newbold has an ongoing interest in the pathogenesis of malaria.When a red blood cell is infected by P. falciparum, the causative agent of malaria, parasite proteins appear on the surface of the cell. These proteins cause the cell to stick to sides of small blood vessels, particularly in the brain, causing the lethal brain lesions typical of cerebral malaria.
The Plasmodium sequence data generated so far have helped Professor Newbold gain a better understanding of the proteins involved in parasite adhesion. His group has cloned PfEMP1, a variant antigen belonging to the multigene family, var. Var proteins undergo antigenic variation - the form of the protein expressed by a parasite changes during an infection - helping the parasite evade host immune responses.In addition, Professor Newbold has discovered a second family of variant genes adjacent to var in the Plasmodium genome. The rif family also codes for variable proteins, the rifins, which are expressed on the surface of infected red blood cells and may also have an important role in host-parasite interactions. Both var proteins and the rifins are likely to be the fruitful subject of further studies, which should shed light on the pathological basis of malaria and on the development of immunity to the malaria parasite.
At the Pasteur Institute in Paris, Professor Stewart Cole was a key participant in the M. tuberculosis sequencing project carried out at the Wellcome Trust Sanger Centre. The genome of M. tuberculosis was found to contain approximately 4000 functional genes, 16 per cent of which code for proteins of unknown function. Genomic data revealed an unusual abundance of genes encoding more than 250 proteins involved in fatty acid metabolism, probably associated with the complex lipid-rich cell envelope surrounding the bacterium. Some of these proteins may play a role in pathogenesis and host inflammatory responses. Remarkably, just under 10 per cent of the genome was found to encode just two novel families of glycine-rich proteins, which may be involved in antigenic variation.
The complete genome sequence enabled Professor Cole’s group to compare several Mycobacterium species, to try to identify genetic variation that could explain differences in virulence between species and strains. For example, the group discovered seven deleted regions in the chromosomes of the M. bovis BCG strain (the non-virulent strain used in vaccination programmes), indicating that genes within these regions may account for host specificity or virulence.

See also

External links

Share |
Home  >  News and features  >  2001  > Know thy enemy:Pathogen genome sequencing
Wellcome Trust, Gibbs Building, 215 Euston Road, London NW1 2BE, UK T:+44 (0)20 7611 8888