Structural genomicsThinking big in three dimensionsSequencing the human genome has been compared to momentous achievements such as putting a man on the moon. Producing three-dimensional structures for all human proteins – structural genomics – is arguably an even bigger challenge. |
The Human Genome Project has given us a read-out of more or less all the genes in the human genome. A major challenge in the ‘postgenomic’ era is to understand what these genes do. We already know that many genes are associated with disease, and it is likely that a large number will have subtle but significant roles in safeguarding health or in increasing susceptibility to illness. Biologically speaking, though, the genes are exerting their effects through the proteins that they encode, and most therapeutics are targeted at these protein products.
One of the most powerful ways to analyse a protein is to determine its structure. This can provide insight into the protein’s function and may offer an opportunity for rational drug design. So, just as we have the sequence of all human genes, can we produce structures for all human proteins? This approach – structural genomics – is an even greater challenge than sequencing the genome.
For more than 60 years, scientists have been firing X-rays at crystals of proteins in the quest to determine the three-dimensional make-up of large biological molecules. The results of these endeavours are more than 12 000 often beautiful structures – not just of proteins but DNA, RNA and other molecules. But fewer than 1000 of these structures are of human proteins, and only a few of the new structures produced each year are of human proteins. For tens of thousands of gene sequences – not only from the human genome but also from the genomes of the fruitfly and of microbes – we have no idea of the structure of the protein products and insufficient examples to use computers to model them accurately.
Compared with the manipulation of DNA, almost all aspects of which have been automated, producing just one protein structure can be a laborious and expensive process. Purifying and crystallising the protein can take many months of work, and months more lie ahead as the researcher fires X-rays at the protein crystal and then analyses the results. Structural genomics – in particular producing structures for all the 30–40 000 human proteins – presents a daunting challenge for the structural biology community, as daunting as that facing the DNA sequencers when the Human Genome Project was launched in 1990.
Yet the success of the Human Genome Project illustrated the power of industrial-scale methods. Careful preparation and a remarkable convergence in high technology – high-throughput capillary sequencers, robots, and powerful computers – drove a massive acceleration in DNA sequencing output. Similarly, the ingredients for a successful structural genomics programme are falling into place. The human genome sequence provides the raw resource, new synchrotrons have been and are being built to produce and collect structural data, robots are being designed to automate the process, and new computer software has been developed to interpret the data and produce a final structure.
At a meeting held at the Wellcome Trust Genome Campus in April 2000, structural biologists from all over the world agreed key principles for an International Structural Genomics Initiative. Through the large-scale determination and analysis of three-dimensional structures, and the development of methods for structural genomics, the Initiative aims to produce 10 000 structures in ten years, with all structures being released freely into the public domain.
To complement this public-sector initiative, the Wellcome Trust is discussing the possibility of setting up a not-for-profit organisation – the Structural Genomics Consortium – with a group of pharmaceutical and other companies.The model for the new consortium would be the highly successful SNP Consortium, formed in 1999 by the Trust and 12 companies, which has identified more than a million human single nucleotide polymorphisms (SNPs) in two years. Like the SNP Consortium, the Structural Genomics Consortium would be a focused, high-throughput project, releasing all structures freely onto the Internet.
But there are some significant scientific differences between the proposed consortium and the SNP Consortium. Producing a SNP map depends on DNA sequencing, and existing high-throughput sequencing centres, such as the Wellcome Trust Sanger Centre, were able to increase their output relatively easily. By contrast, although the underlying technology exists, there are no high-throughput protein production centres at present, so the consortium would look to fund complementary facilities to produce, purify and crystallise thousands of proteins – the key bottlenecks at present in scaling-up structure determination.
Although the ultimate goal of structural genomics would be to determine accurate three-dimensional structures for the protein products of all genes, choosing which of the 30–40 000 human proteins to work on first will be tricky. The products of genes implicated in human disease are obvious candidates, but the drive will be to determine representative structures for all the different protein families and for every distinct protein fold (current estimates suggest that there may be only 1000 distinct spatial arrangements of polypeptides – ‘folds’ – found in nature). With these databases in hand, it may be feasible for computers to model the product of any gene sequence, and to predict the effects of variation in this sequence on the structure of the protein.
See also
- Structural genomics
- SNP consortium: An overview of the collaboration
- Human Genome Project
External links
- The SNP consortium website
- National Institute of General Medical Sciences: Structural Genomics Initiatives
- Wellcome Trust Genome Campus
- Diamond: The new synchrotron light source to be built at Rutherford Appleton Laboratory: Further details



