Structural genomics

Structural genomics

Thinking big in three dimensions

Sequencing the human genome has been compared to momentous achievements such as putting a man on the moon. Producing three-dimensional structures for all human proteins – structural genomics – is arguably an even bigger challenge.

The Human Genome Project has given us a read-out of more or less all the genes in the human genome. A major challenge in the ‘postgenomic’ era is to understand what these genes do. We already know that many genes are associated with disease, and it is likely that a large number will have subtle but significant roles in safeguarding health or in increasing susceptibility to illness. Biologically speaking, though, the genes are exerting their effects through the proteins that they encode, and most therapeutics are targeted at these protein products.

One of the most powerful ways to analyse a protein is to determine its structure. This can provide insight into the protein’s function and may offer an opportunity for rational drug design. So, just as we have the sequence of all human genes, can we produce structures for all human proteins? This approach – structural genomics – is an even greater challenge than sequencing the genome.

For more than 60 years, scientists have been firing X-rays at crystals of proteins in the quest to determine the three-dimensional make-up of large biological molecules. The results of these endeavours are more than 12 000 often beautiful structures – not just of proteins but DNA, RNA and other molecules. But fewer than 1000 of these structures are of human proteins, and only a few of the new structures produced each year are of human proteins. For tens of thousands of gene sequences – not only from the human genome but also from the genomes of the fruitfly and of microbes – we have no idea of the structure of the protein products and insufficient examples to use computers to model them accurately.

Compared with the manipulation of DNA, almost all aspects of which have been automated, producing just one protein structure can be a laborious and expensive process. Purifying and crystallising the protein can take many months of work, and months more lie ahead as the researcher fires X-rays at the protein crystal and then analyses the results. Structural genomics – in particular producing structures for all the 30–40 000 human proteins – presents a daunting challenge for the structural biology community, as daunting as that facing the DNA sequencers when the Human Genome Project was launched in 1990.

Yet the success of the Human Genome Project illustrated the power of industrial-scale methods. Careful preparation and a remarkable convergence in high technology – high-throughput capillary sequencers, robots, and powerful computers – drove a massive acceleration in DNA sequencing output. Similarly, the ingredients for a successful structural genomics programme are falling into place. The human genome sequence provides the raw resource, new synchrotrons have been and are being built to produce and collect structural data, robots are being designed to automate the process, and new computer software has been developed to interpret the data and produce a final structure.

At a meeting held at the Wellcome Trust Genome Campus in April 2000, structural biologists from all over the world agreed key principles for an International Structural Genomics Initiative. Through the large-scale determination and analysis of three-dimensional structures, and the development of methods for structural genomics, the Initiative aims to produce 10 000 structures in ten years, with all structures being released freely into the public domain.

To complement this public-sector initiative, the Wellcome Trust is discussing the possibility of setting up a not-for-profit organisation – the Structural Genomics Consortium – with a group of pharmaceutical and other companies.The model for the new consortium would be the highly successful SNP Consortium, formed in 1999 by the Trust and 12 companies, which has identified more than a million human single nucleotide polymorphisms (SNPs) in two years. Like the SNP Consortium, the Structural Genomics Consortium would be a focused, high-throughput project, releasing all structures freely onto the Internet.

But there are some significant scientific differences between the proposed consortium and the SNP Consortium. Producing a SNP map depends on DNA sequencing, and existing high-throughput sequencing centres, such as the Wellcome Trust Sanger Centre, were able to increase their output relatively easily. By contrast, although the underlying technology exists, there are no high-throughput protein production centres at present, so the consortium would look to fund complementary facilities to produce, purify and crystallise thousands of proteins – the key bottlenecks at present in scaling-up structure determination.

Although the ultimate goal of structural genomics would be to determine accurate three-dimensional structures for the protein products of all genes, choosing which of the 30–40 000 human proteins to work on first will be tricky. The products of genes implicated in human disease are obvious candidates, but the drive will be to determine representative structures for all the different protein families and for every distinct protein fold (current estimates suggest that there may be only 1000 distinct spatial arrangements of polypeptides – ‘folds’ – found in nature). With these databases in hand, it may be feasible for computers to model the product of any gene sequence, and to predict the effects of variation in this sequence on the structure of the protein.

The unofficial guide to structural genomics
So what is structural genomics?
The determination of tens of thousands of structures of the proteins encoded by the human genome, or the genome of any organism to be more precise.
Sounds a worthy goal. But why would you want all these protein structures?
Other biological experiments can tell you something about what a particular protein does, but its structure can give you lots of information about how it does it.
If you know the gene sequence, and the amino acid sequence in the protein, why get the structure?
The string of amino acids folds up into a fiendishly complex arrangement, with twists, turns, loops and coils, which is very difficult to predict. The final shape the protein takes - its 3D structure - determines how it works in the cell.
So how do you get a structure?
Well, the first step is to get some very pure protein. After that, you can go a few different paths to get your structure, but the most common route is to grow some crystals.
Crystals? Like sugar crystals?
Yes and no. Salt or sugar crystals are durable and hard, but protein crystals are like fragile cubes of jelly. Growing crystals is something of a black art. Some scientists are well known for having ‘green fingers’ at growing crystals while others can spend years trying to crystallise one protein. Just like some people have a houseful of healthy plants, while mine rarely last a week.
OK. So your plants have died but you’ve grown some crystals. What’s next?
You put the crystal in front of a beam of X-rays. The atoms in the protein crystal scatter the radiation into thousands of rays; the resulting X-ray image is then fed into powerful computers, which work out ‘electron density maps’ that could account for the observed scattering.
And then you have the structure?
Not yet. Now you have to sit in front of computers for days on end to interpret the electron density maps in terms of the amino acid sequence of the protein. Using a lot of brain and computer power, you end up with…wait for it…your protein’s structure.
Is there any way to speed this up?
To an extent. If you take your protein crystals to a synchrotron, which generates very powerful X-rays, you can get vast amounts of data very quickly. But it still takes time to purify and crystallise proteins, so researchers doing structural genomics will be looking to use industrial-scale methods to speed up the process.
What happens if you can’t crystallise your protein?
All is not lost. Although X-ray crystallography is probably the best known method of getting a structure, you could use NMR spectroscopy or cryo-electron microscopy. But that’s another story…

See also

External links

Home  >  News and features  >  2001  > Structural genomics: Thinking big in three dimensions