Protein powerDeciphering the genome with proteomicsProteins are the new genes and proteomics is the new genomics: after years out of the spotlight, proteins are back in fashion again. |
The amount of information uncovered by the Human Genome Project is staggering. Yet in many ways this information is little more than the groundwork for further biological exploration. This is in no way to belittle the achievement, which has rightly been compared to other seminal breakthroughs in human history, but it is only a step towards answering the real question – how do the genes in the genome work together in a human being?
With all the excitement generated by genetics and genome sequencing, it is easy to forget that the primary purpose of most genes is to code for proteins, the molecules in the body that do the major work in building and controlling our cells and tissues. Indeed, proteins had been studied for decades before the DNA revolution took off. Fred Sanger, for instance, won his first Nobel Prize for sequencing a protein (insulin) in 1958 and his second for DNA sequencing techniques in 1980.
Since then, DNA-based studies have been in the ascendancy. Nevertheless, protein science never went away, and it is worth remembering that proteins are still the targets of nearly all our drugs. Now, however, proteins are returning to pre-eminence – and in a new postgenomic guise: the 'proteome', the set of all proteins the human genome can produce. The tools of this new trade are proteomics – high-throughput technologies for the large-scale, rapid analysis of proteins.
At the simplest level, proteomics can determine which proteins are present in a cell or tissue. But this only brushes the surface of what proteomics can do – and will be able to do in the future as the technology improves – to answer the questions that biologists want to know about proteins: where are the proteins in the cell, how are they modified, which proteins interact with each other, what do the proteins actually do and how can we modulate their actions therapeutically?
A plethora of proteins
DNA is easily isolated, stable, and nature has provided a host of tools that enable us to manipulate it and modify it – particularly restriction enzymes to cut it and polymerases to make copies of it. It also has the invaluable property of complementarity – a strand of DNA will bind to an opposite and complementary nucleic acid. By contrast, working with proteins can be technically more demanding: there is no complementarity to exploit, they are chemically complex, and often fragile and difficult to isolate.
Many lines of research can identify genes implicated in particular biological processes. Unfortunately, the route from an identified gene to a functional protein can be a tortuous one. For a start, a gene may be switched on, but its messenger RNA is not necessarily translated into protein. In addition, genes often produce more than one type of protein – for example, by alternative splicing of messenger RNA – and these proteins may have different functions. Proteins are commonly modified after they have been constructed: pieces of the protein may be cleaved off, or other molecules, such as lipids and sugars can be added – in fact, more than 100 different types of modification have been described. With all these possibilities, it has been estimated that the human proteome is at least an order of magnitude more complex than the human genome.
Name that protein
Notwithstanding these problems, protein researchers have made great strides in designing high-throughput methods for separating and identifying molecules of interest – greatly aided by DNA sequence databases, as a small amount of protein sequence information can be used to fish out the complete corresponding DNA sequence. At present, the most widely used proteomic approaches marry two 'classic' techniques – two-dimensional gel electrophoresis, used for many years by biologists to separate proteins, and mass spectrometry, used by chemists to identify molecules according to their mass.
Two-dimensional gel electrophoresis, so-called because the proteins are separated first by electric charge and then in a perpendicular direction by mass, produces a gel with spots corresponding to individual proteins. The spots can then be cut out of the gel and the proteins digested into shorter peptides by enzymes such as trypsin. The peptide fragments are fed into a mass spectrometer, and their masses determined by the inconveniently named technique matrix-assisted laser desorption/ionisation and time-of-flight mass analysis (more conveniently known by its acronym 'MALDI-tof', a staple of proteomics jargon). The 'peptide-mass fingerprint' is then compared with data predicted from genetic or protein sequence information, which with luck will identify the protein being examined.
If MALDI-tof does not provide the answer, the same peptides can be sprayed into a so-called quadrupole orthogonal time-of-flight (Q-tof) mass spectrometer where the peptide ions are fragmented. The analysis of these new masses provides some partial amino acid sequence from the peptides – normally sufficient to identify a protein unambiguously.
Proteomics has already proved extremely useful in the identification of proteins that work together in large complexes. For the exosome, for example, a complex of 11 proteins that trims and processes RNA, MALDI was used to identify the components very rapidly. Another approach is to use proteomics to look at the proteins produced in cells or tissues under two different conditions, to identify proteins likely to have important functional roles in the cells’ response to those conditions. Or the cells or tissues may be in two different states – such as immature or mature immune cells – and proteomics can be used to identify proteins that drive this process of maturation. These comparative approaches parallel those of microarray experiments, which examine whether genes are turned on or off in different conditions.
Not content with the technologies at hand, academics and industrialists are striving to improve the throughput of proteomics. Robots have been developed that can 'pick' protein spots from a gel and transfer them directly to a mass spectrometer, and capillary and liquid-chromatography methods are being examined to see if they can be used to bypass the gel-based separation stage altogether. Meanwhile, other researchers and companies are adapting microarray-type approaches to develop 'protein chips' – tiny spots of hundreds or thousands of proteins arranged on microscope slides or small multiwell plates. Although in early days of development, these kinds of devices open up exciting new opportunities for exploring the biochemical properties of hundreds or thousands of proteins simultaneously.
Cross-talk
As well as knowing what proteins are actually present in a cell, it is also useful to know which proteins interact with one another. Such approaches can help to identify components of a particular enzymatic pathway, for example. This again tells us more about how the proteins cooperate within a living cell. Moreover, this kind of research can also uncover new therapeutic targets – proteins in the same biochemical pathway as an existing pharmacological target.
Much effort is being expended to scale-up the so-called yeast two-hybrid system, used for many years by molecular biologists to study protein–protein interactions. This system relies on a yeast protein known as GAL4, a well-characterised activator of transcription. The gene for GAL4 has been split into two, and each half can be fused to a protein-encoding gene. If after transcription these two proteins bind to one another, the two domains of GAL4 protein will be brought together and the complex will be able to activate transcription (which can be easily assayed).
A further development of this is 'reverse' two-hybrid, where compounds can be tested to see if they can disrupt protein–protein interactions – a strategy that may be extremely useful for the development of new drugs.
Although proteomics is a new, 'sexy' science, it is worth remembering that thousands of proteins have already been carefully characterised – often in pathways fundamental to the function of a cell. Yet this wealth of information is thought to encompass only a small percentage of the proteins in the body. The flood of information from proteomics enriches our understanding of what proteins are produced by cells and how the proteins interact. In the future, researchers will be able to pull all the information together to make a map of the physiological interconnections of the proteins within a cell – helping us to understand how drugs work and how diseases develop. And with further innovations in computing, we may even be able to gaze into a 'virtual cell' and watch the proteins at work.
See also
- Human Genome Project
- Functional Genomics Development Initiative
- A biological dream team: Article describing multidisciplinary research in functional genomics
- Protein power: Article describing the proteomics work led by Professors David Tollervey, Colin Watts and Mike Ferguson



