Feature: Big is best?
13 December 2006. By Ian Jones.

Well over a thousand genes underlying rare forms of inherited disease have been identified, but those underlying common causes of ill health - such as health disease, cancer and mental illness - have proved much more illusive. The prediction is that for common disease, many genes will have small effects, and will be influenced by other genes and environmental factors. Small wonder that, despite many years' toil, the number of genes definitely known to be involved in common diseases is small - probably less than a dozen.
So why bother identifying these genes? Ultimately, many benefits may accrue. For example, they may give insight into the mechanisms of disease and identify potential targets for new therapeutics. They may also allow more precise diagnosis of disease and hence more targeted treatment. And they may allow early identification of people at risk.
Tracking genes
How can these genes be tracked down? Although a number of different approaches are used, they share a similar principle: identification of a genetic variant that is consistently more common in people with a disease than in those without it.
In fact, it may be misleading to think in terms of a 'disease gene'.Much medically important genetic variation stems from changes in gene regulation – a gene may be active in the wrong place, or at the wrong time, or at the wrong level. In practice, researchers are looking for a biological feature that is reproducibly associated with a disease – a 'biomarker'.
One well-used approach involves twin studies. If one twin is affected and the other is not, genetic differences between the twins could be contributing to disease.
A second method is to use family studies. If a large family group affected by a disease exists, researchers can hunt for genetic markers showing the same inheritance patterns as the disease.
A related approach is to look at genetically isolated populations – those that have arisen from a small number of founders, so any diseases may have the same genetic cause.
By contrast, longitudinal or cohort studies focus not on specific subsections of a population but on a sample of the population as a whole. Unlike families or isolated populations, there is no guarantee that one person's heart disease, for example, has any relation to someone else's, so such studies need large sample sizes.
Finally, a case control approach can be adopted. Here, the genetic inheritance of people with a particular disease is compared with that of 'controls' – similar people without the disease. Again, if enough comparisons can be made, key genetic factors should begin to emerge out of the statistical background noise of chance associations.
These approaches all rely on statistical association, so great care needs to be taken to ensure that the association is not just a chance event. Hence the need for large sample sizes and for replication in a different sample to confirm that the initial association is real (although this is complicated by the fact that some variations may be relevant in some populations but not others).
Why now?
For several reasons, this is a good time to be pursuing these approaches. The availability of human genome sequence data is vital, but so too are dense maps of genetic markers, typically single nucleotide polymorphisms (SNPs), which act as labels for particular parts of the genome. In turn, new haplotype maps greatly reduce the number of SNPs that need to be analysed.
Technological advances have been important too. High-throughput genotyping is getting ever more efficient – so the haplotypes of many individuals can be identified very rapidly. In time, genotyping may actually become unnecessary – it will be possible to sequence partial or whole genomes just as easily.
As well as advances in genetic technologies, ever greater computing power is driving the number-crunching needed to analyse the huge amounts of data generated by genotyping projects.
Fortunately, Europe has several things going for it. Several countries have well-established public health services, which can be used to gather patient information (as in the UK Biobank project). The research base is strong in both genetics and epidemiology across Europe. Many countries already have large-scale studies running that could be linked together to provide additional benefits (e.g. EU-wide studies of twins). And European funding and political structures provide mechanisms to support collaborations across national borders.
What's needed now?
The realisation that 'more is better' has led to an emphasis on collaboration. By pooling data, researchers can be more confident about statistical associations or identify genes with smaller effects.
This is leading to greater networking between longitudinal projects, and efforts to standardise procedures and improve data sharing. A good example is the Public Population Project in Genomics (P3G), a not-for-profit international group working to standardise methodologies and improve coordination across biobanks.
Collaboration raises a number of issues, not least that of informed consent. The type of consent gained varies from study to study, and is unlikely to allow routine sharing of data. From a research point of view, more open-ended consent would be desirable. This would be a significant change from current practice – typically each use of an individual's data has to be approved – and would require careful communication of likely long-term benefits. Harmonising ethical guidelines between countries would also be useful.
More generally, how will population genomics impact on healthcare? The endeavour spans several communities. Biomarkers have to be identified (probably by academia), the discovery has to be shown to have potential public health value, and industry has to create practical products.
This raises the thorny issue of intellectual property, and balancing the need to provide commercial incentives with the overall objective of maximising public health benefits. It is essential, the conference heard, for legal agreements about intellectual property and exploitation to be in place before projects begin, and for participants to be aware that commercial exploitation is the likely route by which health benefits arise.
Discussions also emphasised the need for different communities – academia, governments establishing public policy, regulatory authorities and industry – to work together.
It is a sobering thought that, even when a biomarker has been unequivocally linked to a disease, moving to clinical application is far from straightforward.
Drug development, for example, remains a high-risk proposition – fewer than 10 per cent of all new compounds entering phase 1 trials are actually launched as products.
Population genomics research involves the participation of people on an almost unprecedented scale. It is thus only a viable approach when it is supported by the general public. The research community will therefore be depending on the goodwill and altruistic nature of their research subjects, as the health benefits are likely to be realised many years from now.

