The 1000 Genomes Project
Any two humans are more than 99 per cent the same at the genetic level: the small fraction of genetic material that varies among people can help to explain individual differences in susceptibility to disease, response to drugs or reaction to environmental factors.
The 1000 Genomes Project therefore aims to produce an extremely detailed catalogue of human DNA variation that can be used in future studies of people with particular diseases.
Across most of the human genome, the researchers taking part in this international collaboration are looking for variations that are present at a frequency of 1 per cent or more in the population; in genes, the goal is to find variations that are present in 0.5 per cent or less of the population.
Producing a map at this resolution, which is unmatched by current resources, is likely to require sequencing of the genomes of at least 1000 people. The project is now on course to complete 'light' sequencing of the genomes of 2500 people from about 20 different populations by the end of 2011.
Using catalogues of human genetic variation, such as the International HapMap and Wellcome Trust Case Control Consortium, researchers already have discovered more than 100 regions of the genome that contain genetic variants associated with susceptibility to common human diseases such as diabetes, coronary artery disease, prostate and breast cancer, rheumatoid arthritis, inflammatory bowel disease and age-related macular degeneration.
However, researchers often must follow those studies with costly and time-consuming DNA sequencing to help pinpoint the precise genetic variants that are associated with a disease. The new map from the 1000 Genomes Project will enable researchers to zero in quickly on such variants, speeding efforts to use genetic information to develop new strategies for diagnosing, treating and preventing common diseases.
Recent improvements in sequencing technology (‘next-gen’ sequencing platforms) have sharply reduced the cost of sequencing and increased the speed at which it can be performed. However, the costs of 'deeply' sequencing an entire genome (by doing so the equivalent of 28 times) is still too high, so the project will 'light' sequence each individual's DNA to 4x coverage.
The improvements in sequencing technology have also enabled the project to increase its initial target of sequencing 1000 genomes to 2500 genomes by the end of 2011, sourced from about 20 different populations around the world. Combining the data from 2500 samples should allow for a highly accurate estimation of the variants and genotypes for each sample that were not seen directly by the light sequencing.
In the short film below, Drs Richard Durbin and Chris Tyler-Smith describe the key findings and significance of the pilot phase of the 1000 Genomes Project.
Running time: 3 min 58 s
View this video on YouTube
The genomes of a diverse set of populations are being sequenced for the project. To preserve the anonymity of the people involved, the collection of samples follows a series of ethical guidelines and they are taken on the basis of informed consent.
Some of the populations whose DNA is being sequenced in the 1000 Genomes Project include:
- Chinese in metropolitan Denver
- Gujarati Indians in Houston
- Han Chinese in Beijing
- Japanese in Tokyo
- Kayadtha in Calcutta, India
- Luhya in Webuye, Kenya
- Maasai in Kinyawa, Kenya
- Malawian in Blantyre, Malawi
- Toscani in Italy
- People of Mexican ancestry in Los Angeles
- People of African ancestry in the south-western United States.
- Puerto Rican in Puerto Rico
- Punjabi in Lahore, Pakistan
- Utah residents with ancestry from northern and western Europe
- Yoruba in Ibadan, Nigeria
For a comprehensive listing of all the populations involved and for more information about the ethics and policy governing the samples, visit the 1000 Genomes Project website.
A collaborative effort
The sequencing work is being carried out by an international collaboration, with work performed at the Wellcome Trust Sanger Institute, the Beijing Genomics Institute in China, and the National Human Genome Research Institute (NHGRI) Large-Scale Sequencing Network, which includes the Broad Institute of MIT and Harvard; the Washington University Genome Sequencing Center at the Washington University School of Medicine in St. Louis; and the Human Genome Sequencing Center at the Baylor College of Medicine in Houston.