Feature: Highly cited
28 November 2008. By Ian Jones
Founded in 1992, the Sanger Institute has grown into one of the world’s leading centres of genome-based research. There at the beginning was Richard Durbin, whose background in mathematics and computing has proved essential to the Sanger’s processing and analysis of genome data. “I’d worked with John Sulston at the LMB [the MRC Laboratory of Molecular Biology] in Cambridge,” he explains. “When he moved to Hinxton to establish the Sanger Institute he asked me to join him.” Having grown disillusioned with his area of research at the time, neural nets - “I wanted to model nervous systems but it was diverging from science: it wasn’t the brain” - he leapt at the opportunity.
Beginning with the Human Genome Project, Dr Durbin has witnessed at first hand the extraordinary growth of DNA sequencing. In 1992, the Sanger Institute’s output was around 100 000 base pairs a day. During the heyday of the Human Genome Project in 2000, this had jumped to 10 million base pairs. And in 2008, thanks to ‘next-generation’ sequencing machines, the Institute has been churning out an astonishing 10 billion base pairs - equivalent to three human genomes - every day.
Dr Durbin has had the challenging role not only of ensuring that the Sanger Institute can cope with this ever-increasing flood of data but also of discerning meaning in its endless streams of As, Cs, Gs and Ts. Where are the genes? Where are the control regions? Where are the conserved sequences - those seen across a range of species? Where is the variation - the bits that differ between people?
The fundamental nature of these questions, and the value of the software tools used to answer them, has involved Dr Durbin in numerous projects - from the analysis of the human genome to the development of widely used software tools such as Ensembl, Pfam (protein families) and WormBase (a nematode worm genome database). All have generated influential scientific papers. Indeed, a recent analysis by ScienceWatch ranked Dr Durbin as Europe’s most highly cited researcher.
“It was a bit of a surprise,” he admits. “I was amused more than anything. It’s somewhat artificial. It hasn’t changed my life!” In reality, he says, he was lucky enough to be a member of several multidisciplinary consortia, each with several key contributors.
Dr Durbin is likely to rack up more citations thanks to his involvement in the 1000 Genomes Project, a US$50 million partnership between the Sanger Institute and centres in the USA, China and Germany, funded by the Wellcome Trust and others, which will provide a more detailed picture of human genetic variation. The Human Genome Project produced a reference sequence while follow-ups such as the International HapMap Project have begun to identify the sites at which human genomes differ. Added to this has been the flow of information from ‘personal genomics’ projects, such as the sequencing of the genomes of James Watson and Craig Venter.
Yet the picture of rare human genetic variation is obscure - even though, as many genome-wide studies have shown recently, it accounts for a significant proportion of the risk factors for common diseases. To tackle this problem, the 1000 Genomes Project will analyse the genomes of a further 1000 people - generating a staggering 6 trillion base pairs of sequence information.
Indeed, suggests Dr Durbin, sequence productivity is following its own version of ‘Moore’s law’ - the observation that computer memory capacity doubles roughly every 18 months. In fact, with next generation sequencers, doubling time has actually got faster.
This is having a profound influence: “I think we’re witnessing a shift in how science is done,” suggests Dr Durbin. “Sequencing machines are beginning to infiltrate labs. And doing sequence-based studies is becoming affordably cheap.”
Just as recombinant DNA technologies were rapidly picked up in the early 1980s, so other fields will adopt genomic tools. An ecologist studying a population of birds, for example, could soon be integrating genomic analyses of individuals within that population.
Indeed, Dr Durbin sees genome based approaches putting the genes into population genetics: traditionally, the field has treated them mostly as theoretical concepts. “Genetics in the 20th century was mainly indirect - it was hard to observe genes directly. Since about 2000 you can easily look at whole genomes, you can look directly at the scale of human genes in individuals.” Population-based studies with genome-wide breadth are a real possibility - the beginnings of which can perhaps be seen in metagenomic studies of bacterial populations.
Such studies have the advantage of generating huge amounts of data - an advantage, that is, if you have someone like Richard Durbin to make sense of them.
Image: Dr Richard Durbin at the Wellcome Trust Sanger Institute; Wellcome Images.