Genomics – a new dawn
Mention genomics and, for many people, large, heavily-funded international consortia, such as the human genome project, spring to mind.
Today, the ready availability of second- and third-generation sequencing platforms, means that it is possible for individual labs to sequence whole genomes – be they human, plant or microorganism – in a matter of hours and for a fraction of the cost.
These advances are giving us a glimpse at the impact that genomics is set to make across bioscience and related industry sectors, such as in the fields of personalised medicine, disease surveillance and on-farm monitoring.
Earlier this month, the US sequencing technology company Illumina announced that they had "broken the sound barrier" of human genomics with the launch of a system capable of sequencing a full human genome at a cost of $1,000 (ref 1). The HiSeq XTen system is the latest in the evolution of Illumina's core sequencing technology, which is based on BBSRC-funded research at the University of Cambridge (see section 'The next generation').
This price tag is seen as the threshold at which DNA sequencing technology becomes cheap enough to be used widely, for example in medicine, helping in diagnosis and in matching drugs to an individual's genetic make-up. But how did we get to this point, and what can we expect from genomics in the future?
In the beginning
DNA sequencing has been possible since the 1960s. But it wasn't until 1977, with the development of a new method – Sanger sequencing, that the first DNA-based genome sequence (of the virus φX174) was completed (ref 2).
By the mid-1980s, the small genomes of several microbes had been mapped and partially sequenced and the possibility of sequencing the human genome was beginning to take hold. Then, one late night in 1986, in a bar in Bethesda USA, Thomas H. Roderick of the Jackson Laboratory, Maine coined the term 'genomics' as a name for a new genome-oriented scientific journal (ref 3). Little did Jackson know that, over the following decade, this name would become a central part of modern bioscience.
Genomics became a new way of thinking about biology. Where researchers were once limited to the study of single gene products, they could now simultaneously analyse the entire complement of gene and protein sequences within an organism. The approach also allowed scientists to obtain insights into how differences in DNA sequence – both between individuals and between species – can have profound influences on traits such as height and appearance as well as on susceptibility to diseases and the origin of drug resistance.
The publication, in 1996, of the complete genome sequence of baker's yeast (Saccharomyces cerevisiae) – the first eukaryote genome to be sequenced – was therefore seen as an important navigational aid for scientists to steer their way through much larger and more complex genomes.
"Almost half of yeast's genes have equivalents in humans," explains Professor Steve Oliver a BBSRC-funded researcher from The University of Cambridge, who, while working at The University of Manchester Institute of Science and Technology, initiated the sequencing of yeast chromosome III with funding from the Science and Engineering Research Council (a forerunner to BBSRC). This grew into a European project, led by Prof. Oliver, which resulted in the publication of the first entire chromosome sequence from any organism – yeast chromosome III.
"What surprised and, indeed, chastened us at the time was that only 20% of the genes revealed by the chromosome's sequence had been identified by classical genetic approaches. It was immediately clear that, in the future, the traditional route of genetics research would be reversed. One would need to proceed from gene to function, rather than from function to gene."
Eighteen years on, and the models and experimental systems Prof. Oliver and his team have developed with BBSRC funding have sometimes led them in unexpected directions.
"We have been able to predict the impact of gene copy number variation in cancer, construct network models to identify genes important in Alzheimer's disease, or use yeast 'surrogates' to screen for drugs against parasitic diseases," Prof. Oliver says.
The Human Genome Project itself, which began in 1990 with a budget of US$ 3Bn, was envisaged to take 15 years due to its size – it's more than 250 times that of the yeast genome. But, as a result of advances in the technology and novel computational methods, the project ended up being completed one year early.
Looking back, the Human Genome Project can also be seen as a watershed in the open access publishing movement (ref 4). The decision to place the sequence in the public domain as soon as it was generated helped to ensure that this key research resource could be used by scientists the world over. The move to an era of fully open access, both to data and research publications, is one to which BBSRC, along with the other UK Research Councils, is fully committed. BBSRC encourages our research community to make their research outputs readily available through open access publications and repositories, such as Europe PMC (formerly UK PubMed Central) – a digital archive of full-text, peer-reviewed research publications, which is part funded by BBSRC (ref 5).
The next generation
"The cost and speed of decoding DNA, base-by-base, has improved by more than a million fold, to the point where today a high-quality human genome can be sequenced for under $1,000," says Professor Shankar Balasubramanian. "The main technology that has delivered this is Solexa-Illumina sequencing which is based on BBSRC-funded ideas and inventions from my lab and David Klenerman's lab in Cambridge."
In the late 1990s, while investigating the action of a key DNA replication enzyme, DNA polymerase, using single-molecule fluorescence, Balasubramanian and Klenerman conceived of an approach to decode DNA by colour-coding each of the four DNA bases (known by the letters A, T, G and C) to give a readout of a DNA sequence. The technique usesreversible dye terminator chemistry to add a single nucleotide to the DNA template.
By building this system into a chip, he and Professor Klenerman enabled the simultaneous sequencing of hundreds of millions of DNA fragments. Solexa's first product – the Genome Analyser – was announced in 2005. It was immediately successful and, in 2007, Solexa was sold to the US company Illumina for $600M. Illumina has gone on to corner two-thirds of the global sequencing market (ref 6).
Today, sequencing has become an essential tool in the biologist's toolkit and, with a number of new technologies on the horizon, including portable sequencing machines (see box opening a 'pore' to the future), the volume of sequencing experiments is only going to increase.
Mick Watson, Director of ARK Genomics, a BBSRC National Capability based at The Roslin Insitute and embedded within Edinburgh Genomics, explains, "Once it becomes possible to sequence everything so quickly and cheaply we will do exactly that. Specialist software will be needed and this may end up costing more than the data generation.
Opening a 'pore' to the future
In 2012, the British company Oxford Nanopore Technologies (ONT) presented a tiny DNA sequencer that would plug into the USB port of your computer and sequence DNA directly from blood using an engineered nanopore. BBSRC-funded training grants played an important part in demonstrating that biological nanopores could differentiate between the individual DNA bases in a single DNA molecule – a key aspect of the sequencing methods developed by ONT.
Single molecule DNA sensors such as these promise real-time read-outs of DNA. Not only will this bring the cost of sequencing down further, but it will enable the development of bedside (and pen-side, and field-side) diagnostics.
"It is truly amazing," says ONT founder Hagan Bayley, Professor of Chemical Biology at the University of Oxford. "It’s amazing in terms of sequencing technology, but also in terms of single molecule detection – many other molecules can be detected this way."
Bayley and his company have set their sights beyond the realms of DNA to sense metal ions, drugs, biochemical markers almost anything in an aqueous solution.
"Dealing with mountains of data is not a new problem. In the early '90s people were worried about the size of sequence trace files and we needed bioinformatics tools to organise and analyse them. Then came microarrays, data sets got bigger and people panicked and we needed new tools for microarray data. Now we have next generation sequencing and data sets are even bigger. It's cyclical, and dealing with big data is nothing new. But what's clear is that we need expertise and skills to be able to ask the right questions."
This is the ethos behind The Genome Analysis Centre (TGAC), which was established in 2009 and is strategically funded by BBSRC. TGAC gives the UK scientific community access to world-class data processing and analysis capability. Together, TGAC and Edinburgh Genomics, provide UK bioscientists working on plant, livestock and microbial genomics with ready access to the latest sequencing technologies and bioinformatics capabilities.
Right now, these approaches are being applied to characterise gene expression activity, for example to understand embryo development; to model genetic diversity between populations of crop plants; to trace fast-evolving viruses; to understand the genetic basis of rare diseases; to discover novel cancer biomarkers; to study complex microbial communities (such as the human microbiome); and many other applications.
Following the crowd
In 2011, a global 'crowd-sourcing' effort, kick-started by Dr Nick Loman at the University of Birmingham, raced to find the source of an outbreak of E. coli O104:H4, which struck more than 4,000 people across Europe, claiming 50 lives, and had a serious impact on fresh vegetable sales throughout the EU. Through their analysis of publicly amassed data, TGAC scientists identified several genes that may have contributed to making this particular strain so deadly (ref 7).
More recently, some of the same team at TGAC, together with other BBSRC-funded scientists at the John Innes Centre and led by researchers at the Sainsbury Laboratory, all on Norwich Research Park, set up an 'open source' platform for scientists to share data and publish results relating to the tree killer ash dieback disease. Speed is of the essence to try to characterise and develop a solution to the problem. Novel research technologies have also been developed, such as a puzzle game hosted on Facebook called Fraxinus (ref 8) that utilises the game-playing skills of the public to help analyse some of the huge amounts of genetic data.
"Inexpensive high-throughout sequencing technologies have revolutionised biology. Looking into the future, the application of computational methods to model complex biological processes will be an area to watch. Novel data analysis methods and ingenious approaches to generate and validate large number of hypothesis will be at the heart of biology," says TGAC's Director, Dr Mario Caccamo.
"Computers are now an essential component of modern biological research," says Watson. "Biologists are being asked to adopt new skills in computer science and statistics. It's important that we help them achieve this. Through bioinformatics training programmes being developed at TGAC and Edinburgh Genomics, as well as through access to some of the largest high performance computing systems in Europe (Hector and Eddie), we hope to do so." (ref 9)
Dr Caccamo adds, "Computer scientists have worked for many years on approaches to capture the mathematics governing dynamics systems. We will see more work at the interface between computer science and biology to help tackle some of the key questions, for example in the field of epigenetics."
The fifth 'element'
While the genetic code is a set of instructions for making proteins, there is more to the story than just the sequence of DNA bases. DNA is altered by enzymatic modifications that affect gene expression in organisms – picking up these molecular nuances is critical if the information in DNA is to be utilised in a way that is clinically useful on an individual basis.
As well as the four canonical bases A, T, G and C, scientists have discovered a 'fifth' epigenetic base 5-methylcytosine (5mC), and a 'sixth', 5-hydroxymethylcytosine (5hmC), which are present in the DNA of humans and many other organisms. Both of these modified bases have distinct functions in a wide variety of biological processes, such as cell differentiation, neurodegenerative diseases, cancer, stem cell dynamics and a variety of other processes.
Prof. Balasubramanian has developed new chemistries to decode these modified bases, with funding from BBSRC. This has led to a recent start-up company, Cambridge Epigenetix Ltd, to develop and commercialise the decoding of modified bases. The company is based on Babraham Research Campus where scientists at the Babraham Institute, which is strategically-funded by BBSRC, have a strong focus on epigenetics research.
Understanding more about where and when epigenetics changes are taking place, for example in response to environmental stresses such as drought or famine, will help us relate this to practical situations in agriculture and healthcare and, hopefully, present new opportunities to intervene.
Developments in DNA synthesis technologies and 'genome editing' for example, could allow us to make direct use of our biological knowledge by targeting specific genes in order to generate novel antibiotics, produce secondary metabolites in plants or obtain high-value compounds from microbes.
But why stop at editing single genes?
The rise of the synthetic genome
In 2013, Minister for Universities and Science David Willetts announced nearly £1M funding for the UK arm of an international consortium attempting to build a synthetic version of the entire yeast genome.
"A synthetic genome will allow us to re-program yeast and our goal is to use it to produce new antibiotics as resistance arises to existing ones," says Dr Tom Ellis from Imperial College London, one of the leaders of the UK team, which also includes his colleague Professor Paul Freemont, Prof. Steve Oliver from the University of Cambridge and Professor Andrew Elfick from The University of Edinburgh and is supported by BBSRC with co-funding from the Engineering and Physical Sciences Research Council.
Sc 2.0, once completed, will provide unparalleled opportunities for asking profound questions about biology in new and interesting ways, such as: where do new species come from? What are the bare essentials of life – and what happens if an essential is lost? Harnessing this new understanding will have many new uses, for example in the development of yeast that can tolerate higher ethanol levels.
Similarly, once we start using genomics to analyse complex ecosystems such as farms, who knows what we may find? Bacteria and viruses will exist in these locations, undetectable with current technologies, which may never cause any problem for the farmer or animal. But, once discovered, will farmers be put under pressure to remove them anyway? There may well be trade-offs between safeguarding production and yields and protecting ecosystem biodiversity.
One thing that is certain, the future impact of genomics will only be fully realised through close collaboration between doctors, farmers and scientists, and with the support of an engaged public.
- 1953: The double-helical structure of DNA is described
- 1964: First complete gene sequence, alanine transfer RNA from yeast
- 1966: The genetic code, comprised of different arrangements of the four DNA bases (A, T, G and C), is cracked for each of the 20 types of amino acid
- 1977: First DNA-based genome sequence, of the virus φX174
- 1983: Polymerase chain reaction invented, a method for rapidly and dramatically increasing the amount of specific DNA sequences
- 1990: The Human Genome Project begins
- 1992: First complete chromosome sequence, chromosome III from yeast
- 1996: First eukaryote genome sequence, Saccharomyces cerevisiae
- 2000: First plant genome published, Arabidopsis thaliana – the Arabidopsis Genome Initiative was part-funded by BBSRC
- 2000: ARK Genomics was set up by BBSRC as part of its ‘investigating gene function’ initiative to provide access to functional genomics technology and resources relevant to the animal health community
- 2002: Streptomyces coelicolor genome published, led by researchers at the BBSRC-supported John Innes Centre
- 2002: Mouse genome published
- 2004: Human Genome project completed, unveiling more than 20,000 genes
- 2004: Chicken genome published. Researchers funded by BBSRC contributed to the sequencing and analysis of the chicken genome, the establishment of a library of short DNA sequences that span the entire genome and the development of tools to help probe the function of individual genes in the future
- 2005: Solexa Sequencing announces its first next-generation sequencing product – the Genome Analyser
- 2009: New UK national research institute launched, The Genome Analysis Centre
- 2009: Cattle genome published. UK scientists, supported by BBSRC played a key role in the annotation and analysis of the genome as part of a 300-scientist international collaboration
- 2010: The first sequence coverage of the wheat genome, is publicly released by UK researchers, funded by BBSRC
- 2011: Potato genome published. The UK component – led by the James Hutton Institute – of the potato genome sequencing consortium was part-funded by BBSRC
- 2012: Pig genome published. The UK component of the international swine genome sequencing consortium was part-funded by BBSRC
- 2012: Tomato genome published. UK researchers, focussing on chromosome 4, set the standard for the quality of sequence produced internationally. The UK team were part-funded by BBSRC
- 2012: High resolution draft barley genome sequence published. The UK component of the international barley genome sequencing consortium was part-funded by BBSRC
- 2012: Oxford Nanopore Technologies announces new hand-held sequencing technology
- 2012: An integrated encyclopedia of all the functional DNA elements in the human genome (ENCODE) is published in Nature
- 2013: ARK Genomics receives a £1.1M funding boost from BBSRC to support its new status as a National Capability, providing access to next-generation sequencing, high-throughput genotyping, microarrays and bioinformatics
- 2014: First whole-genome draft of the annotated wheat genome sequence released by the International Wheat Genome Sequencing Consortium (ref 10)
- 2014: Price tag of sequencing an entire human genome falls to $1,000
- Science enter $1,000 genome era
- Nucleotide sequence of bacteriophage fX174 DNA
- Beer, bethesda, and biology: how “genomics” came into being
- Realities of data sharing using the genome wars as a case study – an historical perspective and commentary
- Genomics, prizes, DFID and open access
- In sequence survey: Illumina holds two-thirds of sequencing market, splits desktop shre with Ion PGM
- TGAC helps in crowd-sourcing analysis of E. coli strain
- Gamers to join ash dieback fight-back
- Read more about bioinformatics training on Mick Watson’s blog along with some advice for biologists in his co-authored paper, " So you want to be a computational biologist?"
- Annotated bread wheat genome sequence on EnsemblPlants