BBSRC is not responsible for the content of external websites
Feature: Plenty to sing about in genomics research
Sequencing the human genome, heralded by some as mankind's greatest endeavour, has provided the launch pad for the growing application of genomics in science.
Analysis and comparisons of the growing swathes of genome data promise to impact greatly on some of the biggest challenges we face, opening up possibilities for new drugs, drought-resistant crops, disease-resistant livestock and new varieties of energy crops.
Dr Jane Rogers, Director of The Genome Analysis Centre. Copyright: BBSRC
In 2003, exactly 50 years after Watson and Crick unveiled the structure of DNA, an international consortium of scientists announced they had referenced the entire human genome with 99.99% accuracy. Since then more over 1,200 genome sequences have been published, of which more than 60 are vertebrates, including the mouse, pig, chicken, giant panda and the woolly mammoth. Two of the latest additions to this 'genome club' are the pea aphid and the zebra finch.
Copyright: ThinkStock 2010
Dr Jane Rogers, Director of The Genome Analysis Centre (TGAC), a BBSRC institute, explains why this is such an exciting time for science, "The technology developments that have emerged since the human genome was completed have already taken human genomics to a new level of investigation of the underlying causes of human genetic diseases. It originally took hundreds of scientists from around 20 labs all over the world 10 years to sequence the human genome. Things have moved on so much, it is now possible to sequence an individual's genome in a single lab in a matter of days."
Indeed, a number of individuals have had their genomes sequenced, including Archbishop Desmond Tutu. The 1,000 Genomes Project was launched in 2008 - an international consortium of scientists looking to sequence the genomes of at least 1,000 people from around the world, and catalogue the variation. And last year, another international consortium announced the Genome 10k project, whereby researchers plan to create a 'genome zoo' of 10,000 vertebrate species.
And for every headline-grabbing sequence, there are many more equally significant projects, such as the 1001 Genomes Project looking at whole-genome sequence variation in 1001 strains of the model plant Arabidopsis thaliana.
Dr Rogers explains that, in the wake of faster, cheaper technology, genomics is becoming available to all researchers, "It is now becoming feasible to use genomics as another tool on the bench - another assay," she says.
Next generation sequencing
When the Human Genome Project started, DNA sequencing was slow and laborious with a lot of manual input. During the lifespan of the project, sample handling and automated sequencing improved but, even at the end of the project, it took one year to read one billion bases and cost around $0.10 per 1000 bases.
In contrast, today's sequencers can read a gigabase of DNA in less than a day and at a 10-1,000-fold lower cost (note 1). Professor Neil Hall from the University of Liverpool, who is analysing variation in the wheat genome, explains, "Next generation sequencing covers a number of new technologies that take advantage of in vitro cloning of DNA and high-performance imaging to sequence massive amounts of DNA. A single next generation instrument today would produce more DNA sequence than every DNA sequencing machine on earth 10 years ago in the same period of time."
The genomic era
With these advances in sequencing has come an explosion in data. Analysing and storing such massive amounts of data has required rapid development of bioinformatics for computational manipulation and analysis.
Dr Mario Caccamo, Head of Bioinformatics at TGAC, explains, "The challenge for bioinformatics is to come up with new solutions that can take full advantage of the data generated by sequencing instruments."
The first issue is how and where to store the datasets produced by sequencing. Several international databases have been developed, which are pioneering both for their size and for the fact they are open access, something Rogers feels very proud about, "Genomics has opened up science in a way it has not been opened up before. There are a plethora of useful databases and as more people use them and contribute to them, the more they are being refined and the more useful they become."
Zebra finch. Copyright: ThinkStock 2010
One such database, Genbank, receives sequences from labs produced across the world and is growing at an exponential rate, doubling every 18 months. In 2000, 10 million sequences were available; by 2008 this had increased dramatically to over 98 million. EMBL-bank is the European equivalent of the US-based Genbank. In the past year, EMBL-EBI (European Molecular Biology Laboratory's European Bioinformatics Institute) has launched several new portals, including ones for plants, bacteria, protists, fungi and metazoa. Many smaller, organism-specific online repositories exist too.
But data storage is only one part of bioinformatics, "The interpretation of the information and how this could be integrated is where the real value of bioinformatics resides," explains Caccamo. "A challenge we all face in bioinformatics is that the sequencing technologies are changing quickly, and with these changes come new opportunities. These new opportunities mean that we need to think of new approaches to analyse ever increasing datasets."
One such approach has been developed by Mick Watson, Head of Bioinformatics at the Institute for Animal Health (IAH), an institute of BBSRC. He explains, "To understand a genome, we need more than just the DNA sequence itself. Many scientists must work together to define which parts of the DNA are functional and what they do. Post-genomics experiments tell us about the function of specific pieces of the genome, and the datasets from these experiments require sophisticated software for their analysis."
The Bioinformatics team at IAH have developed software specifically to analyse microRNA data. MicroRNAs are small non-coding RNAs that alter the function of other genes. However, for many of these microRNAs it is not known exactly which genes they target. So the team developed CORNA - a piece of software specifically designed to look for patterns in these data sets. When given a list of genes from an experiment, CORNA looks for microRNA targets that are enriched within that list; or, given the predicted targets of a microRNA, CORNA will tell you if any particular functions, such as immune response, are enriched. The statistics produced help scientists interpret the data from a variety of experiments.
CORNA was developed initially for a project looking at Marek's disease virus in chickens. "However", Watson explains, "It was clearly of generic use, so we released it under an open source license and published it in the journal Bioinformatics."
The software has been an integral part of the Songbird Consortium looking at the zebra finch genome.
Genomics has moved beyond data cataloguing. "Generating sequence data is no longer the primary challenge - interpreting the data is where the new challenge lies. And this is one of TGAC's primary roles - to become a national centre of excellence in bioinformatics,"
One of the major drivers in setting up TGAC was to increase UK sequencing and bioinformatics capacity, particularly in plant, animal and microbial genomes. TGAC, which celebrates its first birthday in July, is currently in its start-up phase and is focussing on the sequencing and analysis of genomes from economically important organisms with applications in areas such as bioenergy, nutrition and agriculture. The first call
for projects is currently underway and the Centre already has four different sequencing platforms.
Rogers concludes, "Whilst other types of work are needed alongside data generation, the more high quality, accessible data sets there are, the better science can be enabled. Having high quality data stored and freely available to researchers across the globe is an exciting reality with endless possibilities."
Developing a virtual nucleus
3 actively transcribed genes from different chromosomes clustering at a specialized transcription factory. Courtesy of Drs Chakalova and Schoenfelder, Babraham Institute
Having a fully sequenced genome is only the start of the process; the new challenge is to gain a greater understanding of how genomes are organised and the relevance of this order.
Researchers from the Babraham Institute, an institute of BBSRC, are doing just that. In a paper, published in Nature Genetics in January, they revealed for the first time that genes work together by huddling in clusters inside the nucleus. These findings represent a paradigm shift in our understanding of how the genome is spatially organised in relation to gene expression. It marks the first step towards a 'virtual nucleus', a dynamic tool simulating interactions in the nucleus, which could revolutionise computer-based drug design.
Dr Peter Fraser, Head of Babraham's Laboratory of Chromatin and Gene Expression explains, "The specific three-dimensional arrangements of the genome in different cell types represent a missing link in our understanding of how our genome works. To understand how cells can change from one type to another, which is a critical question for stem cell therapies, we will need to understand how the spatial reorganisation of the genome is controlled."
Identifying the transcription factories and developing a virtual nucleus potentially opens the door for more effective drug design. These are the first steps in a very long journey that will lead to computer models that simulate genome behaviour and function in development, differentiation, health and disease.
'Preferential associations between co-regulated genes reveal a transcriptional interactome in erythroid cells' was published in Nature Genetics 42, 53 - 61 (2009).
First member of the wheat, barley and forage grass subfamily sequenced
Grasses plays a crucial role in our food chain and more increasingly in energy production too. But there are significant barriers limiting crop improvement, namely a lack of knowledge about gene function and the difficulty of analysing their large, complex genomes.
Copyright: Brachy TAG project John Innes Centre
The publication of the first wild grass genome, Brachypodium distachyon - a member of the wheat, barley and forage grass subfamily - in February is an important step in addressing this.
Professor Michael Bevan from the John Innes Centre, an institute of BBSRC, who led the project with the US Department of Energy Joint Genome Institute and other American researchers explains, "Our analysis of the Brachpodium genome is a key resource for securing sustainable supplies of food, feed and fuel from established crops such as wheat, barley and forage grasses, and for the development of crops for bioenergy and renewable resource production.
"It is already being widely used by crop scientists to identify genes in wheat and barley, and it is defining new approaches to large scale genome analysis of these crops, because of the high degree of conserved gene structure and organisation we have identified."
Analysis of the Brachypodium genome has provided new insights into how grass genomes evolve and expand, as well as to demonstrate how the plant can be used to navigate the closely related, yet larger and more complex, genomes of wheat and barley.
The team also found Brachypodium to have other important features, including a rapid life cycle and a very compact growth habit, making it ideal for study in the lab.
The findings from this project will enable scientists to determine the functions of genes involved in grass productivity, which could have the potential to accelerate research in sustainable food production and new sources of energy.
'Genome sequencing and analysis of the model grass Brachypodium distachyon' was published in Nature 463, 11 February 2010, doi:10.1038/nature 08747
Dr Jane Rogers, TGAC
tel: 01793 414695
fax: 01793 413382