Visualising complex networks
Specialised computer programs help scientists interpret genomics data
Publication of the human genome heralded a new era of research in biology and, along with other social and technological developments such as wider use of the Internet, social networking sites and smart phones, was a sign that the industrial age had made way for the information age.
But how to interpret such volumes of information? Genomics has both divided and diversified into proteomics, transcriptomics, metabolomics and nutrigenomics (to name a few) as the power and speed of the technology has advanced more quickly than anyone could have imagined when the multi-billion dollar Human Genome Project was announced: a human genome can now be sequenced for under US$10,000 and in a tiny fraction of the time than it could just a decade ago.
'Omics' sciences will certainly give us 'big data', but with big data comes big problems. One of them is interpretation. Omics data contains millions if not billions of data points on the activities of genes, proteins and cellular metabolites. Visualising information on such a scale presents unique problems that must be addressed if we are to find the patterns in these huge data sets that could be used eventually to cure human and animal diseases, monitor resistance to deadly pathogens, and understand the growth and development of agricultural crops. To achieve this goal requires biologists, mathematicians, computer programmers and the like to come together to help solve the issues associated with dealing with this scale of data.
To tackle the problem, scientists are using new computer programs to visualize, interact with and analyse huge data sets. One such program is BioLayout Express3D, which, with funding from BBSRC, has been developed as an open source program (meaning anyone can access the code to modify or improve it) by Dr Tom Freeman at The Roslin Institute, an institute of BBSRC embedded in the University of Edinburgh.
'Big' biology requires massive computing power.
Image: BioLayout Express3D
Working initially with Dr Anton Enright at the European Bioinformatics Institute, Cambridge, who developed a forerunner of this program, Dr Freeman worked to develop the tool to work with microarray data and has now led the program's development for the last five years.
There are many advantages of using this program in systems biology ventures. "Many data types can best be visualised as networks and this tool now supports the analysis of very large datasets by constructing very large networks from it," says Freeman. "Using the power of visualisation also allows far better appreciation of the data we generate."
The program gathers together data on genes, transcripts or proteins and depicts them as nodes, or points, in a network graph. The lines between them, called edges, are then representative of the similarities or functional linkages between any two nodes, or even a cluster of nodes. This means that wider connections can be explored using more statistical rigour than simple pair-wise interactions (ref 1 and 2).
And because BioLayout Express3D can run on a standard PC with a 3D graphics card that can handle tens of thousands of nodes and millions of edges, researchers can see families or cliques of, say, compounds related to gene expression that might otherwise be missed if analysed and represented a different way.
Dr Freeman has used the tool to make a number of original and groundbreaking analyses of a variety of data and is currently exploring its use with the analysis cancer data. (See 'publications' on Dr Freeman’s Roslin webpage.)
With a little help from my friends
Dr Freeman and his co-developers are not the only ones to utilise this power of this program. "If we use website stats when the program is opened, as a guide we currently have about 1-2000 users globally," says Freeman, adding that numbers are growing as others become aware of the tool.
Visual representation of plant genomic repeat sequences.
Image: BioLayout Express3D
In fact, visualisation of data may be an emerging field in itself. "We are hearing from a number of sources that visualisation is now seen as one of new areas in data analysis in general and in this respect we are ahead of the game on a number of fronts," says Freeman.
And even though the program is open source, Freeman's extensive BBSRC-funded developments mean that potential exists to commercialise aspects of the program. "We are hoping that one economical outcome will be that we license the program to a software house," he says. "We are currently in early stage discussions with a couple of companies."
- Network visualization and analysis of gene expression data using BioLayout Express3D
- Construction, Visualisation, and Clustering of Transcription Networks from Microarray Expression Data
tel: 01793 413329