BBSRC is not responsible for the content of external websites
From loyalty cards to proteomics and the birth of the super experiment
22 December 2011
Like tracking customer buying preferences with a loyalty card, BBSRC-funded researchers at the University of Dundee are using business intelligence software to discover patterns and trends in proteomics data which were not visible previously.
It's a novel solution to a common problem, and one that has allowed the team to overcome a major bottleneck to proteomics research: how to compare and integrate data from many independent experiments. Now with a suite of customised software under their belt, the team is developing a super experiment for functional proteomics analysis, which in the long term could help to deliver on the promises of the human genome project of tailor-made medicines.
A few years ago it was usually only possible to study one or two proteins at a time. Today researchers can generate data for thousands of proteins and their associated genes in a single experiment. While this is great in principle, the reality is that colossal amounts of untapped data are generated that are usually kept in isolation, with little or no chance of being used at a future date.
Professor Angus Lamond, Director of the Wellcome Trust Centre for Gene Regulation and Expression at the University of Dundee, recognised early on that the collection and analysis of proteomics data was limiting progress. "We needed a more sophisticated approach," he said.
"Working with colleagues in Dundee at the School of Computing, we quickly realised that business intelligence techniques could be the key to creating what I've termed, 'super experiments'."
By creating a multi-dimensional database to manage all the consistently annotated data from the many hundreds of proteomics experiments in Lamond's lab, he realised that it would also be possible to integrate data from multiple experiments and extract information that simply isn't available when individual experiments are considered in isolation.
But instead of using business intelligence software to track consumer spending patterns in order to maximise profits, for example in the use of own brand versus high value products, by high volume, or by season or location, Lamond and his team would track multiple variables such as cell type, whether proteins are switched on or off, the location of proteins with a cell and post-translational modifications.
"It's the same concept, just the names are different," he says.
According to Lamond, such an approach could open the door to large scale, functional proteomics experiments to find out exactly what multiple proteins, are doing in cells and the relationships between them. Because most medicines affect proteins rather than DNA, there is the potential to build on the findings of the human genome project and use the integrated information from proteomics experiments for the development of safer, more tailor-made drugs.
The first step was to create a customised suite of software - called PepTracker. The development of PepTracker was supported in part through the Radical Solutions for Researching the Proteome (RASOR) programme, funded by BBSRC, the Engineering and Physical Sciences Research Council and the Scottish Funding Council, as well as by a BBSRC funded PhD studentship and additional support from the Wellcome Trust.
BBSRC-funded PhD student Yasmeen Ahmad, a computer science graduate who built PepTracker explains how the system works, "Users start by designing and performing a proteomics experiment in the laboratory. As well as the data output from the mass spectrometer, we also collect and record a great deal of metadata about each experiment. Among other things, this includes information about the specific mass spectrometer that was used, the cell line, genotype, extract analysed etc. as well as the time, date and the researcher. The measured data and associated metadata are entered into PepTracker and then stored on a dedicated database server - the data warehouse."
PepTracker provides researchers with a set of very powerful, bespoke tools for analysis of proteomics data, based in part on Microsoft® business intelligence software (Follow link to Microsoft case study for further details).
"We couldn't use off-the-shelf software, and that meant assembling a team of people who understand both worlds - the experimental design, the instrumentation, and the informatics challenge," says Lamond. "When Yasmeen started work my lab, shedidn't know what a protein was. But I think she was fascinated by the opportunity and has learned quickly the background biology."
"It was a steep learning curve, I was and still am constantly learning" says Ahmad. "I was excited and intrigued to see where it could go.
"The team comprises a diverse group of people, from lots of different countries and scientific backgrounds, which makes things interesting, richer. We all have different experiences, some have a life science PhD, other PhDs in protein chemistry, mass spectrometry or computing science, so our skill set is very diverse and I know I can call on anyone if I have questions."
Although PepTracker was initially conceived as focussed software solution for experiments in Lamond's laboratory, the team reached a turning point when they realised that they could do so much more outside of one project.
"These tools will provide insightful analysis through interactions with individual datasets, as well as allow for comparisons of data produced by different researchers, using both similar and different experimental methods," explains Lamond. "They will thereby help to promote new collaborations and to cross-fertilise projects."
The software is already used by other researchers in Dundee. And, having proved the principle behind his super experiment approach, Lamond is now seeking to lead a major expansion of the proteomics facility to provide the scale of resources needed to move the project forward.
In addition, the team are continuing to develop novel software, working with both academic and commercial collaborators to enhance the use of very fast, parallel computing solutions and business intelligence. Their aim is to continue to innovate, making software tailored to the specific needs of the new types of experiments and building better, faster tools to analyse proteomics data.
"This project has grown lots of arms and legs," says Ahmad. "There are so many exciting branches for this work over the next few years, and I'm so pleased to be a part of it."
Systematic analysis of protein pools, isoforms and modifications affecting turnover and subcellular localisation. Molecular and cellular proteomics DOI: 10.1074/mcp.M111.013680 mcp.M111.013680.
tel: 01793 414695