A short correspondence featuring in the October issue of Nature Methods has highlighted the benefits of a community approach to gathering data that can help improve our understanding of the functions of genes.
The article highlights BBSRC-funded research that uses publicly available data as a powerful, cost-effective approach to producing comprehensive parts lists of functional elements of genomes.
The research, conducted by Professor Bertie Göttgens at the University of Cambridge, created HAEMCODE, a repository for transcription factor binding maps in mouse blood cells.
Transcription factors are proteins that bind to specific DNA sequences, thereby controlling the flow of genetic information from DNA to messenger RNA. Mapping these transcription factors helps to build annotated genomes that relate structure to function.
This research gathered this vital data using a community approach. Professor Göttgens explains: "Instead of making advanced decisions about the datasets as large scale projects do, we have compiled datasets produced by the community, as soon as they are deposited in public databases."
"We manually curated more than 300 studies from a wide range of mouse cell line models to create a compendium that covered 84 transcription factors. Currently available data from large consortium projects covers less than half of this."
The HAEMCODE repository demonstrates that community annotation and real-time curation of datasets generated by individual research labs across the world is not only a powerful, but also a very cost-effective approach for expanding datasets available to researchers.
The researchers also developed a web interface HAEMCODE to provide data access as well as a range of online analysis tools, designed to be useful to both experimentalist and computational biologists.
The project was funded thanks to a BBSRC Strategic Longer and Larger Award (sLoLa).
Notes to editors
Nature Methods, October 2013: Building an ENCODE style data compendium on a shoestring.