From University of Pennsylvania
Spurred by the worldwide fight against malaria, scientists release parasite genome database
PHILADELPHIA -- An international team of scientists today unveiled an Internet-based database allowing genomic analysis of Plasmodium falciparum, the parasite responsible for the vast majority of malaria deaths worldwide. Developed as a collaboration between two research teams at the University of Pennsylvania, the Plasmodium genome database breaks new ground in bioinformatics by permitting detailed analysis of a genome even before its sequencing is complete.
"The release of this database is, in some respects, more important for malaria researchers than the sequencing of the human genome has been for researchers who study human disease," said David S. Roos, the Penn biology professor who has spearheaded the Plasmodium database project. "The difficulty of working with these parasites has frustrated research on new drugs and vaccines to combat malaria. The Plasmodium genome sequencing project greatly expands available information on the malaria parasite, and the PlasmoDB database provides researchers all over the world with the means to ac-cess that information."
The Plasmodium database, available as of today at http://PlasmoDB.org and on compact disc, builds upon sequencing efforts at the Institute for Genomic Research in Rockville, Md., the Naval Medical Research Center, Stanford University and Britain's Sanger Centre. That sequencing, conducted by dozens of researchers, is now at essentially the same stage of completion as the Human Genome Project.
Once thought to be on the wane, a combination of factors has brought malaria back to the forefront of human disease. More than 1 in 20 people worldwide contracted malaria last year, and more than a million, mostly children in Africa, are killed each year.
Malarial drug resistance has become a concern for public health officials in recent years; the ready availability of the Plasmodium genome sequence should speed the search for new drugs and vaccines to combat malaria. The database is primarily web-based, but a compact disc version is also available for researchers without reliable Internet access, including scientists working in the field in endemic countries.
The battle against malaria, ranked by the World Health Organization among the most pressing public health concerns worldwide, has driven the genome project -- an extensive collaboration between academic researchers, government scientists, nonprofit corporations, the Department of Defense, international health organizations, private foundations and public funding agencies. Plasmodium falciparum is by far the most complex pathogen genome sequenced to date, requiring extraordinary effort and ingenuity on behalf of the sequencing centers.
The sequencing of Plasmodium's roughly 30 million nucleotides -- 1 percent as many as in the human genome -- began in 1996. The genome has been sequenced several times over in a series of random pieces, which have been fitted together to recreate much of the genome. Several of the organism's 14 chromosomes have been completely sequenced and annotated, and scientists are now filling in the few remaining gaps and annotating the entire genome sequence. The sequencing effort has already identified virtually all of the parasite's genes.
"Like the sequencing of the human genome, that of Plasmodium falciparum has generated huge amounts of data," said Roos, also the director of Penn's Genome Institute. "While it may take years before all the I's are dotted and T's are crossed, it is important to provide researchers with access to the raw data as soon as possible and to equip them with tools to transform this data into a useful form."
The database created by the Penn team is a milestone for bioinformatics, which seeks to bring order to the flood of data generated by various genome projects. Scientists can use built-in data-mining tools to examine chromosome organization, to scan the genome for probable genes, to predict the structure of these genes and function of the proteins they encode, to look for patterns of nucleotides or amino acids and to search for gene functions analogous to those found in other organisms. The database will be updated constantly as Plasmodium sequencing progresses.
By utilizing a database architecture that is not tied to peculiarities of the malaria parasite itself, this project also points the way to development of similar databases for other organisms, permitting direct comparison of one organism's genome with another.
Four species cause malaria in humans, but Plasmodium falciparum accounts for more than 90 percent of fatalities. The PlasmoDB database also includes available data on other species of Plasmodium parasites.
The Plasmodium database project is a multifaceted effort, building on pioneering work in genome database development carried out by the Computational Biology and Informatics Laboratory at Penn under the direction of Christian J. Stoeckert and the late G. Christian Overton. Other key members of this team include Brian Brunk, Jonathan Crabtree and Jonathan Schug.
New data-mining tools for the Plasmodium project were developed by Martin Fraunholz and Jessica Kissinger, and the stand-alone CD version is the brainchild of Jules Milgram, all of whom are in Penn's Department of Biology.
Additional contributions to this international collaboration come from Ross Coppel and Robert Huestis of Monash University in Australia, Dinesh Gupta of the International Centre for Genetic Engineering and Biotechnology in India and Daniel Lawson of the Sanger Centre.
Financial support for the Plasmodium database comes from the Burroughs Wellcome Fund and the World Health Organization.