New Algorithm Finds Stories in Biomedical Literature

Algorithm joins related publications in a chain from start to finish

A good story ties up all the loose ends. A new data-mining tool takes a stab at doing the same. Dubbed storytelling, the algorithm may make it easier to unearth unexpected connections in the avalanche of freshly published research, or among high-throughput datasets. For example, storytelling can sift through tens of thousands of PubMed abstracts to discover scientific links between two apparently unrelated topics; or draw connections across a knowledge structure such as the Gene Ontology.


As shown here, one might use storytelling to understand the pathways into and out of a quiescent state. Datasets evaluating desiccation-tolerant cyanobacteria (left, coccoid cells stained red and encased in an extracellular matrix, the exterior of which is stained green) could potentially be linked to studies involving the metabolic arrest and recovery of primary human fibroblasts (right, Live/Dead stain; image taken 72 hours after a two week metabolic arrest). Storytelling allows the biologist to link disparate datasets, allowing for the development of new hypotheses that can be tested at the bench and re-evaluated within the algorithm, ultimately resulting in new insights into the process of interest. Courtesy of Richard Helm.“What we are trying to do is link data sets very far apart,” says Richard Helm, PhD, associate professor of biochemistry at Virginia Polytechnic Institute. “In the end, we link data set A with data set Z in the form of a story.” The work was presented at the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2006), in August 2006.


Researchers might not have time to find complex relationships waiting for discovery in the literature, but storytelling does. It finds key documents bridging from one research publication (the starting point) to another (the end point). Using System X, a Virginia Tech supercomputer, the algorithm first classifies each PubMed article’s abstract into an organized branched set of terms. It can then make thousands of comparisons and join related publications into a chain connecting start to finish.


For example, Helm and his colleagues, used storytelling to dig through the literature seeking ties between two remotely related papers: one on tomato genes expressed in yeast and another on how chemical stress affects yeast gene expression. The supercomputer boiled down 140,000 yeast publications to nine abstracts—stepping stones from paper one to paper two. The results included a paper that identified a novel protein, expressed only when yeast cells are exposed to cadmium, which researchers might not have immediately connected with the first two papers. Although the paper might have surfaced in an ordinary PubMed search, it would have required much sifting to find it. While not every search will yield treasures, hopefully most results from storytelling will provide new insights and hypotheses that researchers can test at the bench, Helm says.


Bud Mishra, PhD, professor of computer science and cell biology at New York University, thinks storytelling can help biologists make new connections. “In some sense it closely resembles what biologists do, and it works in the same way that biologists think,” says Mishra.



Post new comment

The content of this field is kept private and will not be shown publicly.
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
Enter the characters shown in the image.