Exploration Using Signatures in Annotation Graph Datasets than the less expensive Bevacizumab. Both drugs aremonoclonal antibodies ( mab) that target specific molecules The widespread development and adoption of ontologies to or organisms. In 2008, Bevacizumab cost Medicare $20 capture semantic domain knowledge and the growth of an- million for about 480,000 injections, while Ranibizumab notation graph datasets has created many opportunities for cost $537 million for 337,000 injections. Several studies have shown no superior effect of Ranibizumab over Ontologies are developed by domain experts to capture Bevacizumab for the treatment of macular degeneration.
knowledge specific to some domain. The biomedical com- Our research addresses the challenge of large scale Linked munity has taken the lead in these activities. Every model Data analytics of annotation graph datasets, using semantic organism database has genes and proteins that are widely knowledge from ontologies. We define an Annotation Signa- annotated, e.g., with controlled vocabulary (CV) terms from ture between a pair of scientific concepts, e.g., a pair of drugs the Gene Ontology (GO). The NCI Thesaurus (NCIt) ver- or a pair of genes. The annotation signature builds upon the sion 12.05d has 93,788 terms and the LinkedCT dataset of shared annotations or shared CV terms between the pair of clinical trial results circa September 2011 includes 142,207 concepts. The signature further makes use of knowledge in drugs or interventions, 167,012 conditions or diseases, and the ontology to determine the ontological relatedness of the 166,890 links to DBPedia, DrugBank and Diseasome. At the shared CV terms. The annotation signature is represented opposite end of the domain spectrum, the Financial Indus- by N groups (clusters) of ontologically related shared CV try Business Ontology (FIBO) captures knowledge about the terms. For example, the annotation signature for a (drug, structure, properties and behavior of financial contracts.
drug) pair will be a set of N clusters, where each cluster LinkedData and the Linked Open Data cloud (LOD cloud) includes a group of ontologically related disease terms.
has also made available many annotation graph datasets We define the Annotation Signature problem of creat- where scientific concepts such as genes, drugs and diseases ing a many-to-many partitioning of the edges of a bipartite are marked up (annotated) with controlled vocabulary terms graph between two sets of annotations (Palma et al. 2013a; (CV terms) from ontologies. The challenge is to explore 2013b). We then summarize the challenges in exploiting do- these rich and complex annotated datasets, together with the main specific semantic knowledge, including the ontology domain semantics captured within ontologies, to discover structure and relationship types between concepts. We show patterns of annotations across multiple concepts that may how we can tune the (ontologically related) similarity score lead to potential discoveries. For genes, these patterns may between node pairs, and produce clusters of more closely involve cross-genome functional annotation, e.g., combining related terms that are more useful to the domain scientist.
the GO functional annotations of two model organisms such This research was partially supported by NSF award as Arabidopsis thaliana (a plant) and C. elegans (a nematode DBI1147144. We thank our collaborators: Eric Haag and or worm), to predict new gene functions or interactions.
Heven Sze, University of Maryland; Gilberto Fragoso and Drug target prediction, with a goal of finding new targets Sherri De Coronado, National Cancer Institute; Guoqian for existing drugs, has received widespread media attention and has resulted in some notable successes, e.g., Viagra.
Beyond drug target prediction, there are many applications where one may need to provide a comprehensive report of all Palma, G.; Vidal, M.-E.; ; Raschid, L.; and Thor, A. 2013a.
known evidence about a pair or family of drugs, e.g., to make Annsigclustering: A semantic-driven clustering technique clinical policy recommendations. The New York Times re- for annotated linked data. In Proceedings of the LIS Work- ported on November 3, 2010 that Genentech began offering secret rebates to about 300 ophthalmologists in an apparentinducement to get them to use more Ranibizumab rather Palma, G.; Vidal, M.-E.; Haag, E.; Raschid, L.; and Thor,A. 2013b. Measuring relatedness between scientific enti- Copyright c 2013, Association for the Advancement of Artificial ties in annotation datasets. In Proceedings of the ACM BCB Intelligence ( All rights reserved.


