www3.unine.ch

Www3.unine.ch

SEMANTIC INDEXING FOR COMPLEX PATIENT GROUPING
Kilian STOFFEL†‡ Joel SALTZ†‡, Jim HENDLER†, Jim DICK‡, William MERZ‡ and we are working to provide medical specialists with In this paper we describe indexing techniques based the ability to express more complex queries without on domain knowledge made available in the form of becoming experts on the underlying data model. In ontologies. In high level interfaces like those used in this paper we show how we can support this efficient indexing, facilitate complex data access, and support advantageous to use terminology familiar to the end- high-level querying for users untrained in the details user of the system. This terminology is very often different from the one incorporated in the underlyingdatabase. Much of the terminology (i.e. concepts) in an MOTIVATING EXAMPLES
end-user ontology will map to complex collections of As stated above, our examples are motivated by a data warehouse attribute value pairs. We create a group of applications that involve the analysis of mapping between the end-user terminology and information in a medical data warehouse we are attribute-value pairs in the data warehouse. We constructing at Johns Hopkins Hospital. The data optimize performance by indexing the data warehouse warehouse includes in-patient and outpatient data using an ontology. We optimize performance by from the microbiology, blood bank, clinical chemistry indexing the whole database using an ontology that and hematology laboratories, along with pharmacy represents the end-user’s terminology. data and some clinical data. The microbiology data isobtained through a large and varied group of INTRODUCTION
identification techniques, which are generally carried A joint effort between computer scientists at the out in a stepwise manner. Protocols are then followed University of Maryland and clinicians at Johns to produce a more precise identification.
Hopkins Hospital focuses on using high performance The scope of this paper precludes a detailed computing technology in support of medical discussion of microbiological taxonomy 1. A bacterial applications. One focus is on providing tools that support medical data warehousing. One particular 1. structural attributes of size, shape, Gram stain project focuses on providing database support to microbiologists and infectious disease specialists.
The taxonomy used to describe microbiological data is complex; this terminology also changes with time. A problem we have encountered is the difficulty 4. biochemical information on cell composition and clinicians and researchers face in formulating queries that capture the kinds of questions they wish to pose.
As a means for overcoming this sort of difficulty, we are looking at how to efficiently integrate Microbiological identification requires a process ‘‘semantic’ knowledge, stored in the form of thesauri of iterative refinement; there is not a single battery of (or more technically, ontologies) in ways that support tests that can be applied to all specimens to obtain a efficient indexing of large databases. In particular, The set of protocols required to precisely identify 80% of the specimens belonging to any sub-category an organism can be represented as a directed acyclic graph. Most organisms are not precisely identified as Note that we need to define what is meant by an the iterative process of organism identification effective antibiotic. In the context of a clinical continues only as long as clinically useful information laboratory, MIC values are used to evaluate antibiotic is obtained. A highly imprecise identification is effectiveness. No single MIC threshold defines permissible as long as no clinical decision hinges on a effectiveness for all antibiotics; each antibiotic has its more complete identification. Identical organisms may be identified to different degrees of precision as Our tools support queries that characterize the clinical requirements may vary by anatomic site or effectiveness of antibiotics with respect to many possible categories of organisms. The two categorical terms (antibiotics and Enterobacteriaceae) allow us to microorganisms and the non-uniform degree of discuss two types of categories. The first term microorganism identification create difficulties for "Antibiotics" is replaced by a list of all antibiotics those who wish to pose queries to a clinical data warehouse. We will present some realistic examples "Enterobacteriaceae” is replaced by all its sub- of queries that are of clinical interest at Johns categories. Sub-categories of Enterobacteriaceae are Hopkins Hospital. The clinical context associated not disjoint. For instance, Citrobacter constitutes one with these examples is an ongoing effort to subcategory and Citrobacter diversus and Citrobacter characterize microorganism antibiotic susceptibility.
freundii constitute two additional sub-categories.
This issue is of continuing concern because of thecontinuing Which groups of patients have been infected by microorganisms. In the examples below, note that the organisms that have been proven to be increasingly minimal inhibitory concentration (MIC) of an antibiotic is the lowest concentration of antibiotic that For a given group, we want to identify classes of organisms that exhibit an increase in resistance, overtime, to specific antibiotics. This scenario assumes Produce a histogram of the minimal inhibitory that we have an ontology that specifies a set of groups concentrations to the antibiotic Ciprofloxacin for all to which each patient belongs. Examples of such specimens that grew out gram negative non-lactose groups could include a neighborhood, school and workplace (in an outpatient setting) and might More precisely, the task is to: (1) report the include hospital unit and service in an in-patient Ciprofloxacin minimal inhibitory concentrations of setting. Other groupings based on demographic data all specimens that grew out organisms that belong to (e.g. age) would also be possible. Furthermore each category C (sub-concept) of gram negative non- groups would also include potentially relevant lactose fermenting bacteria, and (2) produce a diagnostic categories or current treatment with a histogram for each category C grouped by MIC value.
therapy that involves immunosuppression.
Note that in this query, the user doesn’t have to know which bacteria are supposed to ferment lactose.
SEMANTIC INDEXING
Furthermore, as long as the ontology defines a species In this section we describe a sophisticated as being one that ferments lactose, it does not matter indexing schema which allows us to support the kinds whether or not the protocol used to identify the of queries described here in an efficient way. Our bacteria actually involved evaluating whether or not a indexing scheme is based on ontologies: taxonomic information with additional links that representassociated properties.
Which antibiotics are effective against 80% of organisms recovered from cultures that grew out any Ontologies
employed to represent ontologies are DAGs (directedacyclic graphs). A node in the DAG is called a concept and represents a specific object or action.
antibiotics are effective against 80% of specimens The directed links pointing from one concept to belonging to the concept "Enterobacteriaceae" or another define the concept/sub-concept relationships.
Microorganism
Gram negative
Bacteria
Enterobacteriaceae
Escherichia coli
Hafnia alvei
Microorganism
Figure 1: Relation between an Ontology and a data base.
to concepts in the ontology. This mapping requires knowledge (WordNet) 2 ,other ontologies specifically the use of a data dictionary to translate database target medicine (e.g. SnoMed 3 and UMLS). 4 We attribute value pairs to ontology concepts. An define a mapping between the attributes used in the example is given in Figure 1. The solid links are data warehouse and the terminology represented in an ontological links and the dashed links are indexing ontology. Users employ terms defined in an ontology when generating data warehouse queries. The use of Thus, for example, we have indexing links from ontologies to help users select terms needed in their the ontology concepts Escherichia coli and Hafnia
queries has been described by several researchers, see alvei to tuples 1 and 3. We also have an indexing link
from the ontology concept Enterobacteriaceae to
Ontologies can also provide a way to group tuple 2. Tuples 1 and 3 each record a specific records of a database in a semantically meaningful microorganism genus and species. Tuple 2 records a way. This type of semantic grouping can be used to microorganism that is identified only as a member of optimize query performance. We anticipate that the family Enterobacteriaceae. Note that Escherichia experts will frequently access data using groupings coli and Hafnia alvei are also members of the family defined in an ontology. The ontology can be used to Enterobacteriaceae, but we only index the concept create indices which allow us to retrieve data grouped that maps directly to an attribute value pair.
by ontological concepts. Just as b-trees permit theretrieval of a range of data, we are able to retrieve aset of records which are semantically associated with Data Structures and Algorithms
a concept in an ontology. These optimizations make Data structures: Each concept consists of three
it practical to develop a tool for formulating complex components: (1) The concept ID, (2) a list of sub- queries such as the queries depicted in Section 2.
concepts, and (3) a list of pointers to database tuples.
The sub-concept list associated with concept C is a USING ONTOLOGIES FOR INDEXING
list of pointers into the ontology file that identifies allsub-concepts of C. The list of pointers to database Indexing
tuples identify instances of concept C. These basicdata structures are used by the following algorithms: In order to use the ontologies for indexing we have to establish links between the data in the data Instances: This operation retrieves all the
warehouse and the concepts in the ontology. All pairs direct instances of a concept.
of attributes and values in the database† are mapped universal relation. This is not a necessary condition, † Throughout this paper we will make the but simplifies the discussion of the basic ideas.
simplifying assumption that data is stored in a 1: A := TRANSITIVE_SUBCONCEPT(Enterobacteriaceae)2: B := TRANSITIVE_SUBCONCEPT(ANTIBIOTIC)3: C := {susceptible, very susceptible }4: for each c in C D := D union TRANSITIVE_INSTANCE(c)5: for each a in A6: for each b in B E[a,b] := TRANSITIVE_INSTANCE(b)∩ TRANSITIVE_INSTANCE(a) ∩D report (a,b) if (count(E[a,b])/count(TRANSITIVE_INSTANCE(a)))>0.8 Figure 2: Pseudocode for Example 2.
Sub-Concept: This operation retrieves the sub-
Enterobacteriaceae sub-concept a proved to be Transitive Sub-Concept: This operation returns
susceptible or very susceptible to antibiotic b.
all sub-categories reachable from a concept by In our database, we maintained results for 27 following all possible directed links.
antibiotics and we store 65 sub-concepts of Transitive Instance: Transitive instance is a
Enterobacteriaceae. Thus Example 2 allows us to combination of the transitive sub-concept operation evaluate antibiotic susceptibilities for 1755 different and the instance operation. In the first step, all combinations of organism and antibiotic. This query reachable sub-concepts are collected. In the next step, required roughly 40 minutes on a 150 MHz Pentium all instances of these concepts are gathered.
PC using a database with 20521 records oforganisms. The scope of this paper precludes a DISCUSSION OF THE EXAMPLES
detailed discussion of results obtained. We found, forinstance, that over 80% of our Enterobacteriaceae In this section we present a more detailed were susceptible or very susceptible to amikacin, description of Example 2. Space limitations prevent ciprofloxacin, ceftazidime, cefuroxime, gentamicin, us from presenting detailed discussions of the other piperacillin, trimethoprim/sulfa, ticarcillin, and tobramycin. For all species of Citrobacter takentogether, only amikacin, ciprofloxacin, gentamicin, Example 2
and tobramycin were effective against 80% of Which antibiotics are effective against 80% of specimens. However, for Citrobacter diversus three organisms recovered from cultures that grew out any additional antibiotics were effective in 80% of Figure 2 depicts the steps that must be carried out In order to execute this query without using the to implement Example 2. In Step 1 we calculate all indexing scheme, most of the 1755 different transitive subconcepts of “Enterobacteriaceae” and combinations of organism and antibiotic would have assign this to Set A. Set A will then contain all to be accounted for through large disjunctive queries.
microorganism names that are subconcepts of“Enterobacteriaceae”. In Step 2 we assign to set B the RELATED WORK
transitive sub-concepts of the concept “Antibiotic”.
In Step 3 we create a simple set with the two concepts The integration of ontologies into data base “susceptible” and “very susceptible”. The union of all systems is of growing interest. Due to the space the transitive instances of “susceptible” and “very restrictions we are not able to provide here an susceptible” is then assigned to set D in Step 4. Set D overview of all the different approaches. Ullman thus contains all patient records where susceptible presented a selection of these ideas in 6 .
bacteria were found. We then intersect the set D with: There are a wide variety of projects that address (1) all transitive instances of each sub-concept of issues associated with making it easier for users to “Antibiotic” and (2) all transitive instances of all formulate medical database queries (e.g. . 5, 7, 8). Our sub-concepts of “Enterobacteriaceae”. This yields a focus is somewhat different as we provide a powerful collection of sets E(a,b). Each set E(a,b) contains (but not simple) user interface for automatic generation of very large sets of related queries.
Enterobacteriaceae sub-concept a and each antibiotic sub-concept b. Finally, in step 6, we report (a,b) only presented in this paper can be found in some of the more powerful search engines that target text data Journal of the American Medical Informatics bases; these search engines are mainly used in WEB Association, Volume 3, Number4, Jul/Aug 1996.
browsers. 9 10 In our approach, we apply ontology Jeffry D. Ullman. The database approach
indexing schemes to relational databases rather than to knowledge representation. Proceedings of the
to text databases (although in future work we will extend our work to a combination of relational and Association for Artificial Intelligence. AAAI Press, text databases). Another important difference is that we support transitive closure operations on concepts C. Safran, D. Porter, J. Lightfoot et al.
and instances. This functionality increases flexibility CliQuery: a system for online searching of data in
and makes it possible to carry out sets of closely a teaching hospital. Ann Intern Med. 1989;111;751-
related queries in an optimized fashion.
Complex indexing schemes are also known in the B. Leao, E. Reategui. HYCONES: a
deductive data base community. A good overview of Hybrid Connectionist Expert System. SCAMC
these systems is given in 11. These systems offer more functionality than the system we propose but do altavista-web.
not scale well with increasing database and ontology http://altavista.digital.com/av/lt/help.html, Altavista.
[10] medline-web
CONCLUSION AND FUTURE WORK
[11] Jose Alberto Fernandez, Jarek Gryz, and
Jack Minker, Disjunctive deductive databases:
We presented an efficient semantic indexing Semantics,
architecture,
scheme for complex grouping operations. We showed Proceedings of the 4th Bar -Ilan Symposium on this scheme could be used for integrating ontological Foundations of AI, pages 256--274. AAAI Press, and relational data. A prototype of the system is currently being tested in the Johns Hopkins Hospital, [12] J. Saltz, G. Agrawal, C. Chang, R. Das, G.
the prototype is being used to track the spread of Edjlali, P. Havlak, Y.S. Hwang, B. Moon, R.
antibiotic resistant bacteria, evaluate patterns of Ponnusamy, S. Sharma, A. Sussman and M. Uysal,
antibiotic use, and to screen for nosocomial Programming Irregular Applications: Runtime
infections. In the near future, we plan to parallelize support, Compilation and Tools, In Advances in
the prototype to allow it to function in an interactive Computers,Academic Press, Vol 45, 1997.
manner. In past work, we have parallelized many ofthe computational components employed in thisprototype12 and we anticipate that parallelizing ourprototype should be relatively straighforward. We arealso currently exploring the use of these indexingtechniques in other domains in which complex dataretrieval is used and where ontologies can begenerated.
REFERENCES
I.Bergey.
“Systematic
Bacteriology”.
George A. Miller. “Human language
Department, Green Hall, Princeton University, 1996.
Roger Code. “Systemized Nomenclature
of Human and Veterinary Medicine: SNOMED”.
College
Veterinary Medical Association, 1993.
[4] UMLS.
“Unified
Language
System”. National Library of Medicine, 1994.
George Hripcsak, Barry Allen Janes J.
Cimino and Rober Lee. Access to Data:
Comparing AccessMed with Query by Review.

Source: http://www3.unine.ch/files/content/sites/imi/files/shared/documents/papers/Semantic_indexing.pdf

b-wegen.ch

Erfolgreich und sicher ans Ziel Rauchstopp bei kardiovaskulären Patienten ORLANDO – Spätestens nach dem ersten Ereignis muss der Rauchstopp zum erklärten Ziel erhoben werden. Medikamente . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . zur Rauche

mnap.tn

CURRICULUM VITAE IDENTIFICATION : ________________________________________________________ Nom : MAHFOUDHI Prénom : Sana Etat civile : Célibataire Adresse : 44 rue el ferdaws ,el bhira Bizerte 7000 Téléphone :(+216)22 451 944 E-mail : beauzart-girl@live.fr LANGUES :________________________________________________________________ Rédaction Français