Genkin A.A.
Russia, St.Petersburg, Research Finn "Intelligent Systems",
email genkin@atlant.ru
TOWARD CONSTRUCTING INTELLIGENT MEDICAL SYSTEMS THAT FORM ALGORTTHMICALLY THE INFORMATIONAL IMAGE OF DISEASE
Abstract
This report contains the methodology of constructing and the experience of exploitation of the intelligent system OMIS. This system extracts automatically knowledge out of empiric data and uses it for solving complex expert problems in different domains of medicine. The essence of this methodology is the introduction of a probabilistic measure on the basis of interval and binary (matrix) structures and a concilium of rules for decision making.
Генкин А.А
Россия, С. Петербург, Научноисследовательская фирма “Интеллектуальные Системы”
email genkin@atlant.ru
К ПОСТРОЕНИЮ ИНТЕЛЛЕКТУАЛЬНЫХ МЕДИЦИНСКИХ СИСТЕМ, АЛГОРИТМИЧЕСКИ ФОРМИРУЮЩИХ ИНФОРМАЦИОННЫЙ ОБРАЗ БОЛЕЗНИ
Аннотация
В докладе излагается методология построения и опыт эксплуатации интеллектуальной системы, автоматизировано извлекающей знания из эмпирических данных и использующей их для решения сложных экспертных задач в различных областях медицины. В основе методологии  введение вероятностной меры с помощью интервальных и бинарных (матричных) структур и консилиум решающих правил.
The principal difficulties in the development of expert medical systems are connected not only with the complexity of formalization of medical knowledge but rather with its absence [1]. The uptodate medicine science has limited possibilities in early diagnoses of tumour diseases and insufficient representation on homeostasis of vital pathological processes (such as atherosclerosis, infections, inflammation, intoxication, hypoxia, and so on). The modem science is very distant from the understanding of spacetime organization of physiological processes, and the very valuable information containing in EEG or ECG, pneumogram, at al. are not yet used for cognition of fundamental mechanisms of vitality.
In the last years at the constructing of expert systems knowledge bases a tendency appears to use not only the knowledge of domain experts but the information itself, which is extracted from data [2, 3].
At the design of medical expert systems, which automatically form the knowledge base from empiric data, the main part of the work of domain expert is devoted to the development the computer shells of caserecords. The modem level of medical informatics and computer techniques allows to map by the dialog with a user the data of his computer caserecord into mathematical informative structures such that they becomes objects of the knowledge base of some intelligent system. Further we describe a successful experience of constructing such a program complex [4 ].
Let a .vector x = (x^{1},х^{2},..., x^{n} ), a collection of N numerical and qualitative attributes, be an element of some ndimensional space. Vectors x_{1},x_{2},…x_{N}, contain the information on the time cut of states of N patients (or the information on states of one patient at N instances of time). A set of vectors {x}={x_{1},x_{2,…,}x_{N}}, defined by a clinical situation D (referent condition) is called an image in the space of attributes and is denoted by {x}_{D} According to the system approach [5, 6]
{x}_{D} ^{=} {x_{1},x_{2},…x_{N}} may be considered as a subset of the direct product Q(x_{1}) x Q(_{2} ) x ... x Q(x_{n}), where Q(x_{i}), i=1, 2, ..., n , is a range of possible values of the attribute x_{i}, compatible with the life of organism [4 ]. If instead of estimation of averages and correlation coefficients (which in the case of medical data lead to large losses of information) we shall use probabilistic measures for all Q(x_{i}) and for binary relations Q(x_{i})xQ(x_{j}),then the possibilities of description of information on the image {x}_{D}  will be essentially wider than in the case of a multidimensional normal model.
In other words, it is supposed that the basic information on {x}_{D} is contained in the subset S of theset { Q(x_{1}),..., Q(x_{n}), Q(x_{1})xQ(x_{2}),..., Q(x_{1})x Q(x_{n}), Q(х_{2})хQ(х_{3}),...
Q(x_{2})xQ(x_{n}),...,Q(x_{n1})xQ(x_{n})}.
It means that we have a mapping
{x}_{D} S (*)
Mapping (*) allows:
1) to reduce the main information on ndimensional manifold of arbitrary complexity to the information containing in onedimensional and twodimensional objects for which the introduction of a probabilistic measure is possible;
2) to use adequate methods of frequency analysis in the case of different types of clinical laboratory and instrumental data (numbers, enumerations and so on).
The introduction of a probabilistic measure in the ndimensional space has required a significant increase of the number of attributes, since instead of one “dimensional image we must analyze n onedimensional and n(n1)/2 twodimensional sets. At the same time this introduction itself permits to estimate effectively the significance of information containing in elements of the set (*) and, consequently, to decrease the number of these elements [7, 8]. Therefore, the most important information on the clinical situation may be represented economically in the form of a small number of frequency distributions, or interval and binary matrix structures.
Let {x}_{D1}, {х}_{D2} ..., {x}_{Dm} be a set of vectors or images, which are induced by clinical situations D_{1},…,D_{m} , which reflects objectives standing before an investigator; [a, b] be a interval of attribute x variations; x be any value of the attribute x. Consider a partition d of the interval [a, b] into several (no more then 4) separate interval, whose lengths are not fixed preliminary. Denote by p_{s} (x/D_{k}) the frequency of getting the value of attribute x into the sth interval. For two clinical situations D_{k} and D_{l} , for the best partition of the interval [a, b] we choose a partition, which guaranties the maximal or near to it value for the Kullbak functional J(D_{k} : D_{l}, x) [ 7 ]. Such a partition d allows us to use the differential and diagnostic possibility of the attribute x for the pair of referent conditions D_{k} and D_{l} in the best way. Intervals, which realize the partition, together with the probabilities of occurrence of a value of an attribute in each of these intervals we shall call an interval structure. The boundaries of intervals come into being interval structure are defined by the goal standing before a user. Every time, when a concrete problem is solved, they underline the most valuable differentialdiagnostic information.
For two attributes x_{i} and x_{j} if we know partitions _{i} and for each of them, we can construct in a natural way estimates p_{l} (x^{i} , x^{j} / D_{k} ) being the frequencies of getting a pair of value of attributes x_{i} and x_{j}, into the corresponding rectangles. The set of these rectangles together with estimates
p_{l} (x^{i} , x^{j} / D_{k} ) we offer to call binary (matrix) structure, and the corresponding pair of attributes to call a twodimensional attribute. Interval and binary (matrix) structures are new objects in the medical informatics. They characterize effectively variability of medicalbiological attributes and emphasize differentialdiagnostic information in the cases, when other methods cannot do it [4 ]. Constructing interval and binary structures, one has the possibility to select onedimensional and two dimensional informative attributes. On the basis of Kulback's functional and of a number of observations, which take part in forming interval and binary structures, attributes may be ordered by the increase of the level of significance of differences P [7, 8].
Let us select one of clinical situations D. For it and for a collection of attributes x_{1}, x_{2} ,..., x_{n} the measure of Kullback J(D :D_{j} ,x_{k} ) j = 1,2, ...,m; k = 1,2,... , n finds a subset of interval and binary structures, which in the best way characterize the difference between D and other clinical situations being considered.
The found in this way set of interval and binary structures we call an informational image of the clinical situation D (informational image of the disease or state). It can be constructed easily if D_{j} ( j = 1, 2,..., т) and the set of attributes x_{1} , x_{2} ,..., x_{n} are given. In the development of a decision making rule it is not necessary to use all informative attributes (i.e. all attributes entering the informative image of a disease). Always there exist subsets of informativelyvaluable attributes (different for different strategies of recognition), which guarantee the good results.
The search of informativevaluable attributes can be realized in two variants. The essence of one being the simplest is as follows.
1th step. From the set of informative attributes the one is chosen such that it guarantees on the teaching group the minimum of the sum of probabilities of classification errors.
2th step. Among the remaining informative attributes the one is chosen, which together with the first one guarantees the minimal sum of classification errors on the same group; and in this case each attribute, which with the first one decreased the result of the first one only, is deleted from the further consideration. This process goes until the sum of classification errors would cease its decrease.
The second variant, more laborious, is such that at each step during the choice of current attribute all remaining attributes are inspected, not those only who amended result on the previous step. This variant of the search of informativevaluable attributes leads to a subset, which gives the better results. The number of informativevaluable attributes is much lesser than the number of all informative attributes, hi the intelligent system OMIS the objects of the knowledge base are algorithmically formed informativevaluable interval and binary structure together with_ statistical strategies of pattern recognition (of NeumanPearson, Wald, Bayes, sequential Bayes). The first three strategies are well known. The last one (multistep Bayes algorithm) orders attributes by the decrease of their informativeness, and Bayes formula is used sequentially. At each step, excluding the first one, a posteriory probability computed at the previous step, is considered as a priory probability, the decision is made at the last step in favor of a hypothesis for which the a priory probability was a maximal one [4].
The formalization of knowledge with the help of interval and binary structures allows us to develop algorithms for the solving of one and the same differentialdiagnostic problem, which differ as by collections of informativevaluable attributes so as by recognition strategies. The decision made by the concilium of such decision rules [9], which is guaranteed by the special organization of knowledge base and expert module of the system OMIS, leads to results exceeding the possibilities of the uptodate clinical experience[ 4 ].
The methods considered are especially effective for the development of intelligent packages for analysis of physiological processes. The existing at the present time systems for analysis of EEG and other processes are incapable to collect useful physiological and clinical information. The peculiarity of physiological waves is connection of different phases of unit cycles of activity having different functional states. A cyclical process is not a simple change of increase and decrease of some factor but is a sequential change of qualitative different states. Therefore the most important medicalbiological information on time dynamics of some process is not contained in an amplitude frequency spectrum or in slow components of a time series; it is in the knowledge how the antecedent phase of this process conditions the next following one and how it in its turn conditions the phase that comes after it. In the basis of our methods destined for the analysis of time structure lays break physiological waves down into discrete sequences of characteristics of separate (unit) oscillations. The time series obtained are considered as elements of some whole in the time representing the initial physiological process [10].
Such an approach leads to the description of informative data on different physiological processes and relations between them in the common form of matrices, which are conform with matrices of binary relations of clinical and laboratory data (binary structures). Such a uniform description opens the wide range for study of physiological processes. The connection of the research module with devices for obtaining information on living organism (reocardiomonitors, monitors of arterial pressure and pulse, electroencephalography, biochemical analyzers at al.) allows the automatization of the process of empiric medicalbiological cognition [4].
References
[1]. Van Bemmel JJB. Medical Informatics, Art or Science? IIMeth. Inform. Med. 1996.  v..35.p. 157172.
[2].ПереверзевОрлов B.C. Проблемы и концепции построения интеллектуальных партнерских систем // Компьютеры и познание.  М., 1990.  с. 52  57.
[З]. Осипов Г.С. Приобретение знаний интеллектуальными системами. М., Наука, 1997.
[4]. Генкин А.А. Новая информационная технология анализа медицинских данных — СПб. Политехника, 1999.
[5]. Месарович МД. Общая теория систем и ее математические основы // Исследования по общей теории систем. Сборник переводов / Общая ред. В.Н. Садовского и Э.Г.Юдина  М., Прогресс, 1969.
[6]. Klir G.J. Architecture of systems problem solving. N.Y. Plenium Press, 1985. Russian trasl.:Клир Дж. Системология. Автоматизация решения системных задач  М., Радио и связь, 1990.
[7]. Kullback S. Information theory and statistics . N.Y. John Wiley&Sons, 1958. Russian transl.: Кульбак С. Теория информация и статистика  М., Наука, 1967.
[8]. Колмогоров АД. Предисловие редактора перевода книги [7].
[9]. Растригин Л.А. Эренштейн Р.А. Метод коллективного распознавания  М., Наука, 1981.
[10] Генкин А.А. Медведев В.И. Прогнозирование психофизиологических состояний. Вопросы методологии и алгоритмизации.  СПб. Наука, 1973.
Site of Information
Technologies Designed by inftech@webservis.ru. 
