ABSTRACT:
Information entropy measures are widely applied to evaluate the
degree of probabilistic dependencies between random variables. At the
level of data analysis, one can efficiently apply entropy to selection,
extraction and reduction of features providing optimal data models. We
discuss two applications of information entropy to modeling data
dependencies:
The first application is related to the rough set approach to the
construction of classification models. It is based on the paradigm of
reducing attributes irrelevant with respect to determining a
distinguished
decision attribute. The degree of such irrelevance can be expressed in
probabilistic terms, by using entropy. Hence, one can consider a kind of
approximate reduction principle, claiming that attributes should be
removed during the reduction process, if and only if entropy of the
model remains approximately at the same level. We discuss various
specifications of this principle, as well as computational complexity of
optimization problems concerning the search for entropy based
approximate decision reducts.
The second application is related to the notion of an approximate
Bayesian network, capable to encode the statements about approximate
conditional independence between random variables. The usage of
information entropy to approximate the notion of probabilistic
independence is a natural consequence of its fundamental properties. The
notion of an entropy based approximate decision reduct becomes to
correspond to the notion of an approximate Markov boundary - irreducible
subset of random variables making a distinguished variable approximately
independent from the rest of them. We discuss the advantages of dealing
with approximate conditional independence statements and approximate
Bayesian networks while analyzing real life data. We also show
mathematical foundations for generalizing results concerning classical
Bayesian networks onto the entropy based approximate case.
|