Monday, November 21, 2011

Defeasible Inference -- Inheritance


Defeasible Inference -- Inheritance

     Recall that formal logic uses a single uniform "language" within which to represent knowledge. A representation that has this property is often referred to as a domain independent representation. One of the advantages of a domain independent representation language is that the procedures that manipulate the expressions in the language can be written so that they are domain independent. Deduction in first order logic is the same procedure whether we are reasoning about knowledge about cooking or knowledge about Euclidean geometry. But, it turns out that the algorithms that implement the deductive procedures of first order logic are generally not very efficient. As the number of facts that must be considered increases, the number of possible inferences increase very rapidly. Consequently, the time and space required for the computation of a deduction can become very large.
     It seems to generally be true that we can design a domain dependent representation language that is more efficient in reasoning about certain aspects of the domain than a domain independent representation language. What is lost, of course, is generality and in the case of logic we usually also lose the guarantee that if a proof for some statement exists then it will be found. (Newell reflected on this tradeoff between what he referred to as weak methods and strong methods back in the 60's. Weak methods are very general but usually rather inefficient; whereas strong methods are usually quite efficient but are limited in generality because of their domain dependence. We will see this distinction again when we look at the question of what constitutes expertise.)
     One suggestion to account for human "deductive-like" reasoning is that we utilize not a domain independent representation language but domain dependent languages. And, AI researchers have developed domain dependent reasoning strategies and carefully investigated the advantages and disadvantages of their use.
     You may recall the figure on the right from our discussion of search. In this case, I adopted a very simple and restricted language for use in representing the block problem. And, it would be quite easy to write special procedures for this language that could carry out certain types of deductively valid inferences. For example, it would be trivial to write a procedure to "deduce that block C is to the 'right of' block A. Now note that I have used this as an example of a domain dependent language. But the "data-type" of this language is that of a string (i.e., a sequence of characters).
     In what sense, are strings a domain dependent language for the "Blocks domain?!" Well, what people really mean by domain dependent is this: if the syntax of the language rather directly reflects a property of the domain that we wish to reason about; then we tend to refer to that representation language as domain dependent. Here the string, a left to right structure, mirrors the left to right structure in the domain. And, I used the '#' to delimit stacks and the order that I wrote the stack down to mirror the "on top of" relation. These syntactic properties of the "data-type" can then be exploited to yield efficient, but limited procedures that yield deductively valid inferences.

 Trees, Hierarchies, and Inheritance
     One of the earliest and most thoroughly investigated special representational language can be thought of as a kind of graph known as a tree. This type of graph was used to represent information about classes, instances of the classes, and properties of the classes. You will probably remember that we used trees to represent search. Trees are much beloved graphs because they have some properties that can be exploited when writing algorithms to search the trees.
     The figure to the right illustrates the way in which a variety of class information, in this case information about animals, could be organized. The organization is shown as a tree where Animal is the root node.
     Various classes of animals are shown in blue and these classes are organized in a hierarchy that reflects the class inclusion relations among these classes. For example, both Birds and Mammals are subclasses of Animal; Mammal is a superclass of Elephant; and so on. In white, at the terminal nodes of this tree, are shown instances or individuals that are members of the class. For example, both jumbo and clyde are members of the class Elephant. Finally, in yellow are shown various properties that are characteristic of a class. For example, Birds fly; and so on.

     Contrast this representation with the figure to the left. This figure contains the same information but expresses it as a set of logical implications.Note that more information is explicitly stated in this figure. There would be even more except that I got lazy and simply entered '...' to indicate that much more would have to be listed to make it complete. Note also that on the left are some of the logical implications that involve negation. Again, many more would have to be listed to complete the list. Finally, although I did impose some organization on the presentation of the information in this figure; this organization is not required nor depended on when we carry out deduction. The statements could be written down in any order. Note, that this isn't true for the tree representation.
Comparing these two figures should help you see that the tree structure is a succinct way in which to organize this type of information. And, because it is organized in his way, an inference procedure can take advantage of the fact that lower entities in the tree inherit the properties of their ancestors in the tree. For example, "jumbo is an elephant", "jumbo is grey", "jumbo is a mammal", "jumbo bears live young", and so on.
     The picture to the right marks in red the nodes along the path that must be identified in this structure in order to determine that "jumbo is grey." The procedure simply searches for jumbo in the terminal elements of the tree. If jumbo is not found then jumbo would be sought in the set of parents of each of these terminal elements. In this case, jumbo is a terminal element. Consequently, the procedure focuses only on this element and begins tracing the paths that move upward from jumbo. The first node encountered is Elephant which does not equal "grey", so the procedure continues to the next connected nodes which are {Mammal, Grey}. Consequently, at this point the procedure has derived from this structure and its associated procedure that "jumbo is grey;" and the process terminates.
     Note, that this pattern is reminiscent of the repeated application of modus ponens. If we had tried to derive that "jumbo is white," then the procedure would have continued until it had reached the top of the structure. At this point there would be no new places to look. Consequently, the procedure has failed to derive that "jumbo is white," and based on this failure derives that "jumbo is not white." This type of inference is known as Negation by Failure.
 This is really quite encouraging. A simple data structure allows us to define a highly efficient procedure for making inferences. But, are these inferences "deductively valid"? The answer is ...it depends. The procedure followed is most definitely not a deductive procedure in the technical sense. Negation by Failure is certainly not a deductive rule of inference. In order to do deductive inference we would have to list all the information as illustrated in the second figure and then employ some deductive procedure such as resolution theorem proving.
Whether all of the inferences that a domain dependent procedure allows us to make are deductively valid will simply depend on the procedure. There is no way to know in advance; and it may be quite hard to firmly answer the question even when the procedure is known. What will generally be true is that inference procedures that depend on the map between properties of the data structure and properties of the domain will not provide all deductively valid inferences. This is because the map focuses the procedure on a subset of the deductively valid inferences. For example, the procedure outlined above would not yield the deductively valid statement "jumbo or clyde implies not-flies".

 Defeasible Inference
An entirely different issue arises with the type of knowledge that we are considering here. Notice the nodes circled in red. This tree representation has been used to represent that both penguins and robins are birds, and that birds fly.
Notice that our procedure could derive that tweety is a penguin and also that tweety flies. The figure to the left shows the problem. In general, Birds fly and of course Penguins are Birds. But they are an exception. They swim very nicely, but they just don't fly.
We need some way in which to represent these facts which yields the appropriate conclusion and blocks the inappropriate conclusions about tweety.
This turns out to be a rather serious problem because it raises the question of the exact nature of this type of knowledge. In standard logics either a statement holds universally or it doesn't. But here we have knowledge that is quite general, but not without exception. And, if you think about it a bit, this seems to be a characteristic of much of our knowledge But, how can we reason with knowledge that is not universal but must be qualified with terms like 'normally p' or 'typically p'? Or is there a simple way in which to handle these exceptions within first order logic and/or within the graph language of a tree structure as depicted above?

 Exceptions: General Axioms are often inadequate.
 Because of exceptions, the general axiom that states that if x is a bird then x flies leads to an inconsistency as illustrated in the axioms on the figure to the right below.
 An alternative is to throw out the general axiom and explicitly list the exceptions as illustrated above. However, other exceptions come to mind...what if a wing is broken, what if an oil spill has covered the bird, what if the bird is still too young to fly, ....etc. The fact that we can come up with so many exceptional conditions, makes us suspect that there isn't a closed set of exceptions. Perhaps, first order axioms are simply not the way to capture this type of knowledge. But how can exceptions be handled while retaining the ability to represent this type knowledge?

 Default Theory
 One suggestion for representing and reasoning with common sense knowledge was developed by Ray Reiter and is known as default theory.
     The idea is to reason in first order logic but to have available a set of default rules which are used only if an inference can not be obtained within the first order formulation. The general framework of this proposal is depicted on the left.
     Probably the simplest place to start is with the example rule shown at the bottom. One kind of knowledge that we seem to possess and use is knowledge about what is typically the case. Here, we have the "generalization" that "Typically an American adult owns a car." Clearly this is not the same as the logical statement, "For all x if x is an American and x is an adult, then x owns a car."
     Consequently, such knowledge has to be used with care or we will constantly be introducing contradictions into our beliefs. The default rule consists of three components. First, there is the prerequisite. Think of this as some kind of foundation or evidence for the rule. And this foundation must be logically derived from your beliefs. You must be able to prove that "John Doe" is both an American and an adult to employ this rule. Then, the second component requires that you not be able to prove that "John Doe" does not own something which is a car. This is the consistency test. If the prerequisite and the consistency component have been satisfied, then the third component of the rule, theconsequent, may be asserted.

It turns out that only under very special circumstances can this way of formulating this type of knowledge be used efficiently. Notice that checking whether some belief is consistent with our current beliefs amounts to trying to prove its negation. And, deduction is generally an intractable problem.
Also, note that a default conclusion isn't the same as a deduction. It may be defeated by susbsequent information and consequently the use of Default theories results in a logic that is non-monotonic. For example, we may run into John Doe and he may tell us that he doesn't own a car. Rather than arguing with John by pointing out to him that he is an American and an adult and therefore should, by our default theory, own a car; we should probably just throw out the statement that "John Doe owns a car" and substitute the statement "John Doe does not own a car".


© Charles F. Schmidt

0 comments: