|
Abstraction is an important tool for scientists in general, and AI
practitioners in particular. One form of abstraction that is very
useful is to disregard particulars of a problem. For instance, when
designing an agent to find a route from one town to another, you
ignore most of the details you may know about each town (population,
weather, food, etc). As your search agent becomes more sophisticated,
you may give it more information to produce better routes. For
example, the population of a town may affect the volume of traffic,
which will affect the time taken to drive through the town.
Another form of abstraction is to generalise a particular problem
solving technique. Often AI research progresses through a series of
abstractions of this kind. That is, a problem is stated and an AI
agent programmed to solve that problem. We then think: how can I
change that program to solve a wider range of problems. A new
implementation follows, and so on. The famous mathematician Euler laid
down the foundations for graph theory by abstracting a particular
problem (how to cross all the bridges in Konigsberg without crossing
any of them twice) into a diagrammatic representation of all such
problems (directed graphs). For details of this problem see: here.
|
3.2 General Considerations for Search
If we can specify the initial state, the operators and the goal check
for a search problem, then we know where to start, how to move and
when to stop in our search. This leaves the important question of how
to choose which operator to apply to which state at any stage during
the search. We call an answer to this question a search
strategy. Before we worry about exactly what strategy to use, the
following need to be taken into consideration:
- Path or Artefact
Broadly speaking, there are two different reasons to undertake a
search: to find an artefact (a particular state), or to find a path
from one given state to another given state. Whether you are
searching for a path or an artefact will affect many aspects of your
agent's search, including its goal test, what it records along the way
and the search strategies available to you.
For example, in the maze below, the game involves finding a route from
the top left hand corner to the bottom right hand corner. We all know
what the exit looks like (a gap in the outer wall), so we do not
search for an artefact. Rather, the point of the search is to find a
path, so the agent must remember where it has been.
However, in other searches, the point of the search is to find
something, and it may be immaterial how you found it. For instance,
suppose we play a different game: to find an anagram of the phrase:
ELECTING NEIL
The answer is, of course: (FILL IN THIS GAP AS AN EXERCISE). In this
case, the point of the search is to find an artefact - a word which is
an anagram of "electing neil". No-one really cares in which order to
actually re-arrange the letters, so we are not searching for a path.
- Completeness
It's also worth trying to estimate the number of solutions to a
problem, and the density of those solutions amongst the non-solutions.
In a search problem, there may be any number of solutions, and the
problem specification may involve finding just one, finding some, or
finding all the solutions. For example, suppose a military application
searches for routes that an enemy might take. The question: "Can the
enemy get from A to B" requires finding only one solution, whereas
the question: "How many ways can the enemy get from A to B" will
require the agent to find all the solutions.
When an agent is asked to find just one solution, we can often program
it to prune its search space quite heavily, i.e., rule out
particular operators at particular times to be more
efficient. However, this may also prune some of the solutions, so if
our agent is asked to find all of them, the pruning has to be
controlled so that we know that pruned areas of the search space
either contain no solutions, or contain solutions which are repeated
in another (non-pruned) part of the space.
If our search strategy is guaranteed to find all the solutions
eventually, then we say that it is complete. Often, it is
obvious that all the solutions are in the search space, but in other
cases, we need to prove this fact mathematically to be sure that our
space is complete. A problem with complete searches is that - while
the solution is certainly there - it can take a very long time to find
the solution, sometimes so long that the strategy is effectively
useless. Some people use the word exhaustive when they describe
complete searches, because the strategy exhausts all possibilities in
the search space.
- Time and Space Tradeoffs
In practice, you are going to have to stop your agent at some stage if
it has not found a solution by then. Hence, if we can choose the
fastest search strategy, then this will explore more of the search
space and increase the likelihood of finding a solution. There is a
problem with this, however. It may be that the fastest strategy is the
one which uses most memory. To perform a search, an agent needs at
least to know where it is in a search space, but lots of other things
can also be recorded. For instance, a search strategy may involve
going over old ground, and it would save time if the agent knew it had
already tried a particular path. Even though RAM capacities in
computers are going steadily up, for some of the searches that AI
agents are employed to undertake, they often run out of memory. Hence,
as in computer science in general, AI practitioners often have to
devise clever ways to trade memory and time in order to achieve an
effective balance.
- Soundness
You may hear in some application domains - for example automated
theorem proving - that a search is "sound and complete". Soundness in
theorem proving means that the search to find a proof will not succeed
if you give it a false theorem to prove. This extends to searching in
general, where a search is unsound if it finds a solution to a
problem with no solution. This kind of unsound search may not be the
end of the world if you are only interested in using it for problems
where you know there is a solution (and it performs well in finding
such solutions). Another kind of unsound search is when a search finds
the wrong solution to a problem. This is more worrying and the problem
will probably lie with the goal testing mechanism.
- Additional Knowledge in Search
The amount of extra knowledge available to your agent will effect how
it performs. In the following sections of this lecture, we will look
at uninformed search strategies, where no additional knowledge
is given, and heuristic searches, where any information about
the goal, intermediate states and operators can be used to improve the
efficiency of the search strategy.
The purpose of this lecture is to give you some understanding of how
to turn a general intelligent task into a problem to be solved with
search, how to choose the best search strategy to start with and how
to improve upon this. It is worth remembering that AI is an empirical
science, and experimentation is an important part of building an
intelligent agent.
You may choose the correct search strategy for a problem based on the
suggestions here, but in practice, it turns out that the search
strategy is not good enough. Often much experimentation is required to
fine-tune the strategy to work efficiently. This could be because the
data you are working with is special in some way, and identifying
exactly what is special is also part of the AI research process.
|
3.3 Uninformed Search Strategies
To be able to undertake an uninformed search, all our agent needs to
know is the initial state, the possible operators and how to check
whether the goal has been reached. Once these have been described, we
must then choose a search strategy for the agent: a pre-determined way
in which the operators will be applied.
The example we will use is the case of a genetics professor searching
for a name for her newborn baby boy - of course, it must only contain
the letters D, N and A. The states in this search are strings of
letters (but only Ds, Ns and As), and the initial state is an empty
string. Also, the operators available are: (i) add a 'D' to an
existing string (ii) add an 'N' to an existing string and (iii) add an
'A' to an existing string. The goal check is possible using a book of
boys names against which the professor can check a string of letters.
To help us think about the different search strategies, we use two
analogies. Firstly, we suppose that the professor keeps an
agenda of actions to undertake, such as: add an 'A' to the
string 'AN'. So, the agenda consists of pairs (S,O) of
states and operators, whereby the operator is to be applied to the
state. The action at the top of the agenda is the one which is carried
out, then that action is removed. How actions are added to the agenda
differs for each search strategy. Secondly, we think of a search
graphically: by making each state a node in a graph and each operator
an edge, we can think of the search progressing as movement from node
to node along edges in the graph. We then allow ourselves to talk
about nodes in a search space (rather than the graph) and we say that
a node in a search space has been expanded if the state that
node represents has been visited and searched from. Note that graphs
which have no cycles in them are called trees, and many AI
searches can be represented as trees.
- Breadth First Search
Given a set of operators o1, ..., on in a
breadth first search, every time a new state s is reached, an
action for each operator on s is added to the bottom of the
agenda, i.e., the pairs (s,o1), ...,
(s,on) are added to the end of the agenda in that
order.
In our example, the first three things on the agenda would be:
1. (empty,add 'D')2. (empty,add 'N')
3. (empty,add 'A')
Then, after the first action on the agenda had been carried out, this
would add three more actions to the agenda:
4. ('D',add 'D')5. ('D',add 'N')
6. ('D',add 'A')
However, we can remove the first agenda item as this action has been
undertaken. Hence there are actually 5 actions on the agenda after the
first step in the search space. Indeed, after every step, one action
will be removed (the action just carried out), and three will be
added, making a total addition of two actions to the agenda. It turns
out that this kind of breadth first search leads to the name 'DAN'
after 20 steps. Also, after the 20th step, there are 43 tasks still on
the agenda to do.
It's useful to think of this search as the evolution of a tree, and
the diagram below shows how each string of letters is found via the
search in a breadth first manner. The numbers above the boxes indicate
at which step in the search the string was found.
We see that each node leads to three others, which corresponds to the
fact that after every step, three more steps are put on the
agenda. This is called the branching rate of a search, and
seriously affects both how long a search is going to take and how much
memory it will use up.
Breadth first search is a complete strategy: given enough time and
memory, it will find a solution if one exists. Unfortunately, memory
is a big problem for breadth first search. We can think about how big
the agenda grows, but in effect we are just counting the number of
states which are still 'alive', i.e., there are still steps in the
agenda involving them. In the above diagram, the states which are
still alive are those with fewer than three arrows coming from them:
there are 14 in all.
It's fairly easy to show that in a search with a branching rate of
b, if we want to search all the way to a depth of d,
then the largest number of states the agent will have to store at any
one time is bd-1. For example, if our
professor wanted to search for all names up to length 8, she
would have to remember (or write down) 2187 different strings to
complete a breadth first search. This is because she would need to
remember 37 strings of length 7 in order to be able to
build all the strings of length 8 from them. In searches with a higher
branching rate, the memory requirement can often become too large for
an agent's processor.
- Depth First Search
Depth first search is very similar to breadth first, except that
things are added to the top of the agenda rather than the
bottom. In our example, the first three things on the agenda would
still be:
1. (empty,add 'D')2. (empty,add 'N')
3. (empty,add 'A')
However, once the 'D' state had been found, the actions:
('D',add 'D')('D',add 'N')
('D',add 'A')
would be added to the top of the agenda, so it would look like this:
1. ('D',add 'D')2. ('D',add 'N')
3. ('D',add 'A')
4. (empty,add 'N')
5. (empty,add 'A')
Of course, carrying out the action at the top of the agenda would
introduce the string 'DD', but then this would cause the action:
('DD',add 'D')
to be added to the top, and the next string found would be
'DDD'. Clearly, this can't go on indefinitely, and in practice, we
must specify a depth limit to stop it going down a particular path
forever. That is, our agent will need to record how far down a
particular path it has gone, and avoid putting actions on the agenda
if the state in the agenda item is past a certain depth.
Note that our search for names is special: no matter what state we
reach, there will always be three actions to add to the agenda. In
other searches, the number of actions available to undertake on a
particular state may be zero, which effectively stops that branch of
the search. Hence, a depth limit is not always required.
Returning to our example, if the professor stipulated that she wanted
very short names (of three or fewer letters), then the search tree
would look like this:
We see that 'DAN' has been reached after the 12th step, so there is an
improvement on the breadth first search. However, it was lucky in this
case that the first letter explored is 'D' and that there is a
solution at depth three. If the depth limit had been set at 4 instead,
the tree would have looked very much different:
It looks like it will be a long time until it finds 'DAN'. This
highlights an important drawback to depth first search. It can often
go deep down paths which have no solutions, when there is a solution
much higher up the tree, but on a different branch. Also, depth first
search is not, in general, complete.
Rather than simply adding the next agenda item directly to the top of
the agenda, it might be a better idea to make sure that every node in
the tree is fully expanded before moving on to the next depth in the
search. This is the kind of depth first search which Russell and
Norvig explain. For our DNA example, if we did this, the search tree
would like like this:
The big advantage to depth first search is that it requires much less
memory to operate than breadth first search. If we count the number of
'alive' nodes in the diagram above, it amounts to only 4, because the
ones on the bottom row are not to be expanded due to the depth
limit. In fact, it can be shown that if an agent wants to search for
all solutions up to a depth of d in a space with branching
factor b, then in a depth first search it only needs to
remember up to a maximum of b*d states at any one time.
To put this in perspective, if our professor wanted to search for
all names up to length 8, she would only have to remember 3 * 8
= 24 different strings to complete a depth first search (rather than
2187 in a breadth first search).
- Iterative Deepening Search
So, breadth first search is guaranteed to find a solution (if one
exists), but it eats all the memory. Depth first search, however, is
much less memory hungry, but not guaranteed to find a solution. Is
there any other way to search the space which combines the good parts
of both?
Well, yes, but it sounds silly. Iterative Deepening Search (IDS) is
just a series of depth first searches where the depth limit is
increased by one every time. That is, an IDS will do a depth first
search (DFS) to depth 1, followed by a DFS to depth 2, and so on, each
time starting completely from scratch. This has the advantage of being
complete, as it covers all depths of the search tree. Also, it only
requires the same memory as depth first search (obviously).
However, you will have noticed that this means that it completely
re-searches the entire space searched in the previous iteration. This
kind of redundancy will surely make the search strategy too slow to
contemplate using in practice? Actually, it isn't as bad as you might
think. This is because, in a depth first search, most of the effort is
spent expanding the last row of the tree, so the repetition over the
top part of the tree is not a major factor. In fact, the effect of the
repetition reduces as the branching rate increases. In a search with
branching rate 10 and depth 5, the number of states searched is
111,111 with a single depth first search. With an iterative deepening
search, this number goes up to 123,456. So, there is only a repetition
of around 11%.
- Bidirectional Search
We've concentrated so far on searches where the point of the search is
to find a solution, not the path to the solution. In other searches,
we know the solution, and we know the initial state, but we don't know
how to get from one to the other, and the point of the search is to
find a path. In these cases, in addition to searching forward from the
initial state, we can sometimes also search backwards from the
solution. This is called a bidirectional search.
For example, consider the 8-puzzle game in the diagram below, where
the point of the game is to move the pieces around so that they are
arranged in the right hand diagram. It's likely that in the search for
the solution to this puzzle (given an arbitrary starting state), you
might start off by moving some of the pieces around to get some of
them in their end positions. Then, as you got closer to the solution
state, you might work backwards: asking yourself, how can I get from
the solution to where I am at the moment, then reversing the search
path. In this case, you've used a bidirectional search.
Bidirectional search has the advantage that search in both directions
is only required to go to a depth half that of normal searches, and
this can often lead to a drastic reduction in the number of paths
looked at. For instance, if we were looking for a path from one town
to another through at most six other towns, we only have to look for a
journey through three towns from both directions, which is fairly easy
to do, compared to searching all paths through six towns in a normal
search.
Unfortunately, it is often difficult to apply a bidirectional search
because (a) we don't really know the solution, only a description of
it (b) there may be many solutions, and we have to choose some to work
backwards from (c) we cannot reverse our operators to work backwards
from the solution and (d) we have to record all the paths from both
sides to see if any two meet at the same point - this may take up a
lot of memory, and checking through both sets repeatedly could take
up too much computing time.
3.4 Using Values in Search
We have already seen that for some search strategies, we must employ a
depth limit to improve the chances of the search being
successful. Hence we have already used some additional information to
improve our search. In fact, by stipulating a depth limit, we shall
see that we imposed a heuristic measure on the search saying that any
state past the depth limit should score zero, and hence be ignored in
any further search. We will expand on this notion by looking at how
numerical values can be used to produce more intelligent searches.
We will look at path costs and heuristic functions. It is important to
bear in mind that these values will be used for two purposes: (a) to
help find a solution quicker, and (b) to help find a better solution.
- Action and Path Costs
In some searches, it is possible to assign a cost associated with
performing a particular action, known as the action cost. For
instance, suppose we have a robot arm constructing a circuit board on
an assembly line and we have programmed an agent to search for the
best way to construct the circuit board (in terms of when to solder
the components onto the board). There are many different ways it can
end up at the goal (a completed circuit board) through actions which
involve picking up parts and soldering them into the board. However,
some of these actions use more electricity than others and the overall
usage may be an economic factor to be taken into account. The
electricity consumption of an individual action is the action cost,
and the overall cost of a solution found is called the path
cost. The path cost is calculated by summing the costs of the
actions in the path.
Often, when finding a path is the point of the search, a value of 1 is
set as the action cost, so that solutions with smaller paths cost
less. However, a path cost can be negative: if you're planning a drive
through the countryside, you might want to see as much of the road as
possible, so you might write an agent to search for the longest path
while planning your journey.
- Heuristic Functions
A heuristic search is one which uses a rule of thumb to increase the
chances of success of the search. A heuristic search may use a
heuristic function, which is a calculation to estimate
how costly (in terms of the path cost) a path from a state will be to
a goal state. That is, a heuristic function takes a state as input and
calculates an estimate of how much it will cost to get to the solution
from that state. This information can be used in order to choose which
state to perform the next action on. Obviously, choosing the state
which scores lowest at any one instance is a good (rational) start,
but there are variations on this, which we discuss below.
Heuristic functions can be derived in many ways. Firstly, they can be
derived mathematically, as is sometimes the case with machine learning
algorithms: given the nature of the search and solution, you can work
out exactly how to choose heuristic functions to guarantee finding the
best solution most efficiently. Secondly, you can think them up as
good ideas. Note however, that introspection (as described in the box
below) can sometimes go wrong. Thirdly, you could look at the search
the agent is doing and the solutions it finds to various problems, and
identify some commonalities in the solutions. Often, these
observations can be turned into effective heuristic
functions. Fourthly, you can use a computer program to derive
heuristic functions for you, for instance using the ABSOLVER program
mentioned in Russell and Norvig.
Introspection is described by Russell and Norvig as "catching thoughts
as they go by", and has been used to give us some major insights into
how to get an agent to act intelligently. However, for some problems
you may work on, introspection may be a bad idea. Introspection is how
we rationalise about thinking in order to make some sense of our
thoughts. It does not necessarily tell us how those thoughts came
about. If we program an agent to act like we think we act, not only
are we likely to have made mistakes in our introspection, we are also
not playing to the strengths of the computer: it may be possible that
our way of doing things is computationally more expensive on a
computer than in our brains.
Some common problems with introspection are: (i) we don't test our
thought processes in a scientific way, in particular, we do not range
over a set of problems, but rather generalise from a few small ones
(ii) we turn what is often a messy process into some neat
rationalisation of that process (iii) we remember only the successful
thought processes, not the failures (which may have contributed to the
overall success in some way) and (iv) we use "common sense", i.e.,
knowledge about the world which wouldn't ordinarily be available to a
computer.
Another problem is that, when humans get particularly good at
something, if often becomes less easy for them to verbalise what they
are doing. Naturally, this makes building an agent based on their
introspection pretty difficult. For a discussion of how this
highlights a drawback to the Turing test, see the editorial and some
of the chapters in this book:
Machines and Thought
- The Legacy of Alan Turing, Vol 1
|
As an example of a heuristic search, suppose we have an agent
programmed to take as input a set of letters, e.g., {D,N,A} and search
for a boy's name having only those letters. As we look at the search
space, we notice that strings of three identical letters often appear
in intermediate states, e.g., DANNNA. However, none of the solutions
we've seen have three consecutive letters the same. Hence, it would be
a good idea not to bother building new strings from strings with three
consecutive letters the same. This can be turned into a heuristic
function easily by saying that the chance of a string leading to the
solution if it has three consecutive letters the same is
zero. Mathematically, this could be written as:
h(string) = 0 if string contains three consecutive letters the same
h(string) = 1 otherwise.
h(string) = 1 otherwise.
With this heuristic in a search, bad strings like DANNNA would never
be built if there were any good strings left. It's not difficult to
imagine ways to make this function more sophisticated by looking at
relative frequencies of substrings, etc.
3.5 Heuristic Search Strategies
Generally speaking, a heuristic search is one which uses a rule of
thumb to improve an agent's performance in solving problems via
search. A heuristic search is not to be confused with a heuristic
measure. If you can specify a heuristic measure, then this opens up a
range of generic heuristic searches which you can try to improve your
agent's performance, as discussed below. It is worth remembering,
however, that any rule of thumb, for instance, choosing the order of
operators when applied in a simple breadth first search, is a
heuristic.
In terms of our agenda analogy, a heuristic search chooses where to
put a (state, operator) pair on the agenda when it is proposed as a
move in the state space. This choice could be fairly complicated and
based on many factors. In terms of the graph analogy, a heuristic
search chooses which node to expand at any point in the search. By
definition, a heuristic search is not guaranteed to improve
performance for a particular problem or set of problems, but they are
implemented in the hope of either improving the speed of which a
solution is found and/or the quality of the solution found. In fact,
we may be able to find optimal solutions, which are as good as
possible with respect to some measure.
- Optimality
The path cost of a solution is calculated as the sum of the costs of
the actions which led to that solution. This is just one example of a
measure of value on the solution of a search problem, and there are
many others. These measures may or may not be related to the heuristic
functions which estimate the likelihood of a particular state being in
the path to a solution. We say that - given a measure of value on the
possible solutions to a search problem - one particular solution is
optimal if it scores higher than all the others with respect to this
measure (or costs less, in the case of path cost). For example, in the
maze example given in section 3.2, there are many paths from the start
to the finish of the maze, but only one which crosses the fewest
squares. This is the optimal solution in terms of the distance
travelled.
Optimality can be guaranteed through a particular choice of search
strategy (for instance the uniform path cost search described
below). Alternatively, an agent can choose to prove that a solution is
optimal by appealing to some mathematical argument. As a last resort,
if optimality is necessary, then an agent must exhaust a complete
search strategy to find all solutions, then choose the one scoring the
highest (alternatively costing the lowest).
William of Ockham (1285-1349) was an English theologian and
philosopher. He stated the heuristic that, with all other things being
equal, a simpler solution is inherently better than a more complicated
one. This is known as Ockham's Razor or alternatively, the "law of
economy" or "law of parsimony". It is used particularly in machine
learning (discussed later in the course), where the point of searches
is to find hypotheses explaining empirical patterns. In this case, as
Ockham suggested, simpler hypotheses are usually preferred to more
complicated ones.
For a discussion of Ockham's razor, see here.
|
- Uniform Path Cost Search
A breadth first search will find the solution with the shortest path
length from the initial state to the goal state. However, this may not
be the least expensive solution in terms of the path cost. A uniform
path cost search chooses which node to expand by looking at the path cost
for each node: the node which has cost least to get to is expanded
first. Hence, if, as is usually the case, the path cost of a node
increases with the path length, then this search is guaranteed to find
the least expensive solution. It is therefore an optimal search
strategy. Unfortunately, this search strategy can be very inefficient.
- Greedy Search
If we have a heuristic function for states, defined as above, then we
can simply measure each state with respect to this measure and order
the agenda items in terms of the score of the state in the item. So,
at each stage, the agent determines which state scores lowest and puts
agenda items on the top of the agenda which contain operators acting
on that state. In this way, the most promising nodes in a search space
are expanded before the less promising ones. This is a type of best
first search known specifically as a greedy search.
In some situations, a greedy search can lead to a solution very
quickly. However, a greedy search can often go down blind alleys,
which look promising to start with, but ultimately don't lead to a
solution. Often the best states at the start of a search are in fact
really quite poor in comparison to those further in the search
space. One way to counteract this blind-alley effect is to turn off
the heuristic until a proportion of the search space has been covered,
so that the truly high scoring states can be identified. Another
problem with a greedy search is that the agent will have to keep a
record of which states have been explored in order to avoid
repetitions (and ultimately end up in a cycle), so a greedy search
must keep all the agenda items it has undertaken in its memory. Also,
this search strategy is not optimal, because the optimal solution may
have nodes on the path which score badly for the heuristic function,
and hence a non-optimal solution will be found before an optimal
one. (Remember that the heuristic function only estimates the
path cost from a node to a solution).
- A* Search
A* search combines the best parts of uniform cost search, namely the
fact that it's optimal and complete, and the best parts of greedy
search, namely its speed. This search strategy simply combines the
path cost function
g(n) and the heuristic function h(n) by summing them to form a new heuristic measure f(n):
g(n) and the heuristic function h(n) by summing them to form a new heuristic measure f(n):
f(n) = g(n) + h(n)
Remembering that g(n) gives the path cost from the start state
to state n and h(n) estimates the path cost from
n to a goal state, we see that f(n) estimates the cost
of the cheapest solution which passes through n.
The most important aspect of A* search is that, given one restriction
on h(n), it is possible to prove that the search strategy is
complete and optimal. The restriction to h(n) is that it must
always underestimate the cost to reach a goal state from
n. Such heuristic measures are called admissible. See
Russell and Norvig for proof that A* search with an admissible
heuristic is complete and optimal.
- IDA* Search
A* search is a sophisticated and successful search strategy. However,
a problem with A* search is that it must keep all states in its
memory, so memory is often a much bigger consideration than time in
designing agents to undertake A* searches. We overcame the same
problem with breadth first search by using an iterative deepening
search (IDS), and we do similar with A*.
Like IDS, an IDA* search is a series of depth first searches where the
depth is increased after each iteration. However, the depth is not
measured in terms of the path length, as it is in IDS, but rather in
terms of the A* combined function f(n) as described above. To
do this, we need to define contours as regions of the search
space containing states where f is below some limit for all the
states, as shown pictorially here:
Each node in a contour scores less than a particular value and IDA*
search agents are told how much to increase the contour boundary by on
each iteration. This defines the depth for successive searches. When
using contours, it is useful for the function f(n) to be
monotonic, i.e., f is monotonic if whenever an operator
takes a state s1 to a state s2,
then f(s2) >= f(s1). In other words,
if the value of f always increases along a path, then f
is monotonic. As an exercise, why do we need monotonicity to ensure
optimality in IDA* search?
- SMA* Search
IDA* search is very good from a memory point of view. In fact, it can
be criticised for not using enough memory - using more memory can
increase the efficiency, so really our search strategies should use
all the available memory. Simplified Memory-Bounded A* search (SMA*)
is a search which does just that. This is a complicated search
strategy, with details given in Russell and Norvig.
- Hill Climbing
As we've seen, in some problems, finding the search path from initial
to goal state is the point of the exercise. In other problems, the
path and the artefact at the end of the path are both important, and
we often try to find optimal solutions. For a certain set of problems,
the path is immaterial, and finding a suitable artefact is the sole
purpose of the search. In these cases, it doesn't matter whether our
agent searches down a path for 10 or 1000 steps, as long as it finds a
solution in the end.
For example, consider the 8-queens problem, where the task is to find
an arrangement of 8 queens on a chess board such that no one can
"take" another (one queen can take another if its on the same
horizontal, vertical or diagonal line). A solution to this problem is:
One way to specify this problem is with states where there are a
number of queens (1 to 8) on the board, and an action is to add a
queen in such a way that it can't take another. Depending on your
strategy, you may find that this search requires much
back-tracking, i.e., towards the end, you find that you simply
can't put the last queens on anywhere, so you have to move one of the
queens you put down earlier (you go back-up the search tree).
An alternative way of specifying the problem is that the states are
boards with 8 queens already on them, and an action is a movement of
one of the queens. In this case, our agent can use an
evaluation function and do hill climbing. That is, it counts
the number of pairs of queens where one can take the other, and only
moves a queen if that movement reduces the number of pairs. When there
is a choice of movements both resulting in the same decrease, the
agent can choose one randomly from the choices. In the 8-queens
problem, there are only 56 * 8 = 448 possible ways to move one queen,
so our agent only has to calculate the evaluation function 448 times
at each stage. If it only chooses moves where the situation with
respect to the evaluation function improves, it is doing hill climbing
(or gradient descent if it's better to think of the agent going
downhill rather than uphill).
A common problem with this search strategy is local maxima: the
search has not yet reached a solution, but it can only go downhill in
terms of the evaluation function. For example, we might get to the
stage where only two queens can take each other, but moving any queen
increases this number to at least three. In cases like this, the agent
can do a random re-start whereby they randomly choose a state
to start the whole process from again. This search strategy has the
appeal of never requiring to store more than one state at any one time
(the part of the hill the agent is on). Russell and Norvig make the
analogy that this kind of search is like trying to climb mount everest
in the fog with amnesia, but they do concede that it is often the
search strategy of choice for some industrial problems. Local/Global
Maxima/Minima are represented in the diagram below:
- Simulated Annealing
One way to get around the problem of local maxima, and related
problems such as ridges and plateaux in hill climbing is to allow the
agent to go downhill to some extent. In simulated annealing - named
because of an analogy with cooling a liquid until it freezes - the
agent chooses to consider a random move. If the move improves the
evaluation function, then it is always carried out. If the move
doesn't improve the evaluation function, then the agent will carry out
the move with some probability between 0 and 1. The probability
decreases as the move gets worse in terms of the evaluation function,
so really bad moves are rarely carried out. This strategy can often
nudge a search out of a local maximum and the search can continue
towards the global maximum.
- Random Search
Some problems to be solved by a search agent are more creative in
nature, for example, writing poetry. In this case, it is often
difficult to project the word 'creative' on to a program because it is
possible to completely understand why it produced an artefact, by
looking at its search path. In these cases, it is often a good idea to
try some randomness in the search strategy, for example randomly
choosing an item from the agenda to carry out, or assigning values
from a heuristic measure randomly. This may add to the creative appeal
of the agent, because it makes it much more difficult to predict what
the agent will do.
Herbert Simon (1916 - 2001) was one of the great polymaths of our
time. He worked in administration theory, economics and psychology and
was highly esteemed in all of these disciplines, winning the Nobel
prize for economics in 1978. He found that in all these disciplines,
he was describing and using heuristics, and this led him to become one
of the founding fathers of Artificial Intelligence research. He used
computers for over 40 years to study human and computer problem
solving using heuristic techniques.
In 1975 he was awarded the A.M. Turing Award of the Association for
Computing Machinery and in 1995, the Research Excellence Award of the
International Joint Conference on Artificial Intelligence. Read about
his AI research HERE.
|
3.6 Assessing Heuristic Searches
Given a particular problem you want to build an agent to solve, there
may be more than one way of specifying it as a search problem, more
than one choice for the search strategy and different possibilities
for heuristic measures. To a large extent, it is difficult to predict
what the best choices will be, and it will require some
experimentation to determine them. In some cases, - if we calculate
the effective branching rate, as described below - we can tell
for sure if one heuristic measure is always being out-performed by
another.
- The Effective Branching Rate
Assessing heuristic functions is an important part of AI research: a
particular heuristic function may sound like a good idea, but in
practice give no discernible increase in the quality of the
search. Search quality can be determined experimentally in terms of
the output from the search, and by using various measures such as the
effective branching rate. Suppose a particular problem P
has been solved by search strategy S by expanding N
nodes, and the solution lay at depth D in the space. Then the
effective branching rate of S for P is calculated by
comparing S to a uniform search U. An example of a
uniform search is a breadth first search where the number of branches
from any node is always the same (as in our baby naming example). We
then suppose the (uniform) branching rate of U is such that, on
exhausting its search to depth D, it too would have expanded
exactly N nodes. This imagined branching rate, written
b*, is the effective branching rate of S and is
calculated thus:
N = 1 + b* + (b*)2 + ... + (b*)D.
Rearranging this equation will provide a value for b*. For
example (taken from Russell and Norvig), suppose S finds a
solution at depth 5 having expanded 52 nodes. In this case:
52 = 1 + b* + (b*)2 + ... + (b*)5.
and it turns out that b*=1.91. To calculate this, we use the
well known mathematical identity:
This enables us to write a polynomial for which b* is a zero,
and we can solve this using numerical techniques such as Newton's
method.
It is usually the case that the effective branching rate of a search
strategy is similar over all the problems it is used for, so that it
is acceptable to average b* over a small set of problems to
give a valid account. If a heuristic search has a branching rate near
to 1, then this is a good sign. We say that one heuristic function
h1 dominates another h2 if
the search using h1 always has a lower effective
branching rate than h2. Having a lower effective
branching rate is clearly desirable because it means a quicker search.
3.7 Summary - Designing Search Agents
The following is a not-too-implausible scenario which we can use to
summarise designing agents which use search to solve problems. You are
asked to write a program to solve a set of problems and you're given
some example problems and their solutions. You know that these
examples are taken from a much larger set, and your program will have
to solve problems in the general case. You decide to program an agent
to solve the problems using search.
You first think about what the point of the search is: do you know the
initial state and goal state, in which case your agent will search for
a path from one to the other? In this case, you might be able to use a
bi-directional search, which should cut down search times. If the
solutions to the problems are not known in advance, then your agent
will be searching for an artefact. You then design a search space
which contains all artefacts which solve the problem (and a lot more
which don't). You then worry about whether you want the optimal
solution or not, think about a path cost and try to work out whether
the search space is dense in solutions for general problems.
After implementing a breadth first search through the space, you
realise that this is only working for the really easy problems, as the
solutions to the harder problems are deeper in the search
space. Unfortunately, a depth first search also doesn't help, because
you're never quite sure what depth limit to set, and sometimes it goes
down a blind alley because there aren't many solutions to each
problem. You then think about an iterative deepening search, but this
still might take too long to find a solution, especially given the
overhead from repetition.
Instead, you start to think about the paths to the solutions: is there
a way of telling whether a particular state in the search space is
going to lead to the solution? You manage to write a heuristic
function that estimates the cost of finding a solution through a
particular node. The next question is how to fruitfully employ
this. Fortunately, you're pretty sure that the heuristic measure
underestimates the path cost, so you can use A* search, and this
manages to solve lots of the problems.
However, for some problems, your agent runs out of memory before it
finds a solution, which is really annoying. Your A* measure is
monotonic, so you decide to upgrade to an IDA* search, which takes
longer in general, but manages to find solutions without running out
of memory. Eventually, you fine-tune your heuristic measures and your
agent is pretty successful at solving the problems you give it.
0 comments:
Post a Comment