Search in Problem Solving

If Artificial Intelligence can inform the other sciences about anything, it is about problem solving and, in particular, how to search for solutions to problems. Much of AI research can be explained in terms of specifying a problem, defining a search space which should contain a solution to the problem, choosing a search strategy and getting an agent to use the strategy to find a solution.

If you are hired as an AI researcher/programmer, you will be expected to come armed with a battery of AI techniques, many of which we cover later in the course. However, perhaps the most important skill you will bring to the job is to effectively seek out the best way of turning some vague specifications into concrete problems requiring AI techniques. Specifying those problems in the most effective way will be vital if you want your AI agent to find the solutions in a reasonable time. In this lecture, we look at how to specify a search problem.

3.1 Specifying Search Problems

In our agent terminology, a problem to be solved is a specific task where the agent starts with the environment in a given state and acts upon the environment until the altered state has some pre-determined quality. The set of states which are possible via some sequence of actions the agent takes is called the search space. The series of actions that the agent actually performs is its search path, and the final state is a solution if it has the required property. There may be many solutions to a particular problem. If you can think of the task you want your agent to perform in these terms, then you will need to write a problem solving agent which uses search.

It is important to identify the scope of your task in terms of the problems which will need to be solved. For instance, there are some tasks which are single problems solved by searching, e.g., find a route on a map. Alternatively, there are tasks such as winning at chess, which have to be broken down into sub-problems (searching for the best move at each stage). Other tasks can be achieved without searching whatsoever e.g., multiplying two large numbers together - you wouldn't dream of searching through the number line until you came across the answer!

There are three initial considerations in problem solving (as described in Russell and Norvig):

Initial State

Firstly, the agent needs to be told exactly what the initial state is before it starts its search, so that it can keep track of the state as it searches.

Operators

An operator is a function taking one state to another via an action undertaken by the agent. For example, in chess, an operator takes one arrangement of pieces on the board to another arrangement by the action of the agent moving a piece.

Goal Test

It is essential when designing a problem solving agent to know when the problem has been solved, i.e., to have a well defined goal test. Suppose the problem we had set our agent was to find a name for a newborn baby, with some properties. In this case, there are lists of "accepted" names for babies, and any solution must appear in that list, so goal-checking amounts to simply testing whether the name appears in the list. In chess, on the other hand, the goal is to reach a checkmate. While there are only a finite number of ways in which the pieces on a board can represent a checkmate, the number of these is huge, so checking a position against them is a bad idea. Instead, a more abstract notion of checkmate is used, whereby our agent checks that the opponent's king cannot move without being captured.

Abstraction

Abstraction is an important tool for scientists in general, and AI practitioners in particular. One form of abstraction that is very useful is to disregard particulars of a problem. For instance, when designing an agent to find a route from one town to another, you ignore most of the details you may know about each town (population, weather, food, etc). As your search agent becomes more sophisticated, you may give it more information to produce better routes. For example, the population of a town may affect the volume of traffic, which will affect the time taken to drive through the town.

Another form of abstraction is to generalise a particular problem solving technique. Often AI research progresses through a series of abstractions of this kind. That is, a problem is stated and an AI agent programmed to solve that problem. We then think: how can I change that program to solve a wider range of problems. A new implementation follows, and so on. The famous mathematician Euler laid down the foundations for graph theory by abstracting a particular problem (how to cross all the bridges in Konigsberg without crossing any of them twice) into a diagrammatic representation of all such problems (directed graphs). For details of this problem see: here.

3.2 General Considerations for Search

If we can specify the initial state, the operators and the goal check for a search problem, then we know where to start, how to move and when to stop in our search. This leaves the important question of how to choose which operator to apply to which state at any stage during the search. We call an answer to this question a search strategy. Before we worry about exactly what strategy to use, the following need to be taken into consideration:

Path or Artefact

Broadly speaking, there are two different reasons to undertake a search: to find an artefact (a particular state), or to find a path from one given state to another given state. Whether you are searching for a path or an artefact will affect many aspects of your agent's search, including its goal test, what it records along the way and the search strategies available to you.

For example, in the maze below, the game involves finding a route from the top left hand corner to the bottom right hand corner. We all know what the exit looks like (a gap in the outer wall), so we do not search for an artefact. Rather, the point of the search is to find a path, so the agent must remember where it has been.

However, in other searches, the point of the search is to find something, and it may be immaterial how you found it. For instance, suppose we play a different game: to find an anagram of the phrase:

ELECTING NEIL

The answer is, of course: (FILL IN THIS GAP AS AN EXERCISE). In this case, the point of the search is to find an artefact - a word which is an anagram of "electing neil". No-one really cares in which order to actually re-arrange the letters, so we are not searching for a path.

Completeness

It's also worth trying to estimate the number of solutions to a problem, and the density of those solutions amongst the non-solutions. In a search problem, there may be any number of solutions, and the problem specification may involve finding just one, finding some, or finding all the solutions. For example, suppose a military application searches for routes that an enemy might take. The question: "Can the enemy get from A to B" requires finding only one solution, whereas the question: "How many ways can the enemy get from A to B" will require the agent to find all the solutions.

When an agent is asked to find just one solution, we can often program it to prune its search space quite heavily, i.e., rule out particular operators at particular times to be more efficient. However, this may also prune some of the solutions, so if our agent is asked to find all of them, the pruning has to be controlled so that we know that pruned areas of the search space either contain no solutions, or contain solutions which are repeated in another (non-pruned) part of the space.

If our search strategy is guaranteed to find all the solutions eventually, then we say that it is complete. Often, it is obvious that all the solutions are in the search space, but in other cases, we need to prove this fact mathematically to be sure that our space is complete. A problem with complete searches is that - while the solution is certainly there - it can take a very long time to find the solution, sometimes so long that the strategy is effectively useless. Some people use the word exhaustive when they describe complete searches, because the strategy exhausts all possibilities in the search space.

Time and Space Tradeoffs

In practice, you are going to have to stop your agent at some stage if it has not found a solution by then. Hence, if we can choose the fastest search strategy, then this will explore more of the search space and increase the likelihood of finding a solution. There is a problem with this, however. It may be that the fastest strategy is the one which uses most memory. To perform a search, an agent needs at least to know where it is in a search space, but lots of other things can also be recorded. For instance, a search strategy may involve going over old ground, and it would save time if the agent knew it had already tried a particular path. Even though RAM capacities in computers are going steadily up, for some of the searches that AI agents are employed to undertake, they often run out of memory. Hence, as in computer science in general, AI practitioners often have to devise clever ways to trade memory and time in order to achieve an effective balance.

Soundness

You may hear in some application domains - for example automated theorem proving - that a search is "sound and complete". Soundness in theorem proving means that the search to find a proof will not succeed if you give it a false theorem to prove. This extends to searching in general, where a search is unsound if it finds a solution to a problem with no solution. This kind of unsound search may not be the end of the world if you are only interested in using it for problems where you know there is a solution (and it performs well in finding such solutions). Another kind of unsound search is when a search finds the wrong solution to a problem. This is more worrying and the problem will probably lie with the goal testing mechanism.

Additional Knowledge in Search

The amount of extra knowledge available to your agent will effect how it performs. In the following sections of this lecture, we will look at uninformed search strategies, where no additional knowledge is given, and heuristic searches, where any information about the goal, intermediate states and operators can be used to improve the efficiency of the search strategy.

AI as an Empirical Science

The purpose of this lecture is to give you some understanding of how to turn a general intelligent task into a problem to be solved with search, how to choose the best search strategy to start with and how to improve upon this. It is worth remembering that AI is an empirical science, and experimentation is an important part of building an intelligent agent.

You may choose the correct search strategy for a problem based on the suggestions here, but in practice, it turns out that the search strategy is not good enough. Often much experimentation is required to fine-tune the strategy to work efficiently. This could be because the data you are working with is special in some way, and identifying exactly what is special is also part of the AI research process.

3.3 Uninformed Search Strategies

To be able to undertake an uninformed search, all our agent needs to know is the initial state, the possible operators and how to check whether the goal has been reached. Once these have been described, we must then choose a search strategy for the agent: a pre-determined way in which the operators will be applied.

The example we will use is the case of a genetics professor searching for a name for her newborn baby boy - of course, it must only contain the letters D, N and A. The states in this search are strings of letters (but only Ds, Ns and As), and the initial state is an empty string. Also, the operators available are: (i) add a 'D' to an existing string (ii) add an 'N' to an existing string and (iii) add an 'A' to an existing string. The goal check is possible using a book of boys names against which the professor can check a string of letters.

To help us think about the different search strategies, we use two analogies. Firstly, we suppose that the professor keeps an agenda of actions to undertake, such as: add an 'A' to the string 'AN'. So, the agenda consists of pairs (S,O) of states and operators, whereby the operator is to be applied to the state. The action at the top of the agenda is the one which is carried out, then that action is removed. How actions are added to the agenda differs for each search strategy. Secondly, we think of a search graphically: by making each state a node in a graph and each operator an edge, we can think of the search progressing as movement from node to node along edges in the graph. We then allow ourselves to talk about nodes in a search space (rather than the graph) and we say that a node in a search space has been expanded if the state that node represents has been visited and searched from. Note that graphs which have no cycles in them are called trees, and many AI searches can be represented as trees.

Breadth First Search

Given a set of operators o₁, ..., o_n in a breadth first search, every time a new state s is reached, an action for each operator on s is added to the bottom of the agenda, i.e., the pairs (s,o₁), ..., (s,o_n) are added to the end of the agenda in that order.

In our example, the first three things on the agenda would be:

1. (empty,add 'D')
2. (empty,add 'N')
3. (empty,add 'A')

Then, after the first action on the agenda had been carried out, this would add three more actions to the agenda:

4. ('D',add 'D')
5. ('D',add 'N')
6. ('D',add 'A')

However, we can remove the first agenda item as this action has been undertaken. Hence there are actually 5 actions on the agenda after the first step in the search space. Indeed, after every step, one action will be removed (the action just carried out), and three will be added, making a total addition of two actions to the agenda. It turns out that this kind of breadth first search leads to the name 'DAN' after 20 steps. Also, after the 20th step, there are 43 tasks still on the agenda to do.

It's useful to think of this search as the evolution of a tree, and the diagram below shows how each string of letters is found via the search in a breadth first manner. The numbers above the boxes indicate at which step in the search the string was found.

We see that each node leads to three others, which corresponds to the fact that after every step, three more steps are put on the agenda. This is called the branching rate of a search, and seriously affects both how long a search is going to take and how much memory it will use up.

Breadth first search is a complete strategy: given enough time and memory, it will find a solution if one exists. Unfortunately, memory is a big problem for breadth first search. We can think about how big the agenda grows, but in effect we are just counting the number of states which are still 'alive', i.e., there are still steps in the agenda involving them. In the above diagram, the states which are still alive are those with fewer than three arrows coming from them: there are 14 in all.

It's fairly easy to show that in a search with a branching rate of b, if we want to search all the way to a depth of d, then the largest number of states the agent will have to store at any one time is b^d-1. For example, if our professor wanted to search for all names up to length 8, she would have to remember (or write down) 2187 different strings to complete a breadth first search. This is because she would need to remember 3⁷ strings of length 7 in order to be able to build all the strings of length 8 from them. In searches with a higher branching rate, the memory requirement can often become too large for an agent's processor.

Depth First Search

Depth first search is very similar to breadth first, except that things are added to the top of the agenda rather than the bottom. In our example, the first three things on the agenda would still be:

1. (empty,add 'D')
2. (empty,add 'N')
3. (empty,add 'A')

However, once the 'D' state had been found, the actions:

('D',add 'D')
('D',add 'N')
('D',add 'A')

would be added to the top of the agenda, so it would look like this:

1. ('D',add 'D')
2. ('D',add 'N')
3. ('D',add 'A')
4. (empty,add 'N')
5. (empty,add 'A')

Of course, carrying out the action at the top of the agenda would introduce the string 'DD', but then this would cause the action:

('DD',add 'D')

to be added to the top, and the next string found would be 'DDD'. Clearly, this can't go on indefinitely, and in practice, we must specify a depth limit to stop it going down a particular path forever. That is, our agent will need to record how far down a particular path it has gone, and avoid putting actions on the agenda if the state in the agenda item is past a certain depth.

Note that our search for names is special: no matter what state we reach, there will always be three actions to add to the agenda. In other searches, the number of actions available to undertake on a particular state may be zero, which effectively stops that branch of the search. Hence, a depth limit is not always required.

Returning to our example, if the professor stipulated that she wanted very short names (of three or fewer letters), then the search tree would look like this:

We see that 'DAN' has been reached after the 12th step, so there is an improvement on the breadth first search. However, it was lucky in this case that the first letter explored is 'D' and that there is a solution at depth three. If the depth limit had been set at 4 instead, the tree would have looked very much different:

It looks like it will be a long time until it finds 'DAN'. This highlights an important drawback to depth first search. It can often go deep down paths which have no solutions, when there is a solution much higher up the tree, but on a different branch. Also, depth first search is not, in general, complete.

Rather than simply adding the next agenda item directly to the top of the agenda, it might be a better idea to make sure that every node in the tree is fully expanded before moving on to the next depth in the search. This is the kind of depth first search which Russell and Norvig explain. For our DNA example, if we did this, the search tree would like like this:

The big advantage to depth first search is that it requires much less memory to operate than breadth first search. If we count the number of 'alive' nodes in the diagram above, it amounts to only 4, because the ones on the bottom row are not to be expanded due to the depth limit. In fact, it can be shown that if an agent wants to search for all solutions up to a depth of d in a space with branching factor b, then in a depth first search it only needs to remember up to a maximum of b*d states at any one time.

To put this in perspective, if our professor wanted to search for all names up to length 8, she would only have to remember 3 * 8 = 24 different strings to complete a depth first search (rather than 2187 in a breadth first search).

Iterative Deepening Search

So, breadth first search is guaranteed to find a solution (if one exists), but it eats all the memory. Depth first search, however, is much less memory hungry, but not guaranteed to find a solution. Is there any other way to search the space which combines the good parts of both?

Well, yes, but it sounds silly. Iterative Deepening Search (IDS) is just a series of depth first searches where the depth limit is increased by one every time. That is, an IDS will do a depth first search (DFS) to depth 1, followed by a DFS to depth 2, and so on, each time starting completely from scratch. This has the advantage of being complete, as it covers all depths of the search tree. Also, it only requires the same memory as depth first search (obviously).

However, you will have noticed that this means that it completely re-searches the entire space searched in the previous iteration. This kind of redundancy will surely make the search strategy too slow to contemplate using in practice? Actually, it isn't as bad as you might think. This is because, in a depth first search, most of the effort is spent expanding the last row of the tree, so the repetition over the top part of the tree is not a major factor. In fact, the effect of the repetition reduces as the branching rate increases. In a search with branching rate 10 and depth 5, the number of states searched is 111,111 with a single depth first search. With an iterative deepening search, this number goes up to 123,456. So, there is only a repetition of around 11%.

Bidirectional Search

We've concentrated so far on searches where the point of the search is to find a solution, not the path to the solution. In other searches, we know the solution, and we know the initial state, but we don't know how to get from one to the other, and the point of the search is to find a path. In these cases, in addition to searching forward from the initial state, we can sometimes also search backwards from the solution. This is called a bidirectional search.

For example, consider the 8-puzzle game in the diagram below, where the point of the game is to move the pieces around so that they are arranged in the right hand diagram. It's likely that in the search for the solution to this puzzle (given an arbitrary starting state), you might start off by moving some of the pieces around to get some of them in their end positions. Then, as you got closer to the solution state, you might work backwards: asking yourself, how can I get from the solution to where I am at the moment, then reversing the search path. In this case, you've used a bidirectional search.

Bidirectional search has the advantage that search in both directions is only required to go to a depth half that of normal searches, and this can often lead to a drastic reduction in the number of paths looked at. For instance, if we were looking for a path from one town to another through at most six other towns, we only have to look for a journey through three towns from both directions, which is fairly easy to do, compared to searching all paths through six towns in a normal search.

Unfortunately, it is often difficult to apply a bidirectional search because (a) we don't really know the solution, only a description of it (b) there may be many solutions, and we have to choose some to work backwards from (c) we cannot reverse our operators to work backwards from the solution and (d) we have to record all the paths from both sides to see if any two meet at the same point - this may take up a lot of memory, and checking through both sets repeatedly could take up too much computing time.

3.4 Using Values in Search

We have already seen that for some search strategies, we must employ a depth limit to improve the chances of the search being successful. Hence we have already used some additional information to improve our search. In fact, by stipulating a depth limit, we shall see that we imposed a heuristic measure on the search saying that any state past the depth limit should score zero, and hence be ignored in any further search. We will expand on this notion by looking at how numerical values can be used to produce more intelligent searches.

We will look at path costs and heuristic functions. It is important to bear in mind that these values will be used for two purposes: (a) to help find a solution quicker, and (b) to help find a better solution.

Action and Path Costs

In some searches, it is possible to assign a cost associated with performing a particular action, known as the action cost. For instance, suppose we have a robot arm constructing a circuit board on an assembly line and we have programmed an agent to search for the best way to construct the circuit board (in terms of when to solder the components onto the board). There are many different ways it can end up at the goal (a completed circuit board) through actions which involve picking up parts and soldering them into the board. However, some of these actions use more electricity than others and the overall usage may be an economic factor to be taken into account. The electricity consumption of an individual action is the action cost, and the overall cost of a solution found is called the path cost. The path cost is calculated by summing the costs of the actions in the path.

Often, when finding a path is the point of the search, a value of 1 is set as the action cost, so that solutions with smaller paths cost less. However, a path cost can be negative: if you're planning a drive through the countryside, you might want to see as much of the road as possible, so you might write an agent to search for the longest path while planning your journey.

Heuristic Functions

A heuristic search is one which uses a rule of thumb to increase the chances of success of the search. A heuristic search may use a heuristic function, which is a calculation to estimate how costly (in terms of the path cost) a path from a state will be to a goal state. That is, a heuristic function takes a state as input and calculates an estimate of how much it will cost to get to the solution from that state. This information can be used in order to choose which state to perform the next action on. Obviously, choosing the state which scores lowest at any one instance is a good (rational) start, but there are variations on this, which we discuss below.

Heuristic functions can be derived in many ways. Firstly, they can be derived mathematically, as is sometimes the case with machine learning algorithms: given the nature of the search and solution, you can work out exactly how to choose heuristic functions to guarantee finding the best solution most efficiently. Secondly, you can think them up as good ideas. Note however, that introspection (as described in the box below) can sometimes go wrong. Thirdly, you could look at the search the agent is doing and the solutions it finds to various problems, and identify some commonalities in the solutions. Often, these observations can be turned into effective heuristic functions. Fourthly, you can use a computer program to derive heuristic functions for you, for instance using the ABSOLVER program mentioned in Russell and Norvig.

Using Introspection

Introspection is described by Russell and Norvig as "catching thoughts as they go by", and has been used to give us some major insights into how to get an agent to act intelligently. However, for some problems you may work on, introspection may be a bad idea. Introspection is how we rationalise about thinking in order to make some sense of our thoughts. It does not necessarily tell us how those thoughts came about. If we program an agent to act like we think we act, not only are we likely to have made mistakes in our introspection, we are also not playing to the strengths of the computer: it may be possible that our way of doing things is computationally more expensive on a computer than in our brains.

Some common problems with introspection are: (i) we don't test our thought processes in a scientific way, in particular, we do not range over a set of problems, but rather generalise from a few small ones (ii) we turn what is often a messy process into some neat rationalisation of that process (iii) we remember only the successful thought processes, not the failures (which may have contributed to the overall success in some way) and (iv) we use "common sense", i.e., knowledge about the world which wouldn't ordinarily be available to a computer.

Another problem is that, when humans get particularly good at something, if often becomes less easy for them to verbalise what they are doing. Naturally, this makes building an agent based on their introspection pretty difficult. For a discussion of how this highlights a drawback to the Turing test, see the editorial and some of the chapters in this book: Machines and Thought - The Legacy of Alan Turing, Vol 1

As an example of a heuristic search, suppose we have an agent programmed to take as input a set of letters, e.g., {D,N,A} and search for a boy's name having only those letters. As we look at the search space, we notice that strings of three identical letters often appear in intermediate states, e.g., DANNNA. However, none of the solutions we've seen have three consecutive letters the same. Hence, it would be a good idea not to bother building new strings from strings with three consecutive letters the same. This can be turned into a heuristic function easily by saying that the chance of a string leading to the solution if it has three consecutive letters the same is zero. Mathematically, this could be written as:

h(string) = 0 if string contains three consecutive letters the same
h(string) = 1 otherwise.

With this heuristic in a search, bad strings like DANNNA would never be built if there were any good strings left. It's not difficult to imagine ways to make this function more sophisticated by looking at relative frequencies of substrings, etc.

3.5 Heuristic Search Strategies

Generally speaking, a heuristic search is one which uses a rule of thumb to improve an agent's performance in solving problems via search. A heuristic search is not to be confused with a heuristic measure. If you can specify a heuristic measure, then this opens up a range of generic heuristic searches which you can try to improve your agent's performance, as discussed below. It is worth remembering, however, that any rule of thumb, for instance, choosing the order of operators when applied in a simple breadth first search, is a heuristic.

In terms of our agenda analogy, a heuristic search chooses where to put a (state, operator) pair on the agenda when it is proposed as a move in the state space. This choice could be fairly complicated and based on many factors. In terms of the graph analogy, a heuristic search chooses which node to expand at any point in the search. By definition, a heuristic search is not guaranteed to improve performance for a particular problem or set of problems, but they are implemented in the hope of either improving the speed of which a solution is found and/or the quality of the solution found. In fact, we may be able to find optimal solutions, which are as good as possible with respect to some measure.

Optimality

The path cost of a solution is calculated as the sum of the costs of the actions which led to that solution. This is just one example of a measure of value on the solution of a search problem, and there are many others. These measures may or may not be related to the heuristic functions which estimate the likelihood of a particular state being in the path to a solution. We say that - given a measure of value on the possible solutions to a search problem - one particular solution is optimal if it scores higher than all the others with respect to this measure (or costs less, in the case of path cost). For example, in the maze example given in section 3.2, there are many paths from the start to the finish of the maze, but only one which crosses the fewest squares. This is the optimal solution in terms of the distance travelled.

Optimality can be guaranteed through a particular choice of search strategy (for instance the uniform path cost search described below). Alternatively, an agent can choose to prove that a solution is optimal by appealing to some mathematical argument. As a last resort, if optimality is necessary, then an agent must exhaust a complete search strategy to find all solutions, then choose the one scoring the highest (alternatively costing the lowest).

Ockham's Razor

William of Ockham (1285-1349) was an English theologian and philosopher. He stated the heuristic that, with all other things being equal, a simpler solution is inherently better than a more complicated one. This is known as Ockham's Razor or alternatively, the "law of economy" or "law of parsimony". It is used particularly in machine learning (discussed later in the course), where the point of searches is to find hypotheses explaining empirical patterns. In this case, as Ockham suggested, simpler hypotheses are usually preferred to more complicated ones.

For a discussion of Ockham's razor, see here.

Uniform Path Cost Search

A breadth first search will find the solution with the shortest path length from the initial state to the goal state. However, this may not be the least expensive solution in terms of the path cost. A uniform path cost search chooses which node to expand by looking at the path cost for each node: the node which has cost least to get to is expanded first. Hence, if, as is usually the case, the path cost of a node increases with the path length, then this search is guaranteed to find the least expensive solution. It is therefore an optimal search strategy. Unfortunately, this search strategy can be very inefficient.

Greedy Search

If we have a heuristic function for states, defined as above, then we can simply measure each state with respect to this measure and order the agenda items in terms of the score of the state in the item. So, at each stage, the agent determines which state scores lowest and puts agenda items on the top of the agenda which contain operators acting on that state. In this way, the most promising nodes in a search space are expanded before the less promising ones. This is a type of best first search known specifically as a greedy search.

In some situations, a greedy search can lead to a solution very quickly. However, a greedy search can often go down blind alleys, which look promising to start with, but ultimately don't lead to a solution. Often the best states at the start of a search are in fact really quite poor in comparison to those further in the search space. One way to counteract this blind-alley effect is to turn off the heuristic until a proportion of the search space has been covered, so that the truly high scoring states can be identified. Another problem with a greedy search is that the agent will have to keep a record of which states have been explored in order to avoid repetitions (and ultimately end up in a cycle), so a greedy search must keep all the agenda items it has undertaken in its memory. Also, this search strategy is not optimal, because the optimal solution may have nodes on the path which score badly for the heuristic function, and hence a non-optimal solution will be found before an optimal one. (Remember that the heuristic function only estimates the path cost from a node to a solution).

A* Search

A* search combines the best parts of uniform cost search, namely the fact that it's optimal and complete, and the best parts of greedy search, namely its speed. This search strategy simply combines the path cost function
g(n) and the heuristic function h(n) by summing them to form a new heuristic measure f(n):

f(n) = g(n) + h(n)

Remembering that g(n) gives the path cost from the start state to state n and h(n) estimates the path cost from n to a goal state, we see that f(n) estimates the cost of the cheapest solution which passes through n.

The most important aspect of A* search is that, given one restriction on h(n), it is possible to prove that the search strategy is complete and optimal. The restriction to h(n) is that it must always underestimate the cost to reach a goal state from n. Such heuristic measures are called admissible. See Russell and Norvig for proof that A* search with an admissible heuristic is complete and optimal.

IDA* Search

A* search is a sophisticated and successful search strategy. However, a problem with A* search is that it must keep all states in its memory, so memory is often a much bigger consideration than time in designing agents to undertake A* searches. We overcame the same problem with breadth first search by using an iterative deepening search (IDS), and we do similar with A*.

Like IDS, an IDA* search is a series of depth first searches where the depth is increased after each iteration. However, the depth is not measured in terms of the path length, as it is in IDS, but rather in terms of the A* combined function f(n) as described above. To do this, we need to define contours as regions of the search space containing states where f is below some limit for all the states, as shown pictorially here:

Each node in a contour scores less than a particular value and IDA* search agents are told how much to increase the contour boundary by on each iteration. This defines the depth for successive searches. When using contours, it is useful for the function f(n) to be monotonic, i.e., f is monotonic if whenever an operator takes a state s₁ to a state s₂, then f(s₂) >= f(s₁). In other words, if the value of f always increases along a path, then f is monotonic. As an exercise, why do we need monotonicity to ensure optimality in IDA* search?

SMA* Search

IDA* search is very good from a memory point of view. In fact, it can be criticised for not using enough memory - using more memory can increase the efficiency, so really our search strategies should use all the available memory. Simplified Memory-Bounded A* search (SMA*) is a search which does just that. This is a complicated search strategy, with details given in Russell and Norvig.

Hill Climbing

As we've seen, in some problems, finding the search path from initial to goal state is the point of the exercise. In other problems, the path and the artefact at the end of the path are both important, and we often try to find optimal solutions. For a certain set of problems, the path is immaterial, and finding a suitable artefact is the sole purpose of the search. In these cases, it doesn't matter whether our agent searches down a path for 10 or 1000 steps, as long as it finds a solution in the end.

For example, consider the 8-queens problem, where the task is to find an arrangement of 8 queens on a chess board such that no one can "take" another (one queen can take another if its on the same horizontal, vertical or diagonal line). A solution to this problem is:

One way to specify this problem is with states where there are a number of queens (1 to 8) on the board, and an action is to add a queen in such a way that it can't take another. Depending on your strategy, you may find that this search requires much back-tracking, i.e., towards the end, you find that you simply can't put the last queens on anywhere, so you have to move one of the queens you put down earlier (you go back-up the search tree).

An alternative way of specifying the problem is that the states are boards with 8 queens already on them, and an action is a movement of one of the queens. In this case, our agent can use an evaluation function and do hill climbing. That is, it counts the number of pairs of queens where one can take the other, and only moves a queen if that movement reduces the number of pairs. When there is a choice of movements both resulting in the same decrease, the agent can choose one randomly from the choices. In the 8-queens problem, there are only 56 * 8 = 448 possible ways to move one queen, so our agent only has to calculate the evaluation function 448 times at each stage. If it only chooses moves where the situation with respect to the evaluation function improves, it is doing hill climbing (or gradient descent if it's better to think of the agent going downhill rather than uphill).

A common problem with this search strategy is local maxima: the search has not yet reached a solution, but it can only go downhill in terms of the evaluation function. For example, we might get to the stage where only two queens can take each other, but moving any queen increases this number to at least three. In cases like this, the agent can do a random re-start whereby they randomly choose a state to start the whole process from again. This search strategy has the appeal of never requiring to store more than one state at any one time (the part of the hill the agent is on). Russell and Norvig make the analogy that this kind of search is like trying to climb mount everest in the fog with amnesia, but they do concede that it is often the search strategy of choice for some industrial problems. Local/Global Maxima/Minima are represented in the diagram below:

Simulated Annealing

One way to get around the problem of local maxima, and related problems such as ridges and plateaux in hill climbing is to allow the agent to go downhill to some extent. In simulated annealing - named because of an analogy with cooling a liquid until it freezes - the agent chooses to consider a random move. If the move improves the evaluation function, then it is always carried out. If the move doesn't improve the evaluation function, then the agent will carry out the move with some probability between 0 and 1. The probability decreases as the move gets worse in terms of the evaluation function, so really bad moves are rarely carried out. This strategy can often nudge a search out of a local maximum and the search can continue towards the global maximum.

Random Search

Some problems to be solved by a search agent are more creative in nature, for example, writing poetry. In this case, it is often difficult to project the word 'creative' on to a program because it is possible to completely understand why it produced an artefact, by looking at its search path. In these cases, it is often a good idea to try some randomness in the search strategy, for example randomly choosing an item from the agenda to carry out, or assigning values from a heuristic measure randomly. This may add to the creative appeal of the agent, because it makes it much more difficult to predict what the agent will do.

Herbert Simon

Herbert Simon (1916 - 2001) was one of the great polymaths of our time. He worked in administration theory, economics and psychology and was highly esteemed in all of these disciplines, winning the Nobel prize for economics in 1978. He found that in all these disciplines, he was describing and using heuristics, and this led him to become one of the founding fathers of Artificial Intelligence research. He used computers for over 40 years to study human and computer problem solving using heuristic techniques.

In 1975 he was awarded the A.M. Turing Award of the Association for Computing Machinery and in 1995, the Research Excellence Award of the International Joint Conference on Artificial Intelligence. Read about his AI research HERE.

3.6 Assessing Heuristic Searches

Given a particular problem you want to build an agent to solve, there may be more than one way of specifying it as a search problem, more than one choice for the search strategy and different possibilities for heuristic measures. To a large extent, it is difficult to predict what the best choices will be, and it will require some experimentation to determine them. In some cases, - if we calculate the effective branching rate, as described below - we can tell for sure if one heuristic measure is always being out-performed by another.

The Effective Branching Rate

Assessing heuristic functions is an important part of AI research: a particular heuristic function may sound like a good idea, but in practice give no discernible increase in the quality of the search. Search quality can be determined experimentally in terms of the output from the search, and by using various measures such as the effective branching rate. Suppose a particular problem P has been solved by search strategy S by expanding N nodes, and the solution lay at depth D in the space. Then the effective branching rate of S for P is calculated by comparing S to a uniform search U. An example of a uniform search is a breadth first search where the number of branches from any node is always the same (as in our baby naming example). We then suppose the (uniform) branching rate of U is such that, on exhausting its search to depth D, it too would have expanded exactly N nodes. This imagined branching rate, written b*, is the effective branching rate of S and is calculated thus:

N = 1 + b* + (b*)² + ... + (b*)^D.

Rearranging this equation will provide a value for b*. For example (taken from Russell and Norvig), suppose S finds a solution at depth 5 having expanded 52 nodes. In this case:

52 = 1 + b* + (b*)² + ... + (b*)⁵.

and it turns out that b*=1.91. To calculate this, we use the well known mathematical identity:

This enables us to write a polynomial for which b* is a zero, and we can solve this using numerical techniques such as Newton's method.

It is usually the case that the effective branching rate of a search strategy is similar over all the problems it is used for, so that it is acceptable to average b* over a small set of problems to give a valid account. If a heuristic search has a branching rate near to 1, then this is a good sign. We say that one heuristic function h₁ dominates another h₂ if the search using h₁ always has a lower effective branching rate than h₂. Having a lower effective branching rate is clearly desirable because it means a quicker search.

3.7 Summary - Designing Search Agents

The following is a not-too-implausible scenario which we can use to summarise designing agents which use search to solve problems. You are asked to write a program to solve a set of problems and you're given some example problems and their solutions. You know that these examples are taken from a much larger set, and your program will have to solve problems in the general case. You decide to program an agent to solve the problems using search.

You first think about what the point of the search is: do you know the initial state and goal state, in which case your agent will search for a path from one to the other? In this case, you might be able to use a bi-directional search, which should cut down search times. If the solutions to the problems are not known in advance, then your agent will be searching for an artefact. You then design a search space which contains all artefacts which solve the problem (and a lot more which don't). You then worry about whether you want the optimal solution or not, think about a path cost and try to work out whether the search space is dense in solutions for general problems.

After implementing a breadth first search through the space, you realise that this is only working for the really easy problems, as the solutions to the harder problems are deeper in the search space. Unfortunately, a depth first search also doesn't help, because you're never quite sure what depth limit to set, and sometimes it goes down a blind alley because there aren't many solutions to each problem. You then think about an iterative deepening search, but this still might take too long to find a solution, especially given the overhead from repetition.

Instead, you start to think about the paths to the solutions: is there a way of telling whether a particular state in the search space is going to lead to the solution? You manage to write a heuristic function that estimates the cost of finding a solution through a particular node. The next question is how to fruitfully employ this. Fortunately, you're pretty sure that the heuristic measure underestimates the path cost, so you can use A* search, and this manages to solve lots of the problems.

However, for some problems, your agent runs out of memory before it finds a solution, which is really annoying. Your A* measure is monotonic, so you decide to upgrade to an IDA* search, which takes longer in general, but manages to find solutions without running out of memory. Eventually, you fine-tune your heuristic measures and your agent is pretty successful at solving the problems you give it.

Open Mind Tree