Genetic Algorithms

The evolutionary approach to Artificial Intelligence is one of the neatest ideas of all. We have tried to mimic the functioning of the brain through neural networks, because - even though we don't know exactly how it works - we know that the brain does work. Similarly, we know that mother nature, through the process of evolution, has solved many problems, for instance the problem getting animals to walk around on two feet (try getting a robot to do that - it's very difficult). So, it seems like a good idea to mimic the processes of reproduction and survival of the fittest to try toevolve answers to problems, and maybe in the long run reach the holy grail of computers which program themselves by evolving programs. Evolutionary approaches are simple in conception:

generate a population of possible answers to the problem at hand
choose the best individuals from the population (using methods inspired by survival of the fittest)
produce a new generation by combining these best ones (using techniques inspired by reproduction)
stop when the best individual of a generation is good enough (or you run out of time)

Perhaps the first landmark in the history of the evolutionary approach to computing was John Holland's book "Adaptation in Natural and Artificial Systems", where he developed the idea of the genetic algorithm as searching via sampling hyperplane partitions of the space. It's important to rememeber that genetic algorithms (GAs), which we look at in this lecture, and genetic programming (GP), which we look at in the next lecture, are just fancy search mechanisms which are inspired by evolution. In fact, using Tom Mitchell's definition of a machine learning system being one which improves its performance through experience, we can see that evolutionary approaches can be classed as machine learning efforts. Historically, however, it has been more common to categorise evolutionary approaches together because of their inspiration rather than their applications (to learning and discovery problems).

As we will see, evolutionary approaches boil down to (i) specifying how to represent possible problem solutions and (ii) determining how to choose which partial solutions are doing the best with respect to solving the problem. The main difference between genetic algorithms and genetic programming is the choice of representation for problem solutions. In particular, with genetic algorithms, the format of the solution is fixed, e.g., a fixed set of parameters to find, and the evolution occurs in order to find good values for those parameters. With genetic programming, however, the individuals in the population of possible solutions are actually individual programs which can increase in complexity, so are not as constrained as in the genetic algorithm approach.

16.1 The Canonical Genetic Algorithm

As with all search techniques, one of the first questions to ask with GAs is how to define a search space which potentially contains good solutions to the problem at hand. This means answering the question of how to represent possible solutions to the problem. The classical approach to GAs is to represent the solutions as strings of ones and zeros, i.e., bit strings . This is not such a bad idea, given that computers store everything as bit strings, so any solution would eventually boil down to a string of ones and zeros. However, there have been many modifications to the original approach to genetic algorithms, and GA approaches now come in many different shapes and sizes, with higher level representations. Indeed, it's possible to see genetic programming, where the individuals in the population are programs, as just a GA approach with a more complicated representation scheme.

Returning to the classical approach, as an example, if solving a particular problem involved finding a set of five integers between 1 and 100, then the search space for a GA would be bits strings where the first eight bits are decoded as the first integer, the next eight bits become the second integer and so on. Representing the solutions is one of the tricky parts to using genetic algorithms, a problem we come back to later. However, suppose that the solutions are represented as strings of length L. Then, in the standard approach to GAs, known as the canonical genetic algorithm, the first stage is to generate an initial random population of bit strings of length L. By random, we mean that the ones and zeros in the strings are chosen at random. Sometimes, rarely, the initialisation procedure is done with a little more intelligence, e.g., using some additional knowledge about the domain to choose the initial population.

After the initialisation step, the canonical genetic algorithm proceeds iteratively using selection, mating, and recombination processes, then checking for termination. This is portrayed in the following diagram:

In the next section, we look in detail at how individuals are selected, mated, recombined (and mutated for good measure). Termination of the algorithm may occur if one or more of the best individuals in the current generation performs well enough with respect to the problem, with this performance specified by the user. Note that this termination check may be related, or the same as the evaluation function - discussed later - but it may be something entirely different to this.

There may not be a definitive answer to the problem you're looking at, and it may only be possible to evolve solutions which are as good as possible. In this case, it may not be obvious when to stop, and moreover, it may be a good idea to produce as many populations as possible given the computing/time resources you have available. In this case, the termination function may be a specific time limit or a specific number of generations. It is very important to note that the best individual in your final population may not be as good as the best individual in a previous generation (GAs do not perform hill-climbing searches, so it is perfectly possible for generations to degrade). Hence GAs should record the best individuals from every generation, and, as a final solution presented to the user, they should output the best solution found over all the generations.

16.2 Selection, Mating, Recombination and Mutation

So, the point of GAs is to generate population after population of individuals which represent possible solutions to the problem at hand in the hope that one individual in one generation will be a good solution. We look here at how to produce the next generation from the current generation. Note that there are various models for whether to kill off the previous generation, or allow some of the fittest individuals to stay alive for a while - we'll assume a culling of the old generation once the new one has been generated.

Selection

The first step is to choose the individuals which will have a shot at becoming the parents of the next generation. This is called the selection procedure, and its purpose it to choose those individuals from the current population which will go into an intermediate population (IP). Only individuals in this intermediate population will be chosen to mate with each other (and there's still no guarantee that they'll be chosen to mate, or that if they do mate, they will be successful - see later).

To perform the selection, the GA agent will require a fitness function. This will assign a real number to each individual in the current generation. From this value, the GA calculates the number of copies of the individual which are guaranteed to go into the intermediate population and a probability which will be used to determine whether an additional copy goes into the IP. To be more specific, if the value calculated by the fitness function is an integer part followed by a fractional part, then the integer part dictates the number of copies of the individual which are guaranteedto go into the IP, and the fractional part is used as a probability: another copy of the individual is added to the IP with this probability, e.g., if it was 1/6, then a random number between 1 and 6 would be generated and only if it was a six would another copy be added.

The fitness function will use an evaluation function to calculate a value of worth for the individual so that they can be compared against each other. Often the evaluation function is written g(c) for a particular individual c. Correctly specifying such evaluation functions is a tricky job, which we look at later. The fitness of an individual is calculated by dividing the value it gets for g by the average value for g over the entire population:

fitness(c) = g(c)/(average of g over the entire population)

We see that every individual has at least a chance of going into the intermediate population unless they score zero for the evaluation function.

As an example of a fitness function using an evaluation function, suppose our GA agent has calculated the evaluation function for every member of the population, and the average is 17. Then, for a particular individual c₀, the value of the evaluation function is 25. The fitness function for c₀ would be calculated as 25/17 = 1.47. This means that one copy of c₀ will definitely be added to the IP, and another copy will be added with a probability of 0.47 (e.g., a 100 side dice is thrown and only if it returns 47 or less, is another copy of c₀ added to the IP).

Mating

Once our GA agent has chosen the individuals lucky enough (actually, fit enough) to produce offspring, we next determine how they are going to mate with each other. To do this, pairs are simply chosen randomly from the set of potential parents. That is, one individual is chosen randomly, then another - which may be the same as the first - is chosen, and that pair is lined up for the reproduction of one or more offspring (dependent on the recombination techniques used). Then whether or not they actually reproduce is probabilistic, and occurs with a probability p_c. If they do reproduce, then their offspring are generated using a recombination and mutation procedure as described below, and these offspring are added to the next generation. This continues until the number of offspring which is produced is the required number. Often this required number is the same as the current population size, to keep the population size constant. Note that there are repeated individuals in the IP, so some individuals may become the proud parent of multiple children.

This mating process has some anology with natural evolution, because sometimes the fittest organisms may not have the opportunity to find a mate, and even if they do find a mate, it's not guaranteed that they will be able to reproduce. However, the analogy with natural evolution also breaks down here, because individuals can mate with themselves and there is no notion of sexes.

Recombination

During the selection and mating process, the GA repeatedly lines up pairs of individuals for reproduction. The next question is how to generate offspring from these parent individuals. This is called the recombination process and how this is done is largely dependent on the representation scheme being used. We will look at three possibilities for recombination of individuals represented as bit strings.

The population will only evolve to be better if the best parts of the best individuals are combined, hence recombination procedures usually take parts from both parents and place them into the offspring. In the One-Point Crossoverrecombination process, a point is chosen at random on the first individual, and the same point is chosen on the second individual. This splits both individuals into a left hand and a right hand side. Two offspring individuals are then produced by (i) taking the LHS of the first and adding it to the RHS of the second and (ii) by taking the LHS of the second and adding it to the RHS of the first. In the following example, the crossover point is after the fifth letter in the bit string:

Note that all the a's, b's, X's and Y's are actually ones or zeros. We see that the length of the two children is the same as that of the parents because GAs use a fixed representation (remember that the bit strings only make sense as solutions if they are of a particular length).

In Two-point Crossover, as you would expect, two points are chosen in exactly the same place in both individuals. Then the bits falling in-between the two points are swapped to give two new offspring. For example, in the following diagram, the two points are after the 5th and 11th letters:

Again, the a's, b's, X's and Y's are ones or zeros, and we see that this recombintion technique doesn't alter the string length either. As a third recombination operator, the inversion process simply takes a segment of a single individual and produces a single offspring by reversing the letters in-between two chosen points. For example:

Mutation

It may appear that the above recombinations are a little arbitrary, especially as points defining where crossover and inversion occur are chosen randomly. However, it is important to note that large parts of the string are kept in tact, which means that if the string contained a region which scored very well with the evaluation function, these operators have a good chance of passing that region on to the offspring (especially if the regions are fairly small, and, like in most GA problems, the overall string length is quite high).

The recombination process produces a large range of possible solutions. However, it is still possible for it to guide the search into a local rather than the global maxima with respect to the evaluation function. For this reason, GAs usually perform random mutations. In this process, the offspring are taken and each bit in their bit string is flipped from a one to a zero or vice versa with a given probability. This probability is usually taken to be very small, say less than 0.01, so that only one in a hundred letters is flipped on average.

In natural evolution, random mutations are often highly deleterious (harmful) to the organism, because the change in the DNA leads to big changes to way the body works. It may seem sensible to protect the children of the fittest individuals in the population from the mutation process, using special alterations to the flipping probability distribution. However, it may be that it is actually the fittest individuals that are causing the population to stay in the local maxima. After all, they get to reproduce with higher frequency. Hence, protecting their offspring is not a good idea, especially as the GA will record the best from each generation, so we won't lose their good abilities totally. Random mutation has been shown to be effective at getting GA searches out of local maxima effectively, which is why it is an important part of the process.

To summarise the production of one generation from the previous: firstly, an intermediate population is produced by selecting copies of the fittest individuals using probability so that every individual has at least a chance of going into the intermediate population. Secondly, pairs from this intermediate population are chosen at random for reproduction (a pair might consist of the same individual twice), and the pair reproduce with a given fixed probability. Thirdly, offspring are generated through recombination procedures such as 1-point crossover, 2-point crossover and inversion. Finally, the offspring are randomly mutated to produce the next generation of individuals. Individuals from the old generation may be entirely killed off, but some may be allowed into the next generation (alternatively, the recombination procedure might be tuned to leave some individuals unchanged). The following schematic gives an indication of how the new generation is produced:

16.3 Two Difficulties

The first big problem we face when designing an AI agent to perform a GA-style search is how to represent the solutions. If the solutions are textual by nature, then ASCII strings require eight bits per letter, so the size of individuals can get very large. This will mean that evolution may take thousands of generations to converge onto a solution. Also, there will be much redundancy in the bit string representations: in general many bit strings produced by the recombination process will not represent solutions at all, e.g., they may represent ASCII characters which shouldn't appear in the solution. In the case of individuals which don't represent solutions, how do we measure these with the evaluation function? It doesn't necessarily follow that they are entirely unfit, because the tweaking of a single zero to a one might make them good solutions. The situation is better when the solution space is continuous, or the solutions represent real valued numbers or integers. The situation is worse when there are only a finite number of solutions.

The second big problem we face is how to specify the evaluation function. This is crucial to the success of the GA experiment. The evaluation function should, if possible:

Return a real-valued number scoring higher for individuals which perform better with respect to the problem
Be quick to calculate, as this calculation will be done many thousands of times
Distinguish well between different individuals, i.e., give a good range of values

Even with a well specified evaluation function, when populations have evolved to a certain stage, it is possible that the individuals will all score highly with respect to the evalation function, so all have equal chances of reproducing. In this case, evolution will effectively have stopped, and it may be necessary to take some action to spread them out (make the evaluation function more sophisticated dynamically, possibly).

16.4 An Example Application

There are many fantastic applications of genetic algorithms. Perhaps my favourite is their usage in evaluating Jazz melodies done as part of a PhD project in Edinburgh. The one we look at here is chosen because it demonstrates how a fairly lightweight effort using GAs can often be highly effective. In their paper "The Application of Artificial Intelligence to Transportation System Design", Ricardo Hoar and Joanne Penner describe their undergraduate project, which involved representing vehicles on a road system as autonomous agents, and using a GA approach to evolve solutions to the timing of traffic lights to increase the traffic flow in the system. The optimum settings for when lights come on and go off is known only for very simple situations, so an AI-style search can be used to try and find good solutions. Hoar and Penner chose to do this in an evolutionary fashion. They don't give details of the representation scheme they used, but traffic light times are real-valued numbers, so they could have used a bit-string representation.

The evaluation function they used involved the total waiting time and total driving time for each car in the system as follows:

The results they produced were good (worthy of writing a paper). The two graphs below describe the decrease in overall waiting time for a simple road and for a more complicated road (albeit not amazingly complicated).

We see that in both cases, the waiting time has roughly halved, which is a good result. In the first case, for the simple road system, the GA evolved a solution very similar to the ones worked out to be optimal by humans. We see that GAs can be used to find good near-optimal solutions to problems where a more cognitive approach might have failed (i.e., humans still can't work out how best to tune traffic light times, but a computer can evolve a good solution).

16.5 Using Genetic Algorithms

If you go on to read more about evolutionary approaches, then you may encounter terminology which makes the analogy to natural evolutionary processes at the species level and at the genetics level (remember that it is the evolution of our genetic material that has shaped our evolution as a species). This analogy is as follows:

GA terminology	Genetics Terminology	Species Terminology
Population	DNA???	Population
Bit string	Chromosome	Organism
Bit (one or zero)	Gene	Genome???
Selection	???	Survival of the fittest
Recombination	Crossover	Inheritance
Mutation	Mutation	Mutation???

It is worth remembering however, that, as with artificial neural networks, this analogy is very loose, because people use different analogies, the anologies don't actually fit sometimes (hence the ???s above) and real evolutionary processes are extremely complicated affairs.

Russell and Norvig say that, if you can answer the following four questions, then you should consider using a GA approach to your problem solving task:

What is the fitness function?
How is an individual represented?
How are the individuals selected?
How do individuals reproduce?

GAs are similar to ANNs and CSPs, in as much as they might not be exactly the best (most efficient, etc.) way to proceed, but they do provide you with a quick and easy way of tackling the problem. Russell and Norvig put this explicitly: "Try a quick implementation before investing more time thinking about another approach". As we saw with the transport application described above (carried out by undergraduates), such initial efforts can often produce surprisingly good results, and it's not fair to say that GAs should only be used as a first try. Indeed, as we shall see in the next lecture, evolutionary approaches such as genetic programming can sometimes rival human performance and be the most suitable AI technique to use.

Open Mind Tree