登录查看更多内容

Genetic Algorithm

Mateusz Dorobek

Data Science ?????? Machine Learning ?? Data Analytics ?? Python - I've help implement AI in your projects ????

发布日期: 2020年3月24日

This article describes a simple Python implementation of a genetic algorithm. This heuristic algorithm searches the space for alternative solutions to the problem to find the best one.

Problem

The knapsack problem is one of the most frequently discussed optimization problems. The name of this task comes from the maximization problem of selecting objects so that their total value is as high as possible and at the same time they fit in the backpack.

Photo: Dean Pugh

In order to solve a problem, we need to generate the data that describes a certain task.

class Task:
  def __init__(self, n=None):
    self.size = np.random.randint(size, 2 * size)
    self.Weight_limit = np.random.randint(10 * self.size, 20 * self.size)
    self.Size_limit = np.random.randint(10 * self.size, 20 * self.size)
    self.weights = np.random.random(self.n) * 10 * self.Weight_limit / self.n
    self.sizes = np.random.random(self.n) * 10 * self.Size_limit / self.n
    self.costs = np.random.random(self.n) * self.n

Example generated task:

{'size': 6,
 'Weight_limit ': 74,
 'Size_limit ': 51,
 'weights ': [78.24, 116.79, 27.22, 16.37, 18.49, 105.31],
 'sizes': [14.01, 78.47, 34.49, 20.80, 2.29, 41.89],
 'costs': [1.72, 0.45, 4.49, 4.85, 4.96, 3.69]
}

Individual representation

Genetic algorithms take their inspiration from nature, where each animal has its own DNA code. The genetic algorithm tries to create such an "animal" that will be the best in a given task. To store individual object I've created Individual class.

class Individual:
  def __init__(self, genome):
    self.genome = genome
 
  def mutate(self, mutation_rate):
    genome_size = self.genome.shape[0]
    no_of_genes_to_change = int(np.ceil(genome_size * mutation_rate))
    genes_to_change = random.choices(range(genome_size), k=no_of_genes_to_change)
    self.genome[genes_to_change] = -self.genome[genes_to_change] + 1

Genome is binary vector eg. [1,0,1] which tells us that we take items 0 and 2.

Population

A set of Individuals is nothing else than a Population.

class Population:
  def __init__(self, genome_size=None, pop_size=None):
    self.population = []
    if genome_size != None and pop_size != None:
      population_array = np.random.choice([0, 1], size=(pop_size, genome_size))
      for genome in population_array:
        individual = Individual(genome)
        self.population.append(individual)
    self.size = len(self.population)

The population is initialized by Individuals with the random genome.

Algorithm

The genetic algorithm is very intuitive due to its straightforward inspiration from nature.

INITIALIZATION - A certain initial population is drawn.
SELECTION - The population is evaluated (selection). Best-adapted individuals take part in the reproduction process.
EVOLUTION - Genotypes of selected individuals are subjected to evolutionary operators:
CROSSOVER - genotypes of new individuals are created by combining parental genotypes.
MUTATION - genotypes are subjected to minor random changes.

This algorithm similar to many other heuristic algorithms is iterative, which means, that its steps are repeated until we achieve satisfying results.

Initialization

We have discussed it earlier, but one small detail that may be important in implementation is that initialization with equal probability for taking or leaving items may be too much or sometimes to less.

To gain control over initialization we can simply set these probabilities.

np.random.choice([0, 1], size=(pop_size, genome_size), p=[0.9, 0.1])

Selection - Tournament

The selection of the individual to reproduce can be done in multiple ways, and choosing the best isn't always the most effective method.

Brak alternatywnego tekstu dla tego zdj?cia

Photo: Fas Khan

One of the simplest methods and most effective at once is Tournament Selection. This method implies choosing the best individual from a smaller subset of the population. This guarantees to choose a strong candidate but doesn't imply always choosing the best.

class Population:
  def tournament(self, tournament_size, task):
    selected = random.choices(self.population, k=tournament_size)
    evaluation = [elem.evaluate(task) for elem in selected]
    idx_best_individual = evaluation.index(max(evaluation))
    return selected[idx_best_individual]

tournament_size is a parameter that tells what is the size of a subset of the population from which we choose the best individual. You can also spot a evaluate() function. I'll explain it in the following section.

Evaluation

To see how our individual is performing we need to check what is the total value that he has gathered, and if the limits are met. If our individual is too greedy and exceeds one of the limits (summary weight or size), then its result is penalized by zeroing its score,

class Individual:
  def evaluate(self, task):
   weight_sum = (self.genome * task.weights).sum()
   sizes_sum = (self.genome * task.sizes).sum()
   costs_sum = (self.genome * task.costs).sum()
   if weight_sum <= task.Weight_limit and sizes_sum <= task.Size_limit:
     return costs_sum
   else:
     return 0

Crossover

After we have selected two parents we can make a crossover. In this process, a new individual is created. Its genome is a mixture of the parents' genome. Our algorithm doesn't necessarily need to always make the crossover to create a new individual. The parameter crossover_rate tells us how often new individual should be the result of crossover, and how often it can be simply just a copy of one of its parents.

class Population:
  def crossover(self, crossover_rate, parent_1, parent_2, task):
    if np.random.random() < crossover_rate:
      splitting_point = np.random.randint(1, len(parent_1.genome))
      result_genome = np.concatenate(
                        [parent_1.genome[:splitting_point], 
                        parent_2.genome[splitting_point:]]
                      ) 
      result = Individual(result_genome)
    else:
      result = parent_1
    return result

In this implementation, I've selected a splitting_point and used it to choose a part of the genome that will be inherited from one parent and genome from another.

Mutation

This process is just adding a random change to some of the bytes of the genome. As in nature, the genome of the individual changes through life because of the mutations. This randomness allows our algorithm not to stuck in local optima for too long and finds a better solution. The mutation_rate parameter tells how much of the genomes should be changed.

class Individual:
  def mutate(self, mutation_rate):
    genome_size = self.genome.shape[0]
    no_of_genes_to_change = int(np.ceil(genome_size * mutation_rate))
    genes_to_change = random.choices(range(genome_size), k=no_of_genes_to_change)
    self.genome[genes_to_change] = -self.genome[genes_to_change] + 1

In the last line, I'm just changing all selected 0's to 1's and otherwise.

Photo: museumweek2015

Genetic algorithm

After we had all the initialization, selection and evolution operators we can construct the code for the whole algorithm. Before we run the algorithm we need to set all the important parameters. in our case, we have four of them.

class GeneticAlgorithm:
  def __init__(self, populations_size, tournament_size, 
               crossover_rate, mutation_rate):
    self.populations_size = populations_size
    self.tournament_size = tournament_size
    self.crossover_rate = crossover_rate
    self.mutation_rate = mutation_rate

The algorithm will be implemented in the method fit(). Apart from the task object, we pass a number of iterations that our algorithm will work.

class GeneticAlgorithm:
  def fit(self, iterations, task):
    population = Population(genome_size=task.n,pop_size=self.populations_size)
    for _ in range(iterations):
      new_population = Population()
      for _ in range(population.size):
        parent_1 = population.tournament(self.tournament_size, task)
        parent_2 = population.tournament(self.tournament_size, task)
        child = population.crossover(self.crossover_rate,parent_1, parent_2,task)
        child.mutate(self.mutation_rate)
        new_population.add_child(child)
      population = new_population

Testing parameters

Great we have our algorithm implemented, let's test how the value of the parameters affects its performance. To test the performance of the algorithm I've collected the evaluation of the best individual in population for each iteration. I've made 5 different runs of each algorithm to measure the mean and standard deviance for these tests and plotted it on the error bar for each value of the tested parameter.

Crossover rate

This parameter tells how often the crossover happens. Low crossover means that we are relying mostly on mutation, and rarely mix the parents to achieve a new genome. A high value means that we want to create most of the new individuals as a result of crossover.

Chart above shows that high values of crossover allow the algorithm to converge and work more stable. In this example algorithm achieved betted result with crossover 0.9 than 0.5.

Mutation rate

This parameter describes how much genome elements should be changed to the opposite value.

As we can see High value of mutation leads our algorithm to 0 very fast. Mutation rate = 0.01 has similar results to 0.0001 but is noisier and led us to a smaller local maximum. The smallest mutation rate the betted but there might be a middle point where a mutation is more useful.

Population size

The size of the population means greater diversity between individuals in each iteration, but high values of this parameter increase the calculation time.

Small population died early and didn't perform much better the random initialization. population size 10 is working better, but still, it's too small and converges slowly with a lot of anomalies. The highest the population the better, but there is a limit where it only increases the computation time and not the score.

Tournament size

This parameter tells how much the subset of the population will be taken into consideration while selecting parents for reproduction.

A small value is equal to random choose and high value is more likely to just choosing the best individual. This parameter depends on population size because we choose tournament_size/populations_size*100% of the population, which is a relative value.

Conclusions

Now we all understand why does al these parameters matter, but still, there is so much randomness in AG, why is that? The reason for that is easy. Algorithms that are focusing to find the minimum or maximum as in our example tend to stuck in local optima, and the role of a good algorithm converges to the global optimum by diverging while in a local optimum.

Genetic algorithms have multiple mechanisms to save us from digging in local optima and we can see that in the chart comparing the mean of population evaluation vs best evaluation. You can see that best is continuously growing, but the mean is oscillation very strong, which means that our algorithm tries to find new solutions, and in almost every iteration this better solution is found.

What next?

If you are interested in how I have implemented all the other details of the project check out the notebook on my GitHub.
If you are interested in other similar methods check simulated annealing, divide and conquer and other heuristic algorithms.
If you are interested in my other projects, read my blog and LinkedIn.

22 Mar 2020 Mateusz Dorobek

Rafa? Porowski, Ph.D.

Data & AI Scientist? Physics-Informed Machine Learnig Specialist ?Technocrate & Innovation Coach ? Experienced Engineer

4 年

Thx!

1 次回应

要查看或添加评论，请登录

Mateusz Dorobek的更多文章

Future of music in the AI world

2019年7月16日

Future of music in the AI world

Inspired by Agnieszka W?clewska comment I've decided to create an article discussing the fate of music as a field of…
How Artificial Intelligence generates Jazz?

2019年2月28日

How Artificial Intelligence generates Jazz?

Hey, check out my newest project - Music Generator on GitHub. I've used Deep Convolutional GAN to generate music with…

4 条评论

Genetic Algorithm

Mateusz Dorobek

Data Science ?????? Machine Learning ?? Data Analytics ?? Python - I've help implement AI in your projects ????

Problem

Individual representation

Population

Algorithm

Initialization

Selection - Tournament

Evaluation

Crossover

Mutation

Genetic algorithm

Testing parameters

Crossover rate

Mutation rate

Population size

Tournament size

Conclusions

What next?

Mateusz Dorobek的更多文章

社区洞察

其他会员也浏览了

What is Genomics AI and why it matters?

Genetic Algorithms

Optimizing Spark Configuration with Genetic Algorithm - Evaluation

tidyomics: Bioconductor with tidy R for OMICS ??, SpliceApp for Splicing Errors??, Genopyc: Predicting Variant Functionality in Python??

?? Streamlining Genomics Data Analysis: Snakemake and Nextflow Pipelines ??

Leonhard Euler Meets Genome Sequencing

5 Reasons to Consider A DNA Test: Unlock Your Potential!

WTFunding: Bioinformatics & Genetic Data

Understanding The Impact Of Bioinformatics On Our Everyday Lives

Decoding Genetic Patterns: A Comprehensive Exploration of Pairwise and Multiple Sequence Alignment

Problem

Individual representation

Population

Algorithm

Initialization

Selection - Tournament

Evaluation

Crossover

Mutation

Genetic algorithm

Testing parameters

Crossover rate

Mutation rate

Population size

Tournament size

Conclusions

What next?

Mateusz Dorobek的更多文章

Future of music in the AI world

How Artificial Intelligence generates Jazz?

社区洞察

其他会员也浏览了

What is Genomics AI and why it matters?

Genetic Algorithms

Optimizing Spark Configuration with Genetic Algorithm - Evaluation

tidyomics: Bioconductor with tidy R for OMICS ??, SpliceApp for Splicing Errors??, Genopyc: Predicting Variant Functionality in Python??

?? Streamlining Genomics Data Analysis: Snakemake and Nextflow Pipelines ??

Leonhard Euler Meets Genome Sequencing

5 Reasons to Consider A DNA Test: Unlock Your Potential!

WTFunding: Bioinformatics & Genetic Data

Understanding The Impact Of Bioinformatics On Our Everyday Lives

Decoding Genetic Patterns: A Comprehensive Exploration of Pairwise and Multiple Sequence Alignment