Topic 7A - Reinforcement Learning

When you cannot check every possibility, you climb toward the best answer you can find — or let evolution find it for you.

Learning Objectives

By the end of this topic, you should be able to:

Learning Activities

To help you meet the learning objectives, we have prepared three readings. Please complete them in order.

Readings

These readings intentionally build on each other, so please complete them in order.

Checking for Understanding

Review the Learning Objectives at the top of this page. The questions below will help you check your understanding before moving on to Topic 7B.

Optimization and Fitness Landscapes

  1. A school district wants to assign 40 teachers to 40 classrooms to minimize travel time between rooms for team-teaching pairs. There are more than 8 × 1047 possible assignments. Why is exhaustive search impossible here? What does this tell us about the kind of algorithm needed?
  2. Describe a fitness landscape for the classroom assignment problem. What does a "peak" represent? What does a "valley" represent? What would it mean for two solutions to be "neighbors"?

Hill Climbing

  1. Trace hill climbing on the following simplified scenario: a teacher is trying to find the best order for five topics in a unit, scored by student engagement ratings from a pilot. She starts with order [A, B, C, D, E] which scores 62. Swapping B and C produces a score of 71. Swapping A and B from the new arrangement produces 68. Swapping D and E produces 74. Swapping C and D produces 73. What does hill climbing do at each step? Where does it stop?
  2. A hill climbing agent searching for a good weekly staff meeting schedule keeps trying small changes but every change it considers makes the score worse. It stops and reports the current schedule as optimal. A staff member points out that a completely different schedule structure — moving the meeting to a different day entirely — would be dramatically better. Which hill climbing pathology does this illustrate? Why couldn't the agent find the better solution?

Genetic Algorithms

  1. A genetic algorithm is being used to optimize a school bus route. The population consists of ten candidate routes. After evaluating fitness, the two best-performing routes are selected as parents. Their routes are split at the midpoint and recombined to create two offspring routes. One offspring has a randomly swapped stop order.
    • Which step is selection? Which is crossover? Which is mutation?
    • Why might the offspring routes be better than either parent?
    • Why might they be worse? What happens in the next generation if they are?
  2. Explain in your own words why genetic algorithms are better than hill climbing at escaping local optima. What specific mechanism allows a genetic algorithm to explore parts of the solution space that hill climbing would never reach?
  3. Both hill climbing and genetic algorithms improve solutions through a reward signal rather than labeled training data. In one or two sentences, explain how this connects them to the reinforcement learning category from Topic 6D.

It is completely fine to revisit the readings as you work through these questions.

Extend Your Learning

These optional topics go beyond the core learning goals but are rich avenues for deeper understanding.

  • Simulated annealing
    • A variant of hill climbing that sometimes accepts a worse solution deliberately — like cooling metal slowly to reach a more stable structure. This controlled randomness helps it escape local optima while still converging on a good solution.
  • Evolutionary programming
    • When genetic algorithms are applied specifically to evolving programs rather than solutions — letting code itself evolve toward a goal rather than being explicitly written. The textbook's BACON and AUTOCLASS systems are early examples of this direction.
  • Neuroevolution
    • Using genetic algorithms to evolve the structure and weights of neural networks, rather than training them by gradient descent. An active research area that combines Topics 7A and 7B in interesting ways.
  • The traveling salesman problem
    • One of the classic benchmarks for optimization algorithms. Both hill climbing and genetic algorithms have been applied to it extensively, and comparing their performance on this problem illustrates their respective strengths and limits concretely.