This article describes an example of Reinforcement Learning being used to solve scheduling problems. See the model and python files in the attached zip file.
This model represents a generic sheet metal processing plant. There are four machines in series. Each job requires time on all four machines. Jobs come in batches of 10. A poor sequence of jobs will cause blocking between items, lowering throughput.
If the time between batches is long, such as a shift or a day, you could use the optimizer to determine the best sequence. If the time between batches is short, however, the optimizer may not be feasible. For real sequencing problems, the time to find a good sequence can be anywhere from 5 minutes to an hour, or even longer. This makes it impractical for high-velocity situations.
The attached model requests a decision every time the first machine in the series is available. The only action is an index for the Nth available job. So the decision can be interpreted as "which job should I do next?"
The general solution is to use reinforcement learning. However, this problem required customized python scripts:
pip install sb3-contribto install it.
After training for 500k time-steps, the agent learns to choose jobs moderately well. If you run the inference script, you can use the experimenter to compare a random policy to a trained agent: