Using Reinforcement Learning for scheduling purposes is tricky. A while back, I talked with some RL folks (Bonsai, since discontinued). They said that there are generally better tools available for scheduling than training an agent. They mentioned Gurobi as one possibility:
https://www.gurobi.com/
But that being said, maybe there is a way forward, especially because Gurobi isn't free.
As far as I can tell, the general idea would be to use a single action: which job should be started next. For that, I probably wouldn't use a sequence parameter, but instead discrete parameter from 1 to N. Note also that if you train an AI on a certain number of jobs, you'll always need to supply that number of jobs.
But then, when a job is chosen, you'll need some way to specify that the job isn't available anymore. For that, you'll need something called an action mask. It looks like you can do that with a Maskable PPO algorithm:
https://sb3-contrib.readthedocs.io/en/master/modules/ppo_mask.html
In addition, you'd probably need to send some kind of state information about the current process so the agent can learn to make good scheduling decisions, as part of your observation.
.
Jordan Johnson
Principal Software Engineer
>