topic Removing actions by reinforcement learning in FlexSim Forum

Removing actions by reinforcement learning

katerina_fratczak — Fri, 22 Nov 2024 09:22:06 GMT

[ FlexSim 23.0.15 ]

Hello,

I need to pull items 1 to 50, but each is available only once. How to set the Action Parameters please? Shall I use Integer 1 to 50, or Options or anything else?

How to remove already chosen action from the action parameters, so that in the next round RL algorithm could choose only from the remaining item numbers?

Thank you, Katerina

Re: Removing actions by reinforcement learning

ralf_gruber — Thu, 28 Nov 2024 08:37:14 GMT

Hi Katerina,

The parameter type "Sequence" is desigened to do what you are asking:

You choose sequence length and it creates an array with that length and fills it with consecutive integers.

@Jordan Johnson Can you please chip in about how this will work in an RL environment?

Thx

Ralf

Re: Removing actions by reinforcement learning

katerina_fratczak — Fri, 29 Nov 2024 14:39:52 GMT

Hi Ralf,

thank you for your answer. Sequence we have already tried, but there is no possibility to connect it with the RL Tools Parameters - see my answer from yesterday: Sequence in reinforcement learning - FlexSim Community.

Since I wrote the question we have tried to use Options 1-50 and removed a chosen option from them after each round using GlobalVariables. Like this the random run in FlexSim works fine, each number is selected only once.

Here random run in FlexSim - chosen numbers are removed from GlobalVariables and Options are updated according to it:

But when we run RL, Python script reads the Action Parameters probably only in the beginning of the training and chooses the same numbers repeatedly. As they are not available in Options any more, FlexSim uses the last row from the Options instead. Like this the RL agent cannot learn properly. There is also confusion with rows and numbers on them (third row has number 6, which is then used in the model...).

Is there any possibility please, how to update available Action Parameters into Python after each Action?

Thank you, Katerina

Re: Removing actions by reinforcement learning

JordanLJohnson — Mon, 02 Dec 2024 20:43:45 GMT

Using Reinforcement Learning for scheduling purposes is tricky. A while back, I talked with some RL folks (Bonsai, since discontinued). They said that there are generally better tools available for scheduling than training an agent. They mentioned Gurobi as one possibility:
https://www.gurobi.com/

But that being said, maybe there is a way forward, especially because Gurobi isn't free.

As far as I can tell, the general idea would be to use a single action: which job should be started next. For that, I probably wouldn't use a sequence parameter, but instead discrete parameter from 1 to N. Note also that if you train an AI on a certain number of jobs, you'll always need to supply that number of jobs.

But then, when a job is chosen, you'll need some way to specify that the job isn't available anymore. For that, you'll need something called an action mask. It looks like you can do that with a Maskable PPO algorithm:
https://sb3-contrib.readthedocs.io/en/master/modules/ppo_mask.html

In addition, you'd probably need to send some kind of state information about the current process so the agent can learn to make good scheduling decisions, as part of your observation.

Re: Removing actions by reinforcement learning

katerina_fratczak — Tue, 03 Dec 2024 09:05:43 GMT

Hello Jordan, thank you very much for your answer. We will try to use the action mask, as you mentioned.

Re: Removing actions by reinforcement learning

JordanLJohnson — Thu, 05 Dec 2024 19:57:08 GMT

One option is to see the article I wrote on this topic, complete with an example:

https://answers.flexsim.com/articles/173513/using-reinforcement-learning-for-job-sequencing.html