Removing actions by reinforcement learning

Removing actions by reinforcement learning

katerina_fratczak
Participant Participant
252 Views
5 Replies
Message 1 of 6

Removing actions by reinforcement learning

katerina_fratczak
Participant
Participant

[ FlexSim 23.0.15 ]

Hello,

I need to pull items 1 to 50, but each is available only once. How to set the Action Parameters please? Shall I use Integer 1 to 50, or Options or anything else?1732266805649.png

How to remove already chosen action from the action parameters, so that in the next round RL algorithm could choose only from the remaining item numbers?

Thank you, Katerina

0 Likes
Accepted solutions (1)
253 Views
5 Replies
Replies (5)
Message 2 of 6

ralf_gruber
Collaborator
Collaborator
Accepted solution

Hi Katerina,

The parameter type "Sequence" is desigened to do what you are asking:

1732782855370.png

You choose sequence length and it creates an array with that length and fills it with consecutive integers.

@Jordan Johnson Can you please chip in about how this will work in an RL environment?

Thx

Ralf

0 Likes
Message 3 of 6

katerina_fratczak
Participant
Participant

Hi Ralf,

thank you for your answer. Sequence we have already tried, but there is no possibility to connect it with the RL Tools Parameters - see my answer from yesterday: Sequence in reinforcement learning - FlexSim Community.

Since I wrote the question we have tried to use Options 1-50 and removed a chosen option from them after each round using GlobalVariables. Like this the random run in FlexSim works fine, each number is selected only once.

Here random run in FlexSim - chosen numbers are removed from GlobalVariables and Options are updated according to it:1732890228338.png

But when we run RL, Python script reads the Action Parameters probably only in the beginning of the training and chooses the same numbers repeatedly. As they are not available in Options any more, FlexSim uses the last row from the Options instead. Like this the RL agent cannot learn properly. There is also confusion with rows and numbers on them (third row has number 6, which is then used in the model...).

1732890553301.png


1732890841360.png


Is there any possibility please, how to update available Action Parameters into Python after each Action?

Thank you, Katerina



0 Likes
Message 4 of 6

JordanLJohnson
Autodesk
Autodesk

Using Reinforcement Learning for scheduling purposes is tricky. A while back, I talked with some RL folks (Bonsai, since discontinued). They said that there are generally better tools available for scheduling than training an agent. They mentioned Gurobi as one possibility:
https://www.gurobi.com/

But that being said, maybe there is a way forward, especially because Gurobi isn't free.

As far as I can tell, the general idea would be to use a single action: which job should be started next. For that, I probably wouldn't use a sequence parameter, but instead discrete parameter from 1 to N. Note also that if you train an AI on a certain number of jobs, you'll always need to supply that number of jobs.

But then, when a job is chosen, you'll need some way to specify that the job isn't available anymore. For that, you'll need something called an action mask. It looks like you can do that with a Maskable PPO algorithm:
https://sb3-contrib.readthedocs.io/en/master/modules/ppo_mask.html

In addition, you'd probably need to send some kind of state information about the current process so the agent can learn to make good scheduling decisions, as part of your observation.

.


Jordan Johnson
Principal Software Engineer
>

0 Likes
Message 5 of 6

katerina_fratczak
Participant
Participant
Hello Jordan, thank you very much for your answer. We will try to use the action mask, as you mentioned.


0 Likes
Message 6 of 6

JordanLJohnson
Autodesk
Autodesk

One option is to see the article I wrote on this topic, complete with an example:

https://answers.flexsim.com/articles/173513/using-reinforcement-learning-for-job-sequencing.html

.


Jordan Johnson
Principal Software Engineer
>

0 Likes