How do I do this Reinforcement learning training

How do I do this Reinforcement learning training

ting_wei_h
Not applicable
67 Views
8 Replies
Message 1 of 9

How do I do this Reinforcement learning training

ting_wei_h
Not applicable

[ FlexSim 23.0.0 ]

I have done the tutorial. Now I'm trying to extend this example.

This is my model:

1678539112553.png

This is my SetupTime Table:

1678539154674.png

If I want to increase the model given in the tutorial from one production line to three, should I change the Observation Space and Action Space in Reinforcement Learning to MultiDiscrete, and set Decision Events to three?

Like this:

1678539216517.png

This is my model file:

practice2023.3.16.fsm


0 Likes
Accepted solutions (1)
68 Views
8 Replies
Replies (8)
Message 2 of 9

moehlmann_fe
Observer
Observer
Accepted solution

Since all three processors operate in the same way and are thus equivalent for the Reinforcement Learning agent, you shouldn't need to train a new agent if you already have trained one for one processor.

The easiest way to use more processors would probably be to copy the Reinforcement Learning Tool and only adjust which event triggers them, whose "lastitemtype" value is written to the parameter in the On Observation code and which reward value is returned (if they are different).

1678693496275.png

1678693542809.png

If you come up with a good way to know which processor triggered the observation (possibly keep track of who will finish (and thus pull) next), you could also do this with multiple events in a single window.

Message 3 of 9

ting_wei_h
Not applicable

So what I should do is to use three Reinforcement Learning Tools and a Parameter for each Observation and Action?

Like this:

1678780011896.png

1678780045066.png1678780059364.png

Thank you for your answer, sorry to trouble you again

0 Likes
Message 4 of 9

moehlmann_fe
Observer
Observer
If all three processors are equivalent in your model, yes.

If there is a difference (for example the items are not distributed equally or the processors use different tables for their setup times) then you should add a second observation parameter that provides the RL algorithm with the information which processor it is currently making a decision for.

0 Likes
Message 5 of 9

ting_wei_h
Not applicable

But when I use this way to do my training, the result will be like this:

1678782474983.png

1678782503207.png

1678782530755.png

1678782559489.png

"ep_len_mean"and "ep_rew_mean" are all smaller than before training. I would like to know which part is wrong.

0 Likes
Message 6 of 9

moehlmann_fe
Observer
Observer

It could be the case that only the first Reinforcement Learning Tool in the toolbox is used for learning. I am sorry if I provided wrong information to you. Multiple defined tools do work when using a trained agent.

As I said originally. In your case it might even be more efficient to train the agent with a single processor and then just have the other processors use that same agent, since the decision making wouldn't really differ between them.

I have attached an example in which all processors use a single RL tool. Be mindful however that the current way you have set up your reward (a single shared sink and thus reward) will likely result in slower learning. Due to how the reward is calculated, it is better if two processors finish with little time between them, than if the finishes are paced evenly.

practice2023316_2.fsm

0 Likes
Message 7 of 9

ting_wei_h
Not applicable

Please don't say that you really helps me a lot.

I want to ask some logic of your example.

In your example, because the "On Observation" and "Reward Function" in your Reinforcement learning tool are only related to Processor1, so the training of my model only trains my first production line, and then through Decision Events I can applied the strategy that I already trained to the remaining two production lines?

So when I am training my Data through Python, will "ep_len_mean" and "ep_rew_mean" only receive the input of the first production line?

I want to know if my understanding is correct

0 Likes
Message 8 of 9

moehlmann_fe
Observer
Observer
Yes, though when training the agent on a single processor the other two should not run at all (disconnect them from the initial queue). Because otherwise items entering the sink from the other two processors will influence the reward value without the RL algorithm having any information as to "why" this happens, so to speak.

In my understanding, the algorithm will figure still learn despite this "noise" in the data, but it will take quite a bit longer.

0 Likes
Message 9 of 9

Jeanette_Fullmer
Community Manager
Community Manager

Hi @Ryan_Wei, was Felix Möhlmann's answer helpful? If so, please click the "Accept" button at the bottom of their answer. Or if you still have questions, add a comment and we'll continue the conversation.

If we haven't heard back from you within 3 business days we'll auto-accept an answer, but you can always unaccept and comment back to reopen your question.

0 Likes