Reinforcement learning tutorial: state values and mismatch between messages

Reinforcement learning tutorial: state values and mismatch between messages

sz_
Not applicable
3 Views
2 Replies
Message 1 of 3

Reinforcement learning tutorial: state values and mismatch between messages

sz_
Not applicable

[ FlexSim 24.1.0 ]

Hello,

I’ve been following this tutorial: https://docs.flexsim.com/en/22.1/ModelLogic/ReinforcementLearning/Training/Training.html und have successfully implemented and run the two Python scripts, flexsim_env.py and flexsim_training.py. However, I have trouble understanding parts of the output. I've attached a screenshot for reference.

1) In the FlexSim model, the "action" and "observation" parameters ("LastItemType“ and "ItemType“) are defined to have values between 1 and 5. However, in the output, the state values range from 0 to 4. Why is there this discrepancy between the expected state range in the model and the observed state values in the output?

2) At the beginning of each iteration, the "state" values from the Action and Observation messages don’t match. After a few simulation steps, the values do align, but why are they initially inconsistent?

Thank you!

1728925547594.png

Model.fsm

0 Likes
Accepted solutions (1)
4 Views
2 Replies
Replies (2)
Message 2 of 3

moehlmann_fe
Observer
Observer
Accepted solution

1) I believe a discrete parameter with N possible values is always mapped to the range [0, N-1]. For example, if the possible values were 3, 6, 9 and 12, the RL agent would "see" the values 0, 1, 2, 3.

2) Not all types of items will be available to pull at the start of run. When the requested type is not available the demo model will instead pull the first item in the queue.

Message 3 of 3

Jeanette_Fullmer
Community Manager
Community Manager

Hi @sz, was Felix Möhlmann's answer helpful? If so, please click the "Accept" button at the bottom of their answer. Or if you still have questions, add a comment and we'll continue the conversation.

If we haven't heard back from you within 3 business days we'll auto-accept an answer, but you can always comment back to reopen your question.

0 Likes