Reinforcement Learning: Received action is not used during inference

arthur_ml · ‎09-20-2022

[ FlexSim 22.2.1 ]

Hi everyone,

I am following your tutorial for setting up an Reinforcement Learning pipeline with the "ChangeoverTimesRL" model. Everything works fine except that the action sent from the model and received from FlexSim is not in control of the process. The "ItemType" parameter has a constant value of 2. First, I assumed that my model is trained poorly and just gives all the time 2 as output.

Therefore, I added a print in the python code as well as in the "On Request Action - Query a server for a predicted action from a trained model" script to check the action values. The values match and change over time. I.e. communication between the model (python) and the flexsim simulation model is not the problem, the actions are received correctly. Somehow the received action is not passed to ItemType and therefore the RL agent is not in control of the process.

Can someone please help?

Thanks in advance!
Arthur

P.S.
Since I am not able to upload my ChangeoverTimesRL.fsm model here, I put it on google drive:
https://drive.google.com/file/d/1QK0hiBy-tF4BssXmvWCRlhVAWeTuKBzs/view?usp=sharing

kavika_faleumu · ‎09-21-2022

Hey @Arthur Ml, I just did this tutorial last week and didn't have this problem. I looked through all of your code and it looks fine until I run the flexsim_env.py, no matter what action or state you're in, you get the same reward. Have you tried remaking the model and going through the tutorial again? Is your declared Python version on FlexSim the same as the version you're using to run the scripts?

Jeanette_Fullmer · ‎09-22-2022

Hello @Arthur Ml,

I did the tutorial is FlexSim 22.0 and 22.2.

I am running into the same problem you are in 22.2 but it is working correctly in 22.0. I am sending this in to the Development team as a bug.

arthur_ml · ‎09-22-2022

Thx @Kavika F for your answer. I went through the tutorial twice. What do you mean with Python version? Is there any constraint which python versions are to be used? I used 3.9, but I don't believe that the error is coming from Python, since the actions are received correctly in FlexSim. But somehow the received actions are not passed to ItemType.

arthur_ml · ‎09-22-2022

Thx @Jeanette F for your answer:-) Is there any workaround to be able to run it in 22.2? And would it be possible to ge informed from the development team, when they find the bug?

Jeanette_Fullmer · ‎09-23-2022

I do not have a workaround but I will let the Dev team know you are interested in being notified. Currently the only suggestion is to use a previous version of FlexSim for RL.

Here is the 22.0 version so you don't have to rebuild the model. ChangeoverTimesRL.fsm

JordanLJohnson · ‎09-23-2022

This is a bug introduced in version 22.2. The issue is a json parsing issue, where we incorrectly parse single values. You can work around the issue by modifying flexsim_env.py, in the _take_action method:

Original, around line 110:

        return state, reward, done
    
    def _take_action(self, action):
        actionStr = json.dumps(action, cls=NumpyEncoder)
        if self.verbose:
            print("Sending Action message: " + actionStr)
        actionMessage = "TakeAction:" + actionStr + "?"
        self._socket_send(actionMessage.encode())


    def _socket_init(self, host, port):
        if self.verbose:

The fix is to add two lines of code, after the def _take_action line, and before the actionStr = json.dumps() line.

Code to insert:

        if not hasattr(action, "__len__"):
            action = [action]

Fixed code:

        return state, reward, done
    
    def _take_action(self, action):
        if not hasattr(action, "__len__"):
            action = [action]
        actionStr = json.dumps(action, cls=NumpyEncoder)
        if self.verbose:
            print("Sending Action message: " + actionStr)
        actionMessage = "TakeAction:" + actionStr + "?"
        self._socket_send(actionMessage.encode())


    def _socket_init(self, host, port):
        if self.verbose:

NOTE: in python, indentation is critical. The code must be indented as above. Furthermore, be sure to use the same kind of whitespace to indent. If the other lines use spaces to indent, use spaces on the new lines. If the other lines use tabs, then use tabs.

The basic idea of this fix is that, since the bug is that, since single numbers don't parse correctly, you can check if the value is a single value, and if it is, put it in an array. Arrays do parse correctly.

.

Jordan Johnson
Principal Software Engineer
>

arthur_ml · ‎09-24-2022

Thx @Jordan Johnson . That did the trick!

Just an addition to your solution. The code snippet

            if not hasattr(action, "__len__"):
                action = [action]

has also to be added in flexsim_inference.py in the method

def _handle_reply(self, params):

so that also during inference the agent is in control of the actions.

Community

Reinforcement Learning: Received action is not used during inference

Reinforcement Learning: Received action is not used during inference

Reinforcement Learning: Received action is not used during inference