reward function

reward function

mhosseini457NG
Advocate Advocate
35 Views
7 Replies
Message 1 of 8

reward function

mhosseini457NG
Advocate
Advocate

[ FlexSim 24.1.0 ]

for a reward function in the RL tool, how i can set up the reward somehow so that it refers to a row in the Performance Measures table?

For example:

 Reward = (Quantity_i ) - (StayTime_i)

where Quantity_i shows the items in a queue for each item type i ( this has been set in the performance measure table) and StayTime_i is the stay time of item type i in a rack ( this also has been set in a performance measure table). The sign of StayTime_i is negative because i want it to penalize for the time it remains in the racks.



0 Likes
Accepted solutions (1)
36 Views
7 Replies
Replies (7)
Message 2 of 8

moehlmann_fe
Observer
Observer
Accepted solution

Performance measure values are flexscript nodes. You need to evaluate the node (passing in the stored reference if needed) to get the actual value.

capture1.png

treenode pfmValue = Model.find("/Tools/PerformanceMeasureTables/PerformanceMeasures>variables/performanceMeasures/1/Value");
Variant value = pfmValue.subnodes[1].evaluate(pfmValue.subnodes[2].value);
0 Likes
Message 3 of 8

Jeanette_Fullmer
Community Manager
Community Manager

Hi @Maryam H2, was Felix Möhlmann's answer helpful? If so, please click the "Accept" button at the bottom of their answer. Or if you still have questions, add a comment and we'll continue the conversation.

If we haven't heard back from you within 3 business days we'll auto-accept an answer, but you can always comment back to reopen your question.

0 Likes
Message 4 of 8

mhosseini457NG
Advocate
Advocate

Hi @Felix Möhlmann @Jeanette F

the code above does not return any value. I have both the target inventory levels in a parameters table and the current inventory levels in a performance measure table. If I want to penalize the variation from the target inventory levels and encourage the agent to take actions that minimize this penalty, how should I structure the reward function? Do you have an example I could reference?

I was thinking how i can define a reward function as below to start:

def reward_function(current_inventory, target_inventory):
    # Calculate the difference between current and target inventory levels
    deviation = abs(current_inventory - target_inventory)
    # Penalize the deviation (negative reward)
    reward = -deviation
    # scale the penalty by a factor 
    penalty_factor = 0.1 # Adjustable
    reward = -penalty_factor * deviation
    return reward

Also, is there a way to instruct the agent to minimize the frequency of actions (such as placing orders for item types and receiving them in queues) in order to reduce ordering costs and extend the time interval between orders as much as possible? If so, how can I do this?


0 Likes
Message 5 of 8

mhosseini457NG
Advocate
Advocate
hi @Felix Möhlmann any idea about my question?
0 Likes
Message 6 of 8

moehlmann_fe
Observer
Observer

The code from my original answer does not. I just tested it again in version 24.0.2. You might have to adjust the path to get the value of the correct PFM though. The "1" in the path is the rank of the performance measure.

The fundamental logic of your code makes sense. It's just not FlexScript. I have heard and read that clamping the reward to lie between -1 and 1 works best for many RL algorithms, so that might be worth trying.

If you want to define a function in FlexSim, have a look at user commands. Your logic in a user command (plus clamping the value to the [-1, 1] interval) would look something like this:

double current_inventory = param(1);
double target_inventory = param(2); double reward = -Math.fabs(current_inventory - target_inventory); reward *= 0.1; reward = Math.min(1, Math.max(reward, -1)); return reward;

You determine when a decision is made by setting up the decision events. If the agent can influence this (for example ordering a larger quantity and a decision is only made when stock falls below a certain level) then it should learn to do so, if the reward function takes into account the amount of time since the last decision.

0 Likes
Message 7 of 8

mhosseini457NG
Advocate
Advocate

@Felix Möhlmann I have two questions:

first, whatever performance measure (rank) I try it does not turn any value, also to get the path to a specific perfrmance measure shouldn't this one tell us the correct path:

treenode performanceMeasureNode = model().find("StayTime_PR_1");
string path = performanceMeasureNode.getPath();

here is the picture that script you had sent does not return any value and return an error below (i tried it with different ranks):

exception: FlexScript exception: invalid index at MAIN:/project/exec/consolescript c: <no path> i: <no path>

1725984442734.png

When defining these parameters for the reward function in the user command, how are current_inventory and target_inventory linked and getting updated to the function written in the user command? also, should i create a reward function in user commenad for each of those four items i'm intereted the agent to learn and take action for their quantity?

0 Likes
Message 8 of 8

moehlmann_fe
Observer
Observer
No, getting the path that way does not work. ".find()" only searches the direct subnodes of the node it is called on (here this is the model). Even if you used the function to search the entire tree recursively, there isn't actually a node with that name. The PFM name is the value of table cell/node.

The path you are using has an error in it.

...variables/PerformanceMeasures/1...

should be

...variables/performanceMeasures/1...

The parameters of a user command are just that: Parameters you pass into the function. So you would first get the current and target value and then run the function and pass those in.
treenode pfmValue = ...;
Variant currentInventory = ...;
double targetInventory = ...;
double reward = inventoryRewardCommand(currentInventory, targetInventory);

You can of course also get those values directly in the code of the user command and not use any parameters but that sort of defeats the purpose of using a user command.

You could create four different reward functions, each taking into account different stats of the model and then add their results together to get the final reward value that the RL agent will receive.

This might be a good approach because you could apply different factors to each value, influencing how much "weight" they have.

0 Likes