FlexSim reinforcement learning

FlexSim reinforcement learning

a9080109
Observer Observer
73 Views
12 Replies
Message 1 of 13

FlexSim reinforcement learning

a9080109
Observer
Observer

[ FlexSim 23.0.0 ]

I want to know some details about models and reinforcement learning

For example, episode corresponds to the meaning represented by the model

Or what timestep corresponds to in the model. Because I wanted to understand why I had a spike in rewards at the beginning when I was training.

1697448272978.png

0 Likes
Accepted solutions (1)
74 Views
12 Replies
Replies (12)
Message 2 of 13

joerg_vogel_HsH
Mentor
Mentor
Probably a division by zero or near zero.
0 Likes
Message 3 of 13

a9080109
Observer
Observer

So how do I solve this problem in the model

0 Likes
Message 4 of 13

joerg_vogel_HsH
Mentor
Mentor
Accepted solution

A quick and dirty way would be to work with a warmup time or you transmit rewards a bit later in your model runtime.

0 Likes
Message 5 of 13

a9080109
Observer
Observer

I don't quite understand the meaning of warm up and what impact it will have on the model.1697540691717.png0926.fsm

0 Likes
Message 6 of 13

jason_lightfootVL7B4
Autodesk
Autodesk

You can find the warmup description in the online documentation. It could somehow influence the timesteps in question if your rewards are based on model statistics. I'm not sure if it would explain the spike in your graph.

0 Likes
Message 7 of 13

jason_lightfootVL7B4
Autodesk
Autodesk
I believe the term 'time-step' comes from the action/reward step in a Markov Decision Process, and their number is aligned to the number of cycles of action->simulate->observe->reward within your episode.
0 Likes
Message 8 of 13

a9080109
Observer
Observer

Thanks but I have other questions,

For example, what is the meaning of the flexsim model corresponding to each timestep?

What does each epoch and episode mean?

0 Likes
Message 9 of 13

jason_lightfootVL7B4
Autodesk
Autodesk

As I said above I believe the term 'time-step' comes from the action/reward step in a Markov Decision Process, and their number is aligned to the number of cycles of action->simulate->observe->reward within your episode.

An episode seems to be a series of timesteps resulting in a terminal state. I would expect that's often equivalent to a model run.

In the ML world is seems this doesn't have to be the case in that you could define an episode as 11 hands of cards and the terminal state is having played them all since the target outcome is to win the majority. The same could be true when applied to simulations in that you might define an episode as n replications and you might be interested in achieving a certain kpi in 95% of them. I'm not yet sure if FlexSim supports this.

The definition of epochs seems to vary a little but most talk about one pass through a training dataset.

Note: I'm new to ML/RL and trying to get up to speed, so may need correcting on this. You can probably search the internet and find all the definitions.


0 Likes
Message 10 of 13

a9080109
Observer
Observer

My new problem now is that I used warm up but there was a problem with my sink data.1697610959985.png1697611037195.png

I guess it’s because I have WARM UP but my SINK calculation is still complete and lateness is required to do the calculation.

current.avgTardiness = (current.lateness/current.stats.input.value);

0926.fsm

0 Likes
Message 11 of 13

kavika_faleumu
Autodesk
Autodesk
@mark zhen, where did you get this graph? Did you plot this from python or FlexSim?
Message 12 of 13

a9080109
Observer
Observer
@Kavika F @Jason Lightfoot so can u know how to solve the problems
0 Likes
Message 13 of 13

Jeanette_Fullmer
Community Manager
Community Manager

Hi @mark zhen ,

Were you able to solve your problem? If so, please add and accept an answer to let others know the solution. Or please respond to the previous comment so that we can continue to help you.

If we don't hear back in the next 3 business days, we'll assume you were able to solve your problem and we'll close this case in our tracker. You can always comment back at any time to reopen your question, or you can contact your local FlexSim distributor for phone or email help.

0 Likes