Machine random action

Machine random action

a9080109
Participant Participant
178 Views
19 Replies
Message 1 of 20

Machine random action

a9080109
Participant
Participant

[ FlexSim 22.0.16 ]

What I want to do now is to make the strategy adopted by my machine pull randomly. For example, I have four options (spt lpt fifo lifo). When I finish processing, I will randomly use one of these four actions. onerandom action.fsm

What I want to do now is that I want to use reinforcement learning to complete the optimal scheduling, but I don't know how to set up my observation space and action space.

1692723650962.png

0 Likes
Accepted solutions (1)
179 Views
19 Replies
Replies (19)
Message 2 of 20

natalie_white
Not applicable

Hi @mark zhen,

The tutorial on our site is similar and should be helpful.

Can you clarify your goal for this model? Are you saying that you can change methods every single time the processor pulls a flow item from the queue? This would be similar to our tutorial. Our tutorial, however, pulls an item based on type, and there is a clear pattern as to which type is best based on the observation of which type was last pulled. For reinforcement learning to work well, there needs to be a learnable pattern between the observation and the best action.

If you simply want to know which of the four sequencing methods is best for your model, perhaps using the experimenter would be better.

0 Likes
Message 3 of 20

a9080109
Participant
Participant

When I need to make the next item, I can use four scheduling rules to perform actions (such as spt lpt and other scheduling rules), and this action is random. I have read a document about his reward function It is determined that he will perform all four actions and learn the best results (for example, it takes less time, etc.), but what I want to do now is that the actions performed by my agent are based on the actions I gave His scheduling rules, and then learn the situation that each time step will think about which rule will be the best.

0 Likes
Message 4 of 20

a9080109
Participant
Participant

I have read and understood the teaching guide, but what I want to do now is that I want my agent to learn these traditional scheduling methods to explore new possibilities, or the four scheduling methods Integrate for a better scheduling result

0 Likes
Message 5 of 20

natalie_white
Not applicable
Accepted solution

This application of reinforcement learning is probably not realistic. Let me explain:

First, how will your model learn which rule is best? It needs some metric for "best," so you'd need to determine what that means (likely you'll want to minimize time or maximize throughput) and design a rewards system that will promote your objective.

Additionally, "best" is going to depend on the current state of your model. What exactly is it, in your model, that determines which rule is optimal? You need to be able to identify what that is and have your model observe it. This is your main problem. I don't know if there is a clear answer to this question, and if you aren't able to answer this question, then you can't successfully use reinforcement learning for your model.

In the tutorial, the best action to take (which type of item to pull next) is directly tied to the observation (which type of item was last pulled). Reinforcement learning requires a connection between the observation and the best action to take.

0 Likes
Message 6 of 20

a9080109
Participant
Participant

I think I have a completion rate. For example, I hope that my order can be completed within a certain time. If so, I will give him a +1 reward. If not, I will give him a -1 reward. This is my reward function. Statute. Then if I finish all three within the time, I will compare the total completion time, and give him 1 for the smallest one.

0 Likes
Message 7 of 20

natalie_white
Not applicable
The problem is that you cannot know which "rule" optimizes your completion rate. Your completion rate is affected by these two things: the item's type, and the type of the previous item. That's the point of the example in the tutorial.

You can't know which rule is best at each time step. You CAN know which item type is best to pull, but you don't know which rule will have you pull that item. At various points in your model run, a certain rule will pull different items.

0 Likes
Message 8 of 20

a9080109
Participant
Participant

But there are similar methods mentioned in the literature I read, but I want to complete it first (how should I write random actions)?

https://www.sciencedirect.com/science/article/pii/S0921889000000877

0 Likes
Message 9 of 20

joerg_vogel_HsH
Mentor
Mentor
This paper mentioned breakdowns and priority jobs as random events. A schedule depends on intervals of known input and demanded output over time. Even if you have a static sequence of products you can define a window width of time, which varies for each simulation setup. This window defines what your control mechanism knows about to decide on any order of production steps in your model.
0 Likes
Message 10 of 20

a9080109
Participant
Participant

Sorry, can you explain it more clearly? Then how do I implement the content in this document

0 Likes
Message 11 of 20

a9080109
Participant
Participant

Regarding random execution, my current thoughts are roughly as shown in the model,but it doesnt work

random-action.fsm

0 Likes
Message 12 of 20

moehlmann_fe
Participant
Participant
You cast a continuous value between 1 and 2 as an integer which removes the decimal places. As a result randomAction is always 1. Use the discrete uniform distribution.
0 Likes
Message 13 of 20

a9080109
Participant
Participant

@Felix Möhlmann

I have a new idea now, I write the method I need in several ways (such as 1= spt, 2 = lpt) and so on, so that my machine is like looking up a dictionary, I randomly look up a number and execute the number inside required content

Also, is that what you're talking about?1692865706857.png

0 Likes
Message 14 of 20

moehlmann_fe
Participant
Participant
If you set the type of randomAction to a double it will never be equal to 1 or 2. Leave it as an integer but generate a discrete number random number (duniform(1, 2)).
0 Likes
Message 15 of 20

a9080109
Participant
Participant

So now I want to use parameters to select a method, what should I do?

0 Likes
Message 16 of 20

moehlmann_fe
Participant
Participant

You already have code that chooses a logic depending on a numeric value. Instead of randomly generating that value, read it from a parameter.

https://docs.flexsim.com/en/21.1/ModelLogic/ModelParameters/ModelParameters.html

0 Likes
Message 17 of 20

a9080109
Participant
Participant

I don't quite understand~~~random action.fsm

0 Likes
Message 18 of 20

moehlmann_fe
Participant
Participant

Define a parameter and use it to control how the processor pulls the next part.

1693048321240.png

0 Likes
Message 19 of 20

Jeanette_Fullmer
Community Manager
Community Manager

Hi @mark zhen , was Natalie White's answer helpful? If so, please click the "Accept" button at the bottom of their answer. Or if you still have questions, add a comment and we'll continue the conversation.

If we haven't heard back from you within 3 business days we'll auto-accept an answer, but you can always comment back to reopen your question.

0 Likes
Message 20 of 20

a9080109
Participant
Participant

@Natalie White @Felix Möhlmann @Kavika F

I think I'm almost done, the state of my model now,

I want to define it as the number of deferred tickets but I'm a bit confused on how to do it?

And I define my actions I take six different actions

The part about the reward may be to minimize the tardness or to calculate the average of the overall tardness (but I don't know how to calculate the average in flexsim)

As for the label part, I have defined four labels in the source

ArrivalTime is the arrival time of the goods

date is the delivery time

total arrival total arrival time

mark the order in which goods enter

random-action_autosave.fsm

0 Likes