I don't think I understand what you're trying to do. The process flow is set up so that items always block. Whenever an item arrives at the DP, the process flow creates a token, and that token stops the item. But previously, you described a system where items only block sometimes, rather than always.
Whether the item is stopped or not, the photo eye fires its OnBlock event, because the block time is set to zero. The process flow creates a token when this happens, and that token moves the item to the queue. This is incorrect; you should not use a Move Object activity to remove an item from a conveyor. To get an item off a conveyor, you need to send the item to an exit transfer. This is why the second item doesn't flow past the decision point.
There are a couple smaller issues as well. Here is a fix for your code in the RL tool:
// == compares the two values
Model.parameters["count"].value == getlabelnum(Model.find("Queue2"),"count")
// = assigns the value, which is what you want
Model.parameters["count"].value = getlabelnum(Model.find("Queue2"),"count")
Also, the queue's OnExit trigger is incrementing a label on the item, when I think it should be incrementing the label on current.
Okay, so all that being said, I made this model:
testrl_1.fsm
It creates a token at the Decision Point. That token flows through a Decide, which randomly choses whether to stop the item or not.
If the decide stops the item, then after 3 seconds (the block time on the Photo Eye), the model creates a token for the On Block of the PE. That token resumes the item, and sends the item to an output queue, based on the Destination parameter.
The RL tool listens to the On Block of the photo eye, and creates an observation. That observation gets the content of the three buffer queues. Note that you don't need to increment/decrement a label; you can just get the object statistics. The RL tool also randomly picks a value for the Destination parameter, so the stopped items go to one of the three queues.
But this is unfinished. The actions taken by the RL don't currently affect the reward in any way. It's up to you to figure out what decision you want the AI to make, and when you want the AI to make that decision.