Think about how important it actually is to keep a distance between the transporters. Any operation that involves a human in reality is never going to be 100% accurate in the simulation. If the minimum distance is relatively small, the error that gets introduced by not implementing that distance is not going to be large. Compare this to the simplifications you already made by using the travel network (a real forklift doesn't change its travel direction by 90° in an instant without slowing down).
If afterwards you still want to integrate the distance rule, define exactly what the behaviour should be. For example, you said that the second forklift wouldn't wait at the entrance of the aisle. This only makes sense if there is enough space in the aisle that the forklifts can pass each other or if one-way traffic is enforced.
Once you have a plan of how the behaviour should be, you can think about how a custom logic that enforces it could work. It would probably involve actually sending the forklifts to an exact network node instead of relying on the offset travel logic to move them close to the pickup location. This would enable you to log the destination of the forklifts and make a decision if and how far another forklift can be allowed to enter the same aisle.
Or you might come to the conclusion that a different travel method is actually preferable over the network nodes. For example, if the traffic through the aisles is one-way only, an AGV network will make controlling the distance easier.
Then estimate the amount of time it might take to implement this logic. Again, compare the amount of necessary work to the projected gain. If you deem it 'worth it', start building the logic into your model.