Preformance enhancement for DelayNewJobExecutionInSeconds

ErhardKlamminger · ‎06-28-2022

Because of the new replication behavior between publisher and subscriber, the setting DelayNewJobExecutionInSeconds has been added by Autodesk to delay a job for defined seconds to be save, that the subscriber has available the latest data from publisher.

Current Situation:

Each job is delayed. For a higher amount of jobs, the running summary time is much more higher until all jobs are done. So a longer waiting time for business.

Change request:

The setting of DelayNewJobExecutionInSeconds should only be used if the time difference of job´s submitted date and job processor´s start date is less of the defined delay.

In other words:

If the queue is empty and new job is created and starts immediately, delay is used.

If queue has 10 jobs and new one is created, it is running immediately through after the 10 jobs before are done.

I guess this change request has a high benefit for all.

Thank you

gilsdorf_e · ‎06-30-2022

Since the migration from merge replication to transaction replication, jobserver maintenance is a full time job.

Although we have set the Delay property to 10 seconds, which is great most of the time, it will happen several times during the day that jobs are replicated faster than the corresponding files.

That results in all kinds of "File not existent", "File could not be read" etc. errors. You have to re-submit the job to make it work.

Our dear user do not care usually and the SAP transfer is triggered by releasing Vault items. So you manually have to take the item in Quick-Change, update and release again, once the job is sucessfull. This is cumbersome.

As far as I understand, jobs reside in the Knowledge Vault Master, while the files are maintained in the Vault database.

Is it possible to work on that mechanism to make sure jobs are only started if the referenced file is ready?

I cannot set the Delay to even higher values. It already blocks our jobprocessors at rush hours, because every job takes 10 seconds longer than before.

ihayesjr · ‎08-19-2022

Thank you for posting the Idea. We recently implemented an Automatic Job Retry feature which should address this idea. If not, we can reopen this idea.

thomas_jamnik · ‎09-01-2022

Hello @ihayesjr

The automatic job retry is a great idea but has also some downside of it. If a job fails the automatic retry of course will restart it again but at the end of the queue. We have very often a high queue so that will mean this failed job because of the lackage of the low replication and missing data will be executed very late. Means if the job queue is empty.
So from business point of view the current provided solutions have two downsides

delaying of each job of the delay time (Solution could be as from Erhard mentioned -> The setting of DelayNewJobExecutionInSeconds should only be used if the time difference of job´s submitted date and job processor´s start date is less of the defined delay.
the automatic job retry feature is a good feature but does not really help us because on the one hand it should solve failed jobs because of the replication delay and on the other hand it makes no sense to retry jobs which needs to be repaired manually.

The suggested solution point 1 will really help us to execute jobs much faster and avoid so some unnecessary waiting time for our business. And we have discussions about this topic on regulary base.

BR, Thomas

ihayesjr · ‎09-01-2022

@thomas_jamnik

Thank you for the feedback.

However, there could still be some delay in the files being replicated to the location they need to be. So even with a delay for new jobs, the job could still fail. Therefore, you would be in a race condition between choosing the correct delay time and replication.

thomas_jamnik · ‎09-01-2022

Hello @ihayesjr

We don't want to have a delay for every jobs only for new created one. The delay should not work and executed if the job start time is higher as job submited time + delay.

Just for example:

Delay time : 10 seconds

10 jobs and each job will need 20 seconds and have been submited at the same time.

So that will mean:

- first job starts with a delay of 10 seconds. Sumery time for fist job will be than 30 seconds

- following 9 jobs will be outside of delay so can be executed immediately. summary time of each job will be 20 seconds.

Full execution time for all jobs will be 210 seconds instead 300 seconds with current solution. If you think about it that we have round about 1000 jobs and more every day this will be an huge difference in performance.

BR, Thomas

ihayesjr · ‎09-01-2022

@thomas_jamnik

What if we only delayed the retries but put the retries back at the top of the queue? Therefore, if a new job could be executed immediately, there isn't a delay. If a Job fails, we put it at the top of the queue with a ## second delay.

thomas_jamnik · ‎09-01-2022

@ihayesjr

Thank you for the suggeston but I am not sure that this suggestion will meet our requirements. Because this will mean we will have always failed, retried jobs with a delay. So I come back to my original suggestion, wish.

BR, Thomas

ihayesjr · ‎09-01-2022

@thomas_jamnik

You are assuming you will find a delay time that will always work. If you have to increase the delay to 30 seconds, Jobs that may not have to wait will get delayed, and now Jobs are not getting executed as fast as they can.

The automatic retry will not show as a failure; it will show the job waiting to be processed.

gilsdorf_e · ‎09-01-2022

Just to come back to my original post:

Could you think of solving all replication-caused problems with jobs by comparing the timestamp of the last replication cycle of the Vault database to the job creation timestamp?

If the Vault database has not been replicated since the job was created in the KVM database, then the job should be postponed.

PS: I support the idea of auto-retry for other reason. Sometime we have failing jobs which will miraculously succeed, once they are restarted (not related to replication).

ihayesjr · ‎09-01-2022

If we tried that comparison, we are assuming that the Job data was in the replicated packet of information. Replication is sending more than just jobs in the queue. The Job data could still be in the queue to be replicated in the next set of replicated data.

ihayesjr · ‎09-01-2022

I am reopening the idea based on the recent conversation.

Thanks a lot for the feedback.

Preformance enhancement for DelayNewJobExecutionInSeconds

Preformance enhancement for DelayNewJobExecutionInSeconds

Forums Links

Submit Idea