Community
Arnold GPU Forum
General discussions about GPU rendering with Arnold.
cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Arnold 7.0.0.3 crash with gpu rendering

1 REPLY 1
Reply
Message 1 of 2
Fuzzle_Snuz
1797 Views, 1 Reply

Arnold 7.0.0.3 crash with gpu rendering

Hi,

I've been experiencing a strange Arnold crash recently. The log gives only one error message:

[gpu] an error happened during rendering. OptiX error is: Unknown error (Details: Function "_rtContextLaunch2D" caught exception: Encountered a CUDA error: cudaDriver().CuEventSynchronize( m_event ) returned (702): Launch timeout, file: <internal>, line: 0)

It may be the related to or the same as what was reported here: https://answers.arnoldrenderer.com/questions/34635/arnold-6211-gpu-error-maya-2020-and-maya-2022.htm.... I found that thread while searching for the issue I'm experiencing, but in it Stephen tells us this error message is just a generic crash message, so I'm creating this new thread instead of adding to that one.


Anyways, I have a scene which reliably crashes with this error message when rendering. I am using MtoA 5.0.0.4 with Maya 2020.


What goes wrong

Partway through rendering, the graphics driver (472.12) abruptly crashes, so my screen goes black for a few seconds until the driver recovers. Once I can see my desktop again, I see that Maya has also crashed. The log file indicates Arnold crashed, per the aforementioned error message. It is not entirely clear if the graphics driver induced the Arnold crash or vice versa.


Controlling the crash

My scene includes some dense, layered fluid meshes with a translucent aiStandardSurface material. The crash is very reliably controlled by the material parameters of these meshes and the render resolution.
- When the fluid material uses the default aiStandardSurface parameters, the scene renders successfully every time.
- When the material has transmission and subsurface enabled, the render crashes at higher resolutions (>= 720), but succeeds at lower resolutions (<= 540).

I have attached two logs, one from a successful render attempt (aiStandardSurface defaults) and one from a crashed render attempt (intended material params). Some file paths and node names are redacted with 0000. I'm afraid I cannot share the scene file itself. I have also attached presets for the two aforementioned materials.


Workaround

I can get a successful render at higher resolutions by using the Arnold RenderView's crop tool to cut up the camera frame into a few vertical strips, the stitch them all together manually. Each of these crop chunks reveals only a portion the fluid meshes and does not crash while rendering. It works, but this kind of manual work is unviable for animations.

I have attached a third log, from one of these successful cropped renders.


Other notes

- The fluid meshes have 1 level of catmull-clark subdivision enabled on the Arnold side (not with Maya).
- The crash occurs both with and without the OptiX denoiser imager.
- The crash occurs regardless of whether I render in the Arnold RenderView, the Maya Render window, or with Maya's batch render function.
- The crash occurs regardless of the bucket size and pattern. I have tested combinations of size 16, 64, 256, and patterns spiral, top, and random.
- The crash occurs all the same when Autodetect Threads is enabled, when Threads is manually set to 1, and when Threads is manually set to -1.
- The crash occurs both with and without progressive refinement enabled.
- When progressive refinement is on, negative AA samples render without incident and are displayed in the Arnold RenderView. Once it starts on the positive AA samples, the RenderView stops updating and it eventually crashes.


Final Thoughts

The crash's dependency on the more complex material and render resolution is interesting. The logs do not indicate a lack of memory. The language of the generic error message, with phrases like "CuEventSynchronize" and "Launch timeout", is also interesting. Perhaps, as the ray calculations becomes more complex and numerous, Arnold blocks some important part of the graphics driver for too long and it crashes as a result.

I would imagine that bucket rendering exists in part to mitigate this, by partitioning and staggering the work to be more responsive. However, as already noted, changing the bucket size and threads options do not help. I should also note that with these complex gpu renders, the buckets in the RenderView don't update smoothly, one at a time, like with cpu rendering. Instead, the majority of the camera frame updates all at once, as if 90% of the buckets all came back simultaneously. Then, the RenderView hangs until the next batch comes in. This hang interval is always excessively long immediately before the crash occurs.

As noted in the Workaround section, cropped rendering with the RenderView seems to behave more like how I would expect buckets to work. If there is no diagnosis or fix for this crash, then it would be nice to have some kind of "superbucket" feature, where Arnold uses its cropped rendering ability to render the camera frame in chunks, thus only working on a few buckets at a time.

I am no Arnold developer, so please excuse me. I feel there is merit, however, the the theory that Arnold is overwhelming the gpu with too much work. Maybe the crash I am experiencing would not happen if the cuda side didn't keep its head in the sand until every bucket at the current AA level finished, and instead responded to "CuEventSynchronize" regularly.

Labels (5)
1 REPLY 1
Message 2 of 2
thiago.ize
in reply to: Fuzzle_Snuz

The error message states that this is a "Launch timeout". That means that your GPU took more than a certain amount of time to compute the pass and Windows thought that your GPU had hung and so it restarted the GPU. That's why the whole screen went black. It's also why simplifying your scene into strips lets it pass because now it takes less time to render.

Unfortunately neither Arnold nor your GPU had actually hung and if this timeout value were increased or completely removed you probably would not have had any issues. https://computergarage.org/video-tdr-failure.html (it was the first google hit for me) has suggestions for how to fix this. I'm not on a windows machine at the moment, so I can't confirm these steps. My suggestion is to first try just raising the TDR from the default 2s to 10 or maybe even 30s. If that isn't enough, you could completely disable it. I'll copy the relevant parts below:

 

How Do I Turn off TDR in Windows 10?

The TDR process should prevent the Video TDR Failure Blue screen, But it can in rare occasions be causing the crashes.

To turn off TDR in Windows 10 Follow the below steps

  • Click start and type in regedit and hit enter
  • Browse to HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\GraphicsDrivers
  • Double click on TdrDelay
  • Change the option from 2 to 10
  • Browse to HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\GraphicsDrivers
  • Double click on TdrLevel
  • Change the option from 1 to 0
  • Restart your machine

If you have an Nvidia or AMD display adapter there are a few more steps you need to perform

Nvidia

To disable TDR in Nvidia you need to do the following

  • Right-click the Nsight Monitor icon in the system tray
  • Click Options
  • Click general tab
  • Under WDDM TDR enabled, select False
  • Click OK
  • Restart your machine

How Can I Increase TDR?

You can increase the amount of time TDR takes to reset the display driver when it is hung which could stop the video TDR failure error.

To increase the TDR follow these steps

  • Click start and type in regedit and hit enter
  • Browse to HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\GraphicsDrivers
  • Double click on TdrDelay
  • Change the option from 2 to 8
  • Restart your machine

Can't find what you're looking for? Ask the community or share your knowledge.

Post to forums  

Autodesk Design & Make Report