Hi,
I've been doing some RND to evaluate Arnold GPU on an RTX4090, however rendering locally in Houdini 20 within the viewport/hydra... I'm getting a massive slow-down with Arnold taking over 1min to "updating gpu scene data" on every single progressive IPR pass. Same result on a RTX A4000.
As far as I can see: no system RAM, VRAM is getting loaded during this time.
Network and CPU/GPU usage is very low. No rendering is happening.
In this screen-grab it took close to 5mins just to finish the first 2 progressive passes, while most of that time is spent updating scene data without any rendering happening.
From some additional testing, I noticed that when I remove some geo grass blades (lowpoly) that are instanced to points... Arnold GPU drastically speeds up.
Example setup and poly count of grass.
This doesn't happen with Karma XPU, it's lightning quick as well as Redshift.
Unfortunately, I'm just using the default "Viewport Settings" and Arnold doesn't log to the Houdini Output Console. So I can't provide a quick log from that. But if I do get a log from a progressive render in the near future, I'll pass that along.
amwilkins
--
Houdini Core Version 20.0.590
Arnold Core: 7.2.5.2
HotA: 6.2.5.2
CPU: AMD Ryzen TR 3990x
RAM: 128GB
GPU: NVIDIA RTX 4090
OS: Windows
@am_wilkins wrote:
Unfortunately, I'm just using the default "Viewport Settings" and Arnold doesn't log to the Houdini Output Console. So I can't provide a quick log from that. But if I do get a log from a progressive render in the near future, I'll pass that along.
amwilkins
You can set the HDARNOLD_log_verbosity environment variable to get logs when you use the Viewport Settings as the Render Settings prim.
set HDARNOLD_log_verbosity=5
or
{"HDARNOLD_log_verbosity": [
{
"value": "4",
"method": "append"
}
]
},
5 would be Debug-level verbosity
Hi Stephen,
I added that variable here in Houdini:
While it appears to "work", I'm still not seeing the usual arnold logs I'm expecting to see:
###################################################################
# HDARNOLD_log_verbosity is overridden to '4'. Default is '2'. #
###################################################################
WARNING | ignoring parameter far_clip on /_UsdImaging_HdArnoldRendererPlugin_000000544D4973C0/camera (found bad float: -inf)
00:00:04 23478MB WARNING | [disp] /path/wall_geo_2/wall_geo_2Shape: padding is at least 10x smaller than it should be! given disp_padding: 0, recommended: 0.014384985
00:00:06 27254MB WARNING | [disp] /path/wall_geo_3/wall_geo_3Shape: padding is at least 10x smaller than it should be! given disp_padding: 0, recommended: 0.0132106543
00:00:11 29427MB WARNING | [disp] /path/wall_geo_1/wall_geo_1Shape: padding is at least 10x smaller than it should be! given disp_padding: 0, recommended: 0.013476491
ie. no arnold render report, scene creation times, texture infos, % completion, etc.
thanks,
amwilkins
Hi again,
I think I Houdini restart got it working. 👍🏻
Here is some logging (from just the last progressive render pass) that shows time spent on GPU::update_scene
00:05:00 18592MB | -----------------------------------------------------------------------------------
00:05:00 18592MB | scene creation time 0:01.26 machine utilization (35.44%)
00:05:00 18592MB | unaccounted 0:01.26
00:05:00 18592MB | -----------------------------------------------------------------------------------
00:05:00 18592MB | frame time 4:59.47 machine utilization (0.29%)
00:05:00 18592MB | node init 0:00.69
00:05:00 18592MB | gpu update 0:43.41
00:05:00 18592MB | rendering 0:05.93
00:05:00 18592MB | output driver 0:00.12
00:05:00 18592MB | pixel rendering 0:05.81 (23 sample iterations @ min: 0:00.012, avg: 0:00.255, max: 0:00.898)
00:05:00 18592MB | unaccounted 4:09.43
00:05:00 18592MB | -----------------------------------------------------------------------------------
00:05:00 18592MB | top session self-times by category
00:05:00 18592MB | AiMsgDebug 0:35.80 (81.28%)
00:05:00 18592MB | GPU::update_scene 0:01.87 ( 4.25%)
00:05:00 18592MB | thread blocked 0:01.77 ( 4.02%)
00:05:00 18592MB | UpdateNodes 0:00.87 ( 1.98%)
00:05:00 18592MB | node_update 0:00.84 ( 1.92%)
00:05:00 18592MB | subdivision 0:00.76 ( 1.74%)
00:05:00 18592MB | -----------------------------------------------------------------------------------
00:05:00 18592MB | top session self-times by node
00:05:00 18594MB | GPU::update_scene 0:01.87 ( 4.25%)
00:05:00 18594MB | UpdateNodes 0:00.87 ( 1.98%)
00:05:00 18594MB | InitializeNodes 0:00.69 ( 1.57%)
00:05:00 18594MB | PreInitializeProcs 0:00.36 ( 0.83%)
00:05:00 18594MB | polymesh:/Set/Name/Scatters/instancer_background_scatters.proto1_bush_geo_1Shape_id2 0:00.34 ( 0.78%)
00:05:00 18594MB | thread blocked 0:00.31 ( 0.71%)
00:05:00 18594MB | polymesh:/Set/Name/Scatters/instancer_background_scatters.proto3_bush_geo_1Shape_id2 0:00.34 ( 0.77%)
00:05:00 18594MB | thread blocked 0:00.30 ( 0.70%)
00:05:00 18594MB | -----------------------------------------------------------------------------------
amwilkins
Updated Arnold and Houdini to:
Houdini Core Version 20.0.653
Arnold Core: 7.3.1.0
HotA: 6.3.1.0
This log shows a "gpu jit compile" taking up some time, but generally IPR rendering is still incredibly slow.
Around 2mins per IPR pass to "update gpu scene data".
00:05:11 15299MB | -----------------------------------------------------------------------------------
00:05:11 15299MB | frame time 5:01.85 machine utilization (0.22%)
00:05:11 15299MB | license checkout time 0:00.03
00:05:11 15299MB | node init 0:00.63
00:05:11 15299MB | gpu update 0:43.94
00:05:11 15299MB | rendering 0:00.86
00:05:11 15299MB | output driver 0:00.07
00:05:11 15299MB | pixel rendering 0:00.79 (8 sample iterations @ min: 0:00.031, avg: 0:00.104, max: 0:00.324)
00:05:11 15299MB | unaccounted 4:16.37
00:05:11 15299MB | -----------------------------------------------------------------------------------
00:05:11 15299MB | top session self-times by category
00:05:11 15299MB | AiMsgDebug 1:04.55 (85.95%)
00:05:11 15299MB | thread blocked 0:03.10 ( 4.14%)
00:05:11 15299MB | GPU::update_scene 0:01.52 ( 2.04%)
00:05:11 15299MB | gpu jit compile 0:01.10 ( 1.47%)
00:05:11 15299MB | subdivision 0:00.67 ( 0.90%)
00:05:11 15299MB | node_update 0:00.67 ( 0.90%)
00:05:11 15299MB | -----------------------------------------------------------------------------------
00:05:11 15299MB | top session self-times by node
00:05:11 15299MB | GPU::update_scene 0:01.52 ( 2.04%)
00:05:11 15299MB | gpu jit compile 0:01.10 ( 1.47%)
00:05:11 15299MB | UpdateNodes 0:00.65 ( 0.87%)
00:05:11 15299MB | InitializeNodes 0:00.53 ( 0.71%)
00:05:11 15299MB | accumulateBucketSamples 0:00.42 ( 0.57%)
00:05:11 15299MB | polymesh:/Set/Name/Scatters/instancer_background_scatters.proto1_bush_geo_1Shape_id2 0:00.40 ( 0.53%)
00:05:11 15299MB | -----------------------------------------------------------------------------------
Here we get another log from the last IPR pass, showing a AOV driver taking some time?
00:08:02 15535MB | -----------------------------------------------------------------------------------
00:08:02 15535MB | top session self-times by category
00:08:02 15510MB | AiMsgDebug 1:04.55 (66.56%)
00:08:02 15510MB | driver_process_bucket 0:09.07 ( 9.36%)
00:08:02 15510MB | /HdArnoldRenderDelegate_00000045CC2DC100/HdArnoldRenderPass_aov_driver_1 0:06.20 ( 6.40%)
00:08:02 15510MB | accumulateBucketSamples 0:05.42 ( 5.59%)
00:08:02 15510MB | processGPUSample 0:04.79 ( 4.95%)
00:08:02 15510MB | thread blocked 0:03.55 ( 3.66%)
00:08:02 15510MB | GPU::update_scene 0:01.87 ( 1.94%)
00:08:02 15510MB | -----------------------------------------------------------------------------------
00:08:02 15510MB | top session self-times by node
00:08:02 15510MB | HdArnoldDriverAOV:/HdArnoldRenderDelegate_00000045CC2DC100/HdArnoldRenderPass_aov_driver_1 (driver_process_bucket) 0:06.20 ( 6.40%)
00:08:02 15510MB | accumulateBucketSamples 0:05.42 ( 5.59%)
00:08:02 15510MB | processGPUSample 0:04.79 ( 4.95%)
00:08:02 15510MB | GPU::update_scene 0:01.87 ( 1.94%)
00:08:02 15510MB | gpu jit compile 0:01.10 ( 1.14%)
00:08:02 15510MB | launch 0:00.94 ( 0.98%)
00:08:02 15510MB | -----------------------------------------------------------------------------------
Other logging, nothing looks wrong:
00:08:02 15510MB | -----------------------------------------------------------------------------------
00:08:02 15510MB | peak GPU memory consumed 9665.51MB
00:08:02 15510MB | CUDA context 159.97MB
00:08:02 15510MB | OptiX context 24.00MB
00:08:02 15510MB | framebuffers 187.02MB
00:08:02 15510MB | node overhead 925.13MB
00:08:02 15510MB | geometry 1998.81MB
00:08:02 15510MB | subdivs 1998.81MB
00:08:02 15510MB | accel structs 3067.58MB
00:08:02 15510MB | skydome importance map 7.80MB
00:08:02 15510MB | texture cache 322.06MB
00:08:02 15510MB | unaccounted 6035.74MB
00:08:02 15510MB | -----------------------------------------------------------------------------------
00:08:02 15510MB | ray counts ( /pixel, /sample) (% total) (avg. hits) (max hits)
00:08:02 15510MB | camera 273470933 ( 0.87, 1.00) ( 12.87%) ( 1.00) ( 1)
00:08:02 15510MB | shadow 1552431299 ( 4.94, 5.68) ( 73.07%) ( 0.70) ( 1)
00:08:02 15510MB | diffuse_reflect 230548798 ( 0.73, 0.84) ( 10.85%) ( 0.75) ( 1)
00:08:02 15510MB | specular_reflect 59092723 ( 0.19, 0.22) ( 2.78%) ( 0.88) ( 1)
00:08:02 15510MB | bssrdf 9173208 ( 0.03, 0.03) ( 0.43%) ( 0.23) ( 1)
00:08:02 15510MB | total 2124716961 ( 6.76, 7.77) (100.00%) ( 0.74) ( 1)
00:08:02 15510MB | by ray depth: 0 1 2 3
00:08:02 15510MB | total 59.2% 36.8% 4.0% 0.0%
00:08:02 15510MB | -----------------------------------------------------------------------------------
00:08:02 15510MB | geometry (% hit ) (instances) ( init mem, final mem)
00:08:02 15510MB | lists 1 ( 0.0%) ( 0) ( 0.00, 0.00)
00:08:02 15510MB | procs 27 ( 0.0%) ( 0) ( 2.17, 2.17)
00:08:02 15510MB | subdivs 1627 (100.0%) ( 284291) ( 569.08, 2682.32)
00:08:02 15510MB | -----------------------------------------------------------------------------------
00:08:02 15510MB | geometric elements ( min) ( avg.) ( max)
00:08:02 15510MB | objects (procs) 284291 ( 9) ( 10529.3) ( 17924)
00:08:02 15510MB | subdiv patches 14230441 ( 40) ( 8746.4) ( 608844)
00:08:02 15510MB | -----------------------------------------------------------------------------------
00:08:02 15510MB | triangle tessellation ( min) ( avg.) ( max) (/ element) (% total)
00:08:02 15510MB | subdivs 104298289 ( 148) ( 64104.7) ( 4730272) ( 7.33) (100.00%)
00:08:02 15510MB | iterations 1 66263776 ( 320) ( 2007993.2) ( 4730272) ( 7.77) ( 63.53%)
00:08:02 15510MB | iterations 2 33472 ( 1728) ( 2789.3) ( 5792) ( 31.82) ( 0.03%)
00:08:02 15510MB | adaptive 38001041 ( 148) ( 24020.9) ( 2862756) ( 6.67) ( 36.43%)
00:08:02 15510MB | unique triangles 104298289
00:08:02 15510MB | visible triangles 327592289
00:08:02 15510MB | CPU memory use 2682.32MB
Is it just GPU or is it CPU too?
What are you doing to force an IPR update? Just moving the camera?
CPU is totally fine. Doesn't really pause or wait on any scene/node updates.
No "Update, etc." in the logs at all.
This issue occurs just starting a new viewport/hydra render on Arnold GPU.
And yes, if it does eventually manage to finish then moving the camera or do anything that causing a scene change will start it again and we'd wait for the updating on every IPR pass again.
Seems somehow in relation to some lowres grass which is instanced to points.
With Grass:
Without Grass: (from scratch, starting new scene)
point cloud for instancer:
geo instanced to points:
amwilkins
Yeah this is frustrating. I ran into this same situation. Every time I updated any object within my scene it was taking forever on "updating GPU scene data". It looks to be something with instancing in general, as I am also using instanced grass meshes in my scene (from MASH), and when I turn them off it doesn't take that long. It's annoying that it does this even though I'm not changing anything to do with the grass, but on any other change at all in the scene.
Hi, could you try again with the recently released Arnold core 7.3.4? It contains this optimization (from release notes https://help.autodesk.com/view/ARNOL/ENU/?guid=arnold_core_7340_html)
Faster startup times on GPU: We have made several optimizations to improve startup and scene update time on GPU. These improvements will be most noticeable on scenes with many objects. Startup times on Windows are also now on par with startup times on Linux. On a scene with 1 million instances, we see the following improvements when compared to the previous release. (ARNOLD-15271, ARNOLD-15279, ARNOLD-15310)
7.3.3.1 | 7.3.4.0 | Improvement | |
---|---|---|---|
GPU update time (Windows) | 204s | 17s | 12X |
GPU update time (Linux) | 37s | 18s | 2X |
Try to instance the grass with Bifrost and USD. Arnold is a lot faster with USD data.
Can't find what you're looking for? Ask the community or share your knowledge.