Community
Arnold GPU Forum
General discussions about GPU rendering with Arnold.
cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Arnold GPU - Taking long to "updating gpu scene data" on every progressive pass

9 REPLIES 9
Reply
Message 1 of 10
am_wilkins
778 Views, 9 Replies

Arnold GPU - Taking long to "updating gpu scene data" on every progressive pass

Hi,

 

I've been doing some RND to evaluate Arnold GPU on an RTX4090, however rendering locally in Houdini 20 within the viewport/hydra... I'm getting a massive slow-down with Arnold taking over 1min to "updating gpu scene data" on every single progressive IPR pass. Same result on a RTX A4000.

 

As far as I can see: no system RAM, VRAM is getting loaded during this time.

Network and CPU/GPU usage is very low. No rendering is happening.

4090_arnold_gpu_01.png

4090_arnold_gpu_02.png

 

In this screen-grab it took close to 5mins just to finish the first 2 progressive passes, while most of that time is spent updating scene data without any rendering happening.

4090_arnold_gpu_03.png

 

From some additional testing, I noticed that when I remove some geo grass blades (lowpoly) that are instanced to points... Arnold GPU drastically speeds up.

Example setup and poly count of grass.

instancer_grass.pnginstancer_grass_02.png

 

This doesn't happen with Karma XPU, it's lightning quick as well as Redshift.

 

Unfortunately, I'm just using the default "Viewport Settings" and Arnold doesn't log to the Houdini Output Console. So I can't provide a quick log from that. But if I do get a log from a progressive render in the near future, I'll pass that along.

 

 

amwilkins

--

Houdini Core Version 20.0.590
Arnold Core: 7.2.5.2

HotA: 6.2.5.2

CPU: AMD Ryzen TR 3990x

RAM: 128GB
GPU: NVIDIA RTX 4090
OS: Windows

9 REPLIES 9
Message 2 of 10
Stephen.Blair
in reply to: am_wilkins


@am_wilkins wrote:

 

Unfortunately, I'm just using the default "Viewport Settings" and Arnold doesn't log to the Houdini Output Console. So I can't provide a quick log from that. But if I do get a log from a progressive render in the near future, I'll pass that along.

 

 

amwilkins

 


You can set the HDARNOLD_log_verbosity environment variable to get logs when you use the Viewport Settings as the Render Settings prim.

 

set HDARNOLD_log_verbosity=5

or

        {"HDARNOLD_log_verbosity": [
		                {
		                    "value": "4",
		                    "method": "append"
		                }
		            ]
        },

 

 

5 would be Debug-level verbosity



// Stephen Blair
// Arnold Renderer Support
Message 3 of 10
am_wilkins
in reply to: Stephen.Blair

Hi Stephen,

 

I added that variable here in Houdini:

variables_01.png

 

While it appears to "work", I'm still not seeing the usual arnold logs I'm expecting to see:

###################################################################
#  HDARNOLD_log_verbosity is overridden to '4'.  Default is '2'.  #
###################################################################
WARNING |  ignoring parameter far_clip on /_UsdImaging_HdArnoldRendererPlugin_000000544D4973C0/camera (found bad float: -inf)
00:00:04 23478MB WARNING |   [disp] /path/wall_geo_2/wall_geo_2Shape: padding is at least 10x smaller than it should be! given disp_padding: 0, recommended: 0.014384985
00:00:06 27254MB WARNING |   [disp] /path/wall_geo_3/wall_geo_3Shape: padding is at least 10x smaller than it should be! given disp_padding: 0, recommended: 0.0132106543
00:00:11 29427MB WARNING |   [disp] /path/wall_geo_1/wall_geo_1Shape: padding is at least 10x smaller than it should be! given disp_padding: 0, recommended: 0.013476491

ie. no arnold render report, scene creation times, texture infos, % completion, etc.

 

 

thanks,

amwilkins

Message 4 of 10
am_wilkins
in reply to: Stephen.Blair

Hi again,

 

I think I Houdini restart got it working. 👍🏻

 

Here is some logging (from just the last progressive render pass) that shows time spent on GPU::update_scene

 

00:05:00 18592MB         | -----------------------------------------------------------------------------------
00:05:00 18592MB         | scene creation time          0:01.26    machine utilization (35.44%) 
00:05:00 18592MB         |  unaccounted                 0:01.26 
00:05:00 18592MB         | -----------------------------------------------------------------------------------
00:05:00 18592MB         | frame time                   4:59.47    machine utilization (0.29%) 
00:05:00 18592MB         |  node init                   0:00.69 
00:05:00 18592MB         |  gpu update                  0:43.41 
00:05:00 18592MB         |  rendering                   0:05.93 
00:05:00 18592MB         |   output driver              0:00.12 
00:05:00 18592MB         |   pixel rendering            0:05.81 (23 sample iterations @ min: 0:00.012, avg: 0:00.255, max: 0:00.898)
00:05:00 18592MB         |  unaccounted                 4:09.43 
00:05:00 18592MB         | -----------------------------------------------------------------------------------
00:05:00 18592MB         | top session self-times by category 
00:05:00 18592MB         |  AiMsgDebug                                                         0:35.80 (81.28%) 
00:05:00 18592MB         |  GPU::update_scene                                                  0:01.87 ( 4.25%) 
00:05:00 18592MB         |  thread blocked                                                     0:01.77 ( 4.02%) 
00:05:00 18592MB         |  UpdateNodes                                                        0:00.87 ( 1.98%) 
00:05:00 18592MB         |  node_update                                                        0:00.84 ( 1.92%) 
00:05:00 18592MB         |  subdivision                                                        0:00.76 ( 1.74%) 
00:05:00 18592MB         | -----------------------------------------------------------------------------------
00:05:00 18592MB         | top session self-times by node 
00:05:00 18594MB         |  GPU::update_scene                                                  0:01.87 ( 4.25%) 
00:05:00 18594MB         |  UpdateNodes                                                        0:00.87 ( 1.98%) 
00:05:00 18594MB         |  InitializeNodes                                                    0:00.69 ( 1.57%) 
00:05:00 18594MB         |  PreInitializeProcs                                                 0:00.36 ( 0.83%) 
00:05:00 18594MB         |  polymesh:/Set/Name/Scatters/instancer_background_scatters.proto1_bush_geo_1Shape_id2     0:00.34 ( 0.78%) 
00:05:00 18594MB         |   thread blocked                                                    0:00.31 ( 0.71%) 
00:05:00 18594MB         |  polymesh:/Set/Name/Scatters/instancer_background_scatters.proto3_bush_geo_1Shape_id2     0:00.34 ( 0.77%) 
00:05:00 18594MB         |   thread blocked                                                    0:00.30 ( 0.70%) 
00:05:00 18594MB         | -----------------------------------------------------------------------------------

 

 

 

amwilkins

 

Message 5 of 10
am_wilkins
in reply to: am_wilkins

Updated Arnold and Houdini to:

 

Houdini Core Version 20.0.653
Arnold Core: 7.3.1.0

HotA: 6.3.1.0

 

This log shows a "gpu jit compile" taking up some time, but generally IPR rendering is still incredibly slow.
Around 2mins per IPR pass to "update gpu scene data".

00:05:11 15299MB         | -----------------------------------------------------------------------------------
00:05:11 15299MB         | frame time                   5:01.85    machine utilization (0.22%) 
00:05:11 15299MB         |  license checkout time       0:00.03 
00:05:11 15299MB         |  node init                   0:00.63 
00:05:11 15299MB         |  gpu update                  0:43.94 
00:05:11 15299MB         |  rendering                   0:00.86 
00:05:11 15299MB         |   output driver              0:00.07 
00:05:11 15299MB         |   pixel rendering            0:00.79 (8 sample iterations @ min: 0:00.031, avg: 0:00.104, max: 0:00.324)
00:05:11 15299MB         |  unaccounted                 4:16.37 
00:05:11 15299MB         | -----------------------------------------------------------------------------------
00:05:11 15299MB         | top session self-times by category 
00:05:11 15299MB         |  AiMsgDebug                                                         1:04.55 (85.95%) 
00:05:11 15299MB         |  thread blocked                                                     0:03.10 ( 4.14%) 
00:05:11 15299MB         |  GPU::update_scene                                                  0:01.52 ( 2.04%) 
00:05:11 15299MB         |  gpu jit compile                                                    0:01.10 ( 1.47%) 
00:05:11 15299MB         |  subdivision                                                        0:00.67 ( 0.90%) 
00:05:11 15299MB         |  node_update                                                        0:00.67 ( 0.90%) 
00:05:11 15299MB         | -----------------------------------------------------------------------------------
00:05:11 15299MB         | top session self-times by node 
00:05:11 15299MB         |  GPU::update_scene                                                  0:01.52 ( 2.04%) 
00:05:11 15299MB         |  gpu jit compile                                                    0:01.10 ( 1.47%) 
00:05:11 15299MB         |  UpdateNodes                                                        0:00.65 ( 0.87%) 
00:05:11 15299MB         |  InitializeNodes                                                    0:00.53 ( 0.71%) 
00:05:11 15299MB         |  accumulateBucketSamples                                            0:00.42 ( 0.57%) 
00:05:11 15299MB         |  polymesh:/Set/Name/Scatters/instancer_background_scatters.proto1_bush_geo_1Shape_id2     0:00.40 ( 0.53%) 
00:05:11 15299MB         | -----------------------------------------------------------------------------------

 

Here we get another log from the last IPR pass, showing a AOV driver taking some time?

00:08:02 15535MB         | -----------------------------------------------------------------------------------
00:08:02 15535MB         | top session self-times by category 
00:08:02 15510MB         |  AiMsgDebug                                                         1:04.55 (66.56%) 
00:08:02 15510MB         |  driver_process_bucket                                              0:09.07 ( 9.36%) 
00:08:02 15510MB         |   /HdArnoldRenderDelegate_00000045CC2DC100/HdArnoldRenderPass_aov_driver_1     0:06.20 ( 6.40%) 
00:08:02 15510MB         |  accumulateBucketSamples                                            0:05.42 ( 5.59%) 
00:08:02 15510MB         |  processGPUSample                                                   0:04.79 ( 4.95%) 
00:08:02 15510MB         |  thread blocked                                                     0:03.55 ( 3.66%) 
00:08:02 15510MB         |  GPU::update_scene                                                  0:01.87 ( 1.94%) 
00:08:02 15510MB         | -----------------------------------------------------------------------------------
00:08:02 15510MB         | top session self-times by node 
00:08:02 15510MB         |  HdArnoldDriverAOV:/HdArnoldRenderDelegate_00000045CC2DC100/HdArnoldRenderPass_aov_driver_1 (driver_process_bucket)     0:06.20 ( 6.40%) 
00:08:02 15510MB         |  accumulateBucketSamples                                            0:05.42 ( 5.59%) 
00:08:02 15510MB         |  processGPUSample                                                   0:04.79 ( 4.95%) 
00:08:02 15510MB         |  GPU::update_scene                                                  0:01.87 ( 1.94%) 
00:08:02 15510MB         |  gpu jit compile                                                    0:01.10 ( 1.14%) 
00:08:02 15510MB         |  launch                                                             0:00.94 ( 0.98%) 
00:08:02 15510MB         | -----------------------------------------------------------------------------------

 

Other logging, nothing looks wrong:

00:08:02 15510MB         | -----------------------------------------------------------------------------------
00:08:02 15510MB         | peak GPU memory consumed 9665.51MB 
00:08:02 15510MB         |  CUDA context             159.97MB 
00:08:02 15510MB         |  OptiX context             24.00MB 
00:08:02 15510MB         |  framebuffers             187.02MB 
00:08:02 15510MB         |  node overhead            925.13MB 
00:08:02 15510MB         |  geometry                1998.81MB 
00:08:02 15510MB         |   subdivs                1998.81MB 
00:08:02 15510MB         |  accel structs           3067.58MB 
00:08:02 15510MB         |  skydome importance map     7.80MB 
00:08:02 15510MB         |  texture cache            322.06MB 
00:08:02 15510MB         |  unaccounted             6035.74MB 
00:08:02 15510MB         | -----------------------------------------------------------------------------------
00:08:02 15510MB         | ray counts                       ( /pixel, /sample) (% total) (avg. hits) (max hits)
00:08:02 15510MB         |  camera                273470933 (   0.87,    1.00) ( 12.87%) (     1.00) (       1)
00:08:02 15510MB         |  shadow               1552431299 (   4.94,    5.68) ( 73.07%) (     0.70) (       1)
00:08:02 15510MB         |  diffuse_reflect       230548798 (   0.73,    0.84) ( 10.85%) (     0.75) (       1)
00:08:02 15510MB         |  specular_reflect       59092723 (   0.19,    0.22) (  2.78%) (     0.88) (       1)
00:08:02 15510MB         |  bssrdf                  9173208 (   0.03,    0.03) (  0.43%) (     0.23) (       1)
00:08:02 15510MB         |  total                2124716961 (   6.76,    7.77) (100.00%) (     0.74) (       1)
00:08:02 15510MB         | by ray depth:              0     1     2     3
00:08:02 15510MB         |  total                   59.2% 36.8%  4.0%  0.0% 
00:08:02 15510MB         | -----------------------------------------------------------------------------------
00:08:02 15510MB         | geometry                         (% hit ) (instances) (  init mem,  final mem)
00:08:02 15510MB         |  lists                         1 (  0.0%) (        0) (      0.00,       0.00)
00:08:02 15510MB         |  procs                        27 (  0.0%) (        0) (      2.17,       2.17)
00:08:02 15510MB         |  subdivs                    1627 (100.0%) (   284291) (    569.08,    2682.32)
00:08:02 15510MB         | -----------------------------------------------------------------------------------
00:08:02 15510MB         | geometric elements               (      min) (       avg.) (      max)
00:08:02 15510MB         |  objects (procs)          284291 (        9) (    10529.3) (    17924)
00:08:02 15510MB         |  subdiv patches         14230441 (       40) (     8746.4) (   608844)
00:08:02 15510MB         | -----------------------------------------------------------------------------------
00:08:02 15510MB         | triangle tessellation            (      min) (       avg.) (      max) (/ element) (% total)
00:08:02 15510MB         |  subdivs               104298289 (      148) (    64104.7) (  4730272) (     7.33) (100.00%)
00:08:02 15510MB         |   iterations 1          66263776 (      320) (  2007993.2) (  4730272) (     7.77) ( 63.53%)
00:08:02 15510MB         |   iterations 2             33472 (     1728) (     2789.3) (     5792) (    31.82) (  0.03%)
00:08:02 15510MB         |   adaptive              38001041 (      148) (    24020.9) (  2862756) (     6.67) ( 36.43%)
00:08:02 15510MB         |  unique triangles      104298289 
00:08:02 15510MB         |  visible triangles     327592289 
00:08:02 15510MB         |  CPU memory use          2682.32MB 

 

 

Message 6 of 10
Stephen.Blair
in reply to: am_wilkins

Is it just GPU or is it CPU too?

What are you doing to force an IPR update? Just moving the camera?



// Stephen Blair
// Arnold Renderer Support
Message 7 of 10
am_wilkins
in reply to: Stephen.Blair

CPU is totally fine. Doesn't really pause or wait on any scene/node updates.

No "Update, etc." in the logs at all.

 

This issue occurs just starting a new viewport/hydra render on Arnold GPU.

And yes, if it does eventually manage to finish then moving the camera or do anything that causing a scene change will start it again and we'd wait for the updating on every IPR pass again.

 

Seems somehow in relation to some lowres grass which is instanced to points.

With Grass:

slow_gpu_update.gif

 

Without Grass: (from scratch, starting new scene)

slow_gpu_update_no_grass.gif

 

 

point cloud for instancer:

points.png

geo instanced to points:

points_grass.png

 

amwilkins

Message 8 of 10
paul
in reply to: am_wilkins

Yeah this is frustrating. I ran into this same situation. Every time I updated any object within my scene it was taking forever on "updating GPU scene data". It looks to be something with instancing in general, as I am also using instanced grass meshes in my scene (from MASH), and when I turn them off it doesn't take that long. It's annoying that it does this even though I'm not changing anything to do with the grass, but on any other change at all in the scene.

Message 9 of 10
thiago.ize
in reply to: am_wilkins

Hi, could you try again with the recently released Arnold core 7.3.4? It contains this optimization (from release notes https://help.autodesk.com/view/ARNOL/ENU/?guid=arnold_core_7340_html)

 

Faster startup times on GPU: We have made several optimizations to improve startup and scene update time on GPU. These improvements will be most noticeable on scenes with many objects. Startup times on Windows are also now on par with startup times on Linux. On a scene with 1 million instances, we see the following improvements when compared to the previous release. (ARNOLD-15271, ARNOLD-15279, ARNOLD-15310)

  7.3.3.1 7.3.4.0 Improvement
GPU update time (Windows) 204s 17s 12X
GPU update time (Linux) 37s 18s 2X
Message 10 of 10
Christoph_Schaedl
in reply to: paul

Try to instance the grass with Bifrost and USD. Arnold is a lot faster with USD data.

----------------------------------------------------------------
https://linktr.ee/cg_oglu

Can't find what you're looking for? Ask the community or share your knowledge.

Post to forums  

Autodesk Design & Make Report