Solved: Performance impact of multiple identical bifrost graphs

dominik6UUXE · ‎05-19-2023

EDIT: attached a demo file + performance readings for said file

I have a reasonably complex bifrost setup that builds geometry procedural. A single graph in an empty scene runs at max fps (my system is capped at 60). With 3 of the same graph in the scene, this drops to ~48fps. With 10 it goes all the way down to ~10fps.

The profiler tells me the bif graph evaluation takes 14ms with a single graph and 26ms avg with 3 graphs in the scene. That evaluation happens in parallel so I had assumed the bottleneck was coming from the BifVP_Translation as that looks to be serial evaluation. However, with 10 graphs, each graph execution went up to ~67ms avg. The translation went from taking roughly 15% to 30% which I can attribute to the serial evaluation.

But, I dont understand why the evaluation time of a single graph increases (significantly). Plotting these values gives me a linear increase of evaluation time per bifrost graph for each additional duplicate of that bifrost graph in the scene even though.

I ran more tests after I wrote this first part with similar results. The individual times changes slightly but every batch of tests gave me a linear plot in the end. Referencing the compound on the inside did not make a difference. I tried with 2.3.0.0 and 2.7.1.0. I didnt observe this before when using copies of the same graph several times in the rig although those generally output floats and matrices, not geometry.

Can you shed some light on whats going on here? Has anyone observed something similar?

---

Again I can confirm that all evaluations happen in parallel on different threads (according to the profiler).

Below are my performance plots for the demo file. Putting it into a graph gives a linear time increase for each visible graph during the execution.

bifrostGraphs visible: 1
    fps: ~60 (capped)
    graph execution: 14.39ms
    Backend::execute: 14.39ms
    BifrostGraph::Container::compute_impl: 14.47ms

bifrostGraphs visible: 1,2
    fps: ~45
    graph execution: 18.50ms
    Backend::execute: 18.50ms
    BifrostGraph::Container::compute_impl: 18.58ms

bifrostGraphs visible: 1,2,3,4
    fps: ~31
    graph execution: 25.30ms
    Backend::execute: 25.30ms
    BifrostGraph::Container::compute_impl: 25.42ms

bifrostGraphs visible: 1,2,3,4,5,6,7,8
    fps: ~18.7
    graph execution: 40.25ms
    Backend::execute: 40.25ms
    BifrostGraph::Container::compute_impl: 40.44ms

bifrostGraphs visible: 1,2,3,4,5,6,7,8,9,10,11,12,13
    fps: ~12
    graph execution: 60.35ms
    Backend::execute: 60.35ms
    BifrostGraph::Container::compute_impl: 60.75ms

.

sepu6 · ‎05-23-2023

I developer will have a better answer here, but is gonna be tough to get the right answer without seeing what you have on that graph. Maybe try to share a repro graph and see if that happens and share it here.

dominik6UUXE · ‎05-24-2023

You are right! Just updated the post, thanks!

morten.bojsen-hansen · ‎05-26-2023

@dominik6UUXE I haven't looked at your scene file, but if you are doing any parallel work in each Bifrost graph I would expect each graph to take linearly longer time to evaluate if one graph is already using up all your cores?

dominik6UUXE · ‎05-26-2023

@morten.bojsen-hansen That would make sense and brings up a good point, I can try to optimize my nodes using for eachs instead of iterates.

In my example, I have an iterate that transforms and merges the geometry inside. In terms of cpu usage, I can only go by what the profiler tells me. Is there any documentation or debug mode to see bifrosts resource usage other than the profiler? (Or a nice write up like there is for parallel evaluation mode 🙂 )

Unless the transform_points is expensive, I fail to see what could be the cause of the performance hit... (In this example, I cranked up the number of points to transform quite a bit to get results faster)

(Adding to that thought, I just replaced the transform_points with an auto loop matrix multiply solution to see if there is some non-essential overhead and I got a 10% performance boost in FPS relative between transform_points and custom, the execution time still increases linearly though).

Below are two screenshots of my graph

(Edit: updated screenshot)

morten.bojsen-hansen · ‎05-26-2023

Lots of the nodes in your graph, including `transform_points`, already process your geometry in parallel so I highly doubt adding more bifrost graphs is going to improve performance, and might even hurt performance a bit since there is more overhead. Can I ask, why you think this setup will improve performance?

dominik6UUXE · ‎05-26-2023

Its what the transform_points does under the hood, I thought, without the additional features like weights, orient/normal blending etc that I am not using. I did just notice internally its doing a matrix*vec4, not matrix*matrix like in my case. transform_points is a reference on my end, so wouldnt that 'unpack' to all the nodes inside?

morten.bojsen-hansen · ‎05-26-2023

@dominik6UUXE No, I mean, why are you splitting up the processing into separate bifrost graphs rather than doing it all in one graph.

dominik6UUXE · ‎05-26-2023

I had set it up as a reuseable component in my scene and hoped Maya's parallel evaluation would be faster than the bifrost graph. I'll restructure it and report back with my findings. Thanks 🙂

morten.bojsen-hansen · ‎05-26-2023

That would be interesting to hear about. I suspect just using Bifrost's parallelism would be the fastest.

dominik6UUXE · ‎05-31-2023

@morten.bojsen-hansen You suspect right! I wrapped my example into a for-each and compared the results of having n-dag nodes to n iterations in the for-each (results below).

Is there any chance we can get Bifrost resources more accurately represented in the Profiler?

Also, is the time for multiple graphs longer because:

- bifrosts internal parallelization takes up more resources and therefore other graphs wont have those available?

- there is a not insignificant, static, per bifrost dag node overhead (eg converting maya-data into bifrost-data and back again) that now has to happen n times more?

Or maybe both?

n-graphs: 1
    dags:     60fps (capped)
    for-each: 60fps (capped)

n-graphs: 2
    dags:     60fps (capped)
    for-each: 60fps (capped)

n-graphs: 4
    dags:     48fps
    for-each: 53fps

n-graphs: 8
    dags:     31fps
    for-each: 40fps

n-graphs: 16
    dags:     16fps
    for-each: 26fps

morten.bojsen-hansen · ‎05-31-2023

> bifrosts internal parallelization takes up more resources and therefore other graphs wont have those available?

I suspect that's it. Bifrost should already scale across all your CPUs/CPU cores. Adding Maya's parallelism on top of that can only clash with this and make things slightly slower. Maya also does caching and, as you said, conversion back-and-forth. I am not an expert on how Maya's parallelism works, however (I mainly work with Bifrost only).

I'm sure we'll get better tools for debugging Bifrost performance in the future, but probably not in the near future. It's definitely something we want to improve though.

Performance impact of multiple identical bifrost graphs

Performance impact of multiple identical bifrost graphs

Forums Links

Post to forums