NUMA nodes scalability

NUMA nodes scalability

Due the limitation of my workstation CPU was testing Bifrost MPM on cloud VM.

So far as a test is working fine, but to substantially improve speed will need to use 32 - 48 or 64 cores machines, that are designed as 1, 2 or 4 NUMA nodes. 

Wondering if using more than 1 node will substantially lower the speed scalability in Bifrost simulation.

Looking better at the specs with AMD CPU the expected nodes are

32  core - 4 NUMA nodes

48 - 6 NUMA nodes

64 - 8 NUMA nodes

96 - 12 NUMA nodes


With Intel CPU we have

32 core - 1 NUMA node

48 - 1 or 2 NUMA node

64 - 2 NUMA node

72 - 2 NUMA node


There are official CoreMark benchmarks but not sure if they are relevant for Bifrost MPM sim, the bench seem not being impacted from NUMA nodes count.

I have done some tests on Azure standard (not dedicated) instances F64, D64, D32, D16, HB120

Seems having 16 or 120 vCPU (threads) available make little difference

On my Ryzen 6 core CPU I need about 30 sec to generate every frame

On all these instances I need btw 55 sec to 1 min to do the same

CPU usage range from 85% (16 vCPU) to 60% (64 vCPU), but in all cases in 1 min of work there is 10-15 sec of max CPU usage and next all the time on single thread activities

Attached the average usage profile.

This is way different than on my Ryzen CPU where the full CPU load is much more extended.

Wanted to ask what's in your experience the right balance to find max performance: need I to use a dedicated server or a simple 6-->12 core local CPU upgrade (also different generations) will in fact be the best.


