Threadripper CPU 32 cores with bifrost?

Threadripper CPU 32 cores with bifrost?

abercaine
Advocate Advocate
3,627 Views
26 Replies
Message 1 of 27

Threadripper CPU 32 cores with bifrost?

abercaine
Advocate
Advocate

Hi,
I was wondering if Bifrost is able to fully leverage the power of a 32 cores threadripper cpu.
I know for instance that houdini solvers are not playing that well with the 32 cores TR, and devs over there do not really recommend it.

So anybody has an insight on this with the current Bifrost? I would also be interested to know technically why solvers have trouble like that with so many cores. Is it an architecture limitation or it's more on the software side that might evolve in the future?
(I also aware that there was this windows issue capping the performance) 

Here is a youtube video with a guy doing a test with Bifrost with a TR 2990wx 32cores, at that time a did exactly the same setup with my i7 4930k (overclocked to 4.5)
and i had exactly the same simulation time!

 

thanks

 

0 Likes
3,628 Views
26 Replies
Replies (26)
Message 2 of 27

Christoph_Schaedl
Mentor
Mentor

I assume that storing the data is the slowdown not the calculation on the CPU.

If its exact the same time.

----------------------------------------------------------------
https://linktr.ee/cg_oglu
0 Likes
Message 3 of 27

joostkonemann
Advocate
Advocate

In general, the speedup you get from running in parallel, may not always scale well with the number of threads you're using. Depending on the problem and the kind of solver, the different threads may need to communicate a lot with each other to solve the task at hand, this creates a lot of overhead, which increases with the number of threads. So, it's quite common to see these kind of processes not scale linearly with the number of threads. And as mentioned before, if a lot of IO to disk is happening, this may be the bottleneck. Or, if for instance the amount of RAM is limited and the OS has to use virtual memory, it needs to swap constantly to disk, resulting in a slowdown of the entire process.

 

Maybe one of the Bifrost developers can comment about the parallel performance of the Bifrost solvers and expected speedups.

I guess we may see some improvements in this area in the future, as this is just the first release of Bifrost extension!

--
MacBook Pro 13,3 - 2.7GHz - 16GB - Radeon Pro 460 - macOS Catalina 10.15
0 Likes
Message 4 of 27

mspeer
Consultant
Consultant

Hi!

 

I would say the same as @Christoph_Schaedl .

The problem is very likely data transfer and not the calculation itself.

As far as i remember data transfer is also the weak spot of the  Threadripper 32, based on the CPU design.

A test with the Threadripper 2950X (16 core) as comparison would be interesting here.

0 Likes
Message 5 of 27

abercaine
Advocate
Advocate

yeah i read something about how the TR is communicating with the ram that might cause this slowdown.
But this CPU is doing really well with cpu renderers, in that case it runs at full speed. 
Might be also interesting to see how TR gen3  performs.

But yeah we definitely need insights from the devs.   

0 Likes
Message 6 of 27

jan
Enthusiast
Enthusiast

I have the 3990x running sims now. Here's a typical example of cpu utilization:

 

3990x.PNGI would say bottlenecks are certainly in data writing. When it has something meaty to process it will hit prolonged 100% utilization. 

 

Massive fan of the TRs, use them for rendering. Pound for pound they knock xeon out of the park.

Message 7 of 27

Christoph_Schaedl
Mentor
Mentor

Thanks interesting. Thanks for sharing.

 

How fast is your SSD? 

----------------------------------------------------------------
https://linktr.ee/cg_oglu
0 Likes
Message 8 of 27

jan
Enthusiast
Enthusiast

Data caching to a Samsung 860 EVO

0 Likes
Message 9 of 27

Christoph_Schaedl
Mentor
Mentor

Hmm thats not the fastest. If possible upgrade to a M2.

----------------------------------------------------------------
https://linktr.ee/cg_oglu
0 Likes
Message 10 of 27

jan
Enthusiast
Enthusiast

The system disk is a smaller M2. I'll do a like for like test and get back to you

0 Likes
Message 11 of 27

jan
Enthusiast
Enthusiast

For context, here is an example test frame:

jan_0-1594214532127.png

In each example I'm starting from the same liquid cache end frame (340) and simulating fluid and a clipped mesh (for animation).

 

On the slower SATA SSD 14-16 mins

jan_1-1594214874203.png

The faster M2 SSD also 14 mins

jan_2-1594215117059.png

 

This surprised me but I wonder if the results are skewed because of the meshing. This does seem to take an extraordinary length of time.

 

Or perhaps because I'm writing to the system disk. I don't know enough about hardware to comment on that.

 

I will later rerun the same test only on the fluid sim.

0 Likes
Message 12 of 27

Christoph_Schaedl
Mentor
Mentor

Budgetsystems did also some testing.

 

pic_disp.jpg

 

https://www.pugetsystems.com/labs/articles/3ds-Max-2021-CPU-Roundup-Intel-vs-AMD-1812/#SimulationRes...

 

----------------------------------------------------------------
https://linktr.ee/cg_oglu
0 Likes
Message 13 of 27

jan
Enthusiast
Enthusiast

That's really interesting and fantastically useful, thanks for sharing.

 

I should actually pull this fantastically expensive 3990X out and stick the 3960X in at less than half the price. I might order one today and do a comparison of my own.

 

So if multithreading isn't the key I guess overclocking is? Perhaps a 3960X and a substantial cooling solution is the way to go.

0 Likes
Message 14 of 27

Christoph_Schaedl
Mentor
Mentor

If simulation is the main workload, i agree.

But if you have to render the stuff your current CPU is much faster.

----------------------------------------------------------------
https://linktr.ee/cg_oglu
Message 15 of 27

jan
Enthusiast
Enthusiast

It certainly appears that way for FLIP. Yes, for rendering it's jaw droppingly fast. 

 

Do you think in the future BF might become more multi-threaded or perhaps it's a limitation of the process that it cannot simply be compartmentalized in the same way a render can be tiled? I don't have any experience of Houdini but after a (very) brief bit of google-fu it seems it doesn't suffer the same bottleneck.

 

I have to do another round of workstation purchases soon and I must admit I'd pretty much disregarded the Intel offering of late. But the 10900K does seem to be a strong, cost effective option.

0 Likes
Message 16 of 27

Christoph_Schaedl
Mentor
Mentor

Im not sure that the CPU is the bottleneck.

Im alos not sure what those houdini numbers are. Sim time or the final data on the harddrive.

And it also depends on the scene you are simming. Bifrosts MPM solver is very different to a Flip.
And there are experimental nodes (orange bottle icon) in Bifrost they are still not optimized.

There are to many factors.

----------------------------------------------------------------
https://linktr.ee/cg_oglu
0 Likes
Message 17 of 27

abercaine
Advocate
Advocate

@jan wrote:

I don't have any experience of Houdini but after a (very) brief bit of google-fu it seems it doesn't suffer the same bottleneck.


well you can see on the houdini side for the flip sim that it doesn't scale well between 3960x and 3970x actually the 32 cores is even slower!! 
3960x (24cores) : 25m11s

3970x (32cores) : 25m21s

 

but overall the 3rd gen of TR are way better, way less problems then previous gen especialy on windows  

0 Likes
Message 18 of 27

jan
Enthusiast
Enthusiast

You're right, I hadn't noticed that. Also what's going on with a 25% performance drop from linux to windows on 3970X?! That's a big, big difference.

0 Likes
Message 19 of 27

abercaine
Advocate
Advocate

linux is dealing better with high numbers of cores.
Now for windows there are things to pay attention regarding high numbers of cores, all versions of Windows (fam, prio etc) will not react the same.
You have some tests about that on the internet.
that being said in general you get better performance on Linux  

Message 20 of 27

abercaine
Advocate
Advocate
0 Likes