Community
Bifrost Forum
Welcome to the Bifrost Forum. This is the place for artists using Bifrost to ask and answer questions, browse popular topics, and share knowledge about creating effects procedurally using Bifrost. You can also visit the Bifrost Community on AREA to download an array of ready-to-use graphs, read Bifrost news and updates, and find the latest tutorials.
cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Threadripper CPU 32 cores with bifrost?

26 REPLIES 26
Reply
Message 1 of 27
abercaine
2428 Views, 26 Replies

Threadripper CPU 32 cores with bifrost?

Hi,
I was wondering if Bifrost is able to fully leverage the power of a 32 cores threadripper cpu.
I know for instance that houdini solvers are not playing that well with the 32 cores TR, and devs over there do not really recommend it.

So anybody has an insight on this with the current Bifrost? I would also be interested to know technically why solvers have trouble like that with so many cores. Is it an architecture limitation or it's more on the software side that might evolve in the future?
(I also aware that there was this windows issue capping the performance) 

Here is a youtube video with a guy doing a test with Bifrost with a TR 2990wx 32cores, at that time a did exactly the same setup with my i7 4930k (overclocked to 4.5)
and i had exactly the same simulation time!

 

thanks

 

26 REPLIES 26
Message 2 of 27

I assume that storing the data is the slowdown not the calculation on the CPU.

If its exact the same time.

----------------------------------------------------------------
https://linktr.ee/cg_oglu
Message 3 of 27
joostkonemann
in reply to: abercaine

In general, the speedup you get from running in parallel, may not always scale well with the number of threads you're using. Depending on the problem and the kind of solver, the different threads may need to communicate a lot with each other to solve the task at hand, this creates a lot of overhead, which increases with the number of threads. So, it's quite common to see these kind of processes not scale linearly with the number of threads. And as mentioned before, if a lot of IO to disk is happening, this may be the bottleneck. Or, if for instance the amount of RAM is limited and the OS has to use virtual memory, it needs to swap constantly to disk, resulting in a slowdown of the entire process.

 

Maybe one of the Bifrost developers can comment about the parallel performance of the Bifrost solvers and expected speedups.

I guess we may see some improvements in this area in the future, as this is just the first release of Bifrost extension!

--
MacBook Pro 13,3 - 2.7GHz - 16GB - Radeon Pro 460 - macOS Catalina 10.15
Message 4 of 27
mspeer
in reply to: abercaine

Hi!

 

I would say the same as @Christoph_Schaedl .

The problem is very likely data transfer and not the calculation itself.

As far as i remember data transfer is also the weak spot of the  Threadripper 32, based on the CPU design.

A test with the Threadripper 2950X (16 core) as comparison would be interesting here.

Message 5 of 27
abercaine
in reply to: mspeer

yeah i read something about how the TR is communicating with the ram that might cause this slowdown.
But this CPU is doing really well with cpu renderers, in that case it runs at full speed. 
Might be also interesting to see how TR gen3  performs.

But yeah we definitely need insights from the devs.   

Message 6 of 27
jan
Enthusiast
in reply to: abercaine

I have the 3990x running sims now. Here's a typical example of cpu utilization:

 

3990x.PNGI would say bottlenecks are certainly in data writing. When it has something meaty to process it will hit prolonged 100% utilization. 

 

Massive fan of the TRs, use them for rendering. Pound for pound they knock xeon out of the park.

Message 7 of 27
Christoph_Schaedl
in reply to: jan

Thanks interesting. Thanks for sharing.

 

How fast is your SSD? 

----------------------------------------------------------------
https://linktr.ee/cg_oglu
Message 8 of 27
jan
Enthusiast
in reply to: Christoph_Schaedl

Data caching to a Samsung 860 EVO

Message 9 of 27
Christoph_Schaedl
in reply to: jan

Hmm thats not the fastest. If possible upgrade to a M2.

----------------------------------------------------------------
https://linktr.ee/cg_oglu
Message 10 of 27
jan
Enthusiast
in reply to: Christoph_Schaedl

The system disk is a smaller M2. I'll do a like for like test and get back to you

Message 11 of 27
jan
Enthusiast
in reply to: jan

For context, here is an example test frame:

jan_0-1594214532127.png

In each example I'm starting from the same liquid cache end frame (340) and simulating fluid and a clipped mesh (for animation).

 

On the slower SATA SSD 14-16 mins

jan_1-1594214874203.png

The faster M2 SSD also 14 mins

jan_2-1594215117059.png

 

This surprised me but I wonder if the results are skewed because of the meshing. This does seem to take an extraordinary length of time.

 

Or perhaps because I'm writing to the system disk. I don't know enough about hardware to comment on that.

 

I will later rerun the same test only on the fluid sim.

Message 12 of 27
Christoph_Schaedl
in reply to: jan

Budgetsystems did also some testing.

 

pic_disp.jpg

 

https://www.pugetsystems.com/labs/articles/3ds-Max-2021-CPU-Roundup-Intel-vs-AMD-1812/#SimulationRes...

 

----------------------------------------------------------------
https://linktr.ee/cg_oglu
Message 13 of 27
jan
Enthusiast
in reply to: Christoph_Schaedl

That's really interesting and fantastically useful, thanks for sharing.

 

I should actually pull this fantastically expensive 3990X out and stick the 3960X in at less than half the price. I might order one today and do a comparison of my own.

 

So if multithreading isn't the key I guess overclocking is? Perhaps a 3960X and a substantial cooling solution is the way to go.

Message 14 of 27
Christoph_Schaedl
in reply to: jan

If simulation is the main workload, i agree.

But if you have to render the stuff your current CPU is much faster.

----------------------------------------------------------------
https://linktr.ee/cg_oglu
Message 15 of 27
jan
Enthusiast
in reply to: Christoph_Schaedl

It certainly appears that way for FLIP. Yes, for rendering it's jaw droppingly fast. 

 

Do you think in the future BF might become more multi-threaded or perhaps it's a limitation of the process that it cannot simply be compartmentalized in the same way a render can be tiled? I don't have any experience of Houdini but after a (very) brief bit of google-fu it seems it doesn't suffer the same bottleneck.

 

I have to do another round of workstation purchases soon and I must admit I'd pretty much disregarded the Intel offering of late. But the 10900K does seem to be a strong, cost effective option.

Message 16 of 27
Christoph_Schaedl
in reply to: jan

Im not sure that the CPU is the bottleneck.

Im alos not sure what those houdini numbers are. Sim time or the final data on the harddrive.

And it also depends on the scene you are simming. Bifrosts MPM solver is very different to a Flip.
And there are experimental nodes (orange bottle icon) in Bifrost they are still not optimized.

There are to many factors.

----------------------------------------------------------------
https://linktr.ee/cg_oglu
Message 17 of 27
abercaine
in reply to: jan


@jan wrote:

I don't have any experience of Houdini but after a (very) brief bit of google-fu it seems it doesn't suffer the same bottleneck.


well you can see on the houdini side for the flip sim that it doesn't scale well between 3960x and 3970x actually the 32 cores is even slower!! 
3960x (24cores) : 25m11s

3970x (32cores) : 25m21s

 

but overall the 3rd gen of TR are way better, way less problems then previous gen especialy on windows  

Message 18 of 27
jan
Enthusiast
in reply to: abercaine

You're right, I hadn't noticed that. Also what's going on with a 25% performance drop from linux to windows on 3970X?! That's a big, big difference.

Message 19 of 27
abercaine
in reply to: jan

linux is dealing better with high numbers of cores.
Now for windows there are things to pay attention regarding high numbers of cores, all versions of Windows (fam, prio etc) will not react the same.
You have some tests about that on the internet.
that being said in general you get better performance on Linux  

Message 20 of 27
abercaine
in reply to: jan

Can't find what you're looking for? Ask the community or share your knowledge.

Post to forums  

Autodesk Design & Make Report