Performance question - Intel versus Apple Silicon

davidpbest · ‎05-16-2024

My main computer is a 2020 iMac with Intel 8-core i9 CPU and 56GB main memory, and Radeon Pro 580X graphics with 8GB memory. I have a Fusion 360 assembly file that takes almost 2 minutes to compute and display a cross section view.

In comparison, my 2022 Macbook Air (not pro) with Apple 8-core M2 CPU, 24GB "unified memory", using the on-chip graphics processor. That same cross section compute/display of the same file is instantaneous - so quick it's not measurable.

I'm starting to scratch my head about this. I used to run the uP group at Intel, so I can certainly understand the technical differences in these two machines and the differences in Apple silicon advancements, etc.. But the difference in this performance is so staggering (120 seconds versus <1 second), that I'm not at all sure this difference is due to the hardware alone. Is there something else that's going on here other than the different hardware that's causing the performance difference on the Intel-based iMac? I ask because I'm wondering if I should retire the Intel-based iMac and move to an Apple silicon based Studio for my desktop machine (I would miss not having the large old iMac display). Is there something unique about the Fusion 360 code base on an Intel-based iMac? I don't have Windows hardware to test this question.

Any comments?

TrippyLighting · ‎05-16-2024

@davidpbest wrote:

... I don't have Windows hardware to test this question.

I have done extensive performance tests on my 2017 MacBook Pro between Fusion on macOS and Fusion running on Window in Bootcamp. There is no appreciable performance difference between Fusion on Windows and macOS.

Drewpan · ‎05-16-2024

Hi,

I would have to agree with Trippy that the basic differences between silicon should be negligable. There may be

some minor differences in how certain CPU instructions are processed that mean a difference of a clock cycle or two

here and there, but not a difference in two minutes. This is obviously something else and could be a setting, a driver

not up to date or some configuration in the software or hardware.

My first thought is that the 2020 machine may still have a mechanical secondary drive that is slowing things way

down with transfers, where as the later machine may have an SSD. That would cause some delay but again, two

minutes is an awful long time. Is there anything running in the background that may be causing the slowdown?

Virus software is notorious for "checking on the fly" when data is loaded. Again - long time very uncommon.

What are your application settings for the environment of each App? You might have 56GB of RAM but you may only

be using say a maximum of 4GB per App on the old machine and larger for the new one.

These are just a few suggestions from the top of my head, it could be anything but I doubt it is the actual Silicon that

is causing this issue.

Cheers

Andrew

davidpbest · ‎05-16-2024

Thanks for that perspective. I'm still left wondering about the the 200X performance difference between Intel and Apple silicon for CPU's only 2 years apart and with the same number of cores. Hard for me to attribute that difference to just the CPU differences.

davidpbest · ‎05-16-2024

Both machines are full 2TB SSD drives (no rotating memory on either). Both tests were done after a fresh boot of the OSX, no other apps fired up after boot, latest Fusion release, no background tasks running other than the usual stuff the OS requires (which is essentially the same on both machines). No virus software on either machine. Memory limits per application is a Window's convention that doesn't apply on OSX. A Mac OS X program has essentially unlimited memory to work with within the constraints of the physical memory on the machine. I'm really unclear how much Fusion 360 uses or relies on the graphics engine to do basic computational tasks like you'd find when turning on a section analysis. Rendering I understand, but that does not appear to be the the bottleneck in this test. Besides, my Intel machine has a dedicated graphics memory unit, whereas the Apple silicon machine has "unified memory" which means the graphics work steals memory from what would otherwise be available to applications, so the faster machine has essentially less than half the memory available to Fusion as the slower Intel machine. I know the Fusion team did a lot to move to native Apple silicon, so I can understand how that would help. So the only thing left that would attribute this level of performance difference is the efficiency of Apple M2 versus Intel i9 uP unit, or that the Intel code base for Fusion is hobbled by some kind of really inefficient compiler/emulator that sits between whatever F360 is written in and the base instruction set of the uP.

Drewpan · ‎05-16-2024

Hi,

I can't see how non-optimised compiling would have such an effect. When the differences between silicon was changing

so rapidly in the past this could make a difference but changes these days are more of an increment, not a multiple.

I do recall at one time that a huge performance change occurred when optimising a compiler from a 2 cycle 32 bit

instruction to a one cycle 64 bit instruction made some difference, but it wasn't double because it took extra time to

load all the registers. We are so spoiled for RAM and bandwidth these days that worrying about saving a byte here and

there is laughable.

I still think there will be some underlying, not obvious setting somewhere. I would expect some difference between

the machines you are using but 2 minutes is excessive. Maybe 10-12 seconds - maybe.

Cheers

Andrew

MichaelT_123 · ‎05-16-2024

Hi M DavidPBest,

One profound difference in the setups you presented is … CPU/GPU/memory configuration.

Since the calculations paradigm moved from fingers and fists to more advanced methods, the number of factors influencing the speed and the precision of devising the results also increased.

Hence, perhaps, your example is the evidence of the above.

How might CPU/GPU/memory configuration/access inhibit CAD reactions rates? … in your case, cross-section views.

First, we must note that CPU - external_GPU processing involves memory transfer over a PCI bus.

In the case when only small chunks of data are transferred, the communication can be pretty smooth. One PCI data transfer burst should do the job. However, when data volume is significant and/or additionally is delivered asynchronously in fragments, it might cause a PCI bus traffic congestion problem. The bus runs on prioritized interrupts; thus, such processes like network access/data pulling, … baking apple strudels (on Mac) … etc. will add to the burden.

The software might also play its role. The section analysis algorithms are very GPU-friendly; hence, the CPU-GPU data transfer will be in play if they are implemented on GPU.

As mentioned above, for an extensive data set, the process will be slowed down by standard PCI bus access/interrupt strategies … in the case of the external GPU … but will not be affected when there is no need for moving data over the bus … accessing it directly via memory channels.

The above divagations from the sky … are only my hypothesis… so take them with a spoon of mustard. By le chapeau … you are French, aren't you?.

How to check … what is going on?

I am not an Apple expert (even a user), but I hope that some tools are available to check the PCI bus's activities (interrupts, data transfer rates, devices involved) and GPU dynamics.

Regards

MichaelT

MichaelT

HughesTooling · ‎05-17-2024

I guess the best bet is share the design and see if others see the same problem. If you don't want to share here maybe share with someone from AutoDesk, not sure who to tag in for that. @TrippyLighting any ideas who would be best.

Just to clarify, this is just a section analysis not a cross section in a 2d drawing?

Mark Hughes
Owner, Hughes Tooling
Did you find this post helpful? Feel free to Like this post.
Did your question get successfully answered? Then click on the ACCEPT SOLUTION button.

TrippyLighting · ‎05-17-2024

I think @lance.carocci might be the right person.

lance.carocci · ‎05-17-2024

It could very well be a platform optimization issue.

I'm really unclear how much Fusion 360 uses or relies on the graphics engine to do basic computational tasks like you'd find when turning on a section analysis. Rendering I understand, but that does not appear to be the the bottleneck in this test.

The GPU largely drives the local canvas graphics during modeling operations, but that's about it. Even In-Canvas Render is CPU-based ray tracing at this moment. This is why our graphics requirements hit a bit of a ceiling compared to CPU - once you can drive a high resolution and framerate comfortably, there isn't much more for the GPU to do in Fusion. This will change in the future, but for now it should explain some of the activity (or lack thereof) you're seeing.

Besides, my Intel machine has a dedicated graphics memory unit, whereas the Apple silicon machine has "unified memory" which means the graphics work steals memory from what would otherwise be available to applications, so the faster machine has essentially less than half the memory available to Fusion as the slower Intel machine.

Unified tends to be favored because the data does not have to travel as far nor cross different memory buffers to be accessed by both CPU and GPU, which is great for latency, power consumption, and thermals. It's not uncommon for, say, a graphics intensive game on Windows to offer an option to modify async compute and CPU/GPU scheduling to balance processing load or tip it towards the superior processor of the two. It's less about quantity of memory, more about throughput and how efficiently it can be used to its fullest.

As to the topic at hand - please include a sample file or steps to help us reproduce the performance discrepancy. We are always on the look out for processing efficiency gains in commands and workflows, especially if it's not a consistent experience across device types.

Lance Carocci
Fusion QA for UI Framework/Cloud Workflows, and fervent cat enthusiast

davidpbest · ‎05-17-2024

Thanks Lance. OK, attached is a test case. When you open the f3z file you will see a model with one inserted component which is another Fusion model of a timing belt. Let it fully open and Fusion to settle to a acquiescent state. Then toggle the "Analysis" button in the browser. On my Intel i9 8-core iMac, Fusion becomes unresponsive for 2 minutes 9 seconds as it grinds away. I've attached a two screen shots of Fusion grinding away that shows the activity monitor, with Fusion taking 800+ percent of the CPU and very little of the GPU. The same test on my Macbook Air with M2 8-cores, the calculation finishes in under a second - too short to measure - and then Fusion is totally responsive again immediately. To repeat the test, reopen the original f3z file without saving it beforehand. Full specs on both machines were noted earlier in this thread. I provide this for performance analysis only - lectures on how to better model a timing belt are not sought here. Let me know what you find - I'm trying to decide whether to retire about 15 Intel iMac's and move to Apple silicon machines throughout the organization, so this is not a trivial decision at my end.

HughesTooling · ‎05-18-2024

Just testing on a PC and the section is almost instant! This is on a fairly old PC with a AMD CPU and GeForce graphic card.

Mark Hughes
Owner, Hughes Tooling
Did you find this post helpful? Feel free to Like this post.
Did your question get successfully answered? Then click on the ACCEPT SOLUTION button.

TrippyLighting · ‎05-18-2024

I can confirm @davidpbest's findings on my 2017 Intel MacBook Pro.

In fullscreen mode, it takes Fusion 2:45 to perform the section analysis with the fans blasting away.
Subsequent toggling of the Analysis folder yields instant results.
After closing the document and re-opening it, it will take another 2:45 for the section analysis.
When NOT in full-screen mode, the Section Analysis happens instantly.

So the outlier here is Fullscreen Mode on Intel!

@davidpbest can you please confirm (or not) that behavior?

davidpbest · ‎05-18-2024

@TrippyLighting- thank you for taking some time to investigate this. I'm heartened you are able to duplicate the performance lag, and for a moment I though you may have found the crux. Alas, it isn't full screen mode that's causing the issue at my end.

On my iMac, I do not run any application in full screen mode - with a 27" Retina I don't need to and often have other screens open to the side of Fusion for quick-click access to download, etc. So the performance lag I'm seeing (over 2 minutes for a fresh section analysis on this model) is with the Fusion screen sized to about 2/3rds (64.9244% actually) of the total display size. My Fusion window is 3898 x 2456 out of an available full screen size of 5120 × 2880 pixels, so definitely not in full screen. And this is a Retina display running in default pixel size (so no scaling going on), and it's driven by a Radon Pro 580X 8GB graphics accelerator which was the Apple norm at the time for the higher end iMacs.

I did retest this on the iMac in full screen mode, and also in the smallest screen size possible (1200 x 630) and saw no difference - fans blazing for over 2 minutes. I also confirmed that I was working in offline-mode with Fusion and I disconnected my internet connection just to be sure some background update chores weren't going on (Fusion is the only main App running).

I reran the test on my Macbook Air M2 in full screen and tiny screen and in all cases the response is instant.

I do suspect you are on to something here - relating to graphics, but it doesn't appear to be as simple as full screen mode. But it also doesn't seem to be related to the iMac specifically, or the Intel i9 chip or the Radon graphics controller I have, given you ran your tests on an older machine with different uP and graphics hardware.

lance.carocci · ‎05-20-2024

@davidpbest thanks for all the details so far - this will help us investigate further.

In the meantime, could you try toggling off Anti-Aliasing in the Navigation Bar under Display Settings > Effects? Ambient Occlusion may also be worth toggling off. AA is extremely resource intensive with DPI/Retina scaling enabled on top of it. 4k and 5k graphics are a lot to push through a card of that power tier and generation, and AA doesn't add a lot on top of it at that resolution.

Lance Carocci
Fusion QA for UI Framework/Cloud Workflows, and fervent cat enthusiast

davidpbest · ‎05-20-2024

@lance.carocci

Thanks for the response and interest in figuring this out. Anti-Aliasing was never toggled on on my systems. I did turn off Ambient Occlusion to repeat the test and no change - still 2+minutes for the section analysis.

I turned off everything in "Effects" and "Object Visibility" and re-ran the test, and still the same result.

As someone who has written GUI's for operating systems, this does not feel to me to be a graphical rendering issue, but rather some other computational task, or bug, or even silicon flaw, that's churning away. 800+ percent CPU usage, very little in the graphics accelerator as the previous screen shots illustrate.

I'm in Portland BTW (F360 HQ), so if you want to buzz on over with your debugging kit, I'm available any time. But my guess is that you should be able to recreate this at your end if you have an Intel-based Mac. FWIW, I'm running the latest version of OS X (14.4.1) on standard issue Apple hardware circa mid-2020. Let me know what else I can do to assist in figuring this out - it's really hampering productivity here.

lance.carocci · ‎05-21-2024

One way you might be able to compare on the same device is by forcing Rosetta 2 translation for Fusion. You can follow the instructions here to do so, and see if the workflow is just as long as it is on true Intel silicon.

Lance Carocci
Fusion QA for UI Framework/Cloud Workflows, and fervent cat enthusiast

davidpbest · ‎05-21-2024

@lance.carocci

Thank you for your reply. I'm not sure I understand how this might be helpful. If you would take the time to elaborate a bit, I would be appreciative.

I do understand how to turn on and off Rosetta, but I'm not sure this gets to my issue. It seems like you're suggesting that I should turn on Rosetta on my Apple silicon MacBook and see if that slows down to match my Intel based iMac. Inserting, some kind of interpreter-like oil-slick between the Fusion code and the base hardware would only serve to decrease performance – at least as I understand it.

What I'm really trying to get out here is whether the Apple silicon machines represent such a huge advance in performance that I should consider re-tooling my entire department with new hardware. The two minute wait time for a simple section analysis on a timing belt is just unacceptable. How do we push forward and get some kind of resolution here? I'm happy to do additional testing at your suggestion, but I would like to ensure that spending more time on this is actually advancing the ball with respect to my quandary about improving performance on operations like section analysis, or the far more demanding sweep-with-twist operations that we do daily for cable bundles.

lance.carocci · ‎05-21-2024

@davidpbest wrote:

Inserting, some kind of interpreter-like oil-slick between the Fusion code and the base hardware would only serve to decrease performance – at least as I understand it.

Possibly. But it would help confirm if the issue is indeed with the x86 side of the code, even with translation.

What I'm really trying to get out here is whether the Apple silicon machines represent such a huge advance in performance that I should consider re-tooling my entire department with new hardware.

Not yet, no. I would not expect the difference to be so large.

Lance Carocci
Fusion QA for UI Framework/Cloud Workflows, and fervent cat enthusiast

davidpbest · ‎05-21-2024

I'm still not sure what you are suggesting specifically. Are you asking that I test this on Intel silicon with and without Rosetta? Or are you asking me to do that test on the Apple silicon version? Please clarify.

Community

Performance question - Intel versus Apple Silicon

Performance question - Intel versus Apple Silicon

Performance question - Intel versus Apple Silicon