Future of the CUDA parallel computing support?

p.wozniacki · ‎10-26-2010

I'm about to purchase one of the new Fermi line graphics card from nVidia, and of course would like the investment to be future proof. The entry level card is the Quadro 4000 (2GB buffer, 250 GPU cores); its price/performance ratio is good but it's not designed to work with a second or third card of the same type, to form the scalable CUDA platform. For that, the much more expensive Quadro 5000 or 6000 are required (see here : http://www.nvidia.pl/object/quadro_sli_pl.html).

Now, can Moldflow reveal some details about the planned scalability of CUDA support in the future MPI releases? Will the solution described in the above link be mandatory to expand the GPU computing power with consecutive cards, or will it also be scalable with the Quadro 4000 model, by simply adding another card of the same type?

TIA

Piotr

raalteh · ‎10-27-2010

Hi Piotr,

The current version of Autodesk Moldflow does not support analyses on a the Fermi line cards. Unfortunately this is not reflected specifically in the Installation guides of the Moldflow 2011 products, as it was not anticipated that newer CUDA versions would break the the use of Moldflow for the analysis. This is inherent to early adoption of leading edge technology.

However, we are aware of the current limitation now and, and we will also make sure we will be explicit in the documentation about the version of CUDA that is supported.

Unfortunately I cannot be explicit about what enhancements will be made in the area of CUDA support. However, we are very interested in exploiting this technology as much as possible, as we would for other technology that can help shorten analysis times.

Best regards,

Hanno

Hanno van Raalte,

Product Manager - Injection Molding & Moldflow products

p.wozniacki · ‎10-27-2010

Hi Hanno,

Thanks so much for letting me know the Fermi line is not supported - I was just about to invest...

Piotr

yannick.moret · ‎10-28-2010

Well, if you plan for long term investment, it might be worth to think about it. We cannot say what future will be made of as Hanno said, but GPU is certainly an appealing technology.

We just know that the current version, 2011, is not able to exploit GPU on Fermi card.

.

raalteh · ‎10-29-2010

Thank you for reinforcing my point Yannick.

Hanno van Raalte,

Product Manager - Injection Molding & Moldflow products

p.wozniacki · ‎10-29-2010

Yeah... We all fall pray to the "early adoption" of new technologies; I've had my share in it too.

I was going to give some more power to my rather old, QX6700 based machine, before I'll replace it with something faster in another year or two. I was hoping to use the Quadro 4000 Fermi card in the current machine, and then move it to the new one - but that seems not viable now.

Piotr

p.wozniacki · ‎10-30-2010

And since we have the attention of Moldflow insiders in this thread, let me ask another question:

- how much longer will the Dual Domain solvers be unable to use multiple threads/cores/CPUs?

Even with 3D technology evolving rapidly, there is still a lot of work that should, or must, be be done in 2.5D - it's so frustrating to wait for an analysis to complete, while no more than 25% of a typical quad processor is used!

p.wozniacki · ‎11-11-2010

Dear Hanno,

Can you please provide us with a list of currently supported nVidia cards? AFAIK, all their new products (including the Tesla 2050/70) are based on the Fermi architecture....

Do I need to buy the older Tesla C1060 (or an older Quadro FX card - the newer 480/580 are also said to be of Fermi type)?

TIA

Piotr

yannick.moret · ‎11-15-2010

Hi,

Cuda hardware compatible with AMI2011 are the one with a compute capability of 1.3 :

http://www.nvidia.co.uk/object/cuda_gpus_uk.html

The cards need to be double precision.

Note that Moldflow beta 2012 is compatible with Fermi hardware. The GPU will be recognized and use.

Remember also it is a beta, not a final version, so we cannot guarantee this feature in final release.

For exact list of CUDA card compatibility I would suggest to ask through support.

Regards

Yannick

p.wozniacki · ‎11-15-2010

Hi Yannick,

Thanks for the answer. You say : "Cuda hardware compatible with AMI2011 are the one with a compute capability of 1.3" - I hope you mean a compute capability of 1.3 or higher, right? All the higher-end nVidia cards have this indicator at 2...

Thanks and Regards

Piotr

yannick.moret · ‎11-15-2010

Unfortunately no...AMI 2011 is limited to compute capability of 1.3.

Above is Fermi based hardware.

See previous reply in this thread :

The current version of Autodesk Moldflow does not support analyses on a the Fermi line cards. Unfortunately this is not reflected specifically in the Installation guides of the Moldflow 2011 products, as it was not anticipated that newer CUDA versions would break the the use of Moldflow for the analysis. This is inherent to early adoption of leading edge technology.

Yannick

p.wozniacki · ‎11-15-2010

Ah, OK - I misred your post, sorry.

The index of 1.3 is the current limitation of the 2011 version (thus, no Fermi support).

The 2012 Beta is compatible with Fermi, which implies higher CUDA computing index of 2.0 - do I get you right now?

Thanks

Piotr

yannick.moret · ‎11-15-2010

Exact !

p.wozniacki · ‎11-16-2010

Thanks for confirmation.

Actually, in the Hardware section of the Release Notes for AMI 2012, it's stated explicitly:

"The minimum required hardware is a CUDA-enabled card capable of double-precision (64-bit floating point precision) computations. Cards which meet these requirements will have a CUDA Compute Capability of 1.3 or higher".

I guess I'm going to get one of these Fermi cards - especially that the AMA 2012 also uses them in 3D Flow!

Piotr

yannick.moret · ‎11-16-2010

And 2012 beta 1 is also using GPU in warp 3D....

p.wozniacki · ‎11-24-2010

Dear Hanno and Yannick

A couple of observations so far, from running AMI 2012 Beta with the Quadro 4000 card, on a QX600 quad CPU @3.0 GHz.

The best gains can be achieved when running several analyzes at the same time. A good scenario is running 3 analyzes:

- the 1st: max number of CPU cores

- the 2nd: single CPU core + CUDA

- the 3rd: max number of CPU cores + CUDA

It turns out that the highest gains can be obtained in 3D warp; with a fairly small model of just some 200,000 elements, I got:

- the 1st: 24 mins

- the 2nd: 15 mins

- the 3rd: 18 mins

The same model 3d Flow times were not so impressive- 9, 9 and 8 mins respectively.

The problem with the 4000 card is the limited amount of memory (just 2GB), so it's not always possible to benefit from CUDA with larger 3D models. The rule of thumbs says that for each 1 million elements, 1 GB memory is allocated - however, this seems to only apply in 3D Flow solver. Since the most beneficial CUDA usage seems to be the 3D warp, my question is:

- what would a similar rule of thumb be for graphics memory requirements vs.. The number of 3D elements?

My other question is related to a much more distant future. Do you think that Moldflow will ever be able to use several GPUs, for scalable CUDA computing? A quick answer to this question is crucial to me, as I can still return my Quadro 4000 card and buy the 6000 (which is 6GB) - but of course, should scalable parallel computing be coming, I could keep the current card and only add another one in future, when necessary...

Thanks,

Piotr

yannick.moret · ‎11-26-2010

Hi,

Note that the beta has not been optimized for the Fermi hardware. I will not and cannot guarantee that you will get better performance once this happen (I don't know when), but something to consider.

One test you didn't do, is 1 thread only without GPU.

Note also that we use the GPU as long as we can when running a model. So if you run only sometimes big models, might be worth to keep the 4000, as itwill use it as long as there is enough memory on the card. It will switch off during the filling when ut cannot use anymore the GPU RAM.

The price of the Quadro 6000 is quite an investment, maybe considering a PC with latest Xeon and 12 or 24 cores with plenty of RAM could do also a good job, no ?

And I wouldn't think the amount of RAM taken in the GPU by an open model just depend of the size of the model.

It might depend of your resolution, probably if you are displaying results or just a mesh, etc...

For future development, I can just reply as Hanno did. It is seen as an interesting technology. But we cannot commit to any specific development.

Regards

Yannick

p.wozniacki · ‎12-03-2010

Still in relation to CUDA computing, but this time on a mobile workstation. It'd be nice to have a laptop that can take advantage of those newest nVidia cards.... The "professional" line (like the Quadro FX1800M) have the Compute Capability of just 1.2 - will it work?

Even if it will, the problem is those cards have very limited amount of memory (usually just 1GB). Therefore, those new gaming GeForce cards of Fermi architecture sound more like it (e.g. the GT 445M can be equipped with 3 GB); others (like the GTX 460M have lots of cores, and super-fast GDDR5 memory). But nowhere can I find the answer to the basic question: are they capable of 64-bit FP computing?

Has anyone at Moldflow actually tried using CUDA on a laptop? I'll appreciate a quick answer, as I'm shopping for a good notebook at the moment!

TIA

Piotr

p.wozniacki · ‎12-04-2010

More generally speaking, are the CUDA 64-bit floating point computations at all possible with the highest GeForce cards, or is a Quadro model essential for that?

Does anyone know (and actually tried)?

Piotr

yannick.moret · ‎12-06-2010

The compute capability you need is 1.3 minimum. So I don't think there is Quadro for laptop who can achieve that(maybe the 5000 M as it ahs a compute capability of 2.0).

For GEForce, yes we support GEforce too.

The one we used to support are for example :

GeForce GTX 295

GeForce GTX 285

GeForce GTX 280

GeForce GTX 260

But I'm not so sure for the latest generation as Nvidia doesn't give clear 64 bit characteristics for GEForce.

Anyway, in the documentation we say we require :

The minimum required hardware is a CUDA-enabled card capable of double-precision (64-bit floating point precision) computations.

So any cards with this capability should do. Ask your supplier to confirm the matter.

Future of the CUDA parallel computing support?

Future of the CUDA parallel computing support?

Forums Links

Post to forums