x16000 times faster explicit nonlinear solver

tsaJBF5V · ‎01-23-2023

16000 times faster explicit solver.

There is a thesis from 1980 that shows equivalence between strain and stress envelopes. It proves mathematically, that all stress envelopes, can be expressed using strain envelopes. (P.J.Yoder, 1980, Derivation and implementation of strain space plasticity).

One peculiar feature, that caught noone's attention in the 80's is - the strain envelope makes simulation run as a parallel spring system. At the time, there were no GPU cards, so they concluded the 1% decrease in computation time is not important... not important enough to re-write or re-ponder conventional formulation.

The thing is, strain (not stress) envelopes allow to run simulations in pure parallel mode. In onve giant, fully closed GPU loop. And convergence can be controlled using PID principles, rather than extremely computation costly rebuilding of inverse stiffness matrices. That way... it is plausible to run explicit nonlinear dynamic FEM simulations in fully parallel computation mode.

I got it to run in 1D... with 1 million bar elements. In a fully closed GPU loop using MATLAB. But it was a spare time project during a Ph.D... makes me wonder what a professional or an expert could achieve. If a student with 2h spare time per day can get that far... how far could someone qualitfied for it take it?

david.wartaJA24W · ‎08-13-2023

Other solvers are out there that implement this or something similar. However the results tend to be less accurate. You can look up Ansys Discovery vs Ansys Mechanical.

tsaJBF5V · ‎08-14-2023

@david.wartaJA24W

I wish I could see the architecture of the solver they're using.

However... I highly doubt they use the "strain space" (deformation based) not stress space (force based) yield limits. And without those there's simply no mathematical way to avoid local matrix inversions - element wise.

I expect they could be using GPU computing for things like contact detection. It can be done with things like branchless programming (using mathematical equations instead of "if" statements). But... even then...

They key to "fully embedded" GPU cycles is the "abandoned" work of P.J.Yoder 1980 "derivation and implementation of strain space plasticity". I've tested it and confirmed his experience - precision improved, increment size increased. But my focus was empirical testing - observing cyclic loaded specimens, during irregular loading cycles (hundreds of hours "playing" with real specimens, manually applying various stress and strain amplitudes on them - using a custom built frictionless triaxial apparatus, with an Xbox joystic attached to it).

I reached the conclusion stiffness hysteresis loops move in proportion to deformation amplitude - not stress amplitude. Did a deep search in literature, found Ricardo Dobry has observed something very similar, but was not able to get funding to develop tools that make models based on physical observations... my observations are a bit more nuanced. I am able to, in real life - real specimen, "disturb the specimen back into initial condition". To control the shape, size and position of individual strain hysteresis loops. I decoded dependencies of the hysteresis loops with regard to the full equation of motion - position, velocity and acceleration proportional components were isolated. At which point... I have achieved full control of individual "stiffness paths generated during infinite irregular loading cycles". In real specimens. Allowing to "disturb the specimens back into initial state" after testing peak strength. All by using deformation dependent material properties (and mostly ignoring stress space completely, in the process of interacting with the real specimens, in real life).

Some years later, I managed to find the thesis of P.J.Yoder. At that time, around 2016, his thesis was preserved only in "picture scan" format... I stumbled across it while googling names I would give to a model that I see needed - from test results. "strain space plasticity" was one of the many names I "came up with" for a model that did not exist yet (to my knowledge, at that time). There are a couple of "strain space" models out there. But only the one derived by P.J.Yoder is fully compatible with everything I have observed in my ad-hoc empirical testing. So... Long before the first GPU was invented, P.J.Yoder was able to notice the "parallel spring system"... and made a request for "empirical tests" in his thesis... by accident, I did the tests 40 years after his publication, without being aware of it and spent 2 years looking for him... it is a funny thing... empirical tests ask for a theory... which was asking for empirical tests - 40 years ago. The empirical proof - pointed to the theoretical solution...

Nobody cares, but... that's cool. IMHO.

All the GPU used in FEM today, that I've seen so far, looks like a "Band-Aid" at best. Not addressing the fundamental problem - the flow rule equation. Yoder shows how to solve nonlinear FEM with the flow rule removed (pre-calculating it, flipping it upside down, solving the iterations "backwards", all matrices - transposed). Just... using stress relaxation directly, instead of converting residual stress into strain and back into stress again - as convention does.

I tested his approach in 1D. That's where my "coding skills" leveled out. Had only so much "spare time", while working and studying - both at same time. I was able to run 1D simulations of up to 1 million springs - in dynamic mode. Combining nonlinear elasto-plasticity with the equation of motion. Observing things like "nonlinear waves"... waves traveling through a medium with nonlinear stiffness... which causes waves not to super-impose, but literally "crash" with each other, shape various wave shapes, waves stuck within waves - due to impedance, waves of varying speed - traveling through the same medium (duo to different stiffness at different amplitude). Beams doing things like "self damping" due to hysteresis loops. It is pretty cool in that sense... and... even on a normal laptop from 2020, the GPU was able to calculate things - with the same precision, but 1000x faster.

It never got granted to develop further. But... it works. It works very well. And, if anything - it actually is more precise. As has been proven in 3 public defenses, and mathematically, and through simulations by P.J.Yoder himself.

My tests showed the same thing that P.J.Yoder is saying... I took the time to test his theory.

And the same thing all 3 public defenses of his idea concluded. All the opponents... year after year... all the opponents... who took the time to look into his proposal - strain space (not stress) plasticity - concluded the same thing. Strain space plasticity has higher tolerances to increment size, and produces slightly faster computation per loop (the loops are 1 equation shorter, because the flow rule is flipped upside down, and local matrix inversion is replaced with simple dot product).

The only reason, anyone ever, in the last 40 years - could give to ignore P.J.Yoder was - it runs a little faster, but produces the same exact answer... the "slight speed improvement" does not justify reformulating convention.

So, their reasoning was - why bother getting "identical" outputs, using "just a bit" faster method?... like... "why bother"?

Today... the answer is... the "identical answer" can be computed at minimum 1000 times faster. For it is inherently, fully, down to the last jakobian matrix - fully, jut completely full fully - closed GPU loop compatible, nonlinear elasto-plastic solution. It will run smoothly.

And that... is extremely different from all the "GPU computation attempts" which solve only one equation in GPU... upload the data, run a few dimes, download the data... do CPU part... upload, download... upload, download... it's not the same thing as a "closed loop".

The way most GPU computation engines compute FEM, reminds a lot how a lot of the A.I. development does things... it's just "GPU enhanced", not "reformulated to be fundamentally GPU compatible". They just put one equation into the GPU... and continue to run the rest of the code in the CPU mode. Which is just... incredibly stupid. But... Seems to sell good. Same as "A.I. enhanced" - median lines... put a sticker on it and sell it... it's just sad, man...

david.wartaJA24W · ‎08-14-2023

Thanks for your explanation. It looks like the likes of Ansys are just not aware or dont want to be. Interesting topic.

x16000 times faster explicit nonlinear solver

x16000 times faster explicit nonlinear solver

Forums Links

Submit Idea