I have just started using Sim 360 Pro and while I like the flexibility, one thing that irked me is the download speed. For a moderately dense mesh of 4-5 million elements, it took nearly 40 minutes from pressing the solve button to seeing the final mesh. For last 10 minutes of this period, the output pane showed "Analysis completed successfully" but the results were still getting downloaded!
We have an internet line of 70 mbps (checked on speedtest) so really the bandwidth doesn't seem to be the reason for this. The "download speed" in the job manager varied rapidly, sometimes dipping to as low as 0.02 MB/s. Am I missing some settings here? I have increased the size of "chunk data" to 25 MB from 5 MB and kept the default value (0) for limit for no. of uploads, which implies no limit. I am yet to run a full fledged simulation and don't yet know how this will affect the gathering of results, which have larger size than mesh data anyway.
Solved! Go to Solution.
If you are just meshing a model (and not running the solver) you are probably going to get to the mesh faster by doing it locally. Meshing a model in Simulation CFD is equivalent to running zero iterations so the amount of data you are uploading and downloading from the cloud is similar to a mesh + solve.
The advantages of solving in the cloud really come to light when you are solving models that would normally take a fair bit of time to complete because you are free to solve multiple at once and can use your machine's resources for other work during the computation. The size of the download at the end of a run should not be that much different from your initial test. For a 4-5 million element model I would anticipate a data set on the order of several hundred MB which could be expected to take 10 minutes or so to download.
Just out of curiosity, are you using the 2014 version or the new Simulation Flex release?
A 4-5 million element model downloading in 10 minutes? Is this in a parallel universe? Please give me directions how to get there.
At least in the 2014 version, a 4-5 million element model with, for example, rotating region and a few thousand time steps would be 1+ days after solve is completed, not hours. Solve time is usually a day. However, I always run more than 1 simulation at a time (otherwise why cloud solve?), so that will definitely affect results as SJM is a happy/sad type program. Once it gets 'behind' on result downloads, it takes 1+ day to recover.
IMO, the SJM code and architecture is 5+ years old. It will never do better unless the networking code is thoroughly rewritten. My post on SPDY (SPeeDY) was an attempt to at least get the discussion going at Autodesk (personally I think the code should be almost finished but that's just my shiny penny).
I'm not trying to bash this product. Cloud CFD is a great concept, probably one of the best to come along in my career. However, expectations must be managed or users will be jumping from the clouds after writing long 'Dear John' emails (Jon has received them from me already).
I have had a year of ever-tempered expectations. To the point that I now expect that I can only run concurrently <10 simulations of 500k-1 million elements, each having a few thousand time steps and ~100-500 MBytes of data. My own experience has involved downloading results from 4-5 different business connections (100+ Mbps) in 2 different states and SJM always regresses to ~100 Kbps speeds 99% of the time with peaks to 10+ Mbps.
In my online and offline talk with others...they have similar experience. Running small simulations works, but once SJM gets 'overloaded' and falls behind performance 'crashes'.
Why the 'crash'? My understanding of the Autodesk Sim360 Cloud is this:
CFD/Mech 'solver' server solves and forwards results to a batch download 'cache' server which then transfers to the client SJM over the net. I'm not sure if this solver->cache server transfer is (a.) always undertaken or only if (b.) the client SJM can't receive the results fast enough from the solver server or (c.) client SJM is offline. SPDY would help with the (b) scenario and perhaps avoid solver-cache transfer bottleneck (if that in fact is a bottleneck).
Autodesk isn't talking about architecture or performance publicly, and my only insight is from watching TCP traffic with Process hacker, examining TCP packets with Wireshark, and examining the server logs (which seem to be useless as they have lotsa messages that date from 2009 and 2012 server versions).
In any event, SJM has a ways to go.
I used cloud for meshing because my workstation and our cluster were already busy churning out the queues with our standalone licences. Now I realise that though cloud meshing is a nice flexible option when all Solve licences are used, it probably makes sense not to use it for very heavy meshes. Also, we are using 2014 version for consistency instead of Flex. We have already done some benchmarking between 2013 and 2014 and found them at par for our applications. We won't move to 2015 before we do this benchmarking again.
An insightful post
I agree (and wish) more can be done on the communication front. I have seen numerous glitches in standalone version as well when it comes to solving with in-office remote clusters. The cluster integration has been patchy.
I have got a feeling I need to change some of the habits I had picked up while running the standalone SimCFD.
I typically save results after 500 iterations so that in the event of divergence I can resume from the last saved iteration. Do you think that will make it even more cumbersome for the SJM to retreive the results?
Also, have you observed that the updation of iterations/plots in Sim 360 is laggy? When I right click on the job in SJM and select "Show Convergence Plot Thumbnail" I see more no. of iterations than what I see in Sim360 output pane. Do you know any setting to mitigate this?
Thankfully, the results seem to be mapped on the mesh with the progression of iterations (with some lag) and hence I can always use that to get the information (pressure drops, velocities etc) "on the go" quickly when I observe sufficient convergence.
Very quick reply, more later.
I've been using 2015 for 3 days now. I have yet to hit it with more than 3 concurrent simulations but so far things have been smoother than 2014. I would encourage you to try 2015.
As to your mention of saving at 500 it ... I first did the same with SJM (for the same reasons) and found that SJM did not like this practice as download suffered. However, over the last 6 months after figuring out CFD meshing/settings issues, I think that cloud solve is much more stable than my desktop solve. This is most likely due to Autodesk Cloud solve running on better hardware. Meshing and Solver stability is now amongst the least of my concerns, so no real need for intermittent saves.
If Autodesk gets SJM figured out, and the Unlimited pricing is stable (will Autodesk offer a 3 year subscription?), this will be an incredible tool.
P.S. Would be very interesting to know more about your cluster hardware. Can you PM me?
Doug & Omkar:
A little behind the wall discussion here.
One aspect of installing Flex/SimPro 2015 is that it will upgrade your SJM client. Many of the SJM improvements will be noticed in 2014. So, upgrade to 2015 to at least get a new SJM.
You make the comment that the code for SJM is +5 years dated. Not going to agree or disagree here, but we did have a customer that tried to run 800 jobs concurrently in 2014 which as you might expect, didn't work. What they did do was contact us and we used their models and approach as more of a standard of what SJM/client/worker needs to handle. So, we can now say that customer can successfully solve those jobs concurrently. What were the big changes you might ask?
- Polling to SJM now works as a queue - Previous is was parallel. As you might expect if you have 800 jobs ping SJM at once it probably will crash.
- The control for concurrent downloads is better now - Don't keep this at zero, keep it at 2 - 5.
- SJM is now multi-threaded. Remember those stuck downloads at 100%? Those should be quicker!
- There are other improvements to SJM that 2014 will not take advantage of, but the above will improve 2014.
Just as Doug said, we have high hopes for Flex performance in 2015. We really think we made significant improvements to the back end.
Now, bandwidth! What a tricky subject. Keep this in mind Omkar, when testing your bandwidth you need to be careful. The Amazon server that we use is actually located on the East coast in USA. So, I would do a bandwidth test to a server near Washington DC. That is your effective bandwidth with the 'cloud'. What does a speedtest say when you do this??
@ Doug: I like the positive comments! I have read many of your frustrating posts (we -support, product managers, QA- read them all) so let us know how more than 3 days work out for you moving forward. We showed the comments to the developers and I know they appreciated hearing the positive outcome of the improvements.
Yes you are correct. Bandwidth takes a hit and dips to 21 Mbps/6 Mbps with DC server.
But I would have thought this is still decent for a reasonable download/upload from AWS servers?
Anyway I am still getting used to 360 ecosystem.
I have installed Flex but currently using 2014 CFD 360 for consistency. I believe hence I have the new SJM as you mention.
As I mentioned earlier, I observe that the progress of iterations in thumbnail in SJM is different than Sim 360 (360 lags).
Do you have any tips to sync them up? My thinking was that if I could fetch the latest solution data then I can see the results "on the go" and don't have to depend on the download of all data.
Maybe this info will help:
- The convergence plot in cfd is from cfd data
- Convergence plot for SJM is from a picture taken every 10 iterations or whatever the user has set in notifications
So we send back 3 packets at different intervals:
- Messages -> Progress/Output bar information/etc
- Convergence plot picture
- Actual cfd results as displayed in the graphics window
I understood that, but I was looking for a way to adjust the interval of updating the solution data on local 360 window so that it is at least close to the one that is shown on convergence plot from the cloud. I have seen an instance where 360 window was showing 900 iterations while the convergence thumbnail showed 1600 iterations! However, I observe that the updation of plots on 360 window is spot-on during the initial few hundred iterations, indicating rather prompt transfer. I hope I am not missing any settings.
This is our cluster configuration. Slightly outdated but does help us expedite:
Four nodes, each with i7-2600 (quad-core, 3.4GHz) processor and 16 GB RAM, connected to each other using Infiniband SDR 4X with RDMA, 10 Gbits/s, latency ~5 microseconds. Thus total of 16 cores and 64 GB RAM
And btw, it's Omkar, not Omar
PS: I just finished one simulation with a mesh of over 4 million size, and results saved at every 500 iterations (each res file of ~500 MB). There were 3000 iterations in total (with one other concurrent simulation running). When I stoped it, it took around 14 hours to download all the data and finish the analysis. I am running a similar case without saving the intermittant results at 500 iterations now, will update if I see any improvement.
To my knowledge I don't think there are any options to increase the frequency of the results displayed in the graphic window while it is solving. There are probably infrastructure reasons why an end user cannot increase how often they download results.