Discussion Groups

Simulation CFD

Reply
*Expert Elite*
OmkarJ
Posts: 422
Registered: ‎10-02-2012

Cluster communication for SimCFD 2014

287 Views, 18 Replies
05-22-2013 05:26 AM

I have faced lot of issues with cluster integration of SimCFD 2013. I was wondering if 2014 brings any improvement in this area. I perused through the "What's new" section in wiki but I couldn't find any mention about it there. 

 

OJ

Please use plain text.
Product Support
apolo.vanderberg
Posts: 296
Registered: ‎08-31-2011

Re: Cluster communication for SimCFD 2014

05-29-2013 09:45 AM in reply to: OmkarJ

Omkar,

   Are you referring to the setup procedure for Clusters, or use of the cluster as a remote solver?

 

While we did not higlight anything in the Whats New, the setup process should be a little easier for 2014 than it was for 2013.

 

As well, overall with the evolution of CFD 360 for 2014 there were also some back end improvements for communications on the desktop product when dealign with remote solvers (cluster or not).

 

 

Please use plain text.
*Expert Elite*
OmkarJ
Posts: 422
Registered: ‎10-02-2012

Re: Cluster communication for SimCFD 2014

05-30-2013 01:34 AM in reply to: apolo.vanderberg

I use Cluster in our office as a solver (remotely), while the simulation files are on my local machine. I am referring to the following specific expectations:

 

  1. If I run a queue of say 5 scenarios, from either same cfdst or more than one, the Solver should proceed solving all sequentially and then update the local folders of cfdst scenarios. I have seen that the Solver has run the simulation, but local files are not updated. When I navigate to HPCAnalyze folder in cluster, I see that the folder of scenario in the jobxxx folder has res files of latest iterations, but they are not updated in the local folders in my machine.
  2. Subsequently, when I open the scenario in SimCFD, I should see that not only the results of last iteration are mapped to the mesh, but also I should have history or result in the form of plots. I have seen sometimes,  that though the results are mapped to the mesh, the plots are not present.

I believe these are reasonable expectations.

 

Regards

OJ

Please use plain text.
Product Support
apolo.vanderberg
Posts: 296
Registered: ‎08-31-2011

Re: Cluster communication for SimCFD 2014

05-30-2013 06:06 AM in reply to: OmkarJ

Omkar,

 

I can agree with your sentiment as most would expect that if you remotely solve that you should be able to retrieve the results. 2014 has had some work done on communication between the Interface and the Solver, I would be curious to see if you experience similar with the latest version.

 

I'd like to dig in a little further with some of the specifics on this:

 

1) Where are the files on your local machine? Stored on a C: or a mapped / network drive?

2) When you submit the first analysis, when do you switch to the next to submit (what stage is the analysis at)? Or are you using the Solver Manager to submit all the jobs at once?

3)During the run, all files will be stored on the remote machine, nothing gets copied back until the end of the analysis.

4) How many intermediate results are you saving? Do you know the total size of the Jobxxx folder on the remote machine as this is the amount of data that would have to get copied back.

5) When you do open the analysis again, what happens, does it begin to load the data and then stop, whats the progression?

6) is tehre anything else you're doing with the local machine that you are doing while the run is happening remotely (like shuitting down the local machine)?

7) If you do not close the interface do the jobs more consistently come back?

8) How frequently do you see this issue (roughly 1 in 5 jobs? 1 in 10? )?

 

*If you do see this frequently, is there a common/repeatable set of steps that you can do to replicate this?

While I have had the occasional job not come back, it has not been frequent enough or repeatable enough such that I could work with our development team to point it out.

Please use plain text.
*Expert Elite*
OmkarJ
Posts: 422
Registered: ‎10-02-2012

Re: Cluster communication for SimCFD 2014

05-30-2013 06:56 AM in reply to: apolo.vanderberg

To be honest, I have seen the same sentiment in many posts and SimCFD is believed to be having chronic problems with cluster communication. 

 

Here are the clarificatiosn to your questions. 

 

1) Where are the files on your local machine? Stored on a C: or a mapped / network drive?

Typically, D: drive. However, after lodging a case on this, as per suggestion by Autodesk experts, we started storing it on mapped Cluster drive and started opening the SimCFD of cluster using Remote Desktop. This has been a bit more successful, however, it is inconvinient, because this way only one user can work with Cluster through remote desktop. I would prefer to use Cluster as remote solver, as it is meant to be!

 

2) When you submit the first analysis, when do you switch to the next to submit (what stage is the analysis at)? Or are you using the Solver Manager to submit all the jobs at once?

This depends on the pipeline of the jobs I am working with. The ones that are finished with meshing are set for solving using "Solve". Sometimes when we need licences for meshing, we stop the cluster jobs for the day, and at the end of the day they are again submitted through indivldual scenarios using "Solve"I rarely use solver manager. 

 

3)During the run, all files will be stored on the remote machine, nothing gets copied back until the end of the analysis.

This is straight forward .

 

4) How many intermediate results are you saving? Do you know the total size of the Jobxxx folder on the remote machine as this is the amount of data that would have to get copied back.

Typically after every 500 iterations. How would it be beneficial to know the size of jobxxx folder? I would expect the data it has should be copied to local folders. 

 

5) When you do open the analysis again, what happens, does it begin to load the data and then stop, whats the progression?

Either of this:

  • The mouse pointer goes to "busy" mode, and after few seconds I see that neither the plots are updated nor the results mapped on mesh
  • Sometimes the results are mapped but I don't see the plots so can't judge the covnergence. 

6) is tehre anything else you're doing with the local machine that you are doing while the run is happening remotely (like shuitting down the local machine)?

No, we don't shut down the local machine. But we can't also keep all the cfdst files open because of limited licences for interface so we close the cfdsts if there are too many. Also, in single design study with multiplpe scenarios running, only one can be open anyway. During the day, of course the local machine is being used for other purposes. 

 

7) If you do not close the interface do the jobs more consistently come back?

It is rare that the scenario that is open and running will have problem But even if the interface is open, only one scenario can be open, so rest of the scenarios would ideally be having the same problems as that of closed cfdsts. 

 

8) How frequently do you see this issue (roughly 1 in 5 jobs? 1 in 10? )?

It is as unpredictable as rain in UK! But quantitatively, 20-30% sounds about right.

 

*If you do see this frequently, is there a common/repeatable set of steps that you can do to replicate this?

I haven't seen any pattern or causality in this and hence, it is difficult to say if it can be definitively replicated. The best bet can be to generate a queue and wait for it to happen. 

 

OJ

Please use plain text.
Product Support
apolo.vanderberg
Posts: 296
Registered: ‎08-31-2011

Re: Cluster communication for SimCFD 2014

05-30-2013 07:50 AM in reply to: OmkarJ

Omkar,

 

Thank you for those answers

Let me be a bit more specific on some of these questions:

 

If you are logging on to the cluster to run locally on the cluster, we want the files on the cluster's headnode

If you are using the cluster as a remote solver we want the files on the local harddrive of the local machine not on the cluster.

Is D: a network path?

 

When sending a job from local machine to Cluster, have you been storing your files on the local machine or the cluster?

 

 

If you are using the cluster as a remote solver we send files to the cluster, the analysis will solve locally in the HPCanalyze\JobXXX folder and then when done gets copied back to the local machine.

 

If someone has to mesh you stop all jobs on the Cluster?

How are you doing this?

Can the user not mesh locally vs sending to the cluster?

 

How many Jobs are typically queued up at any given time?

 

Please use plain text.
*Expert Elite*
OmkarJ
Posts: 422
Registered: ‎10-02-2012

Re: Cluster communication for SimCFD 2014

05-30-2013 08:18 AM in reply to: apolo.vanderberg

Thanks for the interest. The clarifications are:

 

If you are logging on to the cluster to run locally on the cluster, we want the files on the cluster's headnode

Yes, when we log on to the cluster's headnode, to run using Cluster as remote solver, we keep the cfdst files on a shared location on Cluster's headnode, that is accessible from everywhere. Also, we open the cfdst from this shared network location, not from local location, when we do this. 

 

If you are using the cluster as a remote solver we want the files on the local harddrive of the local machine not on the cluster.

 

Yes, when we submit to cluster as a remote solver through local machine, we have files stored on local hard drive, and we open the files directly from the local hard drive location.

 

Is D: a network path?

D: is local path, that is not shared.

 

If you are using the cluster as a remote solver we send files to the cluster, the analysis will solve locally in the HPCanalyze\JobXXX folder and then when done gets copied back to the local machine.

Yes, it is straight forward. But the problem lies in its incosistency and hence this thread.

 

If someone has to mesh you stop all jobs on the Cluster?

Since we have only two Solver licences, if two engineers want a licence for meshing, we can't run the cluster jobs - since mesher also requires the Solver licence. Hence we have to stop jobs on cluster.

 

Can the user not mesh locally vs sending to the cluster?

Yes, we mesh locally, using MyComputer as Solver. The cluster jobs are stopped only to free up a licence, not to use cluster as a solver for meshing.

 

How many Jobs are typically queued up at any given time?

Typical values are:

Minimum: 3

Maximum: 8

 

OJ

Please use plain text.
Product Support
apolo.vanderberg
Posts: 296
Registered: ‎08-31-2011

Re: Cluster communication for SimCFD 2014

06-03-2013 11:40 AM in reply to: OmkarJ

Omkar,

  A few more questions with some of this.

 

Do you see these issues more when you have to stop the cluster so that others can mesh?


     If so, is there any reason you do not laeve the cluster runnign with its 1 license and let the 2 engineers take turns meshing (as meshing shouldnt take that long and would be less troublesome than stopping the whole queue and then restarting it).

 

It might be useful to keep a mental note of the typical sizes of the JobXXX folders, so taht way we can see if there is a threashold where this appears (does it happen more often as the folder size increases? Is there a specific size that starts being problematic? )

 

 

Please use plain text.
*Expert Elite*
OmkarJ
Posts: 422
Registered: ‎10-02-2012

Re: Cluster communication for SimCFD 2014

06-04-2013 01:40 AM in reply to: apolo.vanderberg

Thanks, here are the clarifications

 

Do you see these issues more when you have to stop the cluster so that others can mesh?

No. We typically don't observe this while we manually stop. The problem is in cluster communication and coordination when it is operating on its own, ie, updating the folders with results after simulation is complete etc. I only raised this issue to 

 

If so, is there any reason you do not laeve the cluster runnign with its 1 license and let the 2 engineers take turns meshing (as meshing shouldnt take that long and would be less troublesome than stopping the whole queue and then restarting it).

The nature of CFD work dictates that majority of time of CFD engineer is spent in geometry cleanup, meshing and model setup. Since we use parameetric models for geometry and templates for model setup, these are relatively quick. So meshing is what occupies most of the time of an engineer. It is not possible that engineers sit idle in turns just to keep queue unaltered. Infact, I have a strong objection to the fact that meshing occuppies a solver licence, considering that meshing and solving NS equations are two completely exclusive processes. I do not observe this trend in any other software, as all come with exclusive licence for meshing and solving. I am trying to find a right platform to communicate this to Autodesk. I am sorry if I sound rude but is it me... or is it true that Autodesk has employed this unfair and unjust tactic, even if it comes from BRN in legacy? 

 

It might be useful to keep a mental note of the typical sizes of the JobXXX folders, so taht way we can see if there is a threashold where this appears (does it happen more often as the folder size increases? Is there a specific size that starts being problematic? )

I have observed that small and big meshes behave as random as each other alike.

 

OJ

Please use plain text.
Product Support
apolo.vanderberg
Posts: 296
Registered: ‎08-31-2011

Re: Cluster communication for SimCFD 2014

06-04-2013 05:58 AM in reply to: OmkarJ

Omkar,

 The fact taht meshing takes a solver license has been part of CFdesign for many years.
This isn't something that Autodesk employed when we were acquired.

 

If this is something you'd like to see changed, I would recommend you logging an Enhancement Request on IdeaStation (forum thread sticky post has the link to this), as this will allow you to log what you'd like and allow for other users to vote on existing ideas to help promote their priority.

 

So from what you've stated the bulk of the issues comes from when you do not touch any of the jobs and they finish on their own?

Do some of those jobs sit in a Finished state for a while before the analysis is opened and the data is then copied back to the local machine?

 

 

Please use plain text.