AnsweredAssumed Answered

Coordinating data transfer in model with parallel runs

Question asked by pmlwd@leeds.ac.uk on Jan 28, 2016
Latest reply on Feb 8, 2016 by pmlwd@leeds.ac.uk

Hello there

 

To try to make better use of system resources, I transformed into a parallel structure some sequentially structured code that runs many independent optimisations. Unfortunately, this doesn’t run quite as I had expected, and I would be very grateful for any help that can be offered. The general approach I have used is described in the white paper ‘Multiple models and parallel solving with Mosel’, specifically the example ‘2.6.3: Job queue for managing parallel submodels’.

 

The problem seems to be of synchronisation. The master model transfers a series of arrays using the shared memory mmjobs module shmem to the submodel before optimisation and the submodel transfers results arrays to the master model after optimisation. Previously, when run in sequential mode, the master model was held before attempting to access the returned files at a ‘wait’ line while the submodel completed its operations. The use of a simple ‘wait’ function wouldn’t seem to be possible in the parallel structure, due to the risk of crossed wires between parallel runs. Instead, I have tried the ‘waitfor’ function in both the main section of the master model and its procedure. This, I hoped, would allow a procedure-specific signal to be sent from the submodel to indicate that the relevant results are ready. However, I get an error message as follows:

 

Mosel: E-36: Read error (unexpected end of file).

Mosel: E-33: Initialization from file `bin:shmem:fromoperational2' failed for: `X',`Y',`Z'.

 

My assessment of this is that the ‘waitfor’ function is probably not waiting for the message of the relevant class (see code given below), but instead progressing at the receipt of any message. It looks to me that the procedure for the second parallel run is here advancing through the ‘waitfor’ stage on the receipt of a message from the first parallel submodel.

 

I have tried to strip the code to its essentials and present it below. I think I have included the relevant parts, but I apologise if I have missed out anything important. The code runs without a problem when not returning files from the submodel to the master model, which uses in the main section of the master code a ‘wait’ function instead of a ‘waitfor’ function. It also successfully returns data when file transfer is with mempipe, but the code is very much slower and the jobs finish in order; this contrasts with running the model with no return array transfer, when parallel submodels complete with a range of times, and thus complete not in the order in which they were started.

 

Do you have any ideas on how I might go about resolving this problem? One way might be to add another master–submodel layer, by making into a separate model what is currently the procedure, but this might be needlessly complicated. Thanks for reading.

 

-----

Master model

-----

while (JobsRun.size<JobSize) do

     waitfor(0)         ! Previously, this was simply ‘wait’

     Msg:=getnextevent

     if getclass(Msg)=EVENT_END then

          retrieve_id:=getfromid(Msg)

          JobsRun+={jobid(retrieve_id)}

          if JobList<>[] then start_next_job(retrieve_id);end-if

     end-if

end-do

 

-----

Procedure in master model

-----

procedure start_next_job(m:integer)

     jobid(getid(array_subModel(array_id_MODEL(m)))):=getfirst(JobList)

     cuthead(JobList,1)


     initialisations to "bin:shmem:tooperational"+m

          A

          B

          C

     end-initialisations

 

     run(array_subModel(array_id_MODEL(m)),“identifier=”+m)

     waitfor(m+1)                 ! I avoid using class 1, which denotes EVENT_END


     initialisations from "bin:shmem:fromoperational"+m

         X

          Y

          Z

     end-initialisations

 

-----

Submodel run from procedure

-----

initialisations from "bin:shmem:tooperational"+m

     A

     B

     C

end-initialisations

 

minimise(some function)

 

initialisations to "bin:shmem:fromoperational"+m

     X

     Y

     Z

end-initialisations

 

send(m+1,0)      ! The format is as follows: class, value

 

-----     

mempipe alternative: submodel; this would have corresponding changes in the submodel, as detailed above

-----

initialisations from "bin:mempipe:tooperational"+m

     X

     Y

     Z

end-initialisation

Outcomes