TOP-C stands for Task-Oriented Parallel  C Coo96.   The  ``TOP-C
model'' is the specific master slave model implemented here.  That  model
has  been  adapted  for  use  in  ParGAP.  The  implementation  is  in
masslave.g in ParGAP's lib directory. Note that the functions  and
variables with names TOPC... are intended as internal  functions  only,
and should not be used by the GAP programmer.
For the impatient, you may type MSexample();  in  a  ParGAP  session
now. If you prefer further hands-on learning in a tutorial style, you may
wish to next read Chapter MasterSlave Tutorial. Eventually, if you wish
a deeper understanding of the TOP-C model, you will  need  to  read  this
current section and those that follow.
The initial GAP process is the master process, and all others are slave processes. It allows most of the CPU-intensive computations to be carried out on slave processes, which typically reside on remote processors. A well-developed TOP-C application should find that the master process is almost never busy when a slave process is idle, waiting for a new computation to carry out. This provides a natural way of maximizing utilization and load balancing.
The TOP-C model depends on three concepts:
().
The task input is defined to be the argument of the task (considered as a function), and the task output is the return value of the task.
There is only one core TOP-C command, a utility function, and several constants. A TOP-C command must be evaluated on the master and on all slaves. We shall describe the commands in detail in the following sections, but a short list of the essentials and a small example will be helpful to set the context.
MasterSlave( SubmitTaskInput,DoTask[,CheckTaskResult[,UpdateSharedData[,taskAgglomCount]]] ) F
See Section Other TOP-C Commands for a description of MasterSlave.
NOTASK V
NO_ACTION V
UPDATE_ACTION V
REDO_ACTION V
CONTINUATION_ACTION( taskContinuation ) F
CONTINUATION_ACTION() is described in Section The  GOTO  statement  of the TOP-C model.
IsUpToDate() F
ParInstallTOPCGlobalFunction( string, function ) F
ParInstallTOPCGlobalFunction( gvar, function ) F
A short example shows one possible implementation of ParList().
gap> ParInstallTOPCGlobalFunction( "MyParListWithAgglom", > function( list, fnc ) > local result, i; > result := []; i := 0; > MasterSlave( function() if i >= Length(list) then return NOTASK; > else i := i+1; return i; fi; end, > fnc, > function(input,output) result[input] := output; > return NO_ACTION; end, > Error > ); > return result; > end );
(Of course rather than  type  such  code  in  a  ParGAP  session  it's
generally more convenient to have it in a file and Read it in.)
A master-slave computation is invoked when a GAP  program  issues  the
command MasterSlave(). As given earlier, the typical form is:
MasterSlave( SubmitTaskInput, DoTask [, CheckTaskResult[, UpdateSharedData[, taskAgglom]]] )
where the first four arguments of MasterSlave() are also functions, but
they must be defined by the application writer. Their calling  syntax  is
defined by the following GAP code, which also  provides  a  simplified
description of how  a  sequential  (non-parallel)  MasterSlave()  would
invoke these functions if there were  only  a  single  process.  (A  more
sophisticated version of this routine is provided in ParGAP  to  allow
one to debug within a  single  process  first.)  The  use  of  the  fifth
argument, taskAgglom, is deferred  until  section Agglomerating  tasks for efficiency (ParSemiEchelonMat revisited again).
In this section, we define MasterSlave() and describe the  use  of  its
four  arguments  in  a  purely  sequential  environment.  The  issues  of
parallelism and passing of messages between processes is covered  in  the
next section. The call to MasterSlave() in ParGAP, above, will  have
the same result  as  if  MasterSlave()  were  defined  equivalently  to
SeqMasterSlave() below, and then run in a standard,  sequential  GAP
(a  single  process).  The  next  section  describes  the   multi-process
implementation of MasterSlave() in ParGAP, in which  taskInput  is
computed on the master process and sent as a message to a slave  process,
while taskOutput is computed on a slave process and sent as  a  message
to the master process.
SeqMasterSlave :=
  function(SubmitTaskInput, DoTask, CheckTaskResult, UpdateSharedData)
  local taskInput, taskOutput, action;
  while true do
    taskInput := SubmitTaskInput();
    if taskInput = NOTASK then break; fi;
    repeat
      taskOutput := DoTask( taskInput );
      action := CheckTaskResult( taskOutput, taskInput );
    until action <> REDO_ACTION;
    if action = UPDATE_ACTION then
      # Modify the shared data (global data structures) here
      # Called on all processes, master and slaves
      UpdateSharedData( taskOutput, taskInput );
    fi;
  od;
end;
One can also follow the life of a single task in a multi-processing environment through the diagram below.
               MASTER               |               SLAVE
_________________________________________________________________________
                                    |  
      +---------------------+       |
      | GenerateTaskInput() |       |
      +---------------------+       |
                             \input |
                              \______________
                                    |        |
                                    |        v
                                    |       +---------------+
                                    |       | DoTask(input) |
                                    |       +---------------+
                                    |output/  ^
                                 _________/   |
                                |   |         |
                                v   |         |
 +--------------------------------+ |         |
 | CheckTaskResult(input, output) |___________|
 +--------------------------------+ (if action == REDO)
                         |          |
                         | (if action == UPDATE)
                         v          |
                    +---------------------------------+
                    | UpdateSharedData(input, output) |
                    +---------------------------------+
                                    |
                         TOP-C Programmers' Model
                          (Life Cycle of a Task)
 
Although not explicit in the code,  the  application  writer  should  add
comments to define the shared data. The shared data  is  defined  as  a
global  data   structure   that   is   treated   as   ``read-write''   by
UpdateSharedData(),  while  being   treated   as   ``read-only''   by
SubmitTaskInput(), DoTask(), and CheckTaskResult(). Note also
that an application writer may use different names for the four functions
SubmitTaskInput(), etc. It is only a convention within this manual to
give  those  functions  the   names,   above.   Similarly,   taskInput,
taskOutput and action are the conventional names used in this manual,
and a given application may use different names.
In a correct ParGAP application, the shared data should be initialized
to  the  same  value  on  all  processes  before  the  application  calls
MasterSlave(). MasterSlave() is then called on all  processes.  After
that,  the  shared data   can   be   modified   only   by   a   call   to
UpdateSharedData(), and MasterSlave() arranges  for  each  call  to
UpdateSharedData()  to  be  executed  on  all   processes.   Further,
UpdateSharedData() has access only to taskInput, taskOutput,  and
the previous value of the shared data.  Thus,  MasterSlave()  maintains
the same shared data uniformly on all processes.
This section is  concerned  with  formal  definitions  for  the  routines
associated  with  ParGAP.  It  is  important  to  keep  in  mind   the
pseudo-code   of   Chapter Basic   Concepts   for   the   TOP-C    model (MasterSlave). Since MasterSlave() uses all the  ParGAP  processes,
the user must invoke it on all processes. This is typically done  through
some function provided by the slave listener layer, such  as  ParEval()
(see ParEval). It may be instructive for the reader  to  run  ParGAP
and type MSexample();  now,  or  else  to  look  at  some  examples  of
ParGAP  applications  in  the  section MasterSlave  Tutorial.   This
demonstrates the use of MasterSlave() in a typical session.
The   four   functions   written   by   the   application   writer   are:
SubmitTaskInput(),    DoTask(),    CheckTaskResult(),     and
UpdateSharedData().   DoTask()   is   executed   on   a    slave.
SubmitTaskInput()  and  CheckTaskResult()  are  executed  on  the
master, where a taskInput is generated and a corresponding taskOutput
is  received.  Finally,  UpdateSharedData()  is   executed   on   all
processes. ParGAP  arranges  to  automatically  pass  taskInput  and
taskOutput between the master and a slave.
Since the  single  master  process  is  responsible  for  generating  all
taskInputs  and  receiving  all  taskOutputs,  it  is  critical  that
computation on the master process should not become a  bottleneck  for  a
well-designed ParGAP application. Accordingly, the application  writer
should arrange for  SubmitTaskInput()  and  CheckTaskResult()  to
execute  quickly,  even  if  this   means   additional   computation   by
DoTask() or UpdateSharedData().
As seen in the examples, SubmitTaskInput() may use global  variables  on
the  master  to  ``remember''  the  last  taskInput  or   other   state
information. Note that such  global  variables  cannot  be  part  of  the
shared data, since they are modified outside of UpdateSharedData().
It is instructive to review the logic for the  lifetime  of  a  task,  as
described by the pseudo-code for SeqMasterSlave in Section Other TOP-C commands. Initially, MasterSlave() calls SubmitTaskInput() on  the
master, which returns an application-defined GAP object,  taskInput.
MasterSlave() then copies taskInput to an  arbitrary  slave  process,
and MasterSlave() then calls DoTask( taskInput )  on  the  slave.
This returns an application-defined GAP  object,  taskOutput,  which
MasterSlave()  copies  to  the   master   process.   On   the   master,
MasterSlave() then calls CheckTaskResult( taskInput,  taskOutput
), which returns an action. (Recall that taskInput,  taskOutput  and
CheckTaskResult() are defined by the application writer,  and  so  an
application program may give them different names.)
There are four possible  actions  (ParGAP  constants):  NO_ACTION,
UPDATE_ACTION, REDO_ACTION, CONTINUATION_ACTION(  taskContinuation
).   A   standard   language   idiom   in   ParGAP   is   to   define
CheckTaskResult()       as       the        ParGAP        function
DefaultCheckTaskResult(), whose code is as follows:
DefaultCheckTaskResult := function( taskOutput, taskInput ) if taskOutput = false then return NO_ACTION; elif IsUpToDate() then return UPDATE_ACTION; else return REDO_ACTION; fi; end;
In the simplest case, CheckTaskResult() returns NO_ACTION, in which
case there is no further computation related to the original taskInput.
CheckTaskResult()  may  record  global  information  on  the   master
process, based on the taskOutput, but the shared data,  and  hence  the
state of the slave processes, will not be modified.
In  the  second   most   common   case,   CheckTaskResult()   returns
UPDATE_ACTION.   This   action   causes   MasterSlave()    to    call
UpdateSharedData(  taskOutput,  taskInput  )  on  all   processes
(master and slaves). This is the only way in which the shared data  can
be modified by a correct ParGAP program.
In the third most common case, CheckTaskResult() returns REDO_ACTION.
When a REDO_ACTION action is generated, the  value  of  taskInput  is
re-sent to the same slave that executed DoTask( taskInput ) for the
current task. An application will typically invoke REDO_ACTION  if  the
shared data has changed, and this changed shared data will produce a  new
taskOutput. As  before,  DoTask()  then  returns  a  new  value  of
taskOutput. Then, taskInput and the new taskOutput are again passed
to CheckTaskResult().
Note that MasterSlave() guarantees that REDO_ACTION causes  the  task
to be re-sent to the same slave process. This allows the  application  to
cache in a  global  variable  some  information  computed  by  the  first
invocation of DoTask(). A second invocation of DoTask() caused by
the REDO_ACTION allows the task to test if the taskInput is the  same
as  the  last  invocation.  In   that   case,   the   application-defined
DoTask() routine can recognize that this is a REDO_ACTION,  and  it
can take advantage of the cached global variable  to  avoid  re-computing
certain quantities that would not be changed by the altered shared  data.
In order to make this strategy possible, MasterSlave() also  guarantees
that in the case of REDO_ACTION, the slave process will not  have  seen
any intervening calls to DoTask() with values  of  taskInput  other
than the current value.
In typical usage, the application-defined routine, CheckTaskResult(),
will first call IsUpToDate(). IsUpToDate() tests if the  shared  data
has  been  modified  since  the  current  taskInput  corresponding   to
CheckTaskResult() was originally generated by  SubmitTaskInput().
The times of the relevant events are recorded as when seen on the  master
process. It is an error to call  IsUpToDate()  outside  of  a  call  to
CheckTaskResult()  by  MasterSlave().  IsUpToDate()   returns   a
boolean value, true or false.
The last possible action, CONTINUATION_ACTION( taskContinuation ), is
provided for unusual cases. As with advice about the use of ``goto'',  it
is recommended to avoid CONTINUATION_ACTION() where possible.
A favorite aphorism of this author is, ``The source code is the  ultimate
documentation''. With this in mind, the reader  may  also  wish  to  read
lib/masslave.g, for which readability of the code was one of the design
criteria.
At this point, it should be noted that  it  explicitly  is  allowed  to
modify the input or output of a task from  within  CheckTaskResult().
This is  not  recommended  in  general,  but  there  may  be  times  when
CheckTaskResult() returns an UPDATE_ACTION and must also be used to
pass additional information to UpdateSharedData(). In order to modify
a previous input or output, it is  important  that  the  application  has
chosen a representation of the input or output as a list or record, which
can be modified in place, such that the  code  excerpt  succeeds  without
error.
oldOutput := taskOutput; # Modify taskOutput here if ( IsIdenticalObj( oldOutput, taskOutput ) ) = false then Error( "MasterSlave() will see only oldOutput, not current taskOutput" ); fi; return UPDATE_ACTION;
In principle, a dirty trick like this would also work in  the  case  of
returning a REDO_ACTION. However, this is  not  recommended.  For  that
functionality,   the   code   will   be   clearer    if    an    explicit
CONTINUATION_ACTION(   modifiedTaskOutput   )   is   returned.    See
Section The GOTO statement of the TOP-C model for further discussion on
the use of CONTINUATION_ACTION().
CONTINUATION_ACTION( taskContinuation )
The  CONTINUATION_ACTION(),  like  the   goto   statement,   is   not
recommended for ordinary programs,  but  it  may  be  useful  in  unusual
circumstances. This  is  a  parametrized  action.  When  the  application
routine  CheckTaskResult()   returns   this   action,   MasterSlave()
guarantees to invoke DoTask() on the same slave process  as  for  the
original task. There will have been no intervening calls to  DoTask()
on that slave, although there  may  have  been  an  intervening  call  to
UpdateSharedData() on that slave.
This action allows arbitrary, repeated communication between  the  master
and a  single  slave  process.  The  slave  process  executes  DoTask(
taskInput  )  and  communicates  with  the  master  by   returning   a
taskOutput.   The   master   process    executes    CheckTaskResult(
taskInput,  taskOutput  )  and  returns  a  taskContinuation.  The
original  slave  process  then  receives  another  call   to   DoTask(
taskInput ), this time with taskInput bound to taskContinuation.
When you are running a long job on a network of workstations, you will often be sharing it with others. Making your parallel job as unintrusive as possible will leave you with a warmer welcome the next time that you want to use that network of workstations. Accordingly, three useful functions are provided.
UNIX_Nice( priority ) F
This is  similar  to  the  nice  command  of  many  UNIX  shells.  UNIX
priorities are in a range from −20 to 20 with −20 being the  highest.
Users typically start at priority  0.  You  can  give  yourself  a  lower
priority by specifying a priority of 5, for example. Usually,  priorities
19 and 20 are absolute priorities. Any process with a  priority  higher
than 19 that wishes to run will always have precedence. Other  priorities
are relative priorities. Your process will still receive some CPU  time
even if other processes with higher priorities are running. You  can  set
your priority lower, but you cannot raise it back to its  original  value
after that. The return value is the previous priority of your process.
UNIX_Alarm( seconds ) F
This causes the process to kill itself after that many seconds. This is a
useful safety measure, since it is unfortunately too easy for  a  runaway
slave process to continue if the master process  is  killed  without  the
normal quit;. You might consider  adding  something  like  UNIX_Alarm(
25000 ); (about 6 hours) to your .gaprc file. Executing UNIX_Alarm( 0
); cancels any previous alarm. The return value is the number of seconds
remaining under the previous setting of the alarm.
UNIX_LimitRss( size ) [ = setrlimit(RLIMIT_RSS, ...) ] F
Many dialects of UNIX (and their shells)  offer  a  limit  or  ulimit
command to limit the resources  available  to  the  shell.  This  command
limits the size of the RSS (resident set size), or the amount of physical
RAM used by your process. The size limit is in bytes. Unfortunately, some
UNIX dialects may not allow or even silently ignore this request to limit
the RSS. A UNIX command such as top can show you if your process RSS is
staying below your requested limit.
The (tutorial contains a section Raw MasterSlave (ParMultMat revisited),
about raw version of MasterSlave()  that  is  useful  for  converting
legacy sequential code to the TOP-C model. However,  that  model  is  not
recommended for writing new code, for stylistic reasons.
[Up] [Previous] [Next] [Index]
ParGAP manual