Section Trivial Parallelism covers trivial parallelism (the simplest and most common form of parallel application). This is followed in Section Using ParGAP interactively by a description of how to use ParGAP interactively, and Section Streaming illustrates these principles with a short implementation of parallel streaming. Streaming refers to simultaneously running different algorithms for the same problem in different ``streams'' or processes, and accepting the first answer that returns. The remaining processes for the other algorithms are then terminated. ParGAP allows those processes to reside on a single CPU or on different CPUs. Hence, in the case of streaming, ParGAP is potentially useful even if one has only a single computer available, since multiple streams can still reside locally in separate processes.
For more sophisticated (non-trivial) parallel applications, the TOP-C model of parallelism is recommended, and it is described in Section TOP-C model for non-trivial parallelism.
In parallel computation, perhaps 80% of the applications fall under the heading of ``trivial parallelism''. These are situations in which one must compute many unrelated cases. For example, perhaps 200 different cases must be computed and results returned for each case and each case has no interaction with any other one. In the absence of shared data, it would be common to start 20 GAP jobs on 20 distinct workstations in a student computer laboratory. If there are 200 cases, then one writes 20 different ``batch jobs'', each batch job handling 10 distinct cases.
ParGAP  provides   strong   support   for   this   common   situation.
Conceptually, one is always talking to ParGAP on a master process (the
local workstation), and the master process talks to each of various slave
processes. In its simplest form, one merely generates a  list  of  inputs
for the cases, and  writes  a  suitable  function  to  provide  the  case
information. Effectively, trivial  parallelism  can  be  expressed  by  a
parallel version of GAP's List()  function  
(see List (for a list (and a function))  in  the Reference  Manual),  
which  is  called  ParList()  (see ParList)   in
ParGAP.
The following is  an  example  of  a  ParGAP  session  that  uses  the
ParList()   function.   Observe    the    use    of    BroadcastMsg()
(see BroadcastMsg)      to      ensure      that      the      function
AnalyzePrimitivePermGroupsOfOrder() is known to the slaves.
gap> AnalyzePrimitivePermGroupsOfOrder := function(ord)
>      #returns a list of data records of all primitive 
>      #permutation groups of order ord that are represented 
>      #as permutation groups in GAP's primitive group database
>      
>      local PrimitiveGroupsOfOrder, AnalyzePrimitivePermGroup;
>    
>      PrimitiveGroupsOfOrder :=
>        ord -> List( [1..NrPrimitiveGroups(ord)],
>                     i -> PrimitiveGroup(ord,i) );
>    
>      AnalyzePrimitivePermGroup :=
>        grp -> rec( order := Size(grp),
>                    degree := NrMovedPoints(grp),
>                    baseSize := Size(BaseOfGroup(grp)) );
>    
>      return List( Filtered( PrimitiveGroupsOfOrder(ord),
>                             IsPermGroup ),
>                   AnalyzePrimitivePermGroup );
>    end;;
gap> #Define AnalyzePrimitivePermGroupsOfOrder on slaves
gap> BroadcastMsg( 
>      PrintToString("AnalyzePrimitivePermGroupsOfOrder := ",
>                    AnalyzePrimitivePermGroupsOfOrder ) );
gap> ParList( [2..256], AnalyzePrimitivePermGroupsOfOrder );
[ [ rec( order := 2, degree := 2, baseSize := 1 ) ], 
  [ rec( order := 3, degree := 3, baseSize := 1 ), 
  <...many lines of output omitted...>
As one expects, the answer is computed in parallel in a time that decreases linearly with the number of processors. Hence a Beowulf cluster of 16 processors (1 ParGAP master and 15 ParGAP slaves) will return this information approximately 15 times as fast as a single processor, greatly facilitating interactive exploration of the properties of various groups.
If each single case can be executed quickly, then the execution time  for
ParList() can be dominated by the network overhead imposed  by  message
passing between the master and slaves. In this case,  an  optional  third
parameter to ParList() specifies the number of cases to be submitted to
a slave in a single message (modulo how many cases are left, of  course).
As an example, suppose we are interested in finding those  of  the  first
100,000 integers that have many distinct prime factors, but to reduce the
network overhead we wish to batch 1000 cases at a  time  to  the  slaves.
Then one is able to execute the following.
gap> ParList([1..100000], > i -> Length(PrimePowersInt(i))/2, 1000);
Having seen an example where ParGAP can be useful, one next needs to know what is involved in setting up and using ParGAP. In the following, recall that ParGAP employs a master-slave architecture, with the local process being the ``master'' process, and all others being ``slaves''.
ParGAP is installed in the same way as other  GAP  packages.  After
extracting the ParGAP files, one invokes configure and make.  (See
Section Installing ParGAP for the details.) Instead of invoking gap, 
one  then  invokes  the  supplied  shell
script,  pargap.sh,  (possibly  modified  and/or  renamed;  again   see
Section Running ParGAP for the details). Observe that unlike most other
GAP packages, one should not  call  LoadPackage()  function  from
within a GAP session to start ParGAP. As usual,  master  and  slave
processes read a  .gaprc  file  in  the  home  directory,  if  present.
The method by which slave processes are started depends on the MPI implementation used (see Section Running ParGAP). When using a system MPI library, the system MPI launcher should be used to start ParGAP, and this also starts up the slaves. One commonly-supported MPI launcher syntax is
 mpiexec -n num pargap.sh
where num is the total number of copies of pargap.sh that you wish to 
run. If, instead, ParGAP is built to use MPINU, the subset MPI implementation
included with ParGAP, then you should invoke pargap.sh directly,
and ParGAP will start up the slave processes itself. For details of these, 
it looks  for  a  file  named  procgroup  file  in  the  current
directory, but can be directed  to  use  a  different  file  through  the
-p4pg switch. Hence, if one always uses ParGAP
with the slave processes specified in myprocgroup.big, one may wish  to
set up an alias or else a  symbolic  link  to  a  script  containing  the
equivalent of:
pargap.sh -p4pg PATH/myprocgroup.big
Observe also that heterogeneous computing is supported, in that each slave machine may run a different operating system on different hardware.
Within ParGAP, the standard message-passing commands  (send,  receive,
probe,  etc.)  are  available.  The  slaves  are  numbered  consecutively
starting at 1. If no slave  parameter  is  supplied  for  SendMsg()  or
SendRecvMsg(), then the default slave is 1. Most other commands take  a
default  slave  parameter,  MPI_ANY_SOURCE  (meaning  ``ANY_SLAVE'').
PingSlave() sends a ``null message'', and waits for an  acknowledgement
that the slave is alive.
The command SendMsg(string, i); sends the (string) message string
to slave i. Slave i then evaluates string as a GAP command,  and
returns an answer. If a command has no answer, (such as "x:=3;"),  then
the string "no_return_val" is  returned.  If  multiple  commands  are
sent, such as "x:=3; y:=4; x+y;", then only the last  return  value  is
returned. Note that the final semicolon, normally required by GAP,  is
optional in a command string for a message.
Most of the other commands are clear from context. Every SendMsg() to a
slave must be balanced  by  a  RecvMsg(). The exception to this rule
is that, when using MPINU, the  commands FlushAllMsgs() and ParReset() 
are available and may be used to flush any pending  messages
from a slave. A SendRecvMsg(string, i) command is equivalent to the
consecutive commands SendMsg(string,  i)  and  RecvMsg(i).  The
command BroadcastMsg() sends its string argument  to  each  slave,  and
there is no reply message. ParEval() is like BroadcastMsg(),  but  it
also evaluates on the master process. If one is unsure whether there is a
pending message, one can invoke ProbeMsgNonBlocking() or  ProbeMsg();
if no slave parameter is provided, the default is  MPI_ANY_SOURCE.  The
command ParRead() is a parallel version of GAP's  Read()  command;
it is useful for ensuring that the slaves  and  master  share  a  similar
environment. Other examples of commands that are  analogous  to  GAP's
built-in      commands       include:       ParReread,       ParEval,
ParInstallGlobalFunction    and     ParInstallGlobalFunction.     See
Section Slave Listener Commands for more complete descriptions  of  the
above commands.
An illustration of the usage of  the  commands  mentioned  above  with  a
sample ParGAP session, for which a procgroup file  has  defined  two
slaves is the first example given  in  Section Extended  Example.  It's
recommended that you review that example at this point.
Next, we demonstrate a simple implementation of ``streaming'' as an example of how easy it is to use the ParGAP tools to provide new functionality. To the best of this author's knowledge, the term ``streaming'' originated with Charles Leedham-Green at the conference on which this proceedings is based, where he made a general request for streaming functionality to be implemented in some software system.
The idea of streaming is that one may have two algorithms for solving a problem. One would like to to solve the problem simultaneously on two processors, each using a distinct algorithm. Whichever processor finds the solution first should report back to the master process. The master process should then interrupt the other slave, so as to make it available for further computation. (NB Resetting of slaves is only possible with the MPINU library.) This is useful when one has a heuristic available that may finish early with the correct answer, or may continue for a very long time. The heuristic can then be ``streamed'' alongside the standard algorithm, in full confidence that the standard algorithm will provide a reasonable upper bound on the time to compute an answer.
One way to provide ``streaming'' functionality is by  the  implementation
of a function we describe below that is inspired  by  GAP's  First()
function (see First in the GAP Reference Manual).  The  function
First() takes a list and boolean function as arguments, and returns the
first element of the list for which the boolean function returns  true.
In   contrast,   we   define   a   corresponding   ParGAP    function,
ParFirstResult(), which takes  a  list  and  a  (general)  function  as
arguments. Messages are sent to the slaves causing  the  ith  slave  to
evaluate the function on the ith list element. The  value  returned  is
the value obtained from whichever slave  finishes  first.  Note  that  in
consistency with the goal of streaming, the function signals an error  if
asked to evaluate a list with more entries than the number of slaves.
gap> ParFirstResult := function( list, fnc )
>   local i, result;
>   if Length(list) > TOPCnumSlaves then
>     Error("too few slaves");
>   fi;
>   for i in [1..Length(list)] do
>     SendMsg( PrintToString(
>                   "fnc :=", fnc, "; fnc(", list[i], ");"),
>              i );
>   od;
>   result := RecvMsg(); # default is MPI_ANY_SOURCE
>   ParReset(); # Interrupt all other slaves - only if using MPINU
>   return result;
> end;;
We now demonstrate the use of ParFirstResult()  in  ``streaming''.  For
our example we need the FactInt package by Stefan Kohl. To  get
this package if you don't have it, visit
http://www.cip.mathematik.uni-stuttgart.de/~kohlsn/factint.html
or the equivalent at one of GAP's mirror sites, and follow the easy installation instructions.
If one hasn't already included a  LoadPackage("factint");  statement
in one's .gaprc file then it is necessary to do:
gap> ParEval("LoadPackage(\"factint\")");
Loading FactInt 1.5.2 (Routines for Integer Factorization )
by Stefan Kohl, kohl@mathematik.uni-stuttgart.de
true
so that each slave (not just the master) is aware of the FactInt functions.
Above we defined ParFirstResult() on the master process. We will assume
that we have two slaves.
gap> StreamingFactInt := function(i, x) > local alg; > alg := > [ x -> [ "MPQS ALGORITHM", FactorsMPQS(Factorial(x)+1) ], > x -> [ "CFRAC ALGORITHM", FactorsCFRAC(Factorial(x)+1) ] > ]; > return alg[i](x); > end;;
Both StreamingFactInt(1, x); and StreamingFactInt(2,  x);  factor  an
integer, but one uses the multiple polynomial quadratic  sieve  algorithm
(MPQS), and the other uses the continued fraction algorithm (CFRAC);  the
functions FactorsMPQS and FactorsCFRAC that perform these  algorithms
are  defined  by  the  FactInt  package.  We  demonstrate   the
``streaming'' of these algorithms in  determining  the  factorization  of
35!+1.
gap> # Now, define StreamingFactInt() on each of the slaves. gap> BroadcastMsg( PrintToString( "StreamingFactInt :=", > StreamingFactInt ) ); gap> ParFirstResult([1,2], i->StreamingFactInt(i, 35) ); ... resetting ... [ "MPQS ALGORITHM", [ 137, 379, 17839, 340825649, 32731815563800396289317 ] ]
The ParReset() function to reset the other slaves is only available if 
ParGAP is built to use MPINU. If you are using a system MPI library then 
the other slaves will continue with their algorithm, and you will need to 
wait for them to complete and return a result before reusing them. 
There are many examples where trivial parallelism does not suffice. Typically, this happens either when there are global variables that must be ``read'' by each slave process, and possibly updated, or else when the input for the next slave process depends on the result that arrived from the last slave process. Here we provide only the basics of the parallel model. The parallel model was described in more detail in Coo97, and a still more detailed description is contained in Chapter Basic Concepts for the TOP-C model (MasterSlave).
For non-trivial parallelism, ParGAP uses the TOP-C model Coo96. ParGAP typically invokes the TOP-C model via a command such as
MasterSlave( GenerateTaskInput, DoTask, CheckTaskResult,
             UpdateSharedData );
The four arguments, GenerateTaskInput, DoTask, CheckTaskResult, and
UpdateSharedData are ``callback'' functions written by the  application
programmer, and the names of those callback functions are arbitrary. (The
manual for ParGAP/MPI, the  earlier  incarnation  of  ParGAP
used slightly different names for its examples, but the  purpose  of  the
arguments  of  MasterSlave()  has  not  changed  --  only  the   naming
convention has.)
The only ParGAP  command  above  is  MasterSlave().  A  task  is  an
arbitrary  function  (here  called  DoTask())  executed  on  a  slave
process, that takes input from the master process and returns its  output
to the master process. The  diagram  in  Section Other  TOP-C  Commands
gives some idea of the flow of  control  as  a  task  is  processed.  The
diagram there is meant to represent the three main  abstractions  of  the
TOP-C model: (1) the task, (2) the shared data, and (3) the  action.  The
shared data consists of any  globally  shared  data  (which  should  be
readable by all processes). The task is a procedure executed on a slave
process that takes a task input, and the shared data,  and  produces  a
task output, which depends only on the  task  input  and  shared  data.
Finally, the task input and task output are sent to the  master  process,
which must then decide upon an action.
Typical actions are NO_ACTION (and the master process  might  save  the
task output in a private list of results), UPDATE_SHARED_DATA  (send  a
message to all slaves to update the local copy  of  the  globally  shared
data), and REDO (re-do the computation in  case  the  shared  data  was
changed by another slave process).  (Earlier  incarnations  of  ParGAP
used UPDATE. Now the alternative UPDATE_SHARED_DATA is offered and we
standardize on this here.)
An example invocation of MasterSlave() is shown below,  where  we  pass
the four application functions as direct  arguments  of  MasterSlave().
The routine below implements a  simplified  version  of  the  ParList()
function described in Section Trivial Parallelism.
ParInstallTOPCGlobalFunction( "MyParList",
function( list, fnc )
  local result, iter;
  result := [];
  # Each invocation of GAP's iterator
  # returns next element of list.
  iter := Iterator(list);
  MasterSlave(
      # GenerateTaskInput():
      function() if IsDoneIterator(iter) then return NOTASK;
                 else return NextIterator(iter); fi; end,
      # DoTask():
      fnc,
      # CheckTaskResult():
      function(input,output) result[input] := output; 
                             return NO_ACTION; end,
      # UpdateSharedData():
      Error # We never see action:  UPDATE_SHARED_DATA
      );
  return result;
end );
The function ParInstallTOPCGlobalFunction() installs MyParList on all
ParGAP processes. It also defines the version of MyParList() on  the
master differently from on a slave, so that a call,  MyParList([1..100],
x->x^2);,  on  the  master  automatically  causes   MyParList([1..100],
x->x^2); to be invoked on the slave with the  same  arguments.  This  is
required  for  the  TOP-C  model.  Note  that  the   distinct   function,
ParInstallGlobalFunction(), exists for the equivalent of BroadcastMsg(
"InstallGlobalFunction( MyParList, function ... end)");. Both  functions
should be called only from the master (possibly from  inside  a  Read()
command).
The shared data consists only of the variable fnc, which is read by all
processes, but in this case is never ``updated''. Note that one need  not
explicitly declare variables that are in the shared data. A TOP-C  shared
data variable is defined as a variable whose value is read by  more  than
one process, and which is modified only through a call to the application
routine UpdateSharedData(). If we would like to see the messages,  as
they are  passed  back  and  forth  between  master  and  slave,  we  can
optionally set the variable  ParTrace  (see ParTrace).  We  are  then
ready to execute our new parallel function.
gap> ParTrace := true;; gap> MyParList([2..256], AnalyzePrimitivePermGroupsOfOrder);
The example above in fact requires only trivial  parallelism.  Hence  the
CheckTaskResult() parameter to MasterSlave() is especially  simple.
Since the action is never REDO,  we  achieve  the  maximum  concurrency
among slave processes, resulting in a  speedup  that  is  linear  in  the
number of slaves.
In  general,  the  parallel  efficiency  of  an  application,  with   its
concommitant decisions about concurrency of  slave  tasks  are  typically
determined by the actions chosen  by  CheckTaskResult().  The  sample
below is a standard ParGAP ``idiom'' that allows one to easily set  up
reasonable concurrency in a parallel application.
CheckTaskResult := function( taskInput, taskOutput ) if taskOutput = fail then return NO_ACTION; elif not IsUpToDate() then return REDO_ACTION; else return UPDATE_ACTION; fi; end;
The function IsUpToDate() is a boolean function provided  by  ParGAP
that returns true if there  has  been  no  UPDATE_SHARED_DATA  action
since the time that the ``corresponding'' task input was generated  by  a
call to the user-provided  function  GenerateTaskInput().  Otherwise,
IsUpToDate() returns false. The ``corresponding'' task input  is  the
task input associated with that task output which was most recently  seen
on the master process. In  the  context  of  the  user-provided  function
CheckTaskResult(), it is the first argument to that function.
[Up] [Previous] [Next] [Index]
ParGAP manual