The ParGAP (Parallel GAP) package provides a way of writing parallel programs using the GAP language. Former names of the package were ParGAP/MPI and GAP/MPI; the word MPI refers to Message Passing Interface, a well-known standard for parallelism. ParGAP is based on the MPI standard, and this distribution includes a subset implementation of MPI, to provide a portable layer with a high level interface to BSD sockets. Since knowledge of MPI is not required for use of this software, we now refer to the package as simply ParGAP. For more information visit the author's ParGAP home page at: http://www.ccs.neu.edu/home/gene/pargap.html
For some background reading, see Coo95 and Coo97.
This first chapter is intended to help a new user set up ParGAP and run through some quick examples: see
LoadPackage); and
The later chapters present detailed explanations  of  the  facilities  of
ParGAP. Because parallel programming is  sufficiently  different  from
sequential programming, this author  recommends  printing  out  at  least
Chapters 1 through MasterSlave Tutorial,  and  skimming  through  those
chapters for areas of interest, before returning to the terminal  to  try
out   some   of   the   ideas.   This   document   can   be   found    in
.../pkg/pargap/doc/manual.dvi of the  software  distribution.  You  may
also want to print the index at the end of manual.dvi.  In  particular,
the heading example in the index, or ??example  from  within  GAP,
should be useful. If you prefer postscript, the UNIX command dvips will
convert that file to postscript form.
The development of ParGAP was partially supported by National Science Foundation grants CCR-9509783 and CCR-9732330.
ParGAP is installed on top of an existing GAP installation. It comes with its own subset MPI implementation (currently functional only on UNIX installations), or it can use your system MPI libraries, if present. See Section Installing ParGAP for instructions on installation of ParGAP. At the time that ParGAP is invoked, a special file or command line parameter must be used to tell ParGAP how many local processes or which remote machines to use for slave processors. See section Running ParGAP for instructions on invoking ParGAP. If there are questions or bugs concerning ParGAP, please write to: gene@ccs.neu.edu
If one wishes only to try out the parallel features, the first five pages of this manual (through the section on the slave listener) will suffice for installation, and using it. For the more advanced user who wishes to design new parallel algorithms or port old sequential code to a parallel environment, it is strongly recommended to also read the sections following on from Section Basic Concepts for the TOP-C model (MasterSlave).
ParGAP should be invoked via the script bin/pargap.sh created by the
installation process which invokes GAP_ROOT_DIR/bin/ARCH/pargapmpi,
where ARCH depends on your system but is the same  directory  in  which
the gap binary is  found.  MPI  and  the  higher  layers  will  not  be
available if the binary is invoked in the standard way as gap. This  is
a feature, since a single binary and source distribution serves both  for
the standard GAP and for ParGAP.
ParGAP is implemented in three layers: 1) MPI, 2) Slave Listener, and 3) Master Slave (TOP-C abstraction). Most users will find that the two highest layers (Slave Listener and Master Slave) meet all their needs.
1) MPI:Error break in the presence of
    errors. MPI_Init() (see MPI_Init) and MPI_Finalize()
    (see MPI_Finalize) are invoked automatically by ParGAP.
MPI_tabMPI_Send; ) will cause it to display the  calling
    syntax. The same information is displayed after  an  incorrect  call.
    The  return  value  is  typically  obvious.  MPI  is  implemented  in
    src/pargap.c. ParGAP will use a sysem MPI implementation if one is
    present, and the distribution also includes two versions of a simple, subset
    implementation of MPI in pkg/gapmpi/mpinu/ and pkg/gapmpi/mpinu2/, 
    which is implemented on top of a standard sockets interface, which can be 
    used instead..
2) Slave Listener:
*Msg*
    e.g.  SendMsg()   (see SendMsg),   RecvMsg()   (see RecvMsg),
    ProbeMsg()   (see ProbeMsg).   Since   the   slave   is   in    a
    receive-eval-send loop, every SendMsg(cmd) on the master must  be
    balanced by a later RecvMsg(). SendRecvMsg()  (see SendRecvMsg)
    is provided to combine these steps. A few parallel utilities are also
    included, such as ParRead() (ParRead),  ParList()  (ParList),
    ParEval() (ParEval), etc.
SendMsg() or ParEval() would be  evaluated
    locally before being  sent  across  the  network.  For  this  reason,
    arguments can also be given as strings,  to  delay  evaluation  until
    reaching the destination process. Hence, real strings must be quoted:
    ParEval("x:="abc";"); Additionally, multiple commands are  valid,
    and the final ``;'' of the string is optional. So, one can write:
BroadcastMsg("x:=\"abc\"; Print(Length(x), \"\\n\")");;
3) Master Slave:
1)
2)
If you are using Linux and wish to try out ParGAP quickly, you can skip this section and let the ParGAP build process choose an MPI library for you. If you have a little more time, or are running on a different system, please read on.
ParGAP uses MPI, a standard Message Passing Interface for communicating between processes. Since the details of inter-process communication are system-specific, ParGAP relies on an external library to provide its MPI functions. A implementation of a sufficient subset of MPI, which runs on Linux and OS X, is included with ParGAP. Alternatively, an MPI library can be installed on your system before building ParGAP. Two popular MPI implementations are:
The MPINU library included with ParGAP provides the MPI functionality that
ParGAP needs by using Unix sockets. This implementation is sufficient for
basic ParGAP usage, but does not scale to larger systems as well as the 
alternative system libraries. It is better-suited to interative ParGAP 
sessions, since system MPI implementations can result in problems with line 
editing in ParGAP. When built with MPINU, ParGAP also enables two 
commands ParReset() and FlushAllMsgs() which can be useful when developing 
parallel programs. See 
Section Problems Running ParGAP with a System MPI Implementation for details
of these known issues with system MPI implementations. Two versions of MPINU
are included with ParGAP: the original MPINU and a newer version, called 
MPINU2. 
On Linux machines, we recommend that you use ParGAP with a system MPI implementation instead of MPINU, if possible. These implementations provide better performance and fault tolerance, and are compatible with a wider range of operating systems and hardware, including high speed networks and proprietory high-end computing systems.
On Macs, we recommend using the original MPINU since there are currently some problems running ParGAP with both a system MPI implementation and MPINU2. Both these issues will hopefully be resolved in a future release.
By default, the ParGAP build process (see Section Installing ParGAP) tries to use a system MPI implementation if it can find one. If not, it will use MPINU. Two versions of MPINU are included with this release of ParGAP. The recommended choice is MPINU2, but the original MPINU is included as a backup in case there are problems building or running MPINU2.
Installing ParGAP should be relatively simple. However, since there are many interactions both with the GAP kernel and with the UNIX operating system, in a minority of cases, manual intervention will be necessary. If you are part of this minority, please see the section Problems Installing or Invoking ParGAP. The most common problem is the local security policy; ParGAP is more pleasant to use when you don't have to manually provide the password for each slave. See section Problems with Passwords (Getting Around Security) for suggestions in this respect.
To install the ParGAP package, move  the  file  pargap-XXX.zoo  or
pargap-XXX.tar.gz (for some version number XXX of  ParGAP)  into
the pkg directory in which you plan to install ParGAP. Usually, this
will be the directory pkg in the hierarchy of your version of  GAP
(in fact, currently it is  not  possible  to  have  the  pkg  directory
separate from GAP's pkg directory; we hope to remedy this in  future
versions of ParGAP so that it will also possible to keep an additional
pkg directory in your private directories; section Installing a GAP Package 
of the GAP reference manual gives details on how to do  this,
when it's possible.)
Now change into  the  pkg  directory  in  which  you  plan  to  install
ParGAP. If you got a .zoo file, unpack it with:
unzoo -x pargap-XXX
If you got a .tar.gz file and  your  tar  command  supports  the  z
option, unpack it with:
tar zxf pargap-XXX.tar.gz
or otherwise unpack in two steps with:
gunzip pargap-XXX.tar
tar xvf pargap-XXX.tar
Whether you got the .zoo or .tar.gz archive you should now have a new
directory pargap. As for a generic GAP package, do:
cd pargap ./configure make
This builds the ParGAP files. ParGAP also needs to rebuild parts of 
GAP to enable the MPI hooks. It may also need to re-run the GAP 
configure if you have a dedicated MPI compiler. By default, the ParGAP
configure will prompt you to do this by hand if necessary, and then to 
restart the ParGAP build. If you are happy for the ParGAP build process
to run the GAP configure for you if needed, with no arguments, then run 
ParGAP's configure with
./configure --with-basic-gap-configure
The configure script will attempt to find a system MPI implementation that
it can use. If if not then it will use MPINU2, the more recent of the two 
MPINU subset implementations included with the ParGAP package. You can use 
the --with-mpi= configure option to specify a different behaviour, and you 
can also set your own MPI compiler and options if you wish. See the help text 
provided by  ./configure -h for full details.
After doing the configure and make steps of ParGAP's  installation
process (see Section Installing ParGAP), you should find in ParGAP's
bin subdirectory a script
pargap.sh
which you should use to start ParGAP. (ParGAP can not be  started
by starting GAP 4 in the usual way, and using LoadPackage;  doing
so will result in Info-ed  advice  to  read  this  section.)  Edit  the
pargap.sh script if necessary, copy it to a standard path and rename it
according to how you intend to call ParGAP (e.g. rename it: pargap).
Note:
The script  pargap.sh  defines  the  program  that  runs  ParGAP  as
pargapmpi. In fact, after installation pargapmpi is a  symbolic  link
to the GAP binary named gap. The same binary runs  both  GAP  and
ParGAP; when the binary is invoked as gap GAP runs in  the  usual
way without any parallel features; only when the  binary  is  invoked  as
pargapmpi    are    the    parallel    features    incorporated.    See
Section Modifying the GAP kernel for more details.
Your ParGAP should now be ready to use.  Now  read  the  next  section
which decribes how to  run  ParGAP  (if  you  are  reading  this  from
GAP's on-line help, type: ?>).
After a successful build, you will see a message saying that ParGAP is
ready to use, and confirmation of whether a system MPI library or MPINU will 
be used. The method of running ParGAP depends on this MPI choice, and the 
MPI library is auto-detected, or can be specified, in configure, as 
described in Section Installing ParGAP. The pros and cons of the two 
different library variants are discussed in Section Choosing an MPI Library.
We will assume that you have copied the pargap.sh script to a location
on your search path and renamed it as pargap, as suggested in 
Section Installing ParGAP.
If you are using a system MPI library:
ParGAP should be started using an MPI launcher script. The name and syntax
of the command to start MPI processes can vary, and you should check your 
system MPI documentation for details. However, one common launcher is 
mpiexec, and the following command should work with both Open MPI and MPICH,
and most other MPI-2 implementations:
mpiexec -n 3 pargap
This will start three copies of the ParGAP: one master and two slaves. These processes will all run on your local machine. See Section Invoking ParGAP with Remote Slaves (when using a system MPI library) for how to configure and run processes on remote slaves.
If you are using MPINU:
In ParGAP's bin subdirectory you should find a procgroup file which
defines the master and slave processes that will be used  by  ParGAP.
When ParGAP is started, the MPINU library looks for a file called procgroup  
in the current directory, unless the -p4pg option is used. Thus if you renamed
your shell script pargap, the following  are  valid  ways  of  starting
ParGAP:
pargap
(if current directory contains the file: procgroup), or
pargap -p4pg myprocgroupfile
(where myprocgroupfile is the complete path of your  procgroup  file --
there is no restriction on how you name it). The default procgroup file
defines one master and two slaves on the local machine. For instructions of 
how to run remote slaves, see 
Section Invoking ParGAP with Remote Slaves (when using MPINU).
If you had trouble installing or starting ParGAP, see the 
section Problems Installing or Invoking ParGAP. Otherwise you are ready 
to test your installation, Try the example in the following section (if you 
are reading this from  GAP's on-line help, type: ?>).
After  installation,  try  it  out.  Invoke  ParGAP  as  described  in
Section Running ParGAP and try the example below (but  substitute  your
own program where you see "/home/gene/myprogram.g").  The  commands  in
this first example are also found in the README file. So, you may  wish
to copy text from the README file and paste it into a ParGAP session.
If you have not specified any additional machines to the MPI launcher, or you
are using the unmodified procgroup file, then your remote slaves
will be other processes on your local machine. It is a good idea  to  run
only on your local machine for your first experiments and while  you  are
debugging parallel programs. When  you  wish  to  experiment  with  using
remote machines, you can then proceed to 
section Invoking ParGAP with Remote Slaves (when using a system MPI library) 
or section Invoking ParGAP with Remote Slaves (when using MPINU) depending
on which MPI library ParGAP has been built to use.
gap> # This assumes your procgroup file includes two slave processes.
gap> PingSlave(1); #a `true' response indicates Slave 1 is alive
true
gap> # Print() on slave appears on standard output 
gap> # i.e. after the master's prompt.
gap> SendMsg( "Print(3+4)" );
gap> 7
gap> # A <return> was input above to get a fresh prompt.
gap> #
gap> # To get special characters (including newline: `\n')
gap> # into a string, escape them with a `\'.
gap> SendMsg( "Print(3+4,\"\\n\")" );
gap> 7
gap> # Again, a <return> was input above after the 7 and new-line
gap> # were printed to get a fresh prompt.
gap> #
gap> # Each SendMsg() is normally balanced by a RecvMsg().
gap> SendMsg( "3+4", 2);
gap> RecvMsg( 2 );
7
gap> # The following is equivalent to the two previous commands.
gap> SendRecvMsg( "3+4", 2);
7
gap> # The two SendMsg() commands that were sent to Slave 1 earlier have
gap> # responses that are waiting in the message queue from that slave.
gap> # Check that there is a message waiting. With some MPI implementations
gap> # the message is not immediately available, but when ProbeMsg() does
gap> # return true then RecvMsg() is guaranteed to succeed. 
gap> ProbeMsgNonBlocking( 1 );
false
gap> ProbeMsgNonBlocking( 1 );
true
gap> # Print() is a `no-value' functions, and so the result of a RecvMsg() 
gap> # in both these cases is "<no_return_val>".
gap> RecvMsg( 1 );
"<no_return_val>"
gap> RecvMsg( 1 );
"<no_return_val>"
gap> # As with Print() the result of Exec() appears on standard
gap> # output, and the result is "<no_return_val>".
gap> SendRecvMsg( "Exec(\"pwd\")" ); # Your pwd will differ :-)
/home/gene
"<no_return_val>"
gap> # Define a variable on a slave
gap> SendRecvMsg( "a:=45; 3+4", 1 );
7
gap> # Note "a" is defined on slave 1, not slave 2.
gap> SendMsg( "a", 2 ); # Slave prints error, output on master
gap>  Variable: 'a' must have a value
gap> # <return> entered to get fresh prompt.
gap> RecvMsg( 2 ); # No value for last SendMsg() command
"<no_return_val>"
gap> RecvMsg( 1 );
45
gap> # Execute analogue of GAP's List() in parallel on slaves.
gap> squares := ParList( [1..100], x->x^2 );
[ 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 225, 256, 
  289, 324, 361, 400, 441, 484, 529, 576, 625, 676, 729, 784, 841, 
  900, 961, 1024, 1089, 1156, 1225, 1296, 1369, 1444, 1521, 1600, 
  1681, 1764, 1849, 1936, 2025, 2116, 2209, 2304, 2401, 2500, 2601, 
  2704, 2809, 2916, 3025, 3136, 3249, 3364, 3481, 3600, 3721, 3844, 
  3969, 4096, 4225, 4356, 4489, 4624, 4761, 4900, 5041, 5184, 5329, 
  5476, 5625, 5776, 5929, 6084, 6241, 6400, 6561, 6724, 6889, 7056, 
  7225, 7396, 7569, 7744, 7921, 8100, 8281, 8464, 8649, 8836, 9025, 
  9216, 9409, 9604, 9801, 10000 ]
gap> # Send a large, local (non-remote) data structure to a slave
gap> Concatenation("x := ", PrintToString([1..10]*2));
"x := [ 2, 4, 6, 8, 10, 12, 14, 16, 18, 20 ]\n\000"
gap> SendMsg( Concatenation("x := ", PrintToString([1..10]*2)) ); 
gap> RecvMsg();
[ 2, 4, 6, 8, 10, 12, 14, 16, 18, 20 ]
gap> # Send a local (non-remote) function to a slave
gap> myfnc := function() return 42; end;;
gap> # Use PrintToString() to define myfnc on all slave processes
gap> BroadcastMsg( PrintToString( "myfnc := ", myfnc ) );
gap> SendRecvMsg( "myfnc()", 1 );
42
gap> # Ensure problem shared data is read into master and slaves.
gap> # Try one of your GAP program files instead.
gap> ParRead( "/home/gene/myprogram.g");
Now that you have done a fairly rudimentary test of ParGAP you should be ready to do something a little bit more interesting:
gap> ParInstallTOPCGlobalFunction( "MyParList", > function( list, fnc ) > local result, iter; > result := []; > iter := Iterator(list); > MasterSlave( function() if IsDoneIterator(iter) then return NOTASK; > else return NextIterator(iter); fi; end, > fnc, > function(input,output) result[input] := output; > return NO_ACTION; end, > Error > ); > return result; > end ); gap> MyParList( [1..25], x->x^3 ); master -> 1: 1 master -> 2: 2 2 -> master: 8 1 -> master: 1 master -> 1: 3 master -> 2: 4 2 -> master: 64 1 -> master: 27 master -> 1: 5 master -> 2: 6 2 -> master: 216 1 -> master: 125 master -> 1: 7 master -> 2: 8 2 -> master: 512 1 -> master: 343 master -> 1: 9 master -> 2: 10 2 -> master: 1000 1 -> master: 729 master -> 1: 11 master -> 2: 12 2 -> master: 1728 1 -> master: 1331 master -> 1: 13 master -> 2: 14 2 -> master: 2744 1 -> master: 2197 master -> 1: 15 master -> 2: 16 2 -> master: 4096 1 -> master: 3375 master -> 1: 17 master -> 2: 18 2 -> master: 5832 1 -> master: 4913 master -> 1: 19 master -> 2: 20 2 -> master: 8000 1 -> master: 6859 master -> 1: 21 master -> 2: 22 2 -> master: 10648 1 -> master: 9261 master -> 1: 23 master -> 2: 24 2 -> master: 13824 1 -> master: 12167 master -> 1: 25 1 -> master: 15625 [ 1, 8, 27, 64, 125, 216, 343, 512, 729, 1000, 1331, 1728, 2197, 2744, 3375, 4096, 4913, 5832, 6859, 8000, 9261, 10648, 12167, 13824, 15625 ] gap> ParInstallTOPCGlobalFunction( "MyParListWithAglom", > function( list, fnc, aglomCount ) > local result, iter; > result := []; > iter := Iterator(list); > MasterSlave( function() if IsDoneIterator(iter) then return NOTASK; > else return NextIterator(iter); fi; end, > fnc, > function(input,output) > local i; > for i in [1..Length(input)] do > result[input[i]] := output[i]; > od; > return NO_ACTION; > end, > Error, # Never called, can specify anything > aglomCount > ); > return result; > end ); gap> MyParListWithAglom( [1..25], x->x^3, 4 ); master -> 1: (AGGLOM_TASK): [ 1, 2, 3, 4 ] master -> 2: (AGGLOM_TASK): [ 5, 6, 7, 8 ] 1 -> master: [ 1, 8, 27, 64 ] 2 -> master: [ 125, 216, 343, 512 ] master -> 1: (AGGLOM_TASK): [ 9, 10, 11, 12 ] master -> 2: (AGGLOM_TASK): [ 13, 14, 15, 16 ] 1 -> master: [ 729, 1000, 1331, 1728 ] 2 -> master: [ 2197, 2744, 3375, 4096 ] master -> 1: (AGGLOM_TASK): [ 17, 18, 19, 20 ] master -> 2: (AGGLOM_TASK): [ 21, 22, 23, 24 ] 1 -> master: [ 4913, 5832, 6859, 8000 ] 2 -> master: [ 9261, 10648, 12167, 13824 ] master -> 1: (AGGLOM_TASK): [ 25 ] 1 -> master: [ 15625 ] [ 1, 8, 27, 64, 125, 216, 343, 512, 729, 1000, 1331, 1728, 2197, 2744, 3375, 4096, 4913, 5832, 6859, 8000, 9261, 10648, 12167, 13824, 15625 ]
If you wish an accelerated introduction to the models of parallel programming provided here, you might wish to read the beginning of Chapter Slave Listener through section Slave Listener Commands, and then proceed immediately to Chapter Basic Concepts for the TOP-C model (MasterSlave).
The ParGAP package was designed and written by Gene Cooperman, College of Computer Science, Northeastern University, Boston, MA, U.S.A.
If you use ParGAP to solve a problem then please send a short email to gene@ccs.neu.edu about it, and cite the ParGAP package as follows:
\bibitem[Coo99]{Coo99}
      Cooperman, Gene,
      {\sl Parallel GAP/MPI (ParGAP/MPI)}, Version 1,
      College of Computer Science, Northeastern University, 1999,
      \verb+http://www.ccs.neu.edu/home/gene/pargap.html+.
ParGAP can be built to use either a system MPI library, or the included MPINU library. The command to run ParGAP is different in the two cases. If ParGAP has been built using MPINU then you should skip this section and proceed to section Invoking ParGAP with Remote Slaves (using MPINU). Otherwise, please read on.
After ParGAP has been installed, a script bin/pargap.sh will have been
created  which   (after   any   changes   you   needed   to   make;   see
Section Installing ParGAP) you should use to invoke ParGAP. Installers 
are encouraged to treat pargap.sh in analogy to gap.sh. For example, if 
your site  has  copied  gap.sh  to /usr/local/bin/gap, then you should
also look for the pargap.sh script as /usr/local/bin/pargap. It simplifies
the remoste slave configuration if ParGAP can be found on the standard
path on each machine, and we'll assume that in this section ParGAP can
be invoked simply as pargap.
When built with a system MPI installation, ParGAP must be invoked using
the system's MPI launcher. This may go under several names, but the command
name mpiexec is suggested in the MPI-2 specification, and is supported by 
both Open MPI and MPICH, two common implementations of that specification.
The basic usage is
mpiexec -n num pargap
to launch num copies of ParGAP (i.e. one master and (num−1) slaves). 
With no other parameters, these will all be launched on the host machine.
A configuration file can be used to specify hosts for remote slaves. The syntax of this file different for Open MPI and MPICH, but in both cases the configuration file is a text file listing the host names and the number of processes to run on each host, one per line. The default number of processes per node is one by default.
When using Open MPI, an example hostfile is
# Example Open MPI hostfile. Comments begin with # # # The following node is a single processor machine: foo.example.com # The following two nodes are dual-processor machines: bar.example.com slots=2 yow.example.com slots=2This hostfile is passed to
mpiexec using
mpiexec -n num -hostfile hostfile pargap
Processes are allocated round-robin style. For example, if we choose num to
be seven then the first process (the master) will run on foo. The
slaves will run two on bar, two on yow and a further one each on foo and
bar. 
When using MPICH, the equivalent machinefile is
# Example MPICH machinefile. Comments begin with # # # The following node is a single processor machine: foo.example.com # The following two nodes are dual-processor machines: bar.example.com:2 yow.example.com:2and the command to start ParGAP using these hosts will be
mpiexec -n num -machinefile machinefile pargap
For further information, such as specifying hosts on the command line, or finer control of how processes are distributed between hosts, or if you have a different MPI implementation, then please see your MPI documentation.
Unless you have any problems with the installation or running ParGAP, you can skip the rest of this chapter and move on to Chapter Slave Listener.
If ParGAP has been built to use the supplied MPINU library then ParGAP includes the facility (on Linux) to start up and manage remove slaves without needing an external MPI launcher. If ParGAP is built using a system MPI library then please read to section Invoking ParGAP with Remote Slaves (when using a system MPI library) instead.
We'll assume that when ParGAP was built the scipt bin/pargap.sh was
copied to /usr/local/bin/pargap (see Section Installing ParGAP).
ParGAP can then be run by calling pargap. In addition, there  must  be  
a file, procgroup, in the current directory, or alternatively, if you wish
to use a single procgroup file for all jobs, and that procgroup file is in  
/home/joe, then you can alias pargap to pargap -p4pg /home/joe/procgroup.
The  procgroup  file  has  a  simple  syntax,  taken from the MPICH 
(not MPICH2) implementation of MPI. A # in column 1  introduces
a comment line. The first non-comment line should be local 0, verbatim.
This line declares the master process as the local process.  Other  lines
are of the form:
host-machine 1 pargap-script
e.g.
regulus.ccs.neu.edu 1 /usr/local/bin/pargap
The first field is the hostname for a remote process.  The  second  field
specifies one thread per process. (ParGAP recognizes only the  value 1
for the second field.) The  third  field  is  an  absolute  pathname  for
ParGAP, as it would be called on the remote process. Note that you can
repeat the same line twice if you want two remote ParGAP processes  on
the same processor. The default procgroup provided in the  distribution
will have lines of form:
localhost 1 path-of-provided-pargap.sh
If you change path-of-provided-pargap.sh to just, say,  pargap,  this
will work only if pargap is in your path on the  remote  machine  shell
(localhost in this case), using your default shell. On  most  machines,
localhost is an alias for the local processor. This is a  good  default
for debugging, so that you don't disturb users on other machines.
MPI will use a line
host-machine 1 pargap-script
to create a UNIX subprocess executing:
ssh host-machine pargap-script
Suppose host-machine is regulus.ccs.neu.edu  and  pargap-script  is
/usr/local/bin/pargap as in the above example,  and  we  were  to  have
trouble invoking ParGAP, then it would be a good idea to try  invoking
ssh regulus.ccs.neu.edu from a UNIX prompt and  if  that  succeeds,  to
then try executing the full ssh command.
A typical problem is that the remote processor  requires  a  password  to
login.   MPI   requires   a   login   without    passwords.    This   can
be set up for ssh.  See man ssh. Sometimes, PAM is also used for user
authentication (see /etc/pam.conf).  Consult  your  system  staff   for
further analysis. If your site uses an alternative to ssh, there  is  a
solution here: add the lines
############################################################################# ## ## SSH . . . .. . . . . . . . . . . . . . . . . remote shell used by ParGAP ## ## SSH=myssh export SSH
before the GAP block with the exec line. (Of course, the  #  lines
are not needed; they are comments.)
Note that the remote ParGAP process will not read from standard input,
although signals such as SIGINT (^C) may be received by  the  remote
process. However, the remote ParGAP process  will  write  to  standard
output, which is relayed to the local process. So,
gap> SendMsg("Exec(\"hostname\")", 2);
will execute and print from the remote process.
If you still have problems, here is a list of things to check. This section considers general problems when installing or running ParGAP. The two sections after this one consider problems specific to using MPINU or a system MPI library respectively.
./configure --with-mpi=MPINU
top.  The  Linux
    version of top sorts by memory usage if you type M.
make tries to automatically create:
pkg/pargap/bin/pargap.sh
/bin/gap.sh. GAP_ROOT  was
    specified when  you  executed  ./configure  GAP_ROOT/bin/gap.sh, does gap come up? If so, compare
    it   with   pargap.sh   and   check   for   correct   settings   in
    .../pkg/pargap/bin/pargap.sh?
ssh remote-hostnamessh instead of ssh, then there  is  a
    security issue. Read Section Problems with Passwords (Getting Around     Security), and possibly man sshd.
man ssh tells  you
    the security model at your site.  Then read Section Problems with     Passwords (Getting Around Security).
pargap listed in .../pkg/ALLPKG?
    [It's needed to autostart slaves.]
gap> MPI_Initialized();
cd  to  a  directory  of  the  same  name  as  your  local
    directory. Check your assumptions about the remote machine. Try:
gap> SendRecvMsg("Exec(pwd)"); SendRecvMsg("UNIX_Hostname()");
gap> SendRecvMsg("UNIX_Getpid()");
-b and/or -q 
    switches to ParGAP when it starts, to disable the banner or all messages
    respectively. See Section Ref:Command Line Options of the GAP Reference
    Manual for further details.
If you have problems running ParGAP, and ParGAP is built to use the supplied MPINU library, then this section lists some things to check, in addition to the general issues listed in the previous section. If you are using a system MPI implementation instead of MPINU, this section can be ignored, but you should read the next section instead.
procgroup file?
    [It looks in the current directory for procgroup, or for:
... -p4pg PATH/procgroup
procgroup file in your current directory 
    set correctly? Test it. If you are calling it on a remote host, manually 
    type:
ssh HOSTNAME ParGAP
procgroup, e.g.
ssh denali.ccs.neu.edu /usr/local/gap4r3/bin/pargap.sh
exec is used to save process overhead. Also try:
ssh HOSTNAME exec ParGAP
/tmp/pargapmpi-ssh.xx
xx is replaced by the  the  process  id  of  the  ParGAP
    process.
SO_KEEPALIVE and variants.  
    (See man setsockopt.)
    This periodically sends null messages so the  remote  machine  does
    not think that the originating  machine  is  dead.  However,  if  the
    remote machine fails to reply, the  local  process  sends  a  SIGPIPE
    signal to notify current processes of a broken  socket,  even  though
    there might have been only a temporary lapse in connectivity.
    ssh specifies KeepAlive yes by default, but setting KeepAlive no
    might get you through some transient lapses in  connectivity  due  to
    high congestion. 
    You may also want to experiment with: setenv SSH "ssh -n"
CALLBACK_HOST, as in the example below.
# [ in sh/bash/... ] CALLBACK_HOST=denali.ccs.neu.edu; export CALLBACK_HOST # [ in csh/tcsh/... ] setenv CALLBACK_HOST=denali.ccs.neu.edu
sh) somewhere between  the  first
    and last line of .../pkg/pargap/bin/pargap.sh.
./configure --with-mpi=MPINU
Here are a list of known issues when using a system MPI library with ParGAP, and some solutions or workarounds. Not all of these issues will manifest themselves on all architectures and all MPI implementations. If you are having problems building or running ParGAP, you should check this section as well as Section Problems Installing or Invoking ParGAP
rlwrap utility, if available. For example, if 
  ParGAP is run using mpiexec, then try
rlwrap mpiexec -n 3 pargap
rlwrap has already seen you use. For more 
  information, try man rlwrap.
FlushAllMsgs() (see FlushAllMsgs) is not available 
  when using a system MPI implementation, since it tests show that 
  ProbeMsgNonBlocking(), which it uses (see ProbeMsgNonBlocking) cannot be 
  relied upon to always return true the first time that it is called after a
  message has been sent. If your system MPI implementation does exhibit this 
  desired behaviour for ProbeMsgNonBlocking() then you can install your own
  local copy of FlushAllMsgs() by copying the code for this function from 
  lib/slavelist.g, removing the if statement and renaming the function.
ParReset() (see ParReset) is not available when using a
  system MPI implementation. When using a MPINU library, the slaves are launched
  by ParGAP itself and so can be contacted and restarted, but with a system
  MPI library the slaves are launched by mpiexec (or whichever MPI launcher
  you use) and so cannot be reset from within ParGAP. There is no known
  workaround for this.
malloc to allocate their own
  memory. MPINU avoids the use of malloc as much as possible, but
  system MPI implementations may not be as careful. This can be resolved by
  starting ParGAP with the -s command-line switch, which asks
  ParGAP to pre-allocate memory before it starts. You can safely 
  pre-allocate more memory than you will actually need since physical memory
  will only be mapped when it is actually used, so for example you could 
  allocate 3Gb:
mpiexec -n 3 pargap -s 3gThe
-a and -m switches can also be used to control memory usage. See 
  Section Ref:Command Line Options of the GAP Reference Manuel for 
  further information.
News of any other issues or solutions would be gratefully accepted.
There is a simple test to see if you need to read this  section.  Pick  a
remote machine, HOSTNAME, that you wish to execute on, and  type:  ssh
HOSTNAME. If this did not work, also try ssh HOSTNAME. If you were
asked for your password, then you and your system administrator may  need
to talk about security policy. If you were successful with an alternative
to ssh then  set  the environment  variable, SSH, to  the alternative
value, as described in item 3 below.
.shosts file to your home directory (for ssh).
ssh to
    start remote processes. However, if the  environment  variable  SSH
    was set, the script  uses  the  value  of  the  environment  variable
    instead of ssh. This may be useful, if you have  your  own  script,
    myssh, that automatically gets around  the  security  issues.  Then
    just type:
SSH=myrsh; export SSH # [ in sh/bash/... ] setenv SSH myrsh # [ in csh/tcsh/... ]
sh) somewhere between  the
    first and last line of .../pkg/pargap/bin/pargap.sh.  (The  example
    for ssh was given earlier.)
ssh: man ssh mentions some possibilities for giving the  password
    the first time, and then having ssh remember that  future  logins  to
    that machine are authorized for the duration of  the  session.  Don't
    overlook the use of $HOME/.ssh/config to set  special  parameters,
    such as specifying a different login name on the remote machine. Some
    parameters of interest  might  be  KeepAlive,  RSAAuthentication,
    UseRsh. You may also find useful information in man sshd.
/tmp/pargapmpi-ssh.$$
Note that this package modifies the GAP src  and  bin  files,  and
creates a new GAP kernel. This new GAP  kernel  can  be  shared  by
traditional users of the old, sequential  GAP  kernel,  and  by  those
doing parallel processing.
The GAP kernel will have identical behavior to the old  GAP  kernel
when invoked through  the  gap.sh  script  or  the  bin/@GAParch@/gap
binary. The new ParGAP variables will appear to the end user ONLY if
the GAP binary was invoked as pargapmpi:  a  symbolic  link  to  the
actual GAP binary. The script, pargap.sh, does this.
So, in a multi-user environment, traditional users can  continue  to  use
gap.sh  without  noticing  any  difference.  Only  an   invocation   of
pargap.sh will add the new features.
In a future version of GAP, it is hoped that the  GAP  kernel  will
have enough ``hooks'', so that no modification of the  GAP  kernel  is
required. At that time, it will also be possible to speed up the  startup
time for ParGAP. Much of the startup time is  caused  by  waiting  for
GAP to read its library files. It will be possible to use  the  GAP
function, SaveWorkspace() to save a version  with  the  GAP  library
pre-loaded. That saved version can then be used to  start  up  ParGAP.
This is not currently possible, because ParGAP needs  to  get  at  the
command line of GAP before the GAP kernel sees it.
Comments and contributions to a ParGAP user library, or any other type of assistance, are gratefully accepted.
Gene Cooperman gene@ccs.neu.edu
ParGAP manual