	     MPINU (MPI from Northeastern University)
		Gene Cooperman (gene@ccs.neu.edu)
		Markos Kyzas (markos@ccs.neu.edu)

This is intended to be a minimalist subset of MPI.  The goal is to allow
people to run simple MPI subset programs on a network of workstations using a
small MPI library.  Most point-to-point (standard mode, only) and
environmental functions are included.  MPI_Barrier and MPI_Bcast (which are
collective communications) are also implemented.  Buffered, synchronous and
ready mode are not handled, nor are asynchronous messages using requests.
Most functions from the remaining MPI layers (collective communication,
derived datatypes, groups, multiple communicators) are also not supported.
Finally, secure sockets are NOT used, and so this software is subject
to malicious attack or in unusual cases to random events connecting to
the ports used by this implementation.  For additional restrictions on the
functionality, see below.

To use it, call:
make libmpi.a
Then link libmpi.a and include mpi.h in any user files.
Be sure to modify the procgroup file according to the instructions inside.
If you wish, your application can add a flag "-p4pg" followed by
a procgroup name of your choice.

Although this is a minimalist implementation, attention has been paid
to portability across several UNIX dialects using sockets over TCP/IP.
No architecture-specific conditionalizations are used.  Process startup
is handled through the procgroup file as done in MPICH/p4.  The example
procgroup file includes a description of the format.  The application
can use the -p4pg flag to specify a procgroup file other than the
default one, "./procgroup".  Hence, the "mpirun" utility from MPICH
should work here, too.  (Don't forget to modify the procgroup file for
your site when you use it.  You may also have to change localhost to a
specific hostname if localhost is not hosts.equiv at your site.)  This
code is placed in the public domain, with with hopes that others will
extend it along the same philosophical lines.  The authors would
appreciate receiving any extensions to this code, so as to possibly
integrate them into future versions.

The benefits of this distribution include easy customization, and
instructional use for socket programming and internal considerations of
an MPI implementation.  An INTERNALS file is provided with this
in mind.  The current version is about 1,200 lines of C code.  Most
functions are reasonably compact, and the call tree is for the most part
no more than three deep, making the code particularly easy to
follow.  We have also tried not to invoke malloc(), brk(), or error
handlers, so as to avoid possible conflicts with user programs.
However, the system is forced to use malloc() for receiving
out-of-order messages, such as when a receiver requests a message with
a specific tag and source and there is a pending message from that
source with a different tag.  This is the only time malloc() is used,
currently.   (See INTERNALS for the issue of deadlock and malloc().)

This work was motivated in part by our need to integrate MPI with
interpreted languages, such as LISP and GAP.  Since such languages may choose
to use malloc() (or even sbrk()), and to install their own error
handlers, it was important to affect those mechanisms as little as
possible.  Attention was also paid to properly handling very large
messages, which often occur in our applications.  Finally, a minimalist
MPI greatly aids in debugging, where it is often difficult to assign
responsibility for a phenomenon between MPI and the interpreted
language.  In a full MPI implementation, the MPI code can be as large
as some of the interpreted languages.

The subset implemented lies entirely within the point-to-point and
environment layer of MPI.  Not all of even those routines have been
updated to date.

This is still a work in progress.

0.  In particular, this preliminary version does not work for messages sent
from a process to itself.  This will soon be remedied.

1.  While the code always works with MPI_ANY_TAG, if one specifies
a specific tag to receive, then that tag must match the tag of the next
incoming message.  (i.e.:  It cannot currently take messages from the same
sender out of order.)

2.  The socket buffers are used to limit the use of malloc().  So, MPI_Send()
can block on large messages, creating the possibility of deadlock if several
processes send before doing a receive.  See INTERNALS for a description
of these issues.

3.  In particular, it's possible that a large message sent by a process to
itself may block.

4.  In heterogeneous networks, the user is responsible for converting the user
data into new byte orderings.  XDR or other mechanisms are not invoked.

5.  The software uses public ports, which are inherently insecure.
Very little checking is done that the remote process is a correct one.
This software is subject to malicious attack, and so should not be
used where security (e.g. secure sockets) is a requirement.
