	     MPINU (MPI from Northeastern University)
		Gene Cooperman (gene@ccs.neu.edu)

This is intended to be a minimalist subset of MPI.  The goal is to allow
people to run simple MPI subset programs on a cluster using a
small MPI library.  Most point-to-point (standard mode, only) and
environmental functions are included.  MPI_Barrier and MPI_Bcast (which are
collective communications) are also implemented.  Buffered, synchronous and
ready mode are not handled, nor are asynchronous messages using requests.
Most functions from the remaining MPI layers (collective communication,
derived datatypes, groups, multiple communicators) are also not supported.
Finally, secure sockets are NOT used, and so this software is subject
to malicious attack or in unusual cases to random events connecting to
the ports used by this implementation.  For additional restrictions on the
functionality, see below.

In addition, note that MPI does not guarantee that a computation can
continue past an error.  For the reasons why, let us consider the issues:
    If we use MPI_Bsend or MPI_Isend (a nonblocking send), then a different
"send thread" must send asynchronously.  (Otherwise the current thread
may block if the kernel's network buffer is full.  An alternative is to
ask for a larger network buffer from the kernel, and this is being considered
for a future version --- but the kernel is not required to use the larger
network buffer.)  If the second "send thread" fails,
it is too late to have MPI_Bsend or MPI_Isend return MPI_FAIL and then recover.
  In MPI_Bsend, MPI is entitled to use the user buffer
declared by the user, and it is then up to the user to make sure that
the buffer has been used before again calling MPI_Bsend.
  In MPI_Isend, the user must check success using MPI_Test, and MPI_Isend
is not guaranteed to complete until something such as MPI_Test guarantees
completion.  However, MPI_Isend should be asynchronous, typically with a
separate thread completing on its own.
  MPINU currently provides only MPI_Send, and creates a separate "send
thread" to always avoid blocking.  Hence, if the send thread incurs an
error, it is too late to return an error from MPI_Send.  In the future,
we may provide MPI_Send for potentially blocking sends (although short
sends will typically be buffered by the network), and MPI_Bsend for
nonblocking sends that use the "send thread".  MPI_Bsend would copy to an
MPINU buffer, regardless of what user buffers are provided by MPI_Attach.

To use it, call:
make libmpi.a
Then link libmpi.a and include mpi.h in any user files.
Be sure to modify the procgroup file according to the instructions inside.
If you wish, your application can add a flag "-p4pg" followed by
a procgroup name of your choice.

Although this is a minimalist implementation, attention has been paid
to portability across several UNIX dialects using sockets over TCP/IP.
No architecture-specific conditionalizations are used.  Process startup
is handled through the procgroup file as done in MPICH/p4.  The example
procgroup file includes a description of the format.  The application
can use the -p4pg flag to specify a procgroup file other than the
default one, "./procgroup".  Hence, the "mpirun" utility from MPICH
should work here, too.  (Don't forget to modify the procgroup file for
your site when you use it.  You may also have to change localhost to a
specific hostname if localhost is not hosts.equiv at your site.)  This
code is placed in the public domain, with with hopes that others will
extend it along the same philosophical lines.  The authors would
appreciate receiving any extensions to this code, so as to possibly
integrate them into future versions.

The benefits of this distribution include easy customization, and
instructional use for socket programming and internal considerations of
an MPI implementation.  An INTERNALS file is provided with this
in mind.  The current version is about 1,200 lines of C code.  Most
functions are reasonably compact, and the call tree is for the most part
no more than three deep, making the code particularly easy to
follow.  We have also tried not to invoke malloc(), brk(), or error
handlers, so as to avoid possible conflicts with user programs.
However, the system is forced to use malloc() for receiving
out-of-order messages, such as when a receiver requests a message with
a specific tag and source and there is a pending message from that
source with a different tag.  This is the only time malloc() is used,
currently.   (See INTERNALS for the issue of deadlock and malloc().)

This work was motivated in part by our need to integrate MPI with
interpreted languages, such as LISP and GAP.  Since such languages may choose
to use malloc() (or even sbrk()), and to install their own error
handlers, it was important to affect those mechanisms as little as
possible.  Attention was also paid to properly handling very large
messages, which often occur in our applications.  Finally, a minimalist
MPI greatly aids in debugging, where it is often difficult to assign
responsibility for a phenomenon between MPI and the interpreted
language.  In a full MPI implementation, the MPI code can be as large
as some of the interpreted languages.

The subset implemented lies within the point-to-point and
environment layer of MPI.  Not all of even those routines have been
updated to date.

This is still a work in progress.

1.  In heterogeneous networks, the user is responsible for converting the user
data into new byte orderings.  XDR or other mechanisms are not invoked.

2.  The software uses public ports, which are inherently insecure.
Very little checking is done that the remote process is a correct one.
This software is subject to malicious attack, and so should not be
used where security (e.g. secure sockets) is a requirement.
