Network Woking Group                                         Danny Cohen
Internet Draft                                                   Myricom
Expire in six months                                          Craig Lund
                                                       Mercury Computers
                                                           Tony Skjellum
                                            Mississippi State University
                                                            Thom McMahon
                                            Mississippi State University
                                                           Robert George
                                            Mississippi State University
                                                            October 1997

             The End-to-End (EEP) PacketWay Protocol for
          High-Performance Interconnection of Computer Clusters
              <draft-ietf-pktway-protocol-eep-spec-02.txt>

Status of this Memo

     This document is an Internet-Draft.  Internet-Drafts are working
     documents of the Internet Engineering Task Force (IETF), its
     areas, and its working groups.  Note that other groups may also
     distribute working documents as Internet-Drafts.

     Internet-Drafts are draft documents valid for a maximum of six
     months and may be updated, replaced, or obsoleted by other
     documents at any time.  It is inappropriate to use Internet-
     Drafts as reference material or to cite them other than as
     "work in progress."

     To view the entire list of current Internet-Drafts, please check
     the "1id-abstracts.txt" listing contained in the Internet-Drafts
     Shadow Directories on ftp.is.co.za (Africa), ftp.nordu.net
     (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East
     Coast), or ftp.isi.edu (US West Coast).


Table of Content:

    1.  Introduction...................................................2
    1a.  PktWay and IP.................................................2
    1b.  General.......................................................4
    1c.  The Level-2 Operation of PktWay...............................5
    2.  A note about the PktWay documents..............................6
    3.  Notations......................................................7
    4.  PktWay EEP Messages............................................7
    4a.  The PktWay Message Structure..................................7
    4b.  The Optional Fields...........................................8
    5.  Optional Sequence of L2RHs and Symbols.........................9
    5a.  L2 Routing Headers (L2RHs)....................................9
    5b.  Symbols......................................................11
    6.  EEP Header....................................................12
    6a.  Version......................................................12
    6b.  Priority.....................................................12
    6c.  Destination-Type.............................................13
    6d.  Packet Type Extension........................................14
    6e.  Packet Type..................................................14
    6f.  Endianness...................................................15
    6g.  Padding Length...............................................15
    6h.  Data Length..................................................15
    6i.  Options flag.................................................15
    6j.  Reserved.....................................................15
    6k.  Source Address...............................................16
    7.  Optional Header Fields........................................16
    8.  Optional Data Block...........................................17
    9.  Optional Trailer Fields.......................................17
   10.  EEP Trailer...................................................18
   11.  Appendix-A: Recommendation for PktWay Address Assignment......18
   12.  Appendix-B: Glossary..........................................19
   13.  Appendix-C: Acronyms and Abbreviations........................20
   14.  Appendix-D: PktWay at a Glance ("cheat-sheet")................22
   15.  Security Considerations.......................................23
   16.  Editor's Address..............................................23

Cohen et al                                                    [Page  1]

Internet-Draft         PktWay End-to-End Protocol           October 1997


1. Introduction

   PktWay is an open family of specifications for inter-networking high
   performance SANs (System Area Networks) and high performance LANs
   (Local Area Networks) into computing clusters.

   Most modern SANs have much in common, such as high data rates, low
   message latency and low bit error rates.  Such SANs are often packet
   networks made of point-to-point links with flow control and source
   routing.  Yet these SANs do not provide heterogeneous networking
   support, and are subsequently incapable of direct inter-
   communications with other SANs.  PktWay's goal is to provide high
   performance "internetting" of such SANs and of high performance LANs.

   The core PktWay protocol comprises the End-to-End Protocol (EEP) and
   the Router-to-Router-Protocol (RRP).  This document specifies the EEP
   (End-to-End) protocol of PktWay.  A companion document
   ("Specification for the Router-to-Router (RRP) PktWay Protocol")
   specifies the Router-to-Router protocol of PktWay.

   Computing clusters and modern MPPs (Massively Parallel Processing
   systems) are sets of processors interconnected by high performance
   SANs.  Examples are Intel's Paragon and ASCI-red, CRAY's T3D and T3E,
   and IBM's SP2 and SP3.  Most modern SANs have much in common, such as
   high data rates, low message latency and low bit error rates.  Such
   SANs are often packet networks made of point-to-point links with flow
   control and source routing.

   Unfortunately, there is no efficient way to "internet" these SANs -
   to allow each computing node to have high performance communication
   directly with any other computing node, in any other interconnected
   SAN.  Hence, there is no way to interconnect such high performance
   SANs to form as efficient computing cluster as possible.

   The objective of PktWay is to provide high performance communication
   among all the processors in a cluster of tightly coupled
   heterogeneous SANs.  PktWay borrows heavily from the experience and
   wisdom of IP, with a few modifications needed for high performance.
   PktWay sacrifices generality and scalability to improve performance.

   1a. PktWay and IP

      IP is the general solution for "internetting" heterogeneous
      diverse networks, proven for over 25 years.  However, IP was
      designed for the generality required for Wide Area Networks,
      without regard to the high performance requirements of tightly
      coupled systems.  In addition, IP was designed to addresses
      "systems" rather than individual processors in MPPs (as PktWay
      does).  For example, a 9,000 processor system is not expected to
      be assigned 9,000 IP addresses.


Cohen et al                                                    [Page  2]

Internet-Draft         PktWay End-to-End Protocol           October 1997


      PktWay is slightly below IP in the OSI Reference Model.  It has
      many Level-3 features, like IP, but also can support IP as if
      PktWay was a Level-2 protocol.  Hence, it is below IP.  In
      addition, PktWay supports Level-2 optimizations (such as source
      routing).

      Like IP, as a heterogeneous network layer, PktWay packets are
      transported by the native data-link layer of each SAN.  As a
      result, PktWay packets are encapsulated with any native routing
      headers and trailers as required by the local network fabric.

      Like IP, PktWay uses routers between its SANs.  When an HR
      (half-router) receives a packet for a destination on its own
      SAN it forwards that packet directly to its destination.  If
      the packet is for a destination out of this SAN, the HR forwards
      it to another HR which is en route to that destination.

      Unlike IP, RRP defines the communication among the HRs both
      intra-router and inter-router.  In the IP environment only the
      intra-router communication is not defined, only the inter-router
      communication.

      Unlike IP, the PktWay routers do not have to pop each packet back
      to Level-3, and are capable of operating entirely at Level-2, if
      this operation is requested by the communicating hosts.  This
      Level-2 operation is discussed later in this introduction.

      Like IP, the PktWay protocol utilizes the native capabilities of
      its constituent SANs and routers.  PktWay defines neither how each
      HR maps the network in the SAN to which it is attached, nor how
      each half-router constructs SAN-headers for each of its hosts.
      The PktWay protocol also does not define how error-checking is
      conducted by each SAN (e.g., CRC8, CRC32, CRC64, or anything
      else).  Instead, PktWay assumes that these capabilities are native
      to each SAN, and defines only how these maps are exchanged, and
      how these error indications are carried from where they were
      detected, to the destination node.

      Like IP, the PktWay protocol defines neither how routes are
      selected, nor what corrective actions should be taken in case of
      faults.  Instead, PktWay provides the information needed by the
      host nodes for devising routes and detecting and circumventing
      faults.

      Like in IP environments, when hosts are powered up they may
      contact their default half-routers to register themselves and to
      inquire about other hosts (by name or node capabilities).  This
      registration could be used in support of dynamic discovery
      procedures.  The half-routers may help nodes discover each other
      (like IP's DNS) and may provide routing alternatives, possibly
      with different characteristics (e.g., MTU, length, and cost).
      The PktWay protocol does not specify how to choose among them.

Cohen et al                                                    [Page  3]

Internet-Draft         PktWay End-to-End Protocol           October 1997


      To sum it all up, PacketWay has learned many lessons from IP, but
      has been heavily optimized for high performance SANs, while IP is
      the protocol of choice for WANs.


   1b. General

      PktWay supports resource discovery, by name or capabilities.

      PktWay's unit of data is 64-bit long (8 bytes).  Hence, a PktWay
      packet is always a multiple of 8B quantities.  PktWay provides
      hosts with padding as required.

      PktWay iself is big-Endian 8B-word based.  Hence, the terms "first
      bit" and "first byte" are equivalent to MSbit and MSByte.

      PktWay handles the Little vs. Big-Endian issue for its payload by
      providing a field in the EEP header which defines the endianness
      and "the chunk-size" of the data in the payload (Data Block).  The
      intent is that byte-swapping hardware, if any, could be used to
      invert the endianness of payloads with uniform data elements
      (e.g., all the data being 32-bit floating point).  Although this
      approach does not address the problems of transporting general
      structures (e.g., a "struct" of C), it does allows the
      participation of smart memory cards as PktWay nodes, as well as
      supporting direct memory access (DMA) operations.

      The PktWay protocol is designed to allow wormhole (or
      "cut-through") forwarding, in which a router can start forwarding
      packets after receiving the first four bytes only (that include
      the PktWay-protocol version, priority, and the destination-type)
      without waiting for information that may not be needed for the
      packet forwarding task.  This is unlike IP routers that receive
      the sender address before receiving the destination address, even
      though the former is not always needed whereas the latter is.

      PktWay's addresses are short (23 bits) because, unlike IP, PktWay
      is not designed for global operation.  The amount of state that is
      stored in the half-routers per node (type, name, paths,
      capabilities, etc.)  makes it impractical for scalability beyond a
      few tens (hundreds?) of thousands of nodes, over a (relatively)
      small number of SANs.

      PktWay does not support SAR (Segmentation And Reassembly).
      Instead, it provides means for hosts to discover the minimum
      transmission unit (MTU) over several alternative paths to any
      other node.  A PktWay packet must never exceed the minimum MTU
      along all the network hops from the source node to the destination
      node.



Cohen et al                                                    [Page  4]

Internet-Draft         PktWay End-to-End Protocol           October 1997


      Several protocol extensions, which are layered on the core PktWay
      protocol, have been defined.  These include dynamic resource and
      routing discovery, secure PktWay, and multicast PktWay.  These
      protocol extensions will be described in documents to be provided
      later.

      1c. The Level-2 Operation of PktWay

      PktWay's goal is to move data from a source node, (on some
      arbitrary SAN) to a destination node, (either on the same SAN, or
      on another SAN).  Sources and destinations can be physical
      entities, such as a processor or a smart memory board, or logical
      entities, such as a group of cooperating processes or a collection
      of threads.  Sources, destinations, and routers are such nodes.

      Within each PktWay configuration all nodes have unique 23-bit
      physical PktWay addresses.  A system designer can assign these
      PktWay addresses manually.  Alternatively, the optional PktWay
      Server Layer may provide a way to assign and discover addresses
      dynamically.  Throughout this document "address" always means the
      23-bit physical PktWay address.

      To optimize for performance, PktWay has a data transfer mode that
      directly leverages the native message routing schemes used within
      each SAN.  This mode uses a "Planned Transfer" paradigm.  During
      the planning phase, a source node collects information on optimal
      routes to a destination, expressed in the various native formats
      of all the intervening SANs.  A source node later uses this
      information for low latency transfers to that destination.  In
      PktWay, the transfer phase of a Planned Transfer is called
      "L2-forwarding".  The RRP document demonstrates the use of
      L2-forwarding.

      PktWay also supports a more traditional data transfer mode that
      requires no planning.  Such transfers specify the destinations by
      their addresses only.  In PktWay, this more traditional approach
      is called "L3-forwarding".

      PktWay packets may be routed by Level-2 (L2) forwarding, Level-3
      (L3) forwarding, or a combination thereof.

      In L3-forwarding (similar to IP forwarding), the L2-routing
      through each SAN is determined by an inter-SAN router upon
      entering that SAN.  The router prefixes the packet with an L2
      routing header (such as a source route) corresponding to the
      destination address specified in the packet directing the packet
      either to its destination or to an intermediate router.  It is a
      task for that router to determine the L2-routing-header
      corresponding to the given PktWay-address.



Cohen et al                                                    [Page  5]

Internet-Draft         PktWay End-to-End Protocol           October 1997


      In L2-forwarding the source prefixes the packet with all the
      L2-routing headers needed along the entire path to the
      destination.  Each router has only to get the L2-routing-header
      from the leading L2RH (L2-Routing-Header record) that was provided
      by the source.

      PktWay allows hosts to construct a source-route built entirely of
      Level-2 headers, allowing each SAN to exploit the full performance
      of its native interconnection fabric.  These SAN-headers
      (equivalent to MAC-headers) are provided by the SANs that will use
      them, in their native format.  PktWay does not define the format
      of the local routing envelope.  Instead, it defines how the
      encapsulated PktWay packets should be passed between half-routers,
      leaving it up to the local network of each SAN to properly deliver
      the packet.

      If hosts so prefer, they can address their destinations either by
      any arbitrary name, a PktWay physical address (which is handled
      like the Level-3 IP-address), or by concatenating a sequence of
      Level-2 SAN-headers.  Although the generation of a sequence of L2
      Routing Headers requires more effort to construct initially,
      PktWay source routing results in considerably lower network
      latencies, as the packets are allowed to cut-through route through
      the intervening SAN networks .



2. A note about the PacketWay Documents

   The PacketWay protocol is defined by a series of documents:
      * EEP (End-to-End Protocol)
      * RRP-1 (basic Router-to-Router Protocol)
      * RRP-2 (dynamic inter-SAN routing)
      * PktWay enumerations

   Each of these documents should include the same "PacketWay at a
   Glance (Cheat-Sheet)", this note, and the Notations page.  They
   should include also (as appendices) a copy of the PacketWay glossary
   of terms and its acronyms and abbreviations list.

   The EEP and the RRP documents will be published first as
   Internet-Drafts and later as Proposed-Standards, Draft-Standards,
   and Standards.

   The Enumeration Document will be first published as an
   "Informational-RFC" and later will be maintained by IANA.

   The enumeration document may be attached to the EEP/RRP documents, as
   a matter of convenience.  The enumeration is NOT a part of the PktWay
   standard, just as RFC0739 (the original "Assigned Numbers" RFC) is
   not a part of RFC0791, that defines IP.

Cohen et al                                                    [Page  6]

Internet-Draft         PktWay End-to-End Protocol           October 1997

   Similarly, the EEP-document has "Appendix-A: A Recommendation for
   PktWay Address Assignment" which is a recommendation only and NOT
   a part of the PktWay standard, just as IP-address-assignment is not
   a part of RFC0791, that defines IP.

   The appendices are brought for clearance and convenience.  They are
   not a part of the PktWay specification.

   Information about the PktWay activity may be found in the URL:
   http://www.erc.msstate.edu/PktWay/


3. Notations

   The shorter "PktWay" is used for "PacketWay".

   8B means "8-byte" (64 bits).

   0x indicates hexadecimal values,  e.g., 0x0100 is 2^8=256(decimal).

   0b indicates binary values, e.g., 0b0100 is 4(decimal).

   xxxx indicate a field that is discarded without any checking (e.g.,
        padding).

   [fff] indicates that fff is an optional field, when appropriate.

   [exp] in equations, is the integral part, rounded down, of `exp`.
         e.g., [23/8]=2.

   All length fields do not include themselves, and therefore may be
   zero.

   Lengths are specified either (a) by byte count, implying that some
   padding bytes may follow to fill 8B-words, or (b) by 8B-word count
   and PL, the number of trailing padding bytes (with PL between 0
   and 7).



4. PktWay EEP Messages

   4a. The Pktway Message Structure

      PktWay messages have 6 components, including 4 optional ones:

      [1]: [Optional Sequence of L2-Routing-Headers and Symbols]
      [2]: EEP Header (16 bytes) (PH)
      [3]: [Optional Header fields] (OH)
      [4]: [Optional, Most likely: Data Block] (DB)
      [5]: [Optional Trailer fields] (OT)
      [6]: EEP Trailer (8 bytes) (TAIL)


Cohen et al                                                    [Page  7]

Internet-Draft         PktWay End-to-End Protocol           October 1997


   4b. The Optional Fields

      [1]: as explained later, if the 9th+10th bits of a messages are
           0b10 then the message starts with an L2RH, but if the 9th
           through the 12th bits of a message are 0b1111 then this
           message starts with a "symbol".  The other values of these
           4 bits indicate the lack of L2RH and symbols and that the
           message begins with the EEP-header.

      [3]: if the h-bit in the EEP header [2] is 1 then there are
           optional header (OH) fields.  The sequence of these OH fields
           is terminated with an OH field marked as being the last one
           (with C=1).

      [4]: if DL>0, in the EEP header, zero then a Data Block (DB) is
           included in this message.

      [5]: the optional header fields, [3], may indicate that some
           optional trailer fields are present after the DB, [4].  The
           order and the formats of the trailer fields are defined by
           the optional header fields.

      It is expected that most messages will have Data Blocks (DB), and
      that most messages will not have Optional Header fields (OH), nor
      Optional Trailer fields (OT).

      Leading L2RHs and symbols [1] are consumed by the HRs before
      reaching the destination which receives only the other components,
      [2] through [6].  These parts, [2] to [6], constitute the
      End-to-End Protocol of PktWay.

      TAIL, the EEP trailer, [6] may be modified along the way to the
      destination, unlike [2], [3], [4] and [5], which arrive exactly as
      sent by the source.

      Each PktWay packet may be first L2-forwarded (zero or more times)
      before being L3-forwarded (zero or more times).

      Although PktWay headers and trailers are always in Big Endian
      order, the byte order of the Data Block is not defined by PktWay.

      Since all the elements of PktWay (L2RHs, EEP-headers, optional
      fields, data, and EEP-trailers) are always multiples of 8B-words,
      it is recommended that PktWay headers (and data) be aligned on
      8B-boundaries in the nodes' memory.







Cohen et al                                                    [Page  8]

Internet-Draft         PktWay End-to-End Protocol           October 1997


5. Optional Sequence of L2RHs and Symbols [1]

   PktWay messages may start with a mix of L2RHs and symbols.

   A PktWay source may specify native routes, by placing the native
   routes before the PktWay Header.  The native routes (for all SANs and
   LANs beyond the initial one) must appear within a sequence of PktWay
   L2-Routing-Header records (L2RH).

   In certain situations symbols may be included among the L2RHs.  These
   symbols are used for conveying information to the routers that handle
   the messages, such as about encryption.  A symbol does not specify
   its destination and is processed (and consumed) by the entity that
   encounters it.

   In L2-forwarding each intermediate HR consumes an L2RH and the
   preceeding symbols (if any).  When a packet reaches its destination
   all of [1] (the Optional Sequence of L2RHs and Symbols) should be
   consumed.

   5a. L2 Routing Headers (L2RHs) Records

      The contents of the L2RH are totally SAN dependent, with the
      exception of the first 2 bytes that distinguish this record from
      an EEP-header and also provide the Length (0<L<64) indicating the
      number of routing bytes of that L2RH (not including these 2
      bytes).

      This distinction (between L2RHs and EEP-headers) is necessary for
      routers that L2-forward packets starting with L2RHs, but
      L3-forward packets starting with EEP-headers.  Similarly, hosts
      expect packets to start with EEP-headers (with optionally
      preceeding symbols), and may discard packets that start with
      L2RHs.

      It's up to each SAN to provide padding, as needed, to fill the
      L2RH words.

      Each L2RH is defined by the entity that will process it.  In
      addition to routing information per se, it may also include
      demuxing information such as a local message-type.  For example,
      over Myrinet the L2RH should end with 0x0300 which is the
      Myrinet-type assigned to PktWay (and possibly some padding, too).

      The L2RH must contain enough information to allow a router to
      create any necessary local routing headers and trailers.  Although
      the low-level network implementation is beyond the scope of this
      document, the native source routing format must be documented in
      sufficient detail to allow for heterogeneous network
      interoperability.


Cohen et al                                                    [Page  9]

Internet-Draft         PktWay End-to-End Protocol           October 1997


      When a PktWay message is encapsulated inside any native SAN
      message (Paragon or Myrinet, for example), it's up to that SAN to
      distinguish between it and its own native packets.  This is not a
      PktWay issue.  For example, Myrinet uses its Message-Type to
      recognize PktWay messages.

      PktWay-Routers on boundaries between SANs L2-forward packets
      starting with L2RH or L3-forward packets starting with
      EEP-headers.  L2RH are distinguished from EEP-headers by the value
      of the first two bits of the Destination-Type field.

      5a1. L2RH FORMAT:

         Each L2RH is in the format:

+--------+--------+--------+--------+--------+--------+--------+--------+
|vv000000|10LLLLLL|  SR01  |  SR02  |........|........|........|  xxxx  |
+--------+--------+--------+--------+--------+--------+--------+--------+
          ^^

         The first 2 bits are vv=0b00 for the working version of the
         protocol.  They may have other values for experimental
         versions.

         The next 6 bits should be all zeroes.

         The next two bits must be 0b10 to indicate that this is an L2RH
         record.  This 0b10 was chosen to be consistent with the 0b10 of
         PktWay-addresses, as described in [2] below.

         The next 6 bits are the byte count (L) of the routing
         information that starts in the next byte and is followed by as
         many padding bytes as needed to fill to the next 8B-boundary.

         L does not include itself, hence it could be between 0 and 63.
         However, since this record contains some routing bytes, L is
         greater than 0.  The total number of 8B-words in the L2RH is
         [(L+9)/8] where the square brackets indicate the integer part,
         rounded down, of the quantity within.  Therefore, the number of
         padding bytes is PL=8*[(L+9)/8]-2-L.

      5a2. L2RH EXAMPLES:

         An L2RH with an SR with 5 routing bytes:

        0b10   L=5    #1       #2       #3       #4       #5    padding
+--------+--------+--------+--------+--------+--------+--------+--------+
|vv000000|10000101|  SR01  |  SR02  |  SR03  |  SR04  |  SR05  |  xxxx  |
+--------+--------+--------+--------+--------+--------+--------+--------+
          ^^ |<---------- routing information ----------->|


Cohen et al                                                    [Page 10]


Internet-Draft         PktWay End-to-End Protocol           October 1997


         An L2RH with an SR with 13 routing bytes:

        0b10  L=13    #1       #2       #3       #4       #5       #6
+--------+--------+--------+--------+--------+--------+--------+--------+
|00000000|10001101|  SR01  |  SR02  |  SR03  |  SR04  |  SR05  |  SR06  |
+--------+--------+--------+--------+--------+--------+--------+--------+
|  SR07  |  SR08  |  SR09  |  SR10  |  SR11  |  SR12  |  SR13  |  xxxx  |
+--------+--------+--------+--------+--------+--------+--------+--------+
    #7       #8       #9       #10      #11      #12      #13    padding




   5b. Symbol Records

      5b1. Symbol Format:

         Each symbol is in the format:

+--------+--------+--------+--------+--------+--------+--------+--------+
|vv000000|1111ssss|ssssssss|ssssssss| Length |  data  |........|........|
+--------+--------+--------+--------+--------+--------+--------+--------+
          ^^^^<---- Symbol-Type --->

         The 5th byte is the byte-count (L) of the data for this field
         that starts in the next byte, and is padded with as many
         padding bytes as needed to fill 8B-words.

         The length (L) does not include itself, hence it is between 0
         and 255.  The total number of 8B-words in the symbol L2RH is
         [(L+12)/8] where the square brackets indicate the integer part,
         rounded down, of the quantity within.  Therefore, the number of
         padding bytes is PL=8*[(L+12)/8]-2-L.

         Symbols may be mixed among the L2RHs, before the EEP-header.

         The values of the Symbol-Type field are defined in the PktWay
         Enumeration document.

      5b2. Symbol Example:

         A symbol with 9 data bytes.

        0b1111<---- Symbol Type --->L=9 Bytes
+--------+--------+--------+--------+--------+--------+--------+--------+
|vv000000|1111ssss|ssssssss|ssssssss|00001001|  data1 |  data2 |  data3 |
+--------+--------+--------+--------+--------+--------+--------+--------+
|  data4 |  data5 |  data6 |  data7 |  data8 |  data9 |  xxxx  |  xxxx  |
+--------+--------+--------+--------+--------+--------+--------+--------+



Cohen et al                                                    [Page 11]


Internet-Draft         PktWay End-to-End Protocol           October 1997



6. EEP Header [2]

   The EEP (aka PH) has 16 bytes.

 2   6               24                     16                16
+-+------+-------+--------+---------+--------+--------+--------+--------+
|V|  P   |    Destination-Type      |  Type-Extension |   Packet-Type   |
+-+-+---++--------------------------+-+------+--------+--------+--------+
| E | PL| Data-Length>=0 (8B-words) |h|  RZ  |0     Source-Address      |
+---+---+--------+--------+---------+-+------+--------+--------+--------+
     4    3             25              1   7                24

   These fields are described below:

                                   Bytes.bits
   a. Version                    (V)      0.2
   b. Priority                   (P)      0.6
   c. Destination-Type           (DT)     3.0
   d. Packet Type Extension      (TE)     2.0
   e. Packet Type                (PT)     2.0
   f. Endianness                 (E)      0.4
   g. Padding Length             (PL)     0.3
   h. Data Length                (DL)     3.1
   i. Options flag               (h)      0.1
   j. Reserved                   (RZ)     0.7
   k. Source Address             (SA)     3.0


   6a. Version (V) 2 bits

      This field is static.  Its 2 bits are 0b00 for the working version
      of the protocol.  These bits should have other values for
      co-existing experimental versions.

   6b. Priority (P) unsigned integer, 6 bits

      It is anticipated that some SANs, especially those working in real
      time, will want to implement priorities.  This field supports such
      usage.

      All ones is the highest priority, and all zeroes the lowest.
      Ideally, packets with higher priority should gain access to
      contested resources before packets with lower priority.
      Implementations may ignore the Priority field.







Cohen et al                                                    [Page 12]


Internet-Draft         PktWay End-to-End Protocol           October 1997


   6c. Destination-Type (DT) 24 bits

     The purpose of this field is to specify the header type, as well as
     the destination of the packet, when applicable.


     This field may specify:

        * A physical PktWay address (of 23 bits);
        * An L2-Routing-Header (L2RH) of a variable length;
        * A logical address (of 20 bits); or
        * A symbol (of 20 bits).

     In addition, it is anticipated that additional types will be needed
     in the future.

     A variant of Huffman coding is used to accommodate all these
     methods for the Destination-Type field.  This is done by assigning
     the MSbit of 0 to physical addresses, 2 MSbits of 0b10 to L2RH,
     3 MSbits of 0b110 to future needs, 4 MSbits of 0b1110 to logical
     addresses, and 4 MSbits of 0b1111 to symbols.

     This assignment is summarized in the following table:

                              MSbits | Method
                             --------+----------
                               0xxx  | Physical
                               10xx  | L2RH
                               110x  | Reserved
                               1110  | Logical
                               1111  | Symbol

     A single C-style 16-way switch can dispatch quickly the protocol
     processor to the right handler required for any of the methods used
     to specify the destination.

     The Physical addresses are unique within each instance of PktWay.
     Nodes should have addresses assigned to them.  The method of
     assigning unique addresses within each PktWay is not specified
     here.

     Examples of potentially addressable PktWay nodes include: groups of
     cooperating processes, an entire MPP, or each of an MPP's many
     processors or processes.

     The 0b10xx was chosen for L2RH to be consistent with the 0b10
     indication of L2RHs, as described earlier in this document.

     "Logical Addresses" (e.g., for broadcast and for multicast groups)
     are also in this address space.  The destination-Type is a "Logical
     Address" if its 4 MSbits are set to 0b1110.

Cohen et al                                                    [Page 13]


Internet-Draft         PktWay End-to-End Protocol           October 1997


       A few Physical-addresses are reserved:

       0x000000 Undefined address (illegal where an address is expected,
                but is allowed in the SA field)

       0x7FFFFE ("Hey-You!") This address could be used at power up
             to address nodes or routers, over point-to-point links.
             ("If you receive it, it's for you.")

       0x7FFFFF (Broadcast) This address is reserved for broadcast
             operations which may be added in later versions.
             ("If you receive it, it's for you.")


   6d. Type Extension (TE) 2 bytes

     An extension of the following PT field.

     Logically, the TE should be after the PT.  However, the PT is
     8B-word aligned, easier to process than the TE which is 2B-aligned,
     but not 8B-aligned. Since the PT is more frequently used than the
     TE, it was assigned to the better aligned field.


   6e. Packet Type (PT) 2 bytes

     The PT field provides the information needed for efficient
     de-multiplexing of multiple protocol layers.  Whereas traditional
     protocol layering requires several stages of sequential
     de-multiplexing, PktWay provides enough information to support a
     single combined de-multiplexing operation (such as in support of
     zero copy TCP).  Thus, the PT field may indicate, for example, that
     the data blocks contain IP, SNMP, ATM, Ethernet, or other layered
     protocols.

     PT values to support popular parallel programming APIs such as MPI
     have been defined.  The PktWay Enumeration document defines several
     values for this PT field.

     The PT field value of "RRP" indicates that message contains
     commands used in the PktWay Router-to-Router Protocol (RRP).

     Some PTs will also use the 2 byte Type Extension (TE) field which
     precedes the PT for passing PT-specific parameters, such as
     implementation specific de-multiplexing information.

     RRP messages (as described in the PktWay RRP document) use the TE
     field to distinguish among the various RRP-messages.




Cohen et al                                                    [Page 14]


Internet-Draft         PktWay End-to-End Protocol           October 1997


     Special Packet Types

       RRP - PktWay's Router/Router protocol (see the RRP document).

       ERR - Error reporting packet, usually sent to the Source Address
             (SA, see below) in response to a PktWay message that could
             not be properly handled, such as "Destination Unknown."
             The TE indicates the nature of the error (e.g., UNK) as
             defined in the PktWay Enumeration document.

   6f. Endianness (E), 4 bits

     If the SAN interface of the receiving-node detects Endianness
     that is different than its own and if the entire Data Block (DB)
     consists of N-byte fields, then it may activate byte-swapping
     hardware for N-byte fields, saving much work for the receiving
     node.

     The first bit (MSbit) of E, 'e' indicates whether the DB is in
     Big-Endian order (e=0) or in Little-Endian order (e=1).  The next
     3 bits could control hardware byte swapping, if any, which assumes
     that all the data consists of words of the same length.

     The meaning associated with the values of the 3 LSbits of this
     field are defined in the PktWay enumeration document.

   6g. Pad Length (PL) unsigned integer, 3 bits

     The number of padding bytes that were added at the end of the DB
     (i.e., from the end of the data to the end of the DB).  PL can be
     between 0 and 7.

   6h. Data Length (DL) unsigned integer, 25 bits

     Length, in 8B-words, of the data block, not including the L2RHs,
     EEP-header, OH, OT, and TAIL, including any optional padding.
     Hence, the net length of the Data Block is 8*DL-PL bytes.  The
     minimum is zero, and the maximum length is (2^25-1)*8 bytes = ~2^28
     = 256 MBytes.

   6i. Optional Header-Field Flag (h) 1 bit

     This bit is set to 1 if there are one (or more) optional header
     (OH) fields following the standard 16-byte EEP-header.

   6j. Reserved (RZ) 7 bits

     This field is reserved for future use.  Applications should neither
     use it, nor count on others not to use it.  It should be always set
     to zero (0b0000000).


Cohen et al                                                    [Page 15]


Internet-Draft         PktWay End-to-End Protocol           October 1997


   6k. Source Address (SA) 24 bit

     This field contains the physical address of the packet's original
     source in the same format as the DT.  However, unlike the DT, the
     SA must be a physical address.

     Filling in this field is optional.  A value of zero means that the
     SA is not specified.

     Routers may use this field to identify the sender to which error
     messages may be returned.



7. Optional Header Fields (OH) [3]

   A PktWay-message has Optional Header fields (OH) following the
   EEP-header, if the Option-Flag (h) is set to 1 in the EEP-header.

   Each OH is in the format:

+--------+--------+--------+--------+--------+--------+--------+--------+
|tttttttt|LLLLLLLL|  data  |........|........|........|........|........|
+--------+--------+--------+--------+--------+--------+--------+--------+

   The first byte indicates the optional header field type (OH-TYPE).

   The first bit, T, of the first byte indicates the processing of this
   OH-TYPE:

    T=0: Optional (may drop this field if this OH-TYPE is unknown)
    T=1: Mandatory (should not process this message if this OH-TYPE
                    is unknown)

   The second bit, C, of the first byte indicates whether there are more
   header fields (i.e., whether this is the last field of this message).

    C=0: More Optional Header fields follow
    C=1: End of Optional Header fields group (i.e., this is the last OH)

   The other 6 bits of this byte, tttttt, define application-specific
   OH-TYPEs.

   The second byte is the byte-count (L) of the data for this field that
   starts in the next byte, and is padded with as many padding bytes as
   needed to fill 8B-words.

   The length (L) does not include itself, hence it is between 0 and
   255.  The total number of 8B-words in the symbol L2RH is [(L+9)/8]
   where the square brackets indicate the integer part, rounded down,
   of the quantity within.  Therefore, the number of padding bytes is
   PL=8*[(L+9)/8]-2-L.

Cohen et al                                                    [Page 16]


Internet-Draft         PktWay End-to-End Protocol           October 1997


   Example: An Optional Header Field (OH) with a mandatory OH-TYPE and
   4 data bytes:

               L=4    #1       #2       #3       #4    padding  padding
+--------+--------+--------+--------+--------+--------+--------+--------+
|1xtttttt|00000100| data01 | data02 | data03 | data04 |  xxxx  |  xxxx  |
+--------+--------+--------+--------+--------+--------+--------+--------+
                  |<------------- value ------------->|




8. Optional Data Block (DB) [4]

   The DB is free for applications to use in any way.  Routers must not
   modify this field.

   The DB has DL 8B-words, including optional padding (at the end) of PL
   bytes.  Hence, the number of data bytes is 8*DL-PL.  Both DL and PL
   are specified in the EEP-header.

   The maximum length of the DB is 8*(2^25-1)B = ~256 MByte.




9. Optional Trailer Fields (OT) [5]

   A PktWay-message has Optional Trailer fields (OT) if so indicated in
   an Optional Header field, e.g., an OH field may indicate that a CRC64
   is in the OT.

   An OT may have just the data for an OH defined above (following the
   EEP header), or be a stand alone, self-defined field in the same
   format as OH.

   The OT-fields are in the order defined by the OHs.  For example, if
   an OH-field indicating that a CRC32 is in the OT, is followed by
   another OH-fields indicating that a CRC64 is in the OT, then the OT
   with the CRC32 should be followed by the OT with the CRC64.  Self
   defined OT fields must follow OTs defined by the OHs.











Cohen et al                                                    [Page 17]


Internet-Draft         PktWay End-to-End Protocol           October 1997


10. EEP Trailer (TAIL) [6]

   The TAIL consists of only the Error Indication (EI) field which is a
   single 8B-word.

   Routers may start forwarding packets toward their destinations before
   detecting transmission errors (such as in wormhole routing).  The EI
   field provides such routers with a means to append an error
   indication to the end of a packet.

   An all zero EI value means that no error was indicated.  Any non-zero
   EI value indicates one or more errors.

   The packet source will usually initialize the EI field to all zeros.
   However, as an alternative example, a memory board may create a
   packet with a non zero EI field (EI=1) that indicates that a parity
   error was detected by the memory board.

   Each router does an arithmetic left shift, on the EI field by one bit
   unless its MSbit is 1.  Routers that detect transmission errors also
   set the LSbit (after the shift) to 1.

   This provides the ability to identify which routers have indicated
   errors (if the route is known).



11. Appendix-A: A Recommendation for PktWay Address Assignment

   This section of the EEP document is a recommendation only, and not a
   part of the PktWay standard.

   Unlike IP addresses, physical PktWay addresses are not globally
   unique, but must be locally unique within each PktWay configuration.
   Hence, when SANs that were developed independently are interconnected
   to form a PktWay, conflicting physical addresses may occur.

   It is recommended not to attempt to assure local uniqueness of
   physical addresses by subdividing the global address space (hence,
   attempting to achieve global uniqueness).

   Instead, it is recommended that every SAN would have local PktWay
   addresses, between 1 and the number of its local nodes, and also have
   a global "bias" to be added to all the addresses in that SAN.  Hence,
   by proper setting of the biases of interconnected SANs, the local
   uniqueness of PktWay addresses is achieved.

   The coordination of these biases is left (at least now) for manual
   (static) out-of-band coordination.

   The use of such biases simplifies the mapping of physical addresses
   to their SANs.

Cohen et al                                                    [Page 18]


Internet-Draft         PktWay End-to-End Protocol           October 1997


12. Appendix-B: Glossary

   Address:       A unique designation of a node (actually an interface
                  to that node) or a SAN.
   Buddy-HR:       HRs are "buddies" if they are on the same SAN.
   Cut-Thru:       See wormhole.
   Destination:    The node to which a packet is intended
   Dynamic-Routing: Routing according to dynamic information
                   (i.e., acquired  at run time, rather than pre-set).
   Endianness:     The property of being Big-Endian or Little-Endian
                   (transmission order, etc.)
   Ethertype:      A 16-bit value designating the type of Level-3
                   packets carried by a Level-2 communication system.
   HR:             Half-Router, the part of a router that handles one
                   network only.
   L2-Forwarding:  Forwarding based on Level-2 (i.e., data-link layer
                   of the ISORM) information, e.g., the native technique
                   of each SAN or LAN.  Also called "source routing."
   L3-Forwarding:  Forwarding based on end-to-end
                   (Level-3 i.e., network layer of the ISORM) addresses.
                   Also called "destination routing."
   Map:            The topology of a network.
   Mapper:         A node on a SAN/LAN that has the map and an RT
                   for that network.  It is expected that the mapper
                   dynamically updates the map and the RT.
   Multi-homed Node: A node with more than one network interface, where
                   each interface has another address.
   Node:           Whatever can send and receive packets
                   (e.g., a computer, an MPP, a software process, etc.)
   Node structure: A C-struct (or equivalent) containing values for some
                   attributes of a node.
   Planned Transfer: Transfer of information, occurs after an initial
                   phase in which the sender decides which Level-2 route
                   to use for that transfer.
   RCVF:           The "Received From" set includes all the physical
                   addresses through which an RT was disseminated,
                   starting with the address of the mapper that created
                   that RT.
   Re-direct-message: A message that tells nodes which HR should be
                   used in order to get to a certain remote address.
   Router:         The inter-SAN communication device
   Security Context: A relationship between 2 (or more) nodes that
                   defines how the nodes utilize security services to
                   communicate securely.
   Source:         The node that created a packet.
   Source-Route:   A Level-2 route that is chosen for a packet by its
                   source.
   Symbol:         Data preceeding the EEP header of a PktWay message,
                   interleaving with the L2RHs.



Cohen et al                                                    [Page 19]


Internet-Draft         PktWay End-to-End Protocol           October 1997

   Twin-HR:        Two HRs are twins if they both are parts of the same
                   inter-SAN router.
   Wormhole-routing: (aka cut-thru routing) forwarding packets out of
                   switches as soon as possible, without storing that
                   entire packet in the switch (unlike Stop-and-forward)
   Zero-copy TCP:  A TCP system that copies data directly between the
                   user area and the network device, bypassing OS copies


13. Appendix-C: Acronyms and Abbreviations

   0bNNNN  The binary number NNNN (e.g., 0b0100 is 4-decimal)
   0xNNNN  The hexadecimal number NNNN (e.g., 0x0100 is 256-decimal)
   8B      8 byte (64 bits) entity
   ADDR    The Address-record of RRP
   APIn    Application/Program Interface
   AT      Address Type
   ATM     Asynchronous Transmission Mode
   B       Byte (e.g., 4B)
   b       bit (e.g., 32b)
   BC      Byte Count (of parameters)
   BER     Bit Error Rate
   CAPA    The CAPAbility-record of RRP
   CC      Capability Code
   CSR     Common Source-Route
   DA      Destination Address
   DB      Data Block
   DL      Data Length (in 8B words)
   DSP     Digital Signal Processor
   DT      Destination-Type
   e       The MSbit of E
   E       The Endianness field (in the EEP header)
   EEP     End/End Protocol
   EI      Error Indication
   GP      General Purpose
   GVL2    An RRP message, requesting L2 route to a given destination
   GVRT    An RRP message asking an HR to give its routing tables
   h       Optional header fields flag
   HR      Half Router
   HRTO    An RRP message asking which HR to use for a given destination
   ID      Identification
   IGMP    Internet Group Management Protocol
   INFO    An RRP message providing information about nodes
   IP      The Internet protocol
   ISORM   The ISO Reference Model
   L       Length field (exclusive of itself)
   L2      Level-2 of the ISORM (Link)
   L2RH    Level-2 Routing Header
   L2SR    Source Route
   L3      Level-3 of the ISORM (Network)
   LA      Logical Address
   LADR    The Logical-addresses-record of RRP

Cohen et al                                                    [Page 20]


Internet-Draft         PktWay End-to-End Protocol           October 1997

   LAN     Local Area Network
   LRT     Local Routing Table
   LSbit   Least Significant bit
   LSbyte  Least Significant byte
   MAC     Message Authentication Code / Media Access Control
   MPI     Message Passing Interface
   MPP     Massively Parallel Processing system
   MSbit   Most Significant bit
   MSbyte  Most Significant byte
   MSU     Mississippi State University
   MTU     Maximum Transmission Unit
   MTUR    The MTU-record of RRP
   M/C     Multicast
   NAME    The name-record of RRP
   NFS     Network File Server
   OH      Optional Header field
   OH-TYPE The Type of an Optional Header field
   OT      Optional Trailer field
   P       The Priority field
   PAD     Padding After Data
   PBD     Padding Before Data
   PCI     The Peripheral Component Interconnect "standard"
   PH      PacketWay Header
   PL      Padding Length (always in bytes)
   PPP     The Point-to-Point Protocol
   PROM    Programmable ROM (Read-Only-Memory)
   PT      Packet Type (2B)
   PVM     Parallel Virtual Machine
   PW      The Myrinet Packet Type assigned to PktWay (PW=0x0300)
   Q       Quality (of a path)
   RCVF    Received-From list, or the Received-From record of RRP
   RDRC    A re-direct message of RRP
   RH      Routing Header
   RID     Record ID
   RL      Record Length (in 8B-words)
   RRP     Router/Router Protocol
   RT-hd   RT (Routing Table) header
   RT      Routing Table
   RTBL    An RRP message proving a Routing Table
   RTHD    The Routing-Table-Header record of RRP
   RTyp    RRP's Record Type
   RZ      The Reserved field (in the EEP header)
   SA      Source Address
   SAN     System Area Network
   SAN-ID  The 24-bit PktWay-address of a SAN
   SAR     Segmentation and Reassembly
   SN      Serial Number
   SNID    SAN-ID
   SNMP    Simple Network Management Protocol
   SR      Source Route (always at Level-2)
   SRQR    The Source-Route-and-Q-record of RRP
   ST      Symbol Type

Cohen et al                                                    [Page 21]


Internet-Draft         PktWay End-to-End Protocol           October 1997

   TAIL    PacketWay EEP Trailer
   TE      Type Extension (2B)
   TELL    An RRP message requesting information about nodes
            partially specified
   UNK     Unknown
   V       Version
   WRU?    An RRP message asking its recipient to identify itself
   XRT     External Routing Table
   xxxx    A padding byte


14. Appendix-4: PktWay at a Glance (aka "The Cheat-Sheet")

 2   6    type       24                     16                16
+-+------+-------+--------+---------+--------+--------+--------+--------+
|V|  P   |     Destination-Type     |  Type-Extension |   Packet-Type   |
+-+-+---++--------------------------+-+------+--------+-----------------+
| E | PL|   Data-Length (8B-words)  |h|  RZ  |0     Source-Address      |
+---+---+--------+--------+---------+-+------+--------+--------+--------+
  4    3             25              1   7    1           23

                type = 0xxx Physical Address
                       10xx L2RH
                       110x Reserved
                       1110 Logical Address
                       1111 Symbols
L2RH:
  2   6    2   6      8        8        8        8        8        8
+--------+--------+--------+--------+--------+--------+--------+--------+
|V|  P   |10LLLLLL|  SR01  |  SR02  |........|........|........|........|
+--------+--------+--------+--------+--------+--------+--------+--------+
            Length
Symbol:
  2   6     4   6     8        8        8        8        8        8
+--------+--------+--------+--------+--------+--------+--------+--------+
|V|  P   |1111ssss|ssssssss|ssssssss| Length |  data  |........|........|
+--------+--------+--------+--------+--------+--------+--------+--------+
              <---- Symbol Type --->
Optional Header:
  2   6      8        8        8        8        8        8        8
+--------+--------+--------+--------+--------+--------+--------+--------+
|TCtttttt|LLLLLLLL|  data  |........|........|........|........|........|
+--------+--------+--------+--------+--------+--------+--------+--------+
T: 0=optional, 1=mandatory;  C: 0=more OH-fields follow, 1=last OH-field

RRP Record:
    8        8        8        8        8        8        8        8
+--------+--------+--------+--------+--------+--------+--------+--------+
|  RTyp  |   PL   |       RL        |........|........|........|........|
+--------+--------+--------+--------+--------+--------+--------+--------+
RRP-messages: GVL2, L2SR, RDRC, TELL, INFO, HRTO, WRU,  GVRT, RTBL;
        RTyp: ADDR, NAME, CAPA, LADR, SRQR, MTUR, RCVF, RTHD;

Cohen et al                                                    [Page 22]


Internet-Draft         PktWay End-to-End Protocol           October 1997




15. Security Considerations

   This RFC raises no security issues.



16. Editor' Address

   Danny Cohen
   Myricom, Inc.
   325 N. Santa Anita Ave
   Arcadia, CA 91006

   Phone: 626-821-5555
   Fax:   626-821-5316
   Email: Cohen@myri.com


































Cohen et al                                                    [Page 23]