Network Working Group                                      R. R. Stewart
INTERNET-DRAFT                                                    Q. Xie
                                                                Motorola
                                                                 T. Bova
                                                               S Hussain
                                                           T Krivoruchka
                                                                R. Revis
                                                                   Cisco

expires in six months                                      April 19 1999

           MULTI_NETWORK DATAGRAM TRANSMISSION PROTOCOL
                <draft-ietf-sigtran-mdtp-04.txt>

Status of This Memo

This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026. Internet-Drafts are working
documents of the Internet Engineering Task Force (IETF), its areas,
and its working groups.  Note that other groups may also distribute
working documents as Internet-Drafts.

The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt

The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.

Abstract

This Internet Draft discusses an experimental call control signaling
transport protocol, namely the Multi-network Datagram Transmission
Protocol (MDTP), that is intended to provide fault-tolerant reliable
data transfer between communicating entities over IP networks [1].

MDTP is proposed as an application-level protocol which is designed
with a high emphasis on supporting redundant networks and transparent
fault management. MDTP also gives the user a great degree of timing
control and configuration flexibilities in order to meet the stringent
time constraints often found in telephony signaling protocols. The
motivation of developing MDTP is to establish a framework for
supporting Internet-based high reliability real-time commercial
applications such as signaling and call control for Internet
telephony.

                        TABLE OF CONTENTS

1.  Introduction
     1.1 Design Requirements of MDTP
     1.2 Interfaces to MDTP
2.  MDTP Datagram Format
     2.1 Header Field Descriptions
     2.2 Data Field
3.  Transmission Initialization
     3.1 Endpoint Association Initialization
       3.1.1 Choice of Tag Value
     3.2 Data Field Format of Initiation Datagrams
     3.3 Initialization Collision
     3.4 Association Re-initialization
4.  Reliable Transfer of Datagrams
     4.1 Timer Management Rules
       4.1.1 Link Rotation
     4.2 Gap Acknowledgment for Missing Datagrams
     4.3 Congestion Control
       4.3.1 Sending with Window Control
       4.3.2 Window Length Adjustment
       4.3.3 Flow Control using In-Queue Information
       4.3.4 T3-send Timer Adjustment with RTT
     4.4 Sequence Number Reset
     4.5 Datagram Re-transmission
       4.5.1 Re-transmission on Redundant networks
     4.6 RTT Measurement
       4.6.1 RTT Datagram Header Format
       4.6.2 Measure RTT
     4.7 Link Heart Beat
     4.8 Advisory Acknowledgment
     4.9 Termination of an Association
     4.10 Draining of an Association
5. Interface with upper level protocols
6. Suggested MDTP Protocol Parameter Values
7. Acknowledgments
8. Author's Addresses
9. References
Appendix A: Stream-based Reliable and Ordered Delivery
     A.1 Stream Initiation
     A.2 Stream Termination
     A.3 Stream Datagram Transfer
       A.3.1 Header Format in Stream Datagrams with User Data
       A.3.2 Transmission of Stream Datagrams
       A.3.3 Extended Stream Ack
     A.4 Other Issues with Stream Transfer
Appendix B: Bundled Message Transfer
     B.1 Format of Bundled Datagram
     B.2 Bundled Datagram Transfer
Appendix C: Fragmented Message Transfer
Appendix D: Multicast Datagram Transfer
     D.1 Multicast Datagram Header Format
     D.2 Transmission of Multicast Datagrams
Appendix E: Unreliable Delivery
     E.1 Ordered Unreliable Delivery

1.  Introduction

This Internet Draft discusses an experimental protocol, namely the
Multi-network Datagram Transmission Protocol (MDTP). The intention of
developing MDTP is to provide a fault-tolerant, real-time reliable
data transfer mechanism between communicating endpoints over IP
networks [1].

MDTP is proposed as an application-level protocol which is designed
with a high emphasis on supporting redundant networks and transparent
fault management. MDTP also gives the user a great degree of timing
control and configuration flexibilities in order to meet the stringent
time constraints often found in telephony signaling protocols. The
motivation of developing MDTP is to establish a framework for
supporting Internet-based high reliability real-time commercial
applications such as signaling and call control for Internet
telephony.

MDTP is also designed to be scalable in order to support different
signaling transport requirements for different interfaces in a
telephony network.

For example, the transportation of signaling protocols such as PRI
ISDN may not require redundant links, and hence only a subset of MDTP
will need to be implemented.  On the other hand, redundant networks
may be mandated when transporting SS7 signaling messages amongst
different components in a carrier-grade telephony core network.  In
such cases, the transparent support for redundant networks, load
sharing, and fault management defined in MDTP become essential and
likely need to be fully supported in an implementation.

Many of the fundamental concepts that have made TCP such a useful
protocol are reused in MDTP, and some of the advantages of UDP are
also merged into the design. This has lead to a highly effective,
robust protocol for fault tolerant data communications.

This document describes the functional interface and the details
necessary for implementing MDTP. The main body of this document
contains the minimal set of functionalities of MDTP that must be
implemented. In the Appendices, a set of additional MDTP functions,
such as reliable stream, multicast, message bundling, message
fragmentation, are defined. Those additional functionalities are
optional to implementation.

1.1 Design Requirements of MDTP

The following are some of the design requirements of MDTP, in order to
make MDTP capable of supporting real-time call control environments
which potentially may employ redundant networks:

A) High communication fan-out: an endpoint may need to be in
   simultaneous communication with hundreds or thousands of endpoints
   performing various call processing functions. These endpoints may
   be codec converters, SS7 to IP translation applications, or, in the
   case of mobile networks, data selector and combiner applications.

B) Stringent timer control: an endpoint needs to have a very fine
   control over the timing for delivering a datagram. The timing
   should be easily adjusted depending on the message type and the
   destination. For example, after a few seconds of non-delivery the
   call which the message is about may not exist anymore.

C) Support redundant links: an endpoint communicating with a peer
   should be able to take advantage of the redundant networks in a
   transparent way. This means that the application or upper layer
   protocols need not to be involved in the network fault
   management. Instead, when network failure occurs MDTP should be
   able to automatically re-route the out-bound datagram to the
   alternate network (if one exists) without intervention from the
   application.

D) Orderly delivery: datagrams may arrive out of order, or may arrive
   in duplicate copies. This is especially true if redundant networks
   are used. MDTP should be strong enough to properly handle both
   situations with little intervention from the upper layer protocols
   or applications.

F) Support stream sequencing: on the demand of the upper layer
   protocols or applications, MDTP should be able to support sequenced
   delivery with regard to each individual stream, i.e., the delay caused
   by the loss and retransmission of a datagram should be isolated to
   only the stream to which the datagram belongs. This is particularly
   important in some call control applications, where a loss of a
   message should only affect the call whom the message belongs to.

1.2 Interfaces to MDTP

The application programs or upper layer protocols interface with MDTP
through a set of primitives (see section 5. for details).

Towards the networks, it is assumed that a UDP-like data transport
protocol will provide the interface between MDTP and the operating
system. No special interfaces or changes are assumed within the
operating system, all queuing and endpoint association information are
maintained inside MDTP layer.

2.  MDTP Datagram Format

MDTP inserts the following protocol header at the beginning of every
user datagram. The integer fields shall be transmitted in network byte
order.

                         MDTP Header Format

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                  MDTP Protocol Identifier                     |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   Version     |              Flags            |   In Queue    |
   |               |N N W I F R D A M S W R R F G U|               |
   |               |O O I S I T A C U H N E T L A N|               |
   |               |M B N B R M T K L U R 1 C O R R|               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                Acknowledgment Number (Seen)                   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                   Sequence Number (Send)                      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |        Data Size              |    Part       |      Of       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   \                                                               \
   /                             data                              /
   \                                                               \
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

2.1 Header Field Descriptions

    MDTP Protocol Identifier: 32 bits

      This shall be a fixed long value of 0xf7873072. The receiver
      shall always verify this Protocol Identifier before it proceeds
      any further in interpreting the header fields.

    Version: 8 bits

      This field represents the version number of the MDTP protocol
      (value TBD).

    Flags: 16 bits

      NOM - shall be set to 1 (reserved for fragmentation, see
      Appendix C)

      NOB - shall be set to 1 (reserved for bundling, see Appendix B)

      WIN - Window Up. This bit is set by the sender of this datagram
      to indicate that the sender needs the receiver to acknowledge on
      previously received datagrams before it can send more datagrams.

      ISB - shall set to 0 (reserved for bundling, see Appendix B)

      FIR - First Datagram. This flag is set to indicate that this is a
      Initiation datagram.

      RTM - normally set to 0 (used for Link Heart Beat and RTT
      measurement, see sections 4.6 and 4.7)

      DAT - Data Present. This bit is set to indicate that, following
      this header, application data is present in this datagram.

      ACK - Acknowledge. This bit is set to indicate that the sender is
      acknowledging the reception of the specified Acknowledgment Number.

      MUL - shall be set to 0 (reserved for multicast, see Appendix D)

      SHU - Shutdown. This bit is set when the sender initiates its
      closing procedure and indicates to the receiver that the sender
      is no longer a valid destination. If the UNR bit is set in
      conjunction with the SHU bit, an incomplete shutdown is
      specified. After an incomplete shutdown, the receiver can still
      re-establish the communication with the sender by re-initiating
      with the sender (see 4.7).

      WNR - Window Up Response. This bit is set in the acknowledgment
      reply to a Window Up flag.

      RE1 - normally set to 0 (used for advisory ACK, see section 4.8)

      RTC - normally set to 0, (used for RTT, see section 4.6)

      FLO - shall be set to 0 (reserved for reliable stream, see
      Appendix A)

      GAR - shall be set to 1 (reserved for unreliable mode, see
      Appendix E)

      UNR - shall be set to 0 (reserved for unreliable mode see
      Appendix E)

    In Queue: 8 bits

      This field contains the number of messages the sender has on its
      incoming queue, waiting to be read by the application. This
      gives the receiver an indication of the flow control conditions
      within the sender.

    Acknowledgment Number (or Seen): 32 bits

      If the flag ACK is set this value is the last sequence number
      that the sender of this datagram received from the
      receiver of this datagram.

    Sequence Number (or Send): 32 bits

      If DAT flag is set, this value represents the sequence number of
      the current data unit following this header. Otherwise, this
      value will be the sequence number of the next data unit that
      will be sent.

    Data Size: 16 bits

      This value represents, in number of octets, the size of the data
      field that follows this header in the current datagram.

    Part: 8 bits

      shall have value '0' (reserved for fragmentation, see Appendix C)

    Of: 8 bits

      shall have value '1' (reserved for fragmentation, see Appendix C)

2.2 Data Field

When the DAT flag is set to 1, the MDTP datagram header will be
followed by a data field. An implementation may choose to pad some
'0's at the end of the data field so as to align with certain memory
boundaries. However, the padded '0' octets, if there are any, shall
not be counted in the Data Size.

The maximal Data Size for a single MDTP datagram is the MTU size of
the underlying transport protocol (e.g., UDP) minus the MDTP header
size.

3.  Transmission Initialization

3.1 Endpoint Association Initialization

Before the first data transmission can take place from one endpoint
("A") to another endpoint ("Z"), the two endpoints will need to
complete an initialization process in order to set up an association
between them.

The initialization procedure should be made transparent to the upper
layer protocol, i.e., it should take place automatically whenever the
upper layer tries to send a datagram to an endpoint which has never
been sent to before. The user datagram shall be withheld by MDTP from
transmission till the completion of the initialization.

A tag-and-lock mechanism is employed during the initialization in
order to guard against erroneous or stale datagrams (this is
especially true if redundant networks are deployed).

The initialization process consists of the following steps (assuming
the upper layer at "A" tries to send data to "Z" for the first time):

A) "A" first sends an Initiation (FIR) to "Z", with Seen field set
   to 0 and Send field set to Tag_A, and then enters the Tag-lock mode
   (see below).

B) "Z" responds immediately with an Initiation Ack (FIR|ACK), with
   Seen set to Tag_A and Send set to Tag_Z, and then enters the
   Tag-lock mode, too (see below).

Note that no user data should be carried in the Initiation or
Initiation Ack datagram.

At this point "Z" is ready to send user data to "A". And upon the
receipt of the above Initiation Ack from "Z", "A" can also start
sending user data to "Z".

However, the first datagram with user data transmitted by "A" to "Z"
shall have the Seen value set to Tag_Z, which is obtained from the
Initiation Ack. And similarly, the first datagram with user data
transmitted by "Z" to "A" shall have the Seen value set to Tag_A,
which comes from the Initiation datagram.

In the Tag-lock mode, each side will silently discard any datagrams
with user data from the other side until it receives the first
datagram with user data and with a Seen value that matches its own
Tag. Once that datagram is received, that endpoint will leave the
Tag-lock mode and immediately send back a data acknowledgment, and
start using the sequence numbers to filter out missing and duplicate
datagrams.

If another Initiation from "A" is received by "Z" after it sent out
the Initiation Ack, "Z" will acknowledge this Initiation by re-sending
the Initiation Ack only when the Send field of this new Initiation has
the same tag as that of the original Initiation.  Otherwise, "Z" will
send an Initiation of its own with Send field set to Tag_Z back to "A"
to elicit an Initiation Ack from "A".

In the following example, "A" initiates the association first and then
sends a datagram with user data to "Z":

   Endpoint A                                          Endpoint Z

   {first app message to Z}
   [Header Flags=FIR
             & other options
           Seen=0,Send=Tag_A] ----------------------->
   (Start T1-init timer)
   (Enter Tag_A-lock mode)
                                              [Header Flags=FIR|ACK
                                                        & other options
                                   /---------- Seen=Tag_A,Send=Tag_Z]
                                  /           (Enter Tag_Z-lock mode)
   (Cancel T1-init timer)<-------/

   [Header Flags=ACK|DAT
             & other options
           Seen=Tag_Z,Send=1]
           [data field]   -----------\
   (Start T3-send timer)              \
                                       \----> (Leave Tag_Z-lock mode)

If T1-init timer expires at "A" after the Initiation sent, the same
Initiation datagram with the same Tag_A value will be retransmitted
and the timer restarted. This will be repeated Max.Init.Retransmit
times before "A" considers "Z" unreachable and optionally reports the
failure.

3.1.1 Choice of Tag Value

Tag values should be selected from the range of 0x80000000 to
0xffffffff.

3.2 Data Field Format of Initiation Datagrams

If redundant networks exist between two endpoints, the data field of
the Initiation and Initiation Ack datagrams will carry the redundant
network information.

The following shows the data field format carrying N IPv4 redundant
network information:

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                     Number of Networks = N                    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |       Size of address=8       |    Type of Address=AF_INET (2)|
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |             IP Address of Network 1                           |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |         Port # 1              |      Padding = 0              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   /                                                               /
   \                              ...                              \
   /                                                               /
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |       Size of address=8       |    Type of Address=AF_INET (2)|
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |             IP Address of Network N                           |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |         Port # N              |      Padding = 0              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Additional implementation-specific data is allowed after the redundant
network information. No user data, however, is allowed to be
transported in Initiation or Initiation Ack datagrams.

3.3 Initialization Collision

If two endpoints attempt to initialize an association with each other
at about the same instance, a collision will occur, i.e., each side
will receive an Initiation datagram from the other side after it
transmitted its own. In such a case, both sides shall acknowledge the
Initiation datagram of the other side in the normal procedure as
described above.

3.4 Association Re-initialization

An endpoint shall be allowed to re-initialize an established
association with another endpoint.

In such a case, the endpoint that initiates the re-initialization
(i.e, the initiator) shall use a tag different from the one used in
the previous initialization. And the initiator shall follow the normal
initialization procedure as stated in section 3.1.

Once left the Tag-lock mode of the current association initialization,
an endpoint shall treat any new incoming Initiation from its peer as a
re-initialization event. Upon the arrival of the new Initiation
datagram from the peer, the receiving endpoint shall also follow the
procedure stated in section 3.1 to respond.

4.  Reliable Transfer of Datagrams

Reliable transfer is indicated if the datagram being transferred has
GAR bit set to 1 and the UNR bit set to 0. The receiver of a
reliable datagram shall always acknowledgment the sender.

Normally, delayed acknowledgment is used, and the acknowledgment can
either be sent separately or piggy-backed on a datagram traveling
in the opposite direction.

The following example illustrates both separate and piggy-backed
acknowledgments with both ends transmitting in reliable mode:

Endpoint A                                      Endpoint Z
{App sends 3 messages}
[Header Flags=DAT|ACK|GAR
        Part=0,Of=1
        Seen=0,Send=1,Size=100]-------------> (Start T2-receive timer)
(Start T3-send timer)

[Header Flags=DAT|ACK|GAR
        Part=0,Of=1
        Seen=0,Send=2,Size=100]----------->
(Restart T3-send timer)

[Header Flags=DAT|ACK|GAR
        Part=0,Of=1
        Seen=0,Send=3,Size=100]----------->
(Restart T3-send timer)
                                              ...
                                              {Timer T2 expires}
                                 /----------- [Header Flags=ACK
                                /              Part=0,Of=0
                               /               Seen=3,Send=1]
                              /
(cancel T3-send timer) <------
...
...
{App sends 1 message}
[Header Flags=DAT|ACK|GAR
        Part=0,Of=1
        Seen=1,Send=4,Size=100]-----------> (Start T2-receive timer)
(Start T3-send timer)
                                              ...
                                              {App sends 1 message}
                                              (cancel T2-receive timer)
                                 /----------- [Header Flags=DAT|ACK|GAR
                                /              Part=0,Of=1
                               /               Seen=4,Send=1,Size=45]
                              /               (Start T3-send timer)
(cancel T3-send timer) <------
(Start T2-receive timer)
..
{Timer T2 Expires}
[Header Flags=ACK
        Part=0,Of=0
        Seen=1,Send=5]------------------> (cancel T3-send timer)

Note that if the datagrams previously received from the same sending
endpoint was transmitted in Unreliable transfer mode (see Appendix E
for details on Unreliable transfer), the receiving endpoint must
reset its Seen counter to the value of the Send field in the current
reliable datagram.

4.1 Timer Management Rules

The the following rules shall be used to manage the timers during
normal Reliable transfer, unless otherwise stated for some special
cases:

A) When a reliable datagram with user data (i.e., with DAT flag set) is
   received, the endpoint shall start a T2-receive timer if no other
   timer is running, and upon the expiration of the T2-receive timer,
   the endpoint shall ack to the sender all the un-acked datagrams
   it has received.

B) When a reliable datagram with user data is sent out, the sending
   endpoint shall start a T3-send timer. If the T3-send timer is
   already running, the endpoint shall first stop the old T3 timer
   and then start a new one. If the T2-receive timer is running, the
   endpoint shall first stop the T2 timer, piggyback an Ack unto the
   out-bound datagram, and then start a T3-send timer. Upon the
   expiration of the T3-send timer, the endpoint shall follow the rules
   described in 4.5 for possible re-transmission of the un-acked
   datagrams. Whenever the T3-send timer is started the RTT estimate
   last calculated for that network should be added to the base
   T3-send timer value (if a RTT value is measured, see section 4.6).

C) When all outstanding datagrams are acknowledged, the T3-send timer
   shall be stopped if one is still running.

The following example shows the use of various timers.

Endpoint A                                         Endpoint Z
{App sends 2 messages}
[Header Flags=DAT|ACK|GAR
        Part=0,Of=1
        Seen=1,Send=6,Size=100]-----------> (Start T2-receive timer)
(Start T3-send timer)

[Header Flags=DAT|ACK|GAR
        Part=0,Of=1                           {App sends 1 message}
        Seen=1,Send=7,Size=100]---\      /--- (cancel T2-receive timer)
(Restart T3-send timer)            \    /     [Header Flags=DAT|ACK|GAR
                                    \  /       Part=0,Of=1
                                     \/        Seen=6,Send=2,Size=100]
                                     /\       (Start T3-send timer)
                                    /  \
                              <----/    ---->
...
...
{T3-send timer expires}
(re-transmit 2nd datagram)
[Header Flags=DAT|ACK|GAR
        Part=0,Of=1
        Seen=2,Send=7,Size=100]---------> (Cancel T3-send timer)
(Restart T3-send timer)                       (Start T2-receive timer)

                                              ..
                                              {Timer T2 expires}
(Cancel T3-send timer)        <-------------- [Header Flags=ACK
                                               Part=0,Of=0
                                               Seen=7,Send=3]

4.1.1 Link Rotation

When multiple networks exist between two communicating endpoints,
every time the application transmits a datagram, the MDTP
implementation MUST keep track of which network the transmission was
sent on (if more than one network exists) in the MDTP protocol variable
'last.sent.intf'. If the user does not specifically override rotation,
each send should be rotated in a round robin fashion amongst all
available networks and the protocol variable 'last.sent.intf' should
be updated to indicate which interface was used last.

The MDTP implementation MUST allow a user to override this rotation
defeating MDTP's rotation upon each send. The implementation must also
provide a interface to add and remove a link from rotation eligibility.

4.2 Gap Acknowledgment for Missing Datagrams

If reliable datagrams become missing during a series of transmissions,
a special type of acknowledgment known as the Gap Ack will be sent
back to inform the sender to re-transmit the missing datagrams.

The following example shows the use of Gap Ack.

Endpoint A                                    Endpoint Z
{App sends 3 messages}
[Header Flags=DAT|ACK|GAR
        Part=0,Of=1
        Seen=3,Send=8,Size=100]-----------> (Start T2-receive timer)
(Start T3-send timer)

[Header Flags=DAT|ACK|GAR
        Part=0,Of=1
        Seen=3,Send=9,Size=100]-----X (lost)
(Restart T3-send timer)

[Header Flags=DAT|ACK|GAR
        Part=0,Of=1
        Seen=3,Send=10,Size=100]-----------> (A gap detected in data)
(Restart T3-send timer)
                                             ..
                                             {T2-receive timer expires}
                                    /------- [Header Flags=ACK
                                   /          Seen=9,Send=3,
                                  /           Part=1,Of=1
                                 /            data=(long integer)10]
(Prepare retransmit)   <--------/

In this example, when "Z" receives the third datagram from "A" it
realizes that a gap exists in the received data.  At the expiration of
T2-receive timer, "Z" sends a Gap Ack, in place of a normal Ack, to
"A" to indicate the missing datagram.

In the Gap Ack, the Part and Of fields are both set to '1', as opposed
to '0' as in a normal Ack. The data field of the Gap Ack is a four (4)
octet long integer containing the sequence number of the next datagram
after the Gap (which is 10 in this example).  The Seen field in
the Gap Ack will contain the sequence number of the datagram of the
gap.  Using these two values, "A" should be able to calculate the
the missing datagram numbers (which is 9 in this
example) and thus determine which datagrams will need to be
retransmitted.

Note that Gap Acks cannot be piggy-backed with user data; if there is
user data to be sent when a gap is detected, the Gap Ack must be sent
out first before the datagram carrying user data can be sent.

4.3 Flow and Congestion Controls

Several different mechanisms shall be used jointly to achieve
flow and congestion controls in MDTP.

4.3.1 Sending with Window Control

The sending endpoint shall use a transmission window to control the
number of outstanding datagrams, i.e., datagrams that have been sent,
but yet to be acknowledged. The length of the window is defined as the
maximal number of outstanding datagrams a sending endpoint can
allow. This length is adjusted dynamically, depending on the current
number of successful transmissions as well as the number of lost
datagrams.

When the number of outstanding datagrams reaches the current window
length, the endpoint shall still accept send requests from its upper
layer, but shall transmit no more datagrams until an Ack is received.

Moreover, when the window length is reached, the next send request
from the upper layer will trigger the sending endpoint to transmit a
special Window Up message. Upon receiving this Window Up (WIN|ACK) the
receiver must respond with a Window Up Response (WNR|ACK), as
illustrated by the following example (assuming current window length
is 3):

Endpoint A                                      Endpoint Z
{App sends 3 messages}
[Header Flags=DAT|GAR|ACK
        Part=0,Of=1
        Seen=0,Send=11,Size=100]-----------> (Start T2-recv timer)
(Start T3-send timer)

[Header Flags=DAT|GAR|ACK
        Part=0,Of=1
        Seen=0,Send=12,Size=100]----------->
(Restart T3-send timer)

[Header Flags=DAT|GAR|ACK
        Part=0,Of=1
        Seen=0,Send=13,Size=100]----------->
(Restart T3-send timer)

{App sends a new message}
(queue new message and send Win Up)
[Header Flags=WIN|ACK
        Seen=0,Send=14]--------------------> (cancel T2-recv timer)
                                      /----- [Header Flags=WNR|ACK
                                     /        Part=0,Of=0
                                    /         Seen=14,Send=0]
[Header Flags=DAT|GAR|ACK <--------/
        Part=0,Of=1
        Seen=0,Send=15,Size=100]-----------> (Start T2-recv timer)
(Restart T3-send timer)

In the above example, after the transmission of the first three
datagrams, "A" reached its window length. The next message from the
user triggered a Window Up that was sent to "Z". The Window Up shall
contain no user data. In response, "Z" cancelled timer T2 and
immediately sent a Window Up Response. The arrival of this Window Up
Response effectively resolved all the outstanding datagrams at "A",
thus allowed "A" to send out the next datagram.

4.3.2 Window Length Adjustment

The window length shall be initially set to 2, and shall then be
dynamically adjusted based on the datagram loss and acknowledgment
conditions of the underlying network.

When 4 consecutive outstanding datagrams are acknowledged at once by
the receiver, the sender's window length will be raised by 1 until it
reaches the protocol parameter 'Max.Outstanding.dg' (which should be a
user configurable parameter).

If the current window length is less than 4, every time when the
number of consecutively outstanding datagrams acknowledged in a single
Ack is equal to or greater than the current window length, the
sender's window length shall be raised by 1, until it reaches
'Max.Outstanding.dg'.

In the following circumstances, the sender's window length shall be
decreased. However, when the window length reaches 2 it shall not be
decreased any further.

If between 1 to 3 consecutive datagrams are lost, the window length
will be decreased by 1. If between 4 to 7 datagrams are lost, the
window length will be decreased by 2. If 8 or more datagrams are lost,
the window length will be decreased by 4.

Moreover, any time a Window Up is sent to the receiving endpoint the
sender's window length will be decreased by 1. Also, if a timeout
forces a retransmission the sender's window length will be reduced
to half of its currently value.

The following table summarizes these rules:
- -----------------------------------------------------------------------
  Duplicate Ack received by sender  | Adjust down by 4
- -----------------------------------------------------------------------
  Greater than 8 datagrams lost     | Adjust down by 4
- -----------------------------------------------------------------------
  Greater than 4 datagrams lost     | Adjust down by 2
- -----------------------------------------------------------------------
  Greater than 0 datagrams lost     | Adjust down by 1
- -----------------------------------------------------------------------
  Timeout forces retransmission     | Adjust down by 1/2 of the current
                                    | window.
- -----------------------------------------------------------------------
  Window Up sent                    | Adjust down by 1
- -----------------------------------------------------------------------
  4 or more consecutive datagrams   | Adjust up by 1
  acknowledged (window length > 4)  |
- -----------------------------------------------------------------------
  1/2 Window length or more acked   | Adjust up by 1
  (window length <=4)               |
- -----------------------------------------------------------------------

4.3.3 Flow Control using In-Queue Information

By using the In Queue field in the MDTP header, the sender can inform
the receiver the number of pending datagrams which the sender has
received, but yet to deliver to its application. The following example
shows how the endpoints use In Queue value to accomplish Flow control.

Assume that Endpoint A has sent Endpoint Z 20 datagrams, and when
Endpoint Z sends an Ack on the reception of these 20 datagrams, only
the first one of them has been delivered to the upper layer at
Endpoint Z.

In the Ack sent by Endpoint Z, the In Queue field would then have a
value of 19, indicating the number of datagrams pending for delivery
to its upper layer. This value would be checked by Endpoint A before
it sent the next datagram to Endpoint Z. If this value was found to be
greater than its current window length, Endpoint A would not send the
next datagram. Instead, Endpoint A would start its T3-send timer and
send a Window Up message to Endpoint Z at the expiration of the timer.
This would force Endpoint Z to send another Ack with an updated In
Queue value. If the new In Queue value was still greater than its
window length, Endpoint A would re-start its T3-send timer, and repeat
this procedure until the In Queue value of Endpoint Z dropped below
the current window length of Endpoint A.  Then, the transmission at
Endpoint A would resume.

4.3.4 T3-send Timer Adjustment with RTT

If the RTT measurement is available on a specific network, the sender
shall adjust the T3-send timer each time when sending datagram using
this network. The calculation and adjustment of the timer should
follow the method described in [4]. RTT measurement shall be tracked
for each network if redundant networks are in use.

MDTP defines two optional methods to obtain RTT measurements, see
sections 4.6 and 4.7.

4.4 Sequence Number Reset

When the datagram sequence number reaches the value 0x7fffffff the
next sequence number shall be set to 1.

4.5 Datagram Re-transmission

Whenever a T3-send timer expires, the endpoint shall re-transmit the
un-acked datagram that has the lowest Send value, unless:

A) If the current window length is reached, a Window Up message will
   be sent out (see 4.3 Congestion Control), or

B) If the current window length is not reached and there is still
   user data pending for transmission, a new datagram with user data
   shall be sent out and T3-send timer shall be restarted.

When a T3-send timer is started at a re-transmission, the length of
the next T3-send timer for this destination should be doubled and the
last estimated RTT value for that network should be added to the timer.

4.5.1 Re-transmission on Redundant networks

When redundant networks exist between two communicating endpoints, the
re-transmission shall be attempted on the network specified in the
MDTP protocol variable 'last.good.intf'. The value of 'last.good.intf'
is always updated to refer to the network on which the last datagram
from the peer endpoint arrived.

Moreover, the number of consecutive re-transmissions is also recorded
in a variable 'retran.count' for each network. Every time a datagram
is received on a network, the corresponding 'retran.count' shall be
reset to 0.

If the value in the 'retran.count' of the current network exceeds
half of the value of the protocol parameter 'Max.Retransmit', the
'last.good.intf' will be changed, so as to force the next
re-transmission to be directed to an alternate network and
optionally report a failure condition.

The total number of consecutive re-transmissions across all the
networks in an association is also recorded. If this value exceeds the
limit defined by 'Max.Retransmit', the sending endpoint shall consider
the peer endpoint unreachable and stop transmitting data to it, and
optionally report the failure.

4.6 RTT Measurement

This defines the mechanism for round-trip-time (RTT) measurement in
MDTP.

On occasions either side of an association may need to perform an RTT
measurement of the network (or one of the redundant networks) between
them.

4.6.1 RTT Datagram Header Format

The following shows the header format an endpoint shall use for RTT
measurement:

                   MDTP Header Format - RTT measurement

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                  MDTP Protocol Identifier                     |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   Version     |           Flags               |   In Queue    |
   |               |N N W I F R D A M S W R R F G U|               |
   |               |O O I S I T A C U H N E T L A N|               |
   |               |M B N B R M T K L U R 1 C O R R|               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                 Acknowledgment Number (Seen)                  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                   Sequence Number (Send)                      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |        Data Size              |    Part       |      Of       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                  Transparent Time Int-1                       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                  Transparent Time Int-2                       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Two long integers are used in the data field to carry the time value.
The RTT datagram is identified by setting the RTC or RTM bit to 1.

4.6.2 Measure RTT

AT the request of its upper layer, an endpoint shall initiate an RTT
measurement by sending an RTT datagram with GAR, ACK, and RTC bits set
to 1 (to a specific network if redundant networks exist). No
user data shall be carried. The sender shall also place in Time Int-1
and Time Int-2 the value of the current time of day in seconds and
microseconds.

Upon the reception of this RTT datagram, the recipient shall
immediately return the datagram to the sender (over the same network
on which the datagram arrives if redundant networks exist), with the
RTM and ACK bits set to 1.

Upon the reception of this reply, the sender shall use the Time Int-1
and Time Int-2 in the reply datagram to calculate the RTT (of the
specific network if redundant networks exist).

Endpoint A                                      Endpoint Z
RTT - Request Now=x.y
[Header Flags=ACK|GAR|RTC
        Part=0,Of=1
        Seen=1,Send=31,Size=0
        Time-Int1=x
        Time-Int2=y] ----------------------->
                                      ------ [Header Flags=ACK|RTM
                                     /        Part=0,Of=0
                                    /         Seen=31,Send=1
                                   /          Time-Int1=x
                                  /           Time-Int2=y]
                                 /
(Endpoint A uses     <-----------
 current time subtracted from
 x.y to calculate RTT)

4.7 Link Heart Beat

This defines the mechanism for activating and transmitting of link
heart beats in MDTP.

At request by its upper layer, an endpoint shall enable heart beat on
a specific peer with which it has an established association in the
Reliable transfer mode.

The RTT datagram defined in section 4.6.1 shall be used as the Heart
Beat.

After having heart beat enabled, the endpoint shall transmit a Heart
Beat to that specific peer and start a T5-heartBeat timer. The peer
shall immediately respond to the Heart Beat in the same manner as an
RTT as described in section 4.6.  This response shall be stored by the
first endpoint (also can be used to update its RTT measurement).

When the T5-heartBeat timer expires, the endpoint shall first check if
the previous heart beat has been responded (on the same network it was
sent in the case of redundant network). If not, the network that the
last Heart Beat was sent upon shall be counted as a transmission
failure, and be handled following the rules described in section 4.5.
Then, the endpoint shall send another Heart Beat and re-start the
T5-heartBeat timer.

In the case where redundant networks exist, the sending of Heart beats
shall follow the link rotation rules outlined in section 4.1.1.

If, before the expiration of T5-heartBeat timer, a datagram is
transmitted or received by the endpoint, the T5-heartBeat timer shall
be stopped and the appropriate T2-T4 timer shall be started. In other
words, the T5-heartBeat timer has the lowest precedence.

When no datagram to send and no other timers are running, the
T5-heartBeat timer shall be start and the above procedure shall
continue.

The suggested interval for T5-heartBeat timer is 4000 ms.

4.8 Advisory Acknowledgment

This defines the mechanism for sending and handling of the Advisory
Acknowledgments in MDTP.

An endpoint may use Advisory Acks to increase bandwidth utilization
when transmitting over a reliable association.

An Advisory Ack shall be indicated by setting RE1 flag to 1 in the
datagram.

The endpoint shall send an Advisory Ack to its peer when it reaches
half of its current window length, and also when it detects that the
next send will reach the full window length.

Upon the reception of an Advisory Ack, the peer endpoint shall
immediately acknowledge all the datagrams it has received but yet
acked upon, and then cancel the T2-recv timer if one is still
running.

The following shows an example of using Advisory Ack:

Endpoint A                                      Endpoint Z
{App sends 3 messages}
[Header Flags=DAT|GAR|ACK
        Part=0,Of=1
        Seen=0,Send=1,Size=100]-------------> (Start T2-recv timer)
(Start T3-send timer)

[Header Flags=DAT|GAR|ACK
        Part=0,Of=1
        Seen=0,Send=2,Size=100]----------->
(Restart T3-send timer)
{detects window half full, use Advisory Ack}
[Header Flags=DAT|GAR|ACK|RE1
        Part=0,Of=1
        Seen=0,Send=3,Size=100]------\
(Stop and restart T3-send timer)      \
                                       \----> (cancel T2-receive timer)
                      <---------------------- [Header Flags=ACK
                                               Part=0,Of=0
                                               Seen=3,Send=1]

4.9 Termination of an Association

When an endpoint terminates, it shall send a Shutdown datagram
(FIR|SHU) to each of the peer endpoints in all its existing
associations.  The Shutdown datagram itself is sent in unreliable
transfer mode and thus needs not to be acknowledged.

When a peer endpoint receives the Shutdown, it will remove the sender
from its record, and optionally report the termination of the sender
to the upper layer.

The following shows an example of the termination of Endpoint A:

Endpoint A
{App indicates termination}
[Header Flags=FIR|SHU
        Seen=3,Send=14,    ------------------------> to Endpoint X

[Header Flags=FIR|SHU
        Seen=1496,Send=101,------------------------> to Endpoint Y

[Header Flags=FIR|SHU
        Seen=14,Send=2    -------------------------> to Endpoint Z

4.10 Draining of an Association

An endpoint in a association may decide to "drain" the association
without completely shutting it down. By draining an association, both
endpoints will remove any record and pending datagrams associated with
the association.  Further communications between the two endpoints can
be resumed by going through a re-initialization procedure (see
section 3).

In such a case, a Drain datagram (FIR|SHU|UNR) is sent to the peer
endpoint of the association, and no Ack is required.

The following sequence shows an example of Draining:

Endpoint A
{App indicates draining}
[Header Flags=FIR|SHU|UNR
        Seen=146,Send=1301]------------------------> to Endpoint X

5. Interface with upper level protocols

The upper layer protocols (ULP) shall request for services by passing
primitives to MDTP and shall receive notifications from MDTP for
various events.

The primitives and notifications described in this section should be
used as a guideline for implementing MDTP.

A) Init.MDTP primitive

This primitive allows MDTP to initialize its internal data structures
and allocate necessary resources for setting up its operation
environment. Note that once MDTP is initialized, ULP can communicate
directly with any other endpoints without re-invoking this primitive.

Mandatory attributes:

None.

Optional attributes:

The following types of attributes may be passed along with
the primitive:

 o Timer selection and its operation syntax -- to indicate to MDTP
   an alternative timer the MDTP should use for its operation.
 o Initial MDTP operation mode;
 o IP port number, if ULP wants it to be specified;

B) Send.Data primitive

This is the main method to send datagrams via MDTP.

Mandatory attributes:

 o data - This is the payload ULP wants to transmit;
 o size - The size of the payload in number of octets;
 o to-address - The IP address and port number of the intended
   receiver. In case of redundant networks, to-address can be any one
   of the multiple IP addresses of the receiver. The network which the
   datagram will actually be sent through will be determined by MDTP due
   to the link rotation, unless the current mode prohibits MDTP link
   rotation; in such case the datagram will be sent through the network
   specified by to-address (see section 4.5).

Optional attributes:

 o mode-flags - This indicates a new MDTP operation mode, taking effect
   immediately including the current datagram send;

 o context - optional information that will be carried in the
   Send.Failure notification to the ULP if the transportation of
   this datagram fails.

C) Receive.Data primitive

This primitive shall return the first datagram in the MDTP in-queue to
ULP, if there is one available. It may, depending on the specific
implementation, also return other informations such as the sender's
address, whether there are more datagrams available for retrieval,
etc. The behavior is undefined if no datagram is available when this
primitive is invoked.

Mandatory attributes:

 o buffer - the memory location indicated by the ULP to store the
   received datagram and other information.

Optional attributes:

   None.

D) Data.Arrive notification

MDTP shall invoke this notification on the ULP when a datagram is
successfully received and ready for retrieval.

E) Send.Failure notification

If a datagram can not be delivered MDTP shall invoke this notification
on the ULP.

The following may be optionally passed with the notification:

 o data - the location ULP can find the un-delivered datagram.
 o context - optional information associated with this datagram (see
   13.2).

F) Link.Status.Change notification

When a link is marked down (e.g., when MDTP detects a link failure),
or marked up (e.g., when MDTP detects a link recovery), MDTP shall
invoke this notification on the ULP.

The following shall be passed with the notification:

 o link-address - This indicates the IP address of the affected link;
 o new-status - This indicates the new status of the link;

G) Communication.Up notification

This notification is used when MDTP becomes ready to send or receive
datagrams, or when a lost communication to an endpoint is restored.

The following shall be passed with the notification:

 o status - This indicates what type of event that has occurred;
 o endpoint-id - The IP address and port number to identify the
   endpoint;

H) Communication.Lost notification

When MDTP loses communication to an endpoint completely or detects
that the endpoint has performed a shut-down operation, it shall invoke
this notification on the ULP.

The following shall be passed with the notification:

 o status - This indicates what type of event that has occurred;
 o endpoint-id - The IP address and port number to identify the
   endpoint;
 o packets-enqueue - The number and location of un-sent datagrams
   still holding by MDTP;
 o last-acked - the sequence number last acked by that peer endpoint;
 o last-sent - the sequence number last sent to that peer endpoint;

I) Change.Link.Rotation primitive

When the upper layer wants to inform MDTP to make a specific network
eligible or ineligible for in link rotation, the upper layer will send
this primitive to MDTP.

Mandatory attributes:

 o  action - This indicates if the network is to be made eligible or
             ineligible for link rotation.
 o  network-id - This is the IP address and port of the network to be
    added or removed from link rotation consideration.

J) Open.Stream primitive

This shall be used by the upper layer to open a new stream.

Mandatory attributes:

 o endpoint-id - The IP address and port number to identify the
   peer endpoint to which the stream is to be opened. An association
   must have existed at the time of stream open.

Returned attributes:

 o The stream number that is opened.

K) Close.Stream primitive

This shall be used by the upper layer to request to close a stream.

Mandatory attributes:

 o endpoint-id - The IP address and port number to identify the
   peer endpoint to which the stream is to be closed.

 o stream number - The stream number to identify the stream to be
   closed (this should be the number returned by the Stream.Open
   primitive on this stream).

6. Suggested MDTP Protocol Parameter Values

The following are suggested timer values for MDTP:

T1-init Timer    -  160 ms
T2-receive Timer -   20 ms
T3-send Timer    -  160 ms + Last calculated RTT for that network.

The following protocol parameters are recommended:

Max.Outstanding.dg      - 20 messages
Max.Retransmit          - 10 attempts
Max.Init.Retransmit     - 8  attempts
Min.Mcast.Time.To.Reset - 5 seconds
Num.Of.Mcast.Reset.Msg  - 5 messages

7. Acknowledgments

The authors wish to thank Brian Wyld, A. Sankar, Henry Houh, Gary
Lehecka, Ken Morneault, Lyndon Ong, and others for their very valuable
comments.

8.  Author's Addresses

Randall R. Stewart                          Tel: +1-847-632-7438
Cellular Infrastructure Group               EMail: stewrtrs@cig.mot.com
Motorola, Inc.
1475 W. Shure Drive, #2C-6
Arlington Heights, IL 60004
USA

Qiaobing Xie                                Tel: +1-847-632-3028
Cellular Infrastructure Group               EMail: xieqb@cig.mot.com
Motorola, Inc.
1501 W. Shure Drive, #2309
Arlington Heights, IL 60004
USA

Tom Bova                                    Tel: +1-703-484-3331
Cisco Systems Inc.                          EMail: tbova@cisco.com
13615 Dulles Technology Drive
Herndon, VA  20171

Suheel Hussain                              Tel: +1-919-472-2312
Cisco Systems Inc.                          EMail:ssh@cisco.com
7025 Kit Creek Road
Research Triangle Park, NC  27709

Ted Krivoruchka                             Tel: +1-703-484-3331
Cisco Systems Inc.                          EMail: tedk@cisco.com
13615 Dulles Technology Drive
Herndon, VA  20171

Renee Revis                                 Tel: +1-703-472-5681
Cisco Systems Inc.                          EMail: drrevis@cisco.com
7025 Kit Creek Road
Research Triangle Park, NC  27709

9. References

[1] Postel, J. (ed.), "Internet Protocol - DARPA Internet Program
Protocol Specification", RFC 791, USC/Information Sciences Institute,
September 1981.

[2] Postel, J., "User Datagram Protocol", RFC 768, USC/Information Sciences
Institute, August 1980.

[3] Postel, J. (ed.), "Transmission Control Protocol", RFC 793, USC/
Information Sciences Institute, September 1981.

[4] Jacobson V., "Congestion Avoidance and Control", Proceedings of
SIGCOMM '88, pp 314-329, August, 1988.

[5] Seth, T., etc. "Performance Requirements for Signaling in Internet
Telephony", Internet-Draft <draft-seth-sigtran-req-00.txt>, May, 1999.

Appendix A: Stream-based Reliable and Ordered Delivery

This defines a reliable and ordered stream mechanism for MDTP. It is
optional for implementation.

A stream in MDTP is defined as a sequence of user datagrams that needs
to be reliably delivered with sequence preservation of its own. In
other words, the delivery of a stream shall not be delayed because of
the losses or re-transmissions occurred in other streams within the
same MDTP association. This capability is a critical requirement of
some telephony call signaling protocols [5].

Stream datagrams are identified by setting FLO bit to 1.

A.1 Stream Initiation

First, an MDTP association between the two endpoints must be initiated
before any stream operation.

A stream shall be initiated (opened) by the sender before datagrams
can be sent in the stream, and after the stream is complete it shall
be terminated (closed) by the user. Also, both sides of the
association shall be able to initiate or terminate streams
independently.

The sender initiates a stream by sending a Stream Initiation
(NOB|UNR), using the following header format:

                          Stream Initiation

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                 MDTP Protocol Identifier                      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   Version     |              Flags            |   In Queue    |
   |               |N N W I F R D A M S W R R F G U|               |
   |               |O O I S I T A C U H N E T L A N|               |
   |               |M B N B R M T K L U R 1 C O R R|               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                         Seen = 0x0 (or Tag)                   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                           Send = 0x0                          |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |        Data Size              |    Part       |      Of       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |        New Stream Number      |              0x0              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Note that in the Stream Initiation, the Seen and Send shall be set to 0,
and the number of the new stream being initiated shall be indicated
in the first two octets of the data field.

However, if this is the first datagram sent out after receiving the
Initiation Ack from the peer (see section 3.1), the Seen field of
above Stream Initiation shall be set to the Tag value carried in the
Initiation Ack.

Upon the reception of the Stream Initiation, the peer shall respond
immediately with a Stream Initiation Ack (NOB|UNR|ACK), using the
following header format:

                        Stream Initiation Ack

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                 MDTP Protocol Identifier                      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   Version     |              Flags            |   In Queue    |
   |               |N N W I F R D A M S W R R F G U|               |
   |               |O O I S I T A C U H N E T L A N|               |
   |               |M B N B R M T K L U R 1 C O R R|               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                      Seen = Stream Number                     |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                           Send = 0x0                          |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |        Data Size              |    Part       |      Of       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

The following example shows the opening of stream 5 by "A":

Endpoint A                                      Endpoint Z
{App Initiates stream 5}
[Header Flags=FLO|UNR
        Part=0,Of=1
        Seen=0,Send=0,Size=0,
        Stream=5 ]--------------------------->
(Start T3-send timer)
(Cancel T3-send timer) <--------------------- [Header Flags=FLO|UNR|ACK
                                               Mode=UNR
                                               Part=0,Of=1
                                               Seen=5,Send=0]

A.2 Stream Termination

For an existing stream, either side shall be allowed to terminate the
stream by sending a Stream Termination (FLO|UNR|SHU) to the other side.

Besides flag RES, The Stream Termination shall use the same header
format as that used in Stream Initiation datagram (see A.2)

A Stream Termination Ack (FLO|UNR|SHU|ACK) shall be sent by the peer
endpoint in response.

The following example shows the termination of stream 5 by "A":

Endpoint A                                      Endpoint Z
{App terminates stream 5}
[Header Flags=FLO|UNR|SHU
        Part=0,Of=1
        Seen=0,Send=0,Size=0,
        Stream=5 ]--------------------->
(Start T3-send timer s-5)
(Cancel T3-send timer s-5) <------------ [Header Flags=FLO|UNR|SHU|ACK
                                          Part=0,Of=1
                                          Seen=5,Send=0]

Datagrams associated to a terminated stream received by either side
should be silently discarded. It is up to the side which terminates
the stream to assure that all outstanding user datagrams in the stream
are acknowledged before the termination.

A.3 Stream Datagram Transfer

A.3.1 Header Format in Stream Datagrams with User Data

The MDTP header in a stream datagram with user data shall have the
following format:

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                  MDTP Protocol Identifier                     |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   Version     |              Flags            |   In Queue    |
   |               |N N W I F R D A M S W R R F G U|               |
   |               |O O I S I T A C U H N E T L A N|               |
   |               |M B N B R M T K L U R 1 C O R R|               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                             Seen                              |
   |         Stream Number         |    Sequence Number            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                             Send                              |
   |         Stream Number         |    Sequence Number            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |        Data Size              |    Part       |      Of       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   \                                                               \
   /                             data                              /
   \                                                               \
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

The stream number and sequence number in the Send field shall be used
by the sender to identify the current stream datagram. And, the
stream number and sequence number in the Seen field shall be used
by the sender to acknowledgment of stream datagrams it has received.

Stream number 0 and sequence number 0 are reserved for special
purposes and are not valid stream number or sequence number.

A.3.2 Transmission of Stream Datagrams

The rules of using the Seen Sequence Number and Send Sequence Number
are similar to those defined for normal MDTP non-stream datagram
transmissions (see section 4), except that for stream transfer the
sequence numbers shall roll-over to 1 after 0xFFFF.

Moreover, each stream maintains its individual T3-send timer, but only
one global T2-receive timer is maintained for all existing streams.

Acknowledgment to a stream datagram shall either be sent separately
or be piggy-backed with a stream datagram (not necessarily belonging
to the same stream) traveling in the opposite direction. For a
separate Stream Ack, the Send field will be set to 0000:0000.

The following shows an example of transmitting a stream datagram
(FLO|REL|DAT) and a separate Stream Ack (FLO|REL|ACK):

Endpoint A                                      Endpoint Z
{App sends first data on stream 5}
[Header Flags=FLO|REL|DAT
        Part=0,Of=1
        Seen=0-0,Send=5-1,Size=20]----\
(Start T3-send timer-s5)               \--->(Start T2-recv)
                                            ...
                                            {T2-recv Timer Expires}
(Cancel T3-send timer-s5)   <--------------- [Header Flags=FLO|REL|ACK
                                             Part=0,Of=1
                                             Seen=5-1,Send=0-0,Size=0]

The following example shows the use of a piggy-backed Stream Ack.

{App sends new data on stream 5}
[Header Flags=FLO|REL|DAT
        Part=0,Of=1
        Seen=0-0,Send=5-4,Size=20]--------->(Start T2-recv)
(Start T3-send timer-s5)                    ...
                                            {App sends data on stream 11}
                                            (cancel T2-recv Timer)
                                     /----- [Header Flags=FLO|REL|DAT|ACK
                                    /        Part=0,Of=1
                                   /         Seen=5-4,Send=11-8,Size=10]
                                  /         (Start T3-send timer-s11)
(Cancel T3-send timer-s5)  <-----/
(Start T2-recv timer)
...
{T2-recv Timer Expires}
[Header Flags=FLO|REL|ACK
        Part=0,Of=1
        Seen=11-8,Send=0-0,Size=0]--------->(Cancel T3-send-s11)

Note that when piggy-back a Stream Ack with an out-bound stream
datagram when more than one streams have un-acked datagrams, the
endpoint shall choose one stream and piggy-back a Stream Ack on one of
the datagrams, and shall leave the T2-recv timer running.

A.3.3 Extended Stream Ack

Upon the expiration of T2-recv timer, if there are more than one
stream datagrams received but yet acked upon by the endpoint, an
Extended Stream Ack shall be used.

The following defines the header format of the Extended Stream Ack
that acknowledges N stream datagrams received:

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                  MDTP Protocol Identifier                     |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   Version     |              Flags            |   In Queue    |
   |               |N N W I F R D A M S W R R F G U|               |
   |               |O O I S I T A C U H N E T L A N|               |
   |               |M B N B R M T K L U R 1 C O R R|               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                             Seen                              |
   |         Stream Number #0      |    Sequence Number #0         |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                  Number of Extra Acks = N-1                   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |        Data Size              |    Part       |      Of       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |         Stream Number #1      |    Sequence Number #1         |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   \                                                               /
   /                                                               \
   \                                                               /
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |         Stream Number #N-1    |    Sequence Number #N-1       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Note that an Extended Stream Ack is identified by setting the Seen
field to the number of extra acks carried in its data field, as shown
above. Also, Extended Stream Acks shall not be piggy-backed.

The following example shows the using of an Extended Stream Ack
(NOB|REL|ACK) by "Z":

Endpoint A                                      Endpoint Z
{App sends data on stream 5}
[Header Flags=FLO|REL|DAT
        Part=0,Of=1
        Seen=0-0,Send=5-2,Size=20]----------> (Start T2-recv)
(Start T3-send timer-s5)
{App sends data on stream 9}
[Header Flags=FLO|REL|DAT
        Part=0,Of=1
        Seen=0-0,Send=9-4,Size=20]---------->
(Start T3-send timer-s9)
{App sends more data on stream 5}
[Header Flags=FLO|REL|DAT
        Part=0,Of=1
        Seen=0-0,Send=5-3,Size=20]---------->
(Restart T3-send timer-s5)
{App sends data on stream 7}
[Header Flags=FLO|REL|DAT
        Part=0,Of=1
        Seen=0-0,Send=7-11,Size=20]--------->
(Start T3-send timer-s7)
                                              ...
                                              {T2-recv Timer Expires}
(Cancel T3-send timer-s5)     <-------------- [Header Flags=FLO|REL|ACK
(Cancel T3-send timer-s7)                      Part=0,Of=1
(Cancel T3-send timer-s9)                      Seen=5-3,NumExtAck=2,
                                               Size=0,
                                               ext[0]=9-4,
                                               ext[1]=7-11]

A.4 Other Issues with Stream Transfer

- -- Congestion control, including the rules for timer management and window
management, shall apply to Stream Transfer the same way as it does to
non-Stream based transfer, as defined in section 4.3.

- -- When an association is re-initialized (see section 3.4), all existing
stream within that association will be automatically terminated.

- -- The receiver shall silently discard any datagrams associated
with a stream which has not been initiated or has already been
terminated.

- -- The same re-transmission and link rotation rules as defined in
section 4 shall apply to Stream Transfer.

- -- Bundled Message (see Appendix B) may be allowed in Stream Transfer,
but fragmentation (see Appendix C) shall not be allowed.

Appendix B: Bundled Message Transfer

This defines the mechanism for bundled datagram transport in MDTP. It
is optional for implementation.

Bundling is sometimes desired by the user when transferring small
datagrams, as a way of improving network utilization.

In bundled transfer, MDTP allows an endpoint to bundle small
application messages into one single datagram for transmission. This
bundled mode can be applied to both reliable and unreliable datagrams
(see Appendix E for Unreliable Delivery).

Note that an endpoint shall never send bundled messages to a peer if
that peer endpoint set NOB bit to 1 during their association
initialization (see section 3).

B.1 Format of Bundled Datagram

The ISB bit in the flag field is set to indicate the current datagram
is bundled, i.e., it contains multiple messages. The format of a
bundled datagram is defined as follows:

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                  MDTP Protocol Identifier                     |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   Version     |              Flags            |   In Queue    |
   |               |N N W I F R D A M S W R R F G U|               |
   |               |O O I S I T A C U H N E T L A N|               |
   |               |M B N B R M T K L U R 1 C O R R|               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                 Acknowledgment Number (Seen)                  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                    Sequence Number (Send)                     |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |        Data Size              |    Part       |      Of       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |  Total Number Of Messages=N   |   Message #1 Size = B1        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   |                     B1 octets of data                         |
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   Message #2 Size = B2        |                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
   |                     B2 octets of data                         |
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   \                                                               \
   /                                                               /
   \                                                               \
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   Message #N Size = BN        |                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
   |                     BN octets of data                         |
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Data_Size in a bundled datagram indicates the actually size of the
data field of the datagram, including both the bundling overhead and
the actually user data. Since no fragmentation is allowed in a bundled
datagram, the Part field will always be '0' and the Of field always be
'1'.

The first two octets of the data field is a 16 bit integer indicating
the number of messages bundled in the current datagram. This is
followed immediately by a list of bundled messages. Each bundled
message starts with an integer of two octets indicating the size of
the data in the message, followed by the data itself.

All integers in the datagram should be transmitted in the network byte
order.

B.2 Bundled Datagram Transfer

The T4-bundling timer and two protocol parameters, namely the
Min.Bundle and Max.Bundle, are used to control the bundling of user
datagrams.

The endpoint will withhold the datagram from transmission and start
T4-bundle timer, if the combined size of all user datagrams currently
pending for transmission in the out-bound buffer is smaller than
'Min.Bundle'.

Each time a new out-bound user data becomes available for
transmission, the endpoint will attempt to bundle the new data with
the current withheld datagram by using the following rules:

A) If the size of the new data is greater than or equal to
   'Min.Bundle', the current withheld datagram will be transmitted and
   T4-bundle timer will be canceled. Then, the new data will be
   transmitted in a separate datagram.

B) If the size of the new data is less than 'Min.Bundle', but the
   combined size of the current datagram and the new data is greater
   than or equal to 'Max.Bundle', the current datagram will be sent and
   the new data will be withheld as the new current datagram.

C) If the size of the new data is less than 'Min.Bundle', and the
   combined size of the current datagram and the new data is greater
   than 'Min.Bundle', but less than 'Max.Bundle', the new data will be
   bundled into the current datagram and the bundled datagram will be
   immediately transmitted. and T4-bundle timer will be canceled.

D) If the size of the new data is less than 'Min.Bundle', and the
   combined size of the current datagram and the new data is less than
   Min.Bundle, the new data will be bundled into the current
   datagram. And the T4-bundle timer will be restarted.

E) If T4-bundle timer expires, the current datagram will be sent
   immediately.

F) When a T2-receive timer expires, any bundled data waiting to be
   transmitted should be sent immediately with a piggy-backed Ack to
   acknowledge all un-acked data previously received.

G) If a T4-bundle timer is running and data arrives, the T2-receive
   timer should not be started.

H) A T4-bundle timer should never be canceled unless it is being
   supplanted by a T3-send timer.

When a bundled datagram arrives at the receiving endpoint, each
message is unbundled and delivered separately to the upper layer.

The following are the suggested protocol parameter values for bundled
datagram transfer:

T4-bundle Timer  -   40 ms
Min.Bundle       - 1000 octets
Max.Bundle       - 1432 octets

Appendix C: Fragmented Message Transfer

This defines the mechanism for fragmented datagram transport in
MDTP. It is optional for implementation.

When the size of an out-bound user message exceeds the value defined
in the protocol parameter Max.Bundle, the endpoint shall fragment the
message into smaller pieces of size equal to or smaller than
'Max.Bundle' and send each piece out in a separate datagram.

The "Part" and "Of" fields are used to disassemble and reassemble the
fragmented message. The combination of the maximal 'Of' value, which
is 255, and the maximal Data Size (see section 2.2) will determined
the maximal size of a single user message that the MDTP can send or
receive in fragmented message transfer mode.

However, an endoint shall never send fragmented datagrams to a peer if
that peer set the NOM bit to 1 during their association
initialization.

The following example shows the transmission of a fragmented message
(assuming Max.Bundle=1432, Min.Bundle=1000):

Endpoint A                                      Endpoint Z
{App sends message size=3300 octets}
[Header Flags=DAT|ACK|GAR
        Part=0,Of=3
        Seen=3,Send=16,Size=1432]-------> (Start T2-receive timer)
[Header Flags=DAT|ACK|GAR
        Part=1,Of=3
        Seen=3,Send=17,Size=1432]------->
[Header Flags=DAT|ACK|GAR
        Part=2,Of=3
        Seen=3,Send=18,Size=436]-------->
(Start T3-send timer)
                                              ..
                                              {Timer T2 Expires}
                                 /----------- [Header Flags=ACK
                                /              Mode=0
                               /               Part=0,Of=0
(cancel timer T3) <-----------/                Seen=18,Send=4]

Notice that "A" is using the reliable transfer mode to send the
fragmented message, therefore "Z" will hold the fragments and request
retransmission if a fragment is found missing, i.e., if a gap is found
in the received data (see ). When all the parts of the fragmented
message are received, the receiving endpoint will re-assemble the
message and dispatch it to the upper layer.

It is also allowed in MDTP to send fragmented message using Unreliable
Transfer mode (see section 4.5). However, in unreliable mode, each
fragment will be dispatch to the application upon its arrival, and no
retransmission will be requested even if a fragment is found missing.

Bundling is prohibited if the current datagram contains a fragment of
a fragmented message.

Appendix D: Multicast Datagram Transfer

This defines the mechanism for unreliable transportation of multicast
datagrams in MDTP. It is optional for implementation.

D.1 Multicast Datagram Header Format

Multicast datagrams are identified by setting MUL, UNR, and DAT bits
to 1.

Two new fields are added to the standard MDTP datagram header to
support multicast:

Multicast To Transmit address - This is the multicast address, in
network byte order, that the sender transmitted the data to. The
receiver can use this information for internal tracking purposes.

Multicast From - This is the network address (or the IP Address of
Network 1 as described in 3.2, if redundant networks exist) of the
sender, in network byte order.

                 MDTP Header Format - Multicast Format

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                  MDTP Protocol Identifier                     |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   Version     |              Flags            |   In Queue    |
   |               |N N W I F R D A M S W R R F G U|               |
   |               |O O I S I T A C U H N E T L A N|               |
   |               |M B N B R M T K L U R 1 C O R R|               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                 Acknowledgment Number (Seen)                  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                    Sequence Number (Send)                     |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |        Data Size              |    Part       |      Of       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                  Multicast To Transmit address                |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |             Multicast From - senders base address             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   \                                                               \
   /                             data                              /
   \                                                               \
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

For multicast datagrams, the value in the Send field shall indicate
the sequence number of multicast datagrams transmitted by the
sender. This information helps the receiver of the multicast to detect
duplicated multicast datagrams and also to detect lost multicast
datagrams from the same sender. The Seen field shall normally be
set to 0, unless in some special cases stated below.

Bundling and fragmentation are not allowed in either multicast or
broadcast datagrams.

No initiation shall be needed for an endpoint to transmit to a
multicast address.

D.2 Transmission of Multicast Datagrams

The following example illustrates multicast transmissions between two
endpoints.

Endpoint A                                               Endpoint Z
{App multicasts a message}
[Header Flags=MUL|UNR|DAT
        Part=0,Of=1
        Seen=0,Send=5,Size=250]--------------> (no Ack necessary)

...
{App multicasts a message}
[Header Flags=MUL|UNR|DAT
        Part=0,Of=1
        Seen=0,Send=6,Size=500]--------------> (no Ack necessary)


Notice that the values of the Send field in the multicast datagrams
(which are 5 and 6, respectively). They represent the sequence numbers
of the multicast datagrams "A" has sent out. Endpoint Z should use
this value to detect missing or duplicate datagrams.

Duplicate datagrams will be discarded and no effort will be made to
retransmit lost multicast datagrams.

D.3 Reset of the Multicast Datagram Sequence Number

If the Seen field of a received multicast datagram equals to '1', this
indicates that the sender has reset its multicast datagram sequence
number. The receiving endpoint, upon detecting this reset indicator in
the incoming multicast datagram, should start a procedure to adopt the
new sequence number for error detection. However, caution
should be taken to prevent false resets due to duplicated datagrams
with reset indicator propagating through multiple networks.

To guarantee that all receivers of the multicast group adopt the new
sequence number, the reset indicator should be repeated within the
first N multicast datagrams sent out after the reset. N is predefined
by the protocol parameter 'Num.Of.Mcast.Reset.Msg'.

At the receiving endpoint, when the reset indicator is detected the
new sequence number will be adopted. However, if two reset events are
detected within a predefined time interval (Min.Mcast.Time.To.Reset),
the second reset indicator will be ignored.

The suggested values for these two protocol parameters are:
   Min.Mcast.Time.To.Reset - 5 seconds
   Num.Of.Mcast.Reset.Msg  - 5 messages

Appendix E: Unreliable Delivery

This defines the support for sending Unreliable datagrams in MDTP.  It
is optional for implementation.

The unreliable transfer mode allows two endpoints to send to each
other without acknowledging the receiving. This can usually achieve
higher data throughput than the reliable transfer mode. To indicate
the unreliable transfer mode the sender of a datagram with user data
simply sets the UNR flag to 1. The following sequence illustrates
unreliable data transfer.

Endpoint A                                      Endpoint Z
{App sends 2 messages}
[Header Flags=UNR|DAT|ACK
        Part=0,Of=1
        Seen=0,Send=4,Size=100]-------->
[Header Flags=UNR|DAT|ACK
        Part=0,Of=1
        Seen=0,Send=5,Size=100]-------->

                                             {App sends 1 message}
                                   <------- [Header Flags=UNR|DAT|ACK
                                             Part=0,Of=1
                                             Seen=5,Send=1,Size=450]
...
{App sends 2 more messages}
[Header Flags=UNR|DAT|ACK
        Part=0,Of=1
        Seen=1,Send=6,Size=100]------>

[Header Flags=UNR|DAT|ACK
        Part=0,Of=1
        Seen=451,Send=7,Size=100]------>

Note that no timers shall be started by either end, and that even
though both ends are in Unreliable transfer mode, the ACK flag is
still set by the sender of the datagram. This means that the Seen
field in the datagram header is still valid to indicating the sequence
number of the last datagram received by the sender.  The upper layer
can use this information to help detecting missing or duplicated
datagrams. However, MDTP shall make no effort to detect or retransmit
missing data or to screen out duplicated datagrams.

E.1 Ordered Unreliable Delivery

In unreliable transfer, the sender should be allowed to request
ordered delivery by setting the RE1 flag to 1.

When Ordered Unreliable Delivery is indicated, the receiver shall
order the newly arrived datagram with any datagrams it has received
but yet passed to its upper layer.

If it receives a datagram which is older than the last datagram it has
passed to the upper layer, that datagram shall be silently discarded.

      This Internet Draft expires in 6 months from April 1999.