draft-ietf-rmt-bb-norm-04

RMT Working Group                                        B.  Adamson/NRL
INTERNET-DRAFT                                      C.  Bormann/Tellique
draft-ietf-rmt-bb-norm-04.txt                          M.  Handley/ACIRI
Expires: Jan 2003                                         J.  Macker/NRL
                                                               July 2002


    NACK-Oriented Reliable Multicast (NORM) Protocol Building Blocks

Status of this Memo


     This document is an Internet-Draft and is in full conformance with
     all provisions of Section 10 of RFC2026.

     Internet-Drafts are working documents of the Internet Engineering
     Task Force (IETF), its areas, and its working groups.  Note that
     other groups may also distribute working documents as Internet-
     Drafts.

     Internet-Drafts are draft documents valid for a maximum of six
     months and may be updated, replaced, or obsoleted by other docu-
     ments at any time.  It is inappropriate to use Internet-Drafts as
     reference material or to cite them other than as "work in
     progress."

     The list of current Internet-Drafts can be accessed at
     http://www.ietf.org/ietf/1id-abstracts.txt

     The list of Internet-Draft Shadow Directories can be accessed at
     http://www.ietf.org/shadow.html.

     Copyright Notice

     Copyright (C) The Internet Society (2001).  All Rights Reserved.

Abstract

     This memo describes issues related to the creation of negative-
     acknowledgment (NACK) oriented reliable multicast (NORM) protocols.
     The general goals and  assumptions for NORM are defined.  The tech-
     nical challenges related to NACK-oriented (and in some cases gen-
     eral) reliable multicast protocol design are identified.  These
     challenges are resolved into a set of applicable "building blocks"
     which are described in further detail.  It is anticipated that
     these building blocks (as they are further refined and defined in
     future revisions of this document) will be useful in generating
     different instantiations of reliable multicast protocols.



Adamson, Bormann, et al.  Expires January 2003                  [Page 1]


Internet Draft            NORM Building Blocks                 July 2002


1.0 Background

     Reliable multicast transport is a desirable technology for the
     efficient and reliable distribution of data to a group on the
     Internet.  The complexities of group communication paradigms neces-
     sitate different protocol types and instantiations to meet the
     range of performance and scalability requirements of different
     potential reliable multicast applications and users [Mankin98].
     Properly designed negative-acknowledgement (NACK) oriented reliable
     multicast (NORM) protocols offer scalability advantages for appli-
     cations and/or network topologies where, for  various reasons, it
     is prohibitive to construct a higher order  delivery infrastructure
     above the basic Layer 3 IP multicast service (e.g.  unicast or
     hybrid unicast/multicast data distribution trees).  Additionally,
     the scalability property of NACK-oriented protocols [Pingali93,
     Levine96] may be applicable where broad "fanout" is expected for a
     single network hop (e.g.  cable-TV data delivery, satellite, or
     other broadcast communication communication services).  Further-
     more, the simplicity of a protocol based on "flat" group-wide mul-
     ticast distribution may offer advantages for a broad range of dis-
     tributed services or dynamic networks and applications.

     While different protocol instantiations may be required to meet
     specific application and network architecture demands [Clark90],
     there are a number of fundamental components which may be common to
     these different instantiations.  This document describes the frame-
     work and  common "building block" components relevant to multicast
     protocols based primarily on NACK operation for reliable transport.

2.0 Applicability

     Each potential protocol instantiation using the building blocks
     presented here (and other applicable building block documents) will
     have specific criteria which will influence individual protocol
     design.  To support the development of applicable building blocks,
     it is useful to identify and summarize driving general protocol
     design goals and assumptions.  These are areas which each protocol
     instantiation will need to address in detail.  Each building block
     description in this document will include a discussion of the
     impact of these design criteria.  The categories of design criteria
     considered here include:

       1)   Data content model
       2)   Group membership dynamics
       3)   Sender/receiver relationships
       4)   Group size
       5)   Data delivery performance
       6)   Network topology



Adamson, Bormann, et al.  Expires January 2003                  [Page 2]


Internet Draft            NORM Building Blocks                 July 2002


       3)   Router/intermediate system assistance

     In some cases, a building block may be designed to address a wide
     range of assumptions while in other cases there will be trade-offs
     required to meet different application needs.  Where necessary,
     building block features will designed to be parametric to meet dif-
     ferent requirements.  Of course, an underlying goal will be to min-
     imize design complexity and to at least recommend default values
     for any such parameters which meet a general purpose "bulk data
     transfer" requirement in a typical Internet environment.

2.1 Data content model

     The implicit goal of a reliable multicast protocol is the reliable
     delivery of "data" among a group of members communicating through
     IP multicast datagram service.  However, the nature of the data
     content and the service the application is attempting to provide
     can impact design decisions.  The service model may range from
     long-lived transfer sessions of bulk  quantities of data (file
     broadcast) to more interactive exhanges of  small messages (e.g.
     white-boarding, text chat).  And within those  different models
     there are other issues such as the sender's ability  to cache
     transmitted data (or state referencing it) for retransmission or
     repair.  The needs for ordering and/or causality in the sequence of
     transmissions and receptions among members in the group may be dif-
     ferent depending upon data content.  The group communication
     paradigm differs significantly from the point-to-point model in
     that, depending upon the data content type, some receivers may com-
     plete reception of a portion of data content and be able to act
     upon it before other members have received the content.  This may
     be acceptable (or even desirable) for some applications but not for
     others.  These varying requirements drives the need for a number of
     different protocol instantiation designs.

     A significant challenge in developing generally useful building
     block mechanisms is accommodating even a limited range of these
     capabilities without defining specific application-level details.
     Note that this particular building block "guideline" may be gener-
     ally applicable beyond the realm of NACK-oriented reliable multi-
     cast.

2.2 Group membership dynamics

     Group communication can differ from point-to-point communications
     with respect to the fact that even if the composition of the group
     changes, the "thread" of communication can still exist.  This con-
     trasts with the point-to-point communication model where, if either
     of the two parties leave, the communication process (exchange of



Adamson, Bormann, et al.  Expires January 2003                  [Page 3]


Internet Draft            NORM Building Blocks                 July 2002


     data) is terminated (or at least paused).  Depending upon  applica-
     tion goals, senders and receivers participating in a reliable  mul-
     ticast transport "session" may be able to join late, leave, and/or
     potentially rejoin while the ongoing group communication "thread"
     still remains functional and useful.

     Also note that this can impact protocol message content.  If "late
     joiners" are supported, some amount of additional information may
     be placed in  message headers to accommodate this functionality.
     Alternatively, the information may be sent in its own message (on
     demand or intermittently) if the impact to the overhead of typical
     message transmissions is deemed too great.  Group dynamics can also
     impact other protocol mechanisms such as NACK or other timing, con-
     gestion control operation, etc.

2.3 Sender/receiver relationships

     The relationship of senders and receivers among group members
     requires consideration.  In some applications, there may be a sin-
     gle  sender multicasting to a group of receivers.  In other cases,
     there  may be more than one sender or the potential for everyone in
     the  group to be a sender _and_ receiver of data may exist.

2.4 Group size

     Native IP multicast [Deering89] may scale to extremely large group
     sizes.  It may be desirable for some applications to be able to
     scale along with the multicast infrastructure's ability to scale.
     In its simplest form, there are limits to the group size to which a
     NACK-oriented protocol can apply without NACK implosion problems.
     However, the potential for router assistance or other non-linear
     NACK suppression mechanisms may enable these protocols to scale to
     very large group sizes.  In large scale cases, it may be pro-
     hibitive for members to maintain state on all other members (in
     particular, other receivers) in the group.  The impact of group
     size needs to be considered in the development of generally appli-
     cable building blocks.

2.5 Data delivery performance

     There is a trade-off between scalability and data delivery latency
     when designing NACK-oriented protocols.  If NACK suppression is to
     be used, there will be some delays built into the NACK generation
     and repair transmission process to allow suppression to occur and
     for the sender of data to identify appropriate content for effi-
     cient repair transmission.  For example, backoff timeouts can be
     used to ensure efficient NACK suppression and repair transmission,
     but this comes at a cost of increased delivery latency and



Adamson, Bormann, et al.  Expires January 2003                  [Page 4]


Internet Draft            NORM Building Blocks                 July 2002


     increased buffering requirements for both senders and receivers.
     The building blocks should allow applications to establish bounds
     for data delivery performance.  Note that application designers
     must be aware of the scalabilty trade-off which is made when such
     bounds are applied.

2.6 Network topology

     The Internet Protocol has historically assumed a role of providing
     service across heterogeneous network topologies.  It is desirable
     that a reliable multicast protocol be capable of effectively oper-
     ating across a wide range of the networks to which general purpose
     IP service applies.  The bandwidth available on the links between
     the members of a single group today may vary between  low numbers
     of kbit/s for wireless links and multiple Gbit/s for high speed LAN
     connections, with varying degrees of contention from other flows.
     Recently, a number of asymmetric network services including
     56K/ADSL modems, CATV Internet service, satellite and other wire-
     less communication services have begun to proliferate.   Many of
     these are inherently broadcast media with potentially large
     "fanouts" to which IP multicast service is highly applicable.

     Additionally, policy and/or technical issues may result in topolo-
     gies where multicast connectivity is limited to a single logical
     direction from a specific source or set of sources to the group at
     large.  Receivers in the group may be restricted to unicast feed-
     back for NACKs and other messages.  Consideration must be given, in
     building block development and protocol design, to the nature of
     the underlying networks over which the protocols may be operating.

2.7 Router/Intermediate System Assistance

     While intermediate assistance from devices/systems with direct
     knowledge of the underlying network topology may used to leverage
     the performance and scalability of reliable multicast protocols,
     there will continue to be a number of instances where this is not
     available or practical.  Any building block components for NACK-
     oriented reliable multicast should be capable of operating without
     such assistance but should also be capable of utilizing these fea-
     tures when available.

3.0 Building Block Areas

     The previous section has presented in general what building blocks
     are intended to be and some of the criteria which may affect build-
     ing block identification/design. This section describes different
     building block areas applicable to  NACK-oriented reliable multi-
     cast protocols.  Some of these areas are  specific to NACK-oriented



Adamson, Bormann, et al.  Expires January 2003                  [Page 5]


Internet Draft            NORM Building Blocks                 July 2002


     protocols.  Detailed descriptions of such  areas are provided
     below.  In other cases, the areas (e.g.  node  identifiers, FEC,
     etc) may be generally applicable to other forms of  reliable multi-
     cast.  In those cases, the discussion below describes requirements
     placed on those other general building block areas from  the stand-
     point of NACK-oriented reliable multicast.

     For the individual building blocks to be incorporated as part of a
     specific protocol instantiation, it is expected that a description
     of some notional "interface" to the building blocks' functionality
     be provided.  For example, a building block component may require
     some  form of "input" from another building block component or
     other source  in order to perform its function.  Any "inputs"
     required by each  building block component and/or any resultant
     "output" provided by the building block will be defined and
     described as the building  block component's interface definition.

     The following building block areas are described below:

     (NACK-Oriented Components)
       1)   Sender transmission
       2)   NACK-oriented Repair Process
       3)   "Late-joining" Receiver Policies and Mechanisms

     (Generally-applicable Components)
       4)   Node (member) Identification
       5)   Data Content Identification
       6)   Forward Error Correction
       7)   Round-trip Timing Collection
       8)   Group Size Determination/Estimation
       9)   Congestion Control Operation
       10)  Router/Intermediate System Assistance
       11)  Additional Protocol Mechanisms

     Figure 1 provides an pictoral overview of these building block
     areas and their relationships.  For example, the content of the
     data messages that sender initially transmits depends upon the
     "Node Identification", Data Content Identification", "FEC" , and
     "Congestion Control components (Note that the rate of message
     transmission will generally depend upon the "Congestion Control"
     component).  Subsequently, the receivers response to these trans-
     missions (e.g. NACKing for repair) will depend upon the content of
     these transmissions and inputs from other building block compo-
     nents.  Then, the sender's processing of these responses will feed
     back into its transmission strategy.

                                           Application Data
                                                 |



Adamson, Bormann, et al.  Expires January 2003                  [Page 6]


Internet Draft            NORM Building Blocks                 July 2002


                                                 V
    .---------------------.            .-----------------------.
    | Node Identification |----------->|  Sender Transmission  |<------.
    `---------------------'       _.-' `-----------------------'       |
    .---------------------.   _.-' .'            |                     |
    | Data Identification |--'   .''             | ("Join" Policy)     |
    `---------------------'    .' '              V                     |
    .---------------------.  .'  '     .------------------------.      |
 .->| Congestion Control  |-'   '      | Receiver NACK-oriented |      |
 |  `---------------------'   .'       | Repair Process         |      |
 |  .---------------------. .'         | .------------------.   |      |
 |  |        FEC          |'.          | | NACK Initiation  |   |      |
 |  `---------------------'` `._       | `------------------'   |      |
 |  .---------------------. ``. `-._   | .------------------.   |      |
 `--|    RTT Collection   |._` `    `->| | NACK Content     |   |      |
    `---------------------' .`- `      | `------------------'   |      |
    .---------------------.   `-`._   | .------------------.   |      |
    |    Group Size Est.  |---.-`---`->| | NACK Suppression |   |      |
    `---------------------'`.  ` `     | `------------------'   |      |
    .---------------------.  `  ` `    `------------------------'      |
    |       Other         |   `  ` `             |                     |
    `---------------------'    `  ` `            | (Router Assistance) |
                                `. ` `           V                     |
                                  `.`' .-------------------------.     |
                                     `>| Sender NACK Processing  |_____/
                                       | and Repair Response     |
                                       `-------------------------'

                    ^                         ^
                    |                         |
                  .-----------------------------.
                  |         (Security)          |
                  `-----------------------------'

                 Fig. 1 - NORM Building Block Framework

The components on the left side of this figure represent the components
which may be generally applicable beyond NORM and those on the right are
specific to NORM protocols.  Some components (e.g. "Security") will
impact many aspects of the protocol and others such as "Router Assis-
tance" may be more transparent to the core protocol processing.  The
sections below discuss issues with regards to these building block com-
ponents and their relationships.  Where applicable, specific technical
recommendations are made for mechanisms which will properly satisfy the
goals of reliable multicast bulk transport for the Internet.

3.1 Sender transmission




Adamson, Bormann, et al.  Expires January 2003                  [Page 7]


Internet Draft            NORM Building Blocks                 July 2002


Senders will transmit data content to the multicast session.  The data
content will be application dependent.  The sender will transmit data
content at a rate and with packet sizes determined by application and/or
network architecture requirements.  When congestion control mechanisms
are used (recommended), the transmission rate will be controlled by the
congestion control mechanism.  It is recommended that all data transmis-
sions from senders be subject to rate limitations determined by the
application or congestion control algorithm.  The sender's transmissions
should make good utlization of the available capacity (which may be lim-
ited by the application and/or by congestion control).  As a result, it
is expected there will be overlap and multiplexing of new data content
transmission with repair content.

In addition to data content, other sender messages or commands may be
employed as part of protocol operation.  For NACK-oriented operation,
the reliability of any such commands may depend upon redundant transmis-
sion.  Other factors related to NACK-oriented operation may determine
sender transmission formats and methods.  Some consideration needs to be
given to the sender's behavior during intermittent idle periods when it
has no data to transmit.  While many aspects may be protocol-specific,
there are techniques which may be generally applicable to NACK-oriented
reliable multicast.  For example, the periodicity of redundant transmis-
sion of command messages issued by a sender should be dependent upon the
greatest round trip timing estimate and the resultant NACK operation.
More specifically, a command message might be redundantly transmitted by
a sender to indicate that it is temporarily (or permanently) halting
transmission.  At this time, it may be appropriate for receivers to
respond with NACKs for any outstanding repairs they require following
the rules of the NACK process described below.  For efficiency, the
sender should allow sufficient time between redundant transmissions to
receive any NACK-oriented responses from the receiver set to this com-
mand.  Other protocol components may benefit from a similar approach.

Inputs:

       1)   Data content
       2)   Segmentation size
       3)   Transmission rate
       4)   Application controls

     Outputs:

       1)   Rate-controlled stream of packets with headers uniquely
            identifying data or repair content within the context of the
            reliable multicast session.
       2)   Commands indicating sender's status or other transport con-
            trol actions to be taken.




Adamson, Bormann, et al.  Expires January 2003                  [Page 8]


Internet Draft            NORM Building Blocks                 July 2002


3.2 NACK-oriented repair process

     The most critical component of the NACK-oriented reliable multicast
     protocol building block is the NACK repair process.  There are four
     primary elements of a general NACK repair process:

       1)   Method for determining when receivers will initiate the NACK
            process in response to sender transmission for which they
            need repair.
       2)   NACK message content.
       3)   NACK suppression mechanisms to promote scalability of the
            protocol.
       4)   Sender NACK reception, aggregation, and repair response.

3.2.1 NACK Process Initiation

     The NACK process (cycle) will be initiated by receivers who detect
     they require repair transmissions from a specific sender at defined
     opportunities.  When FEC is applied, a NACK cycle should only be
     initiated when it is known by the receiver that its repair
     requirements exceed the amount of FEC pending transmission for a
     given coding block of packets.  This can be determined at the end
     of the current transmission block (if it is indicated) or upon the
     start of reception of a subsequent coding block or transmission
     object.  This implies the data content is marked to identify its
     FEC block (See object/stream data  content identifier discussion
     below).

     If the sender's transmission advertises the quantity of repair
     packets it is already planning to send for a block, the receiver
     may be able to initiate the NACK processor earlier.  Allowing
     receivers to initiate NACK cycles at any time they detect  their
     repair needs have exceeded pending repair transmissions may  result
     in slightly quicker repair cycles.  However, it may be useful to
     limit the initiation opportunities to specific events such as at
     the end-of-transmission of an FEC coding block (or alternatively at
     detection of subsequent coding blocks).  This can  allow receivers
     to aggregate NACK content into a smaller number of NACK  messages.

     The NACK cycle should begin with receivers observing backoff time-
     outs to facilitate timer-based NACK suppression as described below.
     In conjunction with initiating this backoff timeout, it is impor-
     tant that the receivers record the current position in the sender's
     transmission sequence at which they initiate the NACK cycle.  When
     the suppression backoff timeout expires, the receivers should only
     consider their repair needs up to this recorded transmission posi-
     tion in making the decision to transmit or suppress a NACK.  With-
     out this restriction, suppression is greatly reduced as additional



Adamson, Bormann, et al.  Expires January 2003                  [Page 9]


Internet Draft            NORM Building Blocks                 July 2002


     content is received from the sender during the time a NACK message
     propagates across the network to other receivers.

     Interface Description

     Inputs:

       1)   Object/stream data content and sequencing identifiers from
            sender transmissions.

     Outputs:

       1)   NACK cycle initiation decision
       2)   Recorded sender transmission sequence position.

3.2.2 NACK Suppression Mechanisms

     A primary NACK suppression mechanism is the use of initial backoff
     timeouts by receivers wishing to transmit NACK messages[Floyd95].
     Upon expiration of the timeout, a receiver will transmit a NACK
     unless the content of the pending repair request is completely
     superseded by NACK messages heard from other receivers (when
     receivers are multicasting NACKs) or from some indicator from the
     sender.  (Note: When receivers are unicasting NACK messages, the
     sender may facilitate NACK suppression by forwarding appropriate
     NACKs it has received to the group at large or provide some other
     indicator of the repair information it will be subsequently trans-
     mitting).

     The backoff timeout periods used by receivers should be  indepen-
     dently, randomly picked by receivers with an exponential distribu-
     tion [Nonnenmacher98].  This results in the bulk of the  receiver
     set holding off transmission of NACK messages under the  assumption
     that the smaller number of "early NACKers" will supersede the
     repair needs of the remainder of the group.  The mean of the dis-
     tribution should be determined as a function of the current esti-
     mate of sender<->group greatest round trip time (GRTT) and a group
     size estimate which determined by other mechanisms within the pro-
     tocol (See section below) or preset by the multicast application.

     A simple algorithm can be constructed to generate random backoff
     timeouts with the appropriate distribution.  Additionally, the
     algorithm may be designed to optimize the backoff distribution
     given the number of receivers (R) potentially generating feedback.
     This "optimization" minimizes the number of feedback messages (e.g.
     NACK) in the worst-case situation where all receivers generate a
     NACK. The maximum backoff timeout (T_maxBackoff) can also be con-
     trolled to allow the  application or protocol tradeoff NACK latency



Adamson, Bormann, et al.  Expires January 2003                 [Page 10]


Internet Draft            NORM Building Blocks                 July 2002


     versus volume of feedback traffic.  A larger value of T_maxBackoff
     will result in a lower density of feedback traffic for a given
     repair cycle.  A smaller value of T_maxBackoff results in shorter
     latency which reduces the buffering requirements of senders and
     receivers for reliable transport.

     Given the receiver group size (R), and maximum allowed backoff
     timeout (T_maxBackoff),  random backoff timeouts (t') with a trun-
     cated exponential distribution can be picked with the following
     algorithm:

     1)  Establish an optimal mean (L) for the exponential backoff based
         on the group size:

                                L = ln(R) + 1

     2) Pick a random number (x) from a uniform distribution
        over a range of:

                   L                           L                   L
           --------------------  to   --------------------  +  ----------
          T_maxBackoff*(exp(L)-1)    T_maxBackoff*(exp(L)-1)  T_maxBackoff

     3) Transform this random variate to generate the
        desired random backoff time (t) with the following
        equation:

        t' = T_maxBackoff/L * ln(x * (exp(L) - 1) * (T_maxBackoff/L))

     This is a C language function which can be used to perform this
     function:

     double RandomBackoff(double maxTime, double groupSize)
     {
         double lambda = log(groupSize) + 1;
         double x = UniformRand(lambda/maxTime) +
                    lambda / (maxTime*(exp(lambda)-1));
         return ((maxTime/lambda) *
                 log(x*(exp(lambda)-1)*(maxTime/lambda)));
     }  // end RandomBackoff()

     where UniformRand(double max) returns random numbers with a uniform
     distribution from the range of 0..max.  For example, based on the
     POSIX "rand()" function, the following code can be used:

     double UniformRand(double max)
     {
         return (max * ((double)rand()/(double)RAND_MAX));



Adamson, Bormann, et al.  Expires January 2003                 [Page 11]


Internet Draft            NORM Building Blocks                 July 2002


     }

     The number of expected NACKs generated (N) within the first round
     trip time for a single loss event can be approximately expected to
     be:

                  N = exp(1.2 * L / (2*T_maxBackoff/GRTT))

     Thus the maximum backoff time (T) can be adjusted to tradeoff
     worst-case NACK feedback volume versus latency.  This is derived
     from [Nonnenmacher98] and assumes  T_maxBackoff >= GRTT, and L is
     the mean of the distribution optimized for the given group size as
     shown in the algorithm above.  Note that other mechanisms within
     protocol may work to reduce redundant NACK generation further.  It
     is suggested that T_maxBackoff be selected as an integer multiple
     of the sender's current advertised GRTT estimate such that:

                    T_maxBackoff = K * GRTT where K >= 1

     Then, given that (K*GRTT) is the maximum backoff time used by the
     receivers to initiate NACK transmission, other timeout periods
     related to the NACK repair process can be scaled accordingly.  One
     of those timeouts is the amount of time a receiver should wait
     after generating a NACK message before allowing itself to initiate
     another NACK backoff/transmission cycle (T_holdoff).  This delay
     should be sufficient for the sender to respond to provide repair
     messages in response to the received NACK.  An appropriate value
     depends upon the amount of time for the NACK to reach the sender
     and the sender to provide a repair response.  This must also
     include any amount of sender NACK aggregation period during which
     possible multiple NACKs are accumulated to determine an efficient
     repair response.  These timeouts are further discussed in the sec-
     tion below on Sender NACK Processing and Repair Response.

     There are also secondary NACK suppression mechanisms which can be
     applied.  For example, the sender's transmissions may follow an
     ordinal sequence of transmission (observed through data/repair con-
     tent <objectIds> and/or <symbolIds>) which is "rewound" during
     repair transmissions.  Receivers may wish to limit transmission of
     their NACKs to only when the sender's current sequenced position of
     transmission passes the point at which the receiver has incomplete
     transmissions, thus reducing premature requests for repair of data
     the sender may be providing in response to other receiver requests.
     This mechanism can be very effective for protocol convergence in
     high loss conditions when transmissions of NACKs from other
     receivers (or indicators from the sender) are lost.

     Another mechanism (particularly applicable when FEC is used) is for



Adamson, Bormann, et al.  Expires January 2003                 [Page 12]


Internet Draft            NORM Building Blocks                 July 2002


     the sender to embed an indication of impending repair transmissions
     in current packets sent.  For example, the indication may be as
     simple as an advertisment of the number of FEC packets to be sent
     for the current applicable coding block.

     Finally, some consideration might be given to using the NACKing
     history of receivers to weight their selection of NACK backoff
     timeout intervals.  For example, if a receiver has historically
     been  experiencing the greatest degree of loss, it may promote
     itself to  statistically NACK sooner than other receivers.  Note
     this assumes  there is some degree of correlation over successive
     intervals of time in the loss experienced by a receiver.  This
     adjustment of backoff timeout selection may require the creation of
     an "early NACK" slot for these historical NACKers.  This additional
     slot in the NACK backoff window will result in a longer repair
     cycle process which may not be desirable for some applications.
     The resolution of these trade-offs may be dependent upon the proto-
     col's target application set or network.


     After the random backoff timeout has expired, the receiver will
     make a decision on whether to generate a NACK repair request or not
     (i.e. it has been suppressed).  The NACK will be suppressed when
     any of the following conditions has occurred:

     1) The accumulated state of NACKs heard from other receivers (or
     forwarding of this state by the sender) is equal to or supersedes
     the repair needs of the local receiver.  Note that the local
     receiver should consider its repair needs only up to the sender
     transmission sequence position recorded at the NACK cycle initia-
     tion (when the backoff timer was activated).

     2) The sender's data content transmission sequence position
     "rewinds" to a point ordinally less than that of the lowest
     sequence position of the local receiver's repair needs. (This
     detection of sender "rewind" indicates the sender has already
     responded to other receiver repair needs of which the local
     receiver may not have been aware).  This "rewind" event can occur
     any time between when the NACK cycle was initiated with the backoff
     timeout activation to the current moment when the backoff timeout
     has expired to suppress the NACK.  Another NACK cycle must be ini-
     tiated by the receiver when the sender's transmission sequence
     position exceeds the receiver's lowest ordinal repair point.  Note
     it is possible that the local receiver may have had its repair
     needs satisfied as a result of the sender's response to the repair
     needs of other receivers.

     If these conditions have not occurred and the receiver still has



Adamson, Bormann, et al.  Expires January 2003                 [Page 13]


Internet Draft            NORM Building Blocks                 July 2002


     pending repair needs, a NACK message is generated and transmitted.
     The NACK should consist of an accumulation of repair needs from the
     receiver's lowest ordinal repair point up to the current sender
     transmission sequence position.  A single NACK message should be
     generated and the NACK message content should be truncated if it
     exceeds the payload size of single protocol message.


     Interface Description

     Inputs:

       1)   Group greatest round trip time estimate (GRTT).
       2)   Group size estimate.
       3)   Application-defined bound on backoff timeout period.
       4)   NACKs from other receivers.
       5)   Pending repair indication from sender (may be forwarded
            NACKs).
       6)   Recorded sender transmission sequence position.

            7)   Current sender transmission sequence position.

     Outputs:

       1)   Yes/no decision to generate NACK message upon backoff timer
            expiration.

3.2.3 NACK Content

     The content of NACK messages generated by reliable multicast
     receivers will include information detailing the current repair
     needs of each receiver.  The specific information depends on the
     use and type of FEC in the NORM repair process.  The identification
     of repair needs is dependent upon the data content identification
     (See Section 3.5 below).  At the highest level the NACK content
     will identify the data transport object (or stream) within the mul-
     ticast session which needs repair.  For the indicated transport
     entity, the NACK content will then identify the specific coding
     blocks and/or segments of coding blocks it needs to reconstruct the
     transmitted data.  This content may consist of FEC block erasure
     counts and/or explicit indication of missing blocks or segments of
     data and FEC content.

3.2.3.1 NACK Content and FEC Repair Strategies

     Where FEC-based repair is used, the NACK message content will mini-
     mally need to identify the coding block(s) for which repair is
     needed and a count of erasures (missing packets) for the coding



Adamson, Bormann, et al.  Expires January 2003                 [Page 14]


Internet Draft            NORM Building Blocks                 July 2002


     block.  Note that this assumes the FEC algorithm is capable of
     repairing _any_ loss combination within the coding block and that
     the quantity of unique FEC parity packets the server has available
     to transmit is essentially unlimited (i.e. the server will always
     be able to provide new parity packets in response to anysubsequent
     repair requests for the same coding block).  In other cases, the
     NACK content will need to also _explicitly_ identify which packets
     (information and/or parity) the receiver requires to successfully
     reconstruct the content of the coding block.  This will be true of
     many applicable small to medium size block codes (e.g. Reed
     Solomon).

     When FEC is not used as part of the repair process or the protocol
     instantiation is required to provide reliability even when the
     sender has transmitted all available parity for a given coding
     block (or the sender's ability to buffer transmission history is
     exceeded by the delay*bandwidth*loss characteristics of the network
     topology), the NACK content will need to contain _explicit_  coding
     block and/or packet loss information so that the sender can provide
     appropriate repair packets and/or data retransmissions.  Explicit
     loss information in NACK content may also potentially serve other
     purposes.  For example, it may be useful for decorrelating loss
     characteristics among a group of receivers to help differentiate
     candidate congestion control bottlenecks among the receiver set.

     For cases where the amount of loss is very small with respect to
     the coding block size, it may be efficient to simply provide a list
     of coding block vector identifiers.  However, in many cases, a bit
     mask marking the locations of missing packets may be significantly
     more efficient for communicating receiver repair needs.  And since
     the data content is logically divided into coding blocks, a system
     of hierarchical bit masks can be used to encode missing content at
     the object/stream, FEC block, and individual packet levels.  Hier-
     archical bit mask encoding can provide compact NACK messages even
     in high delay*bandwidth*loss conditions.  Bit mask based NACK con-
     tent can also be efficiently processed with logical operations dur-
     ing protocol operation.

     When FEC is used and NACK content is designed to contain explicit
     repair requests, there is a strategy where the receivers can NACK
     for specific content which will help facilitate NACK suppression
     and repair efficiency.  The assumptions for this strategy are that
     sender may potentially exhaust its supply of new, unique parity
     packets available for a given coding block and be required to
     explicitly retransmit some data or parity segments to complete
     reliable transfer.  Another assumption is that an FEC algorithm
     where any parity packet can fill any erasure within the coding
     block is used.  The goal of this strategy is to make maximum use of



Adamson, Bormann, et al.  Expires January 2003                 [Page 15]


Internet Draft            NORM Building Blocks                 July 2002


     the available parity and provide the minimal amount of data and
     repair transmissions during reliable transfer of a coding block to
     the group.

     In this approach, the sender transmits the data content of the cod-
     ing block (and optionally some quantity of parity packets) in its
     initial transmission.  Note that a coding block is considered to be
     logically made up of the contiguous set of data vectors plus parity
     vectors for the given FEC algorithm used.  For example, a coding
     scheme which provides for 64 data segments and 32 parity segments
     per coding block would contain <symbolIds> in the range of 0 to 95.

     Receivers then can construct NACK messages requesting sufficient
     content to satisfy their repair needs.  For example, if the
     receiver has three erasures in a given received coding block, it
     will request transmission of the three lowest ordinal parity vec-
     tors in the coding block. In our example coding scheme from the
     previous paragraph, the receiver would explicitly request parity
     segments 64 to 66 to fill its three erasures for the coding block.
     Note that if the receiver's loss for the coding block exceeds the
     available parity quantity (i.e. greater than 32 missing segments in
     our example), the receiver will be required to construct a NACK
     requesting all (32) of the available parity segments plus some
     additional portions of its missing data segments in order to recon-
     struct the block.  If this is done consistently across the receiver
     group, the resulting NACKs will comprise a minimal set of sender
     transmissions to satisfy their repair needs.

     To summarize a consistent NACK content strategy to be employed by
     the receivers, the rule is to request the lower ordinal portion of
     the parity content for the FEC coding block to satisfy the erasure
     repair needs on the first NACK cycle.  If the available number of
     parity segments is insufficient, the receiver will also request the
     subset of ordinally highest missing data segments to cover what the
     parity segments will not fill.  (Note this strategy assumes FEC
     codes such as Reed-Solomon for which a single parity segment can
     repair and erased segment.  This strategy would need minor modifi-
     cation to take into account the possibly limited repair capability
     of other FEC types).  On subsequent NACK repair cycles where the
     receiver may have received some portion of its previously requested
     repair content, the receiver will use the same strategy, but only
     NACK for the set of parity and/or data segments it has not yet
     received.  Optionally, the receivers could also provide a count of
     erasures as a convenience (saves processing time) to the sender.

     After receipt and accumulation of NACK messages during the aggrega-
     tion period, the sender can begin transmission of fresh (previously
     untransmitted) parity vectors for the coding block based on the



Adamson, Bormann, et al.  Expires January 2003                 [Page 16]


Internet Draft            NORM Building Blocks                 July 2002


     erasure count _if_ it has a sufficient quantity of vectors which
     were _not_ previously transmitted.  Otherwise, the sender will
     resort to transmitting the explicit set of repair vectors
     requested.  With this approach, the sender needs to maintain very
     little state on requests it has received from the group without
     need for synchronization of repair requests from the group.  Since
     all receivers use the same consistent algorithm to express their
     explicit repair needs, NACK suppression among receivers is simpli-
     fied over the course of multiple repair cycles.  The receivers can
     simply compare NACKs heard from other receivers against their own
     calculated repair needs to determine whether they should transmit
     or suppress their pending NACK messages.

3.2.3.2 NACK Content Encapsulation

     The format of NACK content will depend on the protocol's data ser-
     vice model and the format of data content identifiication the pro-
     tocol uses.  This is also dependent upon the type of FEC encoding
     (if any) is used.  Figure 2 illustrates a general logical hierarchy
     of transmission content identification, denoting that the notion of
     objects (or streams) and/or FEC blocking is optional at the proto-
     col instantiation's discretion.  Since the notion of session
     "streams" and "blocks" are optional, the framework degenerates to
     that of typical transport data segmentation and reassembly in its
     simplest form.

             Session_
                      \_
                         [Object/Stream(s)]_
                                            \_
                                               [FEC Blocks]_
                                                            \_
                                                               Segments

            Figure 2: Hierarchy of Reliable Data Transfer Content

     The format of NACK messages should meet the following goals:

     1) Describe a basic unit to identify transport data unit transmis-
     sions required to repair a portion of the received content, whether
     it is an entire missing object/stream (or range), entire FEC coding
     block(s), or sets of segments,

     1) Be simple to process for NACK aggregation and suppression,

     2) Be capable of including NACKs for multiple objects, fec coding
     blocks and/or symbols in a single message.  FEC erasure counts may
     also be desirable.



Adamson, Bormann, et al.  Expires January 2003                 [Page 17]


Internet Draft            NORM Building Blocks                 July 2002


     3) Have a compact format, and

     4) Be capable of working with the Generic Router Assist (GRA)
     building block.

     The concatenation of "<objectId><blockId><symbolId>" comprises a
     basic transport protocol data unit (TPDU) identifier of segments
     transmitted from a given source.  NACK content can be composed of
     lists and/or ranges of these TPDU identifiers to build up NACK mes-
     sages to describe the receivers repair needs.  If no hierarchical
     object delineation or FEC blocking is used, the TPDU is a simple
     linear representation of the data segments transmitted by the
     sender.  When the TPDU represents a hierarchy for purposes of
     object/stream delineation and/or FEC blocking, the NACK content
     unit may require flags to indicate which portion of the TPDU is
     applicable.  For example, if an entire "object" (or range of
     objects" is missing in the received data, the receiver will not
     necessarily know the appropriate range of <blockIds> or <symbolIds>
     for which to request repair and thus requires some mechanism to
     request repair (or retransmission) of the entire unit represented
     by an <objectId>.  The same is true if entire FEC coding blocks
     represented by one or a range of <blockIds> is missing for a
     receiver.

3.2.4 Sender NACK Processing and Repair Response

     Upon reception of a repair request from a receiver in the group,
     the sender will initiate a repair response procedure.  The sender
     may wish to delay transmission of repair content until it has had
     sufficient time to accumulate potentially multiple NACKs from the
     receiver set.  This allows the sender to determine the most effi-
     cient repair strategy for a given transport stream/object or FEC
     coding block.  Depending upon the approach used, some protocols may
     find it beneficial for the sender to provide an indicator of pend-
     ing repair transmissions as part of the its current transmitted
     message content.  This can aid some NACK suppression mechanisms.
     The amount of time to perform this NACK aggregation should be suf-
     ficient to allow for the maximum receiver NACK backoff window
     ("T_maxBackoff" from Section 3.2.4) and propagation of NACK mes-
     sages from the receivers to the sender.  Note the maximum transmis-
     sion delay of a message from a receiver to the sender may be
     approximately 1*GRTT in the case of very asymmetric network topol-
     ogy with respect to transmission delay.  Thus if the maximum
     receiver NACK backoff time is T_maxBackoff = K*GRTT, the sender
     NACK aggregation period should be equal to at least:

            T_aggregation = T_maxBackoff + (1*GRTT) = (K+1)*GRTT




Adamson, Bormann, et al.  Expires January 2003                 [Page 18]


Internet Draft            NORM Building Blocks                 July 2002


     Recall that the receivers' will employ a holdoff timeout after gen-
     erating a NACK message to allow time for the sender's response.
     Given a sender T-aggregation time of (K+1)*GRTT, the receivers
     should use holdoff timeouts of:

              T_holdoff = T_aggregation + (1*GRTT) = (K+2)*GRTT

     This allows for a worst case propagation time of 1*GRTT of the
     senders response back to the receiver and the sender's aggregation
     time.

     Additionally, in the case of unicast feedback from the receiver
     set, it may be useful for the sender to forward (via multicast)
     superseding NACK messages to the group to allow for NACK suppres-
     sion when there is not multicast connectivity among the receiver
     set.

     At the expiration of the "T_aggregation" timeout, the sender will
     begin transmitting repair messages according to the accumulated
     content of NACKs received.  There are a couple of guidelines with
     regards to FEC-based repair and the ordering of the repair response
     from the sender which can improve reliable multicast efficiency.

     When FEC is used, it is beneficial that the sender transmit previ-
     ously untransmitted parity content as repair messages whenever pos-
     sible.  This  maximizes the receiving nodes' ability to reconstruct
     the entire transmitted content from their individual subsets of
     received messages.

     The transmitted object and/or stream data and repair content should
     be indexed with  monotonically increasing sequence numbers (within
     a reasonably large ordinal space).  If the sender observes the dis-
     cipline of  transmitting repair for the earliest content (e.g.
     ordinally lowest FEC blocks) first, the receivers can use a strat-
     egy of witholding repair requests for later content until the
     sender once again returns to that point in the object/stream trans-
     mission sequence.  This can increase overall message efficiency
     among the group and help work to keep repair cycles relatively syn-
     chronized without dependence upon strict time synchronization among
     the sender and receivers.  This also helps minimize the buffering
     requirements of receivers and senders and reduces redundant trans-
     mission of data to the group at large.

     Interface Description

     Inputs:

       1)   Receiver NACKs



Adamson, Bormann, et al.  Expires January 2003                 [Page 19]


Internet Draft            NORM Building Blocks                 July 2002


       1)   Group timing information

     Outputs:

       1)   Repair messages (FEC and/or Data content retransmission)

3.3 Group "Join" Policies/ Procedures

     Consideration should be given to how new receivers join a group
     (perhaps where reliable transmission is already in progress) and
     beging NACKing for any repair needs. If this is unconstrained, the
     dynamics of group membership may impede the application's ability
     to meet it goals in progressing the transmission of data.  Policies
     limiting the opportunities at which receivers begin participating
     in the NACK process may be used to achieve the desired behavior.
     For example, it may be beneficial if receivers only attempt reli-
     able reception from a newly-heard sender when upon non-repair
     transmissions of data in the first FEC block of an object or logi-
     cal portion of a stream.  The sender may also implement policies
     limiting which receivers from which it will accept NACK requests,
     but this may be prohibitive for scalability reasons in some situa-
     tions.  In some types of bulk transfer applications (and for poten-
     tial interactive applications), it may alternatively desirable to
     have a looser transport synchronization policy and rely upon ses-
     sion management mechanisms to control group dynamics which may
     result in poor behavior.

     Inputs:

       1)   Current object/stream data/repair content and sequencing
            identifiers from sender transmissions.

     Outputs:

       1)   Receiver yes/no decision to begin receiving and NACKing for
            reliable reception of data

3.4 Reliable Multicast Member Identification

     In a NORM protocol (or other multicast  protocols) where there is
     the potential for multiple sources of data, it is necessary to pro-
     vide some mechanism to uniquely identify the sources (and possibly
     some or all receivers in some cases) within the group.  Identity
     based on arriving packet source addresses is insufficient for sev-
     eral reasons.  These reasons include routing changes for hosts with
     multiple interfaces which result different packet source addresses
     for a given host over time, network address translation (NAT) or
     firewall devices, or other transport/network bridging approaches.



Adamson, Bormann, et al.  Expires January 2003                 [Page 20]


Internet Draft            NORM Building Blocks                 July 2002


     As a result, some type of unique source identifier <sourceId> field
     should be present in packets transmitted by reliable multicast ses-
     sion members.

3.5 Data Content Identification

     The data and repair content transmitted by a NORM sender requires
     some form of identification in the protocol header fields.  This
     identification is required to facilitate the reliable NACK-oriented
     repair process.  These identifiers will be used in NACK messages
     generated.

     There are two very general types of data which may comprise bulk
     transfer session content.  These data types are static objects of
     finite size and continuous non-finite streams.  A given application
     may wish to reliably multicast data content using either one or
     both of these data models.  While it may be possible for some
     applications to further generalize this model and provide mecha-
     nisms to encapsulate static objects as content embedded within a
     stream, there are advantages to many applications to provide dis-
     tinct support for static bulk objects and messages with the context
     of a reliable multicast session.  These applications may include
     content caching servers, file transfer or collaborative tools with
     bulk content.  Applications with requirements for these static
     object types can then take advantage of transport layer mechanisms
     (i.e.  segmentation/reassembly, caching, integrated forward error
     correction coding, etc) rather than being required to provide their
     own mechanisms for these functions at the application layer.

     As noted, some applications may alternatively desire to transmit
     bulk content in the form of one or more streams of non-finite size.
     Example streams include continuous quasi-realtime message broad-
     casts (e.g. stock ticker) or some content types which are part of
     collaborative tools or other more complex applications.  And, as
     indicated above, some applications may wish to encapsulate other
     bulk content (e.g. files) into one more streams within a multicast
     session.  Additionally, multiple streams can allow for parallized
     transmission within the context of a single multicast session.

     The components described within this building block draft document
     are envisioned to be applicable to both of these models with the
     potential for a mix of both types within a single multicast ses-
     sion.  To support this requirement, the normal data content identi-
     fication should include a field to uniquely identify the object or
     stream <objectId> within some reasonable temporal or ordinal inter-
     val.  Note that it is _not_ expected that this data content identi-
     fication will be globally unique.  It is expected that the
     object/stream identifier will be unique with respect to a given



Adamson, Bormann, et al.  Expires January 2003                 [Page 21]


Internet Draft            NORM Building Blocks                 July 2002


     sender within the reliable multicast session and during the time
     that sender is supporting a specific transport instance of that
     object or stream.

     Since the "bulk" object/stream content usually require  segmenta-
     tion, some form of segment identification must also be  provided.
     This segment identifier will be relative to any object or stream
     identifier which has been provided.  Thus, in some cases, NORM pro-
     tocol instantiations may be able to receive  transmissions and
     request repair for multiple streams and one or more sets of static
     objects in parallel.  For protocol instantiations employing FEC the
     segment identification portion of the data content identifier may
     consist of a logical concatenation of a coding block identifier
     <blockId> and an identifer for the specific data or parity symbol
     <symbolId> of the code block.  The RMT FEC Building Block (cur-
     rently "draft-ietf-rmt-bb-fec-04.txt") provides a standard message
     format for identifying FEC transmission content.

     Additionally, flags to determine the usage of the content identi-
     fier fields (e.g.  stream vs.  object) may be applicable.   Flags
     may also serve other purposes in data content identification.  It
     is expected that any flags defined will be dependent upon individ-
     ual protocol instantiations.

     In summary, the following data content identification fields may be
     required for NORM protocol data content messages:

       1)   Source node identifier (<senderId>)
       2)   Object/Stream identifier (<objectId>), if applicable.
       3)   FEC Block identifier (<blockId>), if applicable.
       4)   FEC Symbol identifier (<symbolId>)
       5)   Flags to differentiate interpretation of identifier fields
            or identifier structure which implicitly indicates usage.
       6)   Additional FEC transmission content fields per FEC Building
            Block

     These fields have been identified since any generated NACK messages
     will use these identifiers in requesting repair or retransmission
     of data.  NORM protocols are expected to greatly benefit from
     interaction with other reliable multicast building blocks (Generic
     Router Assist(GRA), in particular) and those other building blocks
     will need to appropriately consider these anticipated requirements.

3.6 Forward Error Correction

     Multiple forward error correction (FEC) approaches have been iden-
     tified which can provide great performance enhancements to the
     repair process of NACK-oriented and other reliable multicast



Adamson, Bormann, et al.  Expires January 2003                 [Page 22]


Internet Draft            NORM Building Blocks                 July 2002


     protocols [Metzner84, Macker97].  NORM protocols can reap addi-
     tional benefits since FEC-based repair does not _generally_ require
     explicit knowledge of repair content within the bounds of its cod-
     ing block size (in packets).

     Generally, repair packets generated using FEC algorithms with good
     erasure filling properties (e.g.  Reed-Solomon) will be transmitted
     only in response to NACK repair requests from receiving nodes.
     However, there are benefits in some network environments for trans-
     mitting some predetermined quantity of FEC repair packets multi-
     plexed with the regular data segment transmissions [Gossink98].
     This can reduce the amount of NACK traffic generated with rela-
     tively  little overhead cost when group sizes are very large or the
     network  connectivity has a large delay*bandwidth product with some
     nominal level of expected packet loss.  While the application of
     FEC is not unique to NORM, these sorts of requirements may dictate
     the types of algorithms and protocol approaches which are applica-
     ble.

     A specific issue related to the use of FEC with NORM is the mecha-
     nism used to identify which portion(s) of transmitted data content
     to which specific FEC packets are applicable.  It is expected that
     FEC algorithms will be based on generating a set of parity repair
     packets for a corresponding block of transmitted data packets.
     Since data content packets are uniquely identified by the concate-
     nation of <sourceId/objectId/blockId/symbolId> during transport, it
     is expected that FEC packets will be identified in a similar man-
     ner.  The RMT FEC Building Block specification (currently "draft-
     ietf-rmt-bb-fec-04.txt") provides detailed recommendations concern-
     ing application of FEC and standard formats for related reliable
     multicast protocol messages.

3.7 Round-trip Timing Collection

     The measurement of packet propagation round-trip time (RTT) among
     members of the group is required to support NACK suppression algo-
     rithms, timing of sender commands or certain repair functions, and
     congestion control operation.  The nature of the round-trip infor-
     mation collected is dependent upon the type of interaction among
     the members of the group.  In the case where only "one-to-many"
     transmission is required, it may be necessary that only the sender
     require RTT knowledge of the greatest RTT (GRTT) among the receiver
     set and/or RTT knowledge of only a portion of the group.  Here, the
     GRTT information might be collected in a reasonably scalable man-
     ner.  For congestion control operation, it is possible that RTT
     information may be required by each receiver in the group.  In this
     case, an alternative RTT collection scheme may be utilized where
     receivers collect individual RTT measurements with respect to the



Adamson, Bormann, et al.  Expires January 2003                 [Page 23]


Internet Draft            NORM Building Blocks                 July 2002


     sender and advertise them (or an competed subset) to the group or
     sender.  Where it is likely that exchange of reliable multicast
     data will occur among the group on a "many-to-many" basis, there
     are alternative measurement techniques which might be employed for
     increased efficiency [Ozdemir99].  And in some cases, there might
     be absolute time synchronization among hosts which may simplify RTT
     measurement.  There are trade-offs in multicast congestion control
     design which need further consideration before a universal recom-
     mendation on RTT (or GRTT) measurement can be specified.  Regard-
     less of how the RTT information is collected (and more specifically
     GRTT) with respect to congestion control or other requirements, the
     sender will need to advertise its current GRTT estimate to the
     group for timing of the

3.7.1 One-to-Many Sender GRTT Measurement

     The goal of this form of RTT measurement is for the sender to learn
     the GRTT among the receivers who are actively participating in NORM
     operation.  The set of receivers participating in this process may
     be the entire group or some subset of the group determined from
     another mechanism within the protocol instantiation.  An approach
     to collect this GRTT information follows.

     The sender periodically polls the group with a message (independent
     or "piggy-backed" with other transmissions) containing a <sendTime>
     timestamp relative to an internal clock at the sender.  Upon recep-
     tion of this message, the receivers will record this <sendTime>
     timestamp and the time (referenced to their own clocks) at which it
     was received <recvTime>.  When the receiver provides feedback to
     the sender (either explicitly or as part of other feedback messages
     depending upon protocol instantion specification), it will con-
     struct a "response" using the formula:

         <grttResponse> = <sendTime> + (<currentTime> - <recv_Time>)

     where the <sendTime> is the timestamp from the last probe message
     received from the source and the (<currentTime> - <recvTime>) is
     the amount of time differential since that request was received
     until the receiver generated the response.

     The sender processes each receiver response by calculating a cur-
     rent RTT measurement for the receiver from whom the response was
     received using the following formula:

               <receiverRtt> = <currentTime> - <grttResponse>

     During the each periodic GRTT probing interval, the source keeps
     the peak round trip estimate from the set of responses it has



Adamson, Bormann, et al.  Expires January 2003                 [Page 24]


Internet Draft            NORM Building Blocks                 July 2002


     received.  The GRTT estimate should be filtered to be conservative
     towards maintaining an estimate biased towards the greatest
     receiver RTT measurements received.  A conservative estimate of
     GRTT maximizes the efficiency redundant NACK suppression and repair
     aggregation.  The update to the source's ongoing estimate of GRTT
     is done observing the following rules:

       1)   If a receiver's response round trip calculation is greater
            than the  current GRTT estimate AND current peak, the
            response value is immediately fed into the GRTT  update fil-
            ter given below.  In any case, the source records the "peak"
            receiver RTT measurement for the current probe interval.

       2)   At the end of the response collection period (i.e. the GRTT
            probe interval), if the recorded "peak" response is less
            than the current GRTT estimate AND this is the third consec-
            utive collection period with a peak less than the current
            GRTT estimate the recorded peak is fed into the GRTT update.
            (Implicitly, Rule #1 was applied otherwise so no new update
            is required).

       3)   At the end of the response collection period, the peak
            tracking value is set to either ZERO if the "peak" is
            greater than or equal to the current GRTT estimate (i.e.
            Already incorporated into the filter under Rule #1) or kept
            the same if its value is less than the current GRTT estimate
            AND was not yet incorporated into the GRTT update filter
            according to Rule #2. Thus for decreases in the source's
            estimate of GRTT, the "peak" is tracked across three consec-
            utive probe intervals.  The current MDP implementation uses
            the following GRTT update filter to incorporate new peak
            responses into the the GRTT estimate:

          if (peak > current_estimate)
              current_estimate = 0.25 * current_estimate + 0.75 * peak;
          else
              current_estimate = 0.75 * current_estimate + 0.25 * peak;

     This update method is biased towards maintaining an estimate of the
     worst-case round trip delay.  The reason the GRTT estimate is
     reduced only after 3 consecutive collection periods with smaller
     response peaks is to be conservative where packet loss may have
     resulted in lost response messages.  And then the reduction is
     additionally conservatively weighted using the averaging filter
     from above.

     The GRTT collection period (i.e. period of probe transmission)
     could be fixed at a value on the order of that expected for group



Adamson, Bormann, et al.  Expires January 2003                 [Page 25]


Internet Draft            NORM Building Blocks                 July 2002


     membership and/or network topology dynamics.  For robustness, more
     rapid probing could be used at protocol startup before settling to
     a less frequent, steady-state interval.  Optionally, an algorithm
     may be developed to adjust the GRTT collection period dynamically
     in response to the current GRTT estimate (or variations in it) and
     to an estimation of packet loss.  The overhead of probing messages
     could then be reduced when the GRTT estimate is stable and unchang-
     ing, but be adjusted to track more dynamically during periods of
     variation with correspondingly shorter GRTT collection periods.

     In summary, although NORM repair cycle timeouts are based on GRTT,
     it should be noted that convergent operation of the protocol does
     not _strictly_ depend on highly accurate GRTT estimation.  The cur-
     rent mechanism has proved sufficient in simulations and in the
     environments in which NORM-like protocols have been deployed to
     date.  The estimate provided by the algorithm tracks the peak enve-
     lope of actual GRTT (including operating system effect as well as
     network delays) even in relaitvely high loss connectivity.  The
     steady-state probing/update interval may potentially be varied to
     accommodate different levels of expected network dynamics in dif-
     ferent environments.

3.7.2 One-to-Many Receiver RTT Measurement

     (TBD - Receivers "ping" sender for RTT measurement, and then
     receivers competitively (worst case RTT metric) advertise their
     measurements to the sender and optionally group so the sender can
     determine GRTT ... Sender should still robustly advertise its cur-
     rent GRTT knowledge to the group so group can use appropriate tim-
     ing ... or the TFMCC approach is a bit more sophisticated)

3.7.3 Many-to-Many RTT Measurement

     (TBD - Describe approach based on Ozdemir99, if appropriate/appli-
     cable?)

3.7.4 Sender GRTT Advertisement

     To facilitate determistic NORM protocol operation, the sender
     should robustly advertise its current estimation of GRTT to the
     receiver set.  Common, robust knowledge of the sender's current
     operating GRTT estimate among the group will allow the protocol to
     progress in its most efficient manner.  The sender's GRTT estimate
     can be robustly advertised to the group by simply embedding the
     estimate into all pertinent messages transmitted by the sender.
     The overhead of this can be made quite small by quantizing (com-
     pressing) the GRTT estimate to a single byte of information.  The
     following C-lanquage function algorithm allows this to be done over



Adamson, Bormann, et al.  Expires January 2003                 [Page 26]


Internet Draft            NORM Building Blocks                 July 2002


     a wide range of GRTT values while maintaining a greater range of
     precision for small GRTT values and less precision for large val-
     ues:

          unsigned char QuantizeGrtt(double grtt)
          {
              if (grtt > 1.0e03)
                  grtt = 1.0e03;
              else if (grtt < 1.0e-06)
                  grtt = 1.0e-06;
              if (grtt < 3.3e-05)
                  return ((unsigned char)(grtt * 1.0e06) - 1);
              else
                  return ((unsigned char)(ceil(255.0.-
                                          (13.0 * log(1.0e03/grtt)))));
          }

     Note that this function is useful for quantizing GRTT times in the
     range of 1 microsecond to 1000 seconds.  Of course, NORM protocol
     implementations may wish to further constrain advertised GRTT esti-
     mates (e.g. limit the maximum value) for practical reasons.

3.8 Group Size Determination/Estimation

     (TBD)  Group size may be approximated from the density of feedback
     messages which follow the exponentially weighted random backoff.
     NORM_NACK messages might be used during normal protocol operation
     or a bootstrap procedure can be created to obtain an initial size
     estimation and track group size with receiver join/leave dynamics.
     This might also be combined with congestion control feedback col-
     lection.  The details of this are TBD.

3.9 Congestion Control Operation

     (TBD - A NACK-oriented protocol may place limitations/requirements
     on collection of information to facilitate congestion control of
     senders.  There are a number of specific issues of TCP-Friendly
     Multicast Congestion Control (TFMCC)which must be addressed.)

3.10 Router/Intermediate System assistance

     (TBD - NACK-oriented protocols can potentially benefit greatly from
     router assistance.  In particular, additional NACK suppression
     would be beneficial (This may impact how synchronized receiver NACK
     cycles are, sender advertisement of NACK-cycle parameters (i.e.
     GRTT, group size, etc), NACK content, others)

3.11 Additional protocol mechanisms



Adamson, Bormann, et al.  Expires January 2003                 [Page 27]


Internet Draft            NORM Building Blocks                 July 2002


     (TBD- e.g.  positive acknowledgement collection, performance
     statistics collection, session management, etc)

4.0 Security Considerations (TBD)


5.0 References

     [Mankin98] A. Mankin, A. Romanow, S.  Bradner, V.  Paxson, "IETF
     Criteria for Evaluating Reliable Multicast Transport and Applica-
     tion Protocols", RFC 2357, June 1998.

     [Pingali93] S. Pingali, D. Towsley, J.  Kurose.  "A Comparison of
     Sender-Initiated and Receiver-Initiated Reliable Multicast Proto-
     cols".  In Proc.  INFOCOM, San Francisco, CA, October 1993.

     [Levine96] B.N. Levine, J.J. Garcia-Luna-Aceves.  "A Comparison of
     Known Classes of Reliable Multicast Protocols", Proc.  Interna-
     tional Conference on Network Protocols (ICNP-96), Columbus, Ohio,
     Oct 29--Nov 1, 1996.

     [Clark90] D. Clark, D. Tennenhouse, "Architectural Considerations
     for a New Generation of Protocols".  In Proc.  ACM SIGCOMM, pages
     201--208, September 1990.

     [Deering89] S.  Deering.  "Host Extensions for IP Multicasting".
     Internet RFC1112, August 1989.

     [Floyd95] S. Floyd, V.  Jacobson, S.  McCanne, C.  Liu, and L.
     Zhang.  "A Reliable Multicast Framework for Light-weight Sessions
     and Application Level Framing", Proc.  ACM SIGCOMM, August 1995.

     [Nonnenmacher98] J.  Nonnenmacher and E.  W.  Biersack, "Optimal
     multicast feedback," in IEEE Infocom , (San Francisco, California),
     p.  964, March/April 1998.

     [Gossink98] D. Gossink, J.  Macker, "Reliable Multicast and Inte-
     grated Parity Retransmission with Channel Estimation", IEEE GLOBE-
     COM 98'.

     [Metzner84] J. Metzner, "An Improved Broadcast Retransmission Pro-
     tocol", IEEE Transactions on Communications, Vol.  Com-32, No.6,
     June 1984.

     [Macker97a] J. Macker, "Integrated Erasure-Based Coding for Reli-
     able Multicast Transmission", IRTF Meeting presentation, March
     1997.




Adamson, Bormann, et al.  Expires January 2003                 [Page 28]


Internet Draft            NORM Building Blocks                 July 2002


     [Macker97b] J. Macker, "Reliable Multicast Transport and Integrated
     Erasure-based Forward Error Correction", Proc. IEEE MILCOM 97,
     October  1997.

     [Ozdemir99]  V. Ozdemir, S. Muthukrishnan, I. Rhee, "Scalable, Low-
     Overhead Network Delay Estimation", NCSU/AT&T White Paper, February
     1999.


6.0 Authors' Addresses

     Brian Adamson
     adamson@itd.nrl.navy.mil
     Naval Research Laboratory
     Washington, DC, USA, 20375

     Carsten Bormann
     cabo@tellique.de
     Tellique Kommunikationstechnik GmbH
     Gustav-Meyer-Allee 25 Geb ude 12
     D-13355 Berlin, Germany

     Mark Handley
     mjh@aciri.org
     1947 Center Street, Suite 600
     Berkeley, CA 94704

     Joe Macker
     macker@itd.nrl.navy.mil
     Naval Research Laboratory
     Washington, DC, USA, 20375




















Adamson, Bormann, et al.  Expires January 2003                 [Page 29]

Document	Document type	This is an older version of an Internet-Draft that was ultimately published as RFC 3941. Expired & archived
	Select version	03 04 05 06 07 08 09 RFC 3941
	Compare versions
	Author
	RFC stream
	Other formats	txt pdf pdf bibtex bibxml
	Additional resources	Mailing list discussion