RMT Working Group                                         Brian Whetten
 Internet Engineering Task Force                                Talarian
 Internet Draft                                            Dah Ming Chiu
 Document: draft-ietf-rmt-bb-track-00.txt               Sun Microsystems
 17 November, 2000                                       Miriam Kadansky
 Expires 17 May 2001                                    Sun Microsystems
                                                           Gursel Taskale
                                                                 Talarian
 
 
          Reliable Multicast Transport Building Block for TRACK
                    <draft-ietf-rmt-bb-track-00.txt>
 
 
 Status of this Memo
 
    This document is an Internet-Draft and is in full conformance with
    all provisions of Section 10 of RFC2026.
 
    Internet-Drafts are working documents of the Internet Engineering
    Task Force (IETF), its areas, and its working groups. Note that
    other groups may also distribute working documents as Internet-
    Drafts. Internet-Drafts are draft documents valid for a maximum of
    six months and may be updated, replaced, or obsoleted by other
    documents at any time. It is inappropriate to use Internet- Drafts
    as reference material or to cite them other than as "work in
    progress."
 
    The list of current Internet-Drafts can be accessed at
    http://www.ietf.org/ietf/1id-abstracts.txt
    The list of Internet-Draft Shadow Directories can be accessed at
    http://www.ietf.org/shadow.html.
 
 Abstract
 
    This document describes the TRACK Building Block.  It contains
    functions relating to positive acknowledgments and hierarchical
    tree construction and maintenance.  It is primarily meant to be
    used as part of the TRACK Protocol Instantiation.  It is also
    designed to be useful as part of overlay multicast systems that
    wish to offer efficient confirmed delivery of multicast messages.
 
 Conventions used in this document
 
    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
    "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in
    this document are to be interpreted as described in RFC-2119.
 
 
 1.0 Introduction
 
 
 
 INTERNET DRAFT      draft-ietf-rmt-bb-track-00.txt                   1
 
 
 INTERNET DRAFT      draft-ietf-rmt-bb-track-00.txt       November 2000
 
 
    This document describes the TRACK Building Block.  It contains
    functions relating to positive acknowledgments and hierarchical
    tree construction and maintenance.  It is primarily meant to be
    used as part of the TRACK Protocol Instantiation.  It is also
    designed to be useful as part of overlay multicast systems that
    wish to offer efficient confirmed delivery of multicast messages.
 
    As pointed out in the building blocks rationale draft [WVKHFL00],
    there are two different reliability tasks that can be provided by a
    reliable multicast transport: ensuring goodput and confirming
    delivery of application level messages.  The NACK Protocol
    Instantiation and ALC Protocol Instantiation are each primarily
    concerned with ensuring goodput.  The TRACK BB and TRACK PI relies
    on a repair tree to provide goodput as well as confirmed delivery.
    If Forward Error Correction, Generic Router Assist or other
    mechanisms are used to help provide goodput, they are assumed to
    work transparently at a layer below this BB, as if the IP multicast
    service has lower error rate.
 
    The TRACK BB also assumes that there is an automatic tree
    configuration algorithm which provides the list of parents each
    node should join to.  If Receivers are used that may also serve as
    Repair Heads, the TRACK BB assumes the Auto Tree BB is also
    responsible for selecting the role of each Receiver as either
    Receiver or Repair Head.
 
    The TRACK BB also assumes that a separate session advertisement
    protocol notifies the receivers as to when to join a session, the
    data multicast address for the session, and the control parameters
    for the session.
 
    The TRACK BB provides additional information and aggregation
    capabilities, which are useful for congestion control.
 
    The TRACK BB provides the following detailed functionality.
 
    @ Hierarchical Session Creation and Maintenance.  This set of
       functionality is responsible for creating and maintaining (but
       not configuring) the hierarchical tree of Repair Heads and
       Receivers.
         o Bind.  When a child knows the parent it wishes to join to
            for a given data session, it binds to that parent.
         o Child Initiated Unbind.  When a child wishes to leave a
            data session, either because the session is over or because
            the application is finished with the session, it initiates
            an unbind operation with its parent.
         o Parent Initiated Unbind.  A parent can also force a child
            to unbind.  This happens if the parent needs to leave the
            session, if the child is not behaving correctly, or if the
 
 
 
 INTERNET DRAFT      draft-ietf-rmt-bb-track-00.txt                   2
 
 
 INTERNET DRAFT      draft-ietf-rmt-bb-track-00.txt       November 2000
 
 
            parent wants to move the child to another parent as part of
            tree configuration maintenance.
         o Fault Detection.  In order to verify liveness, parents and
            children send regular heartbeat messages between
            themselves.  The sender also sends regular null data
            messages to the group, if it has no data to send.
         o Fault Recovery.  When a child detects that its parent is no
            longer reachable, it may switch to another parent.  When a
            parent detects that one of its children is no longer
            reachable, it removes that child from its membership list
            and reports this up the tree to the Sender of the Data
            Session.
 
    @ TRACK Generation.  This set of functionality is responsible for
       periodically generating TRACK messages from all receivers to
       acknowledge receipt of data, report missing messages, advance
       flow control windows, provide roundtrip time measurements and
       provide other group management information.  The algorithms
       include:
         o TRACK Timing.  In order to avoid ACK implosion, the
            Receivers and Repair Heads use the rotating TRACK
            algorithm.
         o Flow Control and Buffer Management.  Receivers and Repair
            Heads maintain a set of buffers that are at least as large
            as the Sender's transmission window.  The Receivers pass
            their reception status up to the sender as part of their
            TRACK messages.  This is used to acknowledge receipt of
            delivery, to advance the buffer windows at each node, and
            to limit the sender's window advancement to the speed of
            the slowest receiver.
         o Application Level Confirmed Delivery.  Confirmed Delivery
            provides transport level confirmation of delivery.  Senders
            can put a "synch point" request in data messages, asking
            for application level confirmation.  Data messages with
            this flag set are only confirmed by the Receivers after the
            Receiver applications confirm receipt.
 
    @ Local Recovery. This functionality describes how repair heads
       maintain state on their children and provide repairs in response
       to requests for retransmission contained in TRACK messages.
       This has overlap with the NACK BB, which is unified in the TRACK
       PI.
 
    @ TRACK Aggregation. In order to provide the highest levels of
       scalability and reliability, interior tree nodes provide
       aggregation of control traffic flowing up the tree. The
       aggregated feedback information includes that used for end-to-
       end confirmed delivery, flow control, congestion control, and
       group membership monitoring and management.
 
 
 
 INTERNET DRAFT      draft-ietf-rmt-bb-track-00.txt                   3
 
 
 INTERNET DRAFT      draft-ietf-rmt-bb-track-00.txt       November 2000
 
 
    @ Distributed RTT Calculations.  One of the primary challenges of
       congestion control is efficient RTT calculations.  TRACK
       provides two methods to perform these calculations.
         o Sender Per-Message RTT Calculations.  Each message is
            stamped with a timestamp from the sender.  As each is
            passed up the tree, the amount of dally time spent waiting
            at each node is accumulated.  The highest measurements are
            passed up the tree, and the dally time is subtracted from
            the original measurement.
         o Local Per-Level RTT Calculations.  Each parent measures the
            local RTT to each of its children as part of the keep-alive
            messages used for failure detection.
 
 
 2. Design Rationale
 
    Much of the design rationale behind the protocol instantiations and
    building blocks being standardized by the RMT working group are
    laid out in [WVKHFL00].  In addition, the design rationale for the
    TRACK PI is laid out in [WCP00].  This building block conforms with
    the design rationales laid out in both of those documents.
 
    TRACK is designed to provide confirmed delivery, receiver-based
    flow control, distributed management of group membership (some of
    them may be dedicated servers in a repair tree), as well as
    providing aggregation of information up the tree.  It also provides
    requests for retransmissions as part of TRACK messages, and local
    recovery of lost packets.
 
    This TRACK BB is primarily designed to work as part of the TRACK
    BB, in conjunction with other BB's including NACK, FEC, and Auto
    Tree.  In the spirit of modular reuse specified in [WVKHFL00], it
    is also designed to be useful as an additional layer of
    functionality on top of any of the following services.  1) The
    functionality (if not the exact message headers) of the NORM PI.
    2) The functionality (if not the exact message headers) of the ALC
    PI.  3) Running directly on top of an unreliable IP multicast
    routing protocol, but on a carefully provisioned network.  4) On
    top of an overlay multicast (aka application layer multicast)
    system.
 
    Overlay multicast is a system where servers in the network provide
    multicast (and unicast) routing as well as reliable multicast
    delivery, all on top of a combination of unicast (i.e. TCP) and, as
    available, reliable multicast services.
 
    There is a fundamental tradeoff between reliability and real-time
    performance in the face of failures.  There are two primary types
    of single layer reliability that have been proposed to deal with
    this:  sender reliable and receiver reliable delivery.  Sender
 
 
 INTERNET DRAFT      draft-ietf-rmt-bb-track-00.txt                   4
 
 
 INTERNET DRAFT      draft-ietf-rmt-bb-track-00.txt       November 2000
 
 
    reliable delivery is similar to TCP, where the sender knows the
    identity of the receivers in a data session, and is notified when
    any of them fails to receive all the data messages.  Receiver
    reliable delivery limits knowledge of group membership and failures
    to only the actual receivers.  Senders do not have any knowledge of
    the membership of a group, and do not require receivers to
    explicitly join or leave a data session.  Receiver reliable
    protocols scale better in the face of networks that have frequent
    failures, and have very high isolation of failures between
    receivers.  This TRACK BB provides sender reliable delivery,
    potentially on top of a receiver reliable system.
 
    This BB is specified according to the guidelines in [KV00].  In
    addition, it specifies all communication between entities in terms
    of messages, rather than packets.  A message is an abstract
    communication unit, which may be part of, or all of, a given
    packet.  It does not have a specific format, although it does
    contain a list of fields, some of which may be optional, and some
    of which may have fixed lengths associated with them.  It is up to
    each protocol instantiation to combine the set of messages in this
    BB, with those in other components, and create the actual set of
    message formats that will be used.
 
    As mentioned in the introduction, this BB assumes the existence of
    a separate Auto Tree Configuration BB.  It also assumes that data
    sessions are advertised to all receivers as part of an external BB
    or other component.  It expects to also interact with other BB's
    through the TRACK PI, but does not require this.
 
 3. Applicability Statement
 
    It is widely recognized that no single reliable multicast protocol
    can meet the needs of all application types over all network types.
    Distinguishing factors include functionality and performance.
 
    From a functionality perspective, TRACK and NACK based reliable
    multicast protocols present an inherently different reliability
    model.  TRACK based protocols are able to remove messages from the
    retransmission window when all the children have acknowledged them.
    NACK based protocols have to rely on other means for determining
    how long to keep messages in the retransmission window.  A popular
    method is a time based scheme[SFCGLTLLBEJMRSV00].  TRACK based
    protocols can keep track of the membership of the data session, and
    provide confirmed delivery against that membership list.  NACK
    protocols have anonymous membership.
 
    Since reliability is obtained through control traffic, the
    difference in the semantics of the term reliability lead to the
    second distinguishing factor: performance.  When a persistent
    failure occurs among the members of a TRACK based protocol, there
 
 
 INTERNET DRAFT      draft-ietf-rmt-bb-track-00.txt                   5
 
 
 INTERNET DRAFT      draft-ietf-rmt-bb-track-00.txt       November 2000
 
 
    is a possibility that this may slow down other members of the
    group.  NACK protocols have higher isolation of failures, as well
    as smaller amounts of control traffic under many scenarios.
 
 3.1 Application types
 
    The objectives of TRACK are to provide high level reliability, high
    scalability, congestion control and flow control for one to many
    bulk data dissemination.  TRACK is not designed for many to many
    applications Examples of applications that fit into the one-to-many
    data dissemination model are: real time financial news and market
    data distribution, electronic software distribution, audio video
    streaming, distance learning, software updates and server
    replication.  But, not all of these application types have the same
    reliability requirements.
 
    Historically, financial applications have had the most stringent
    reliability requirements, while audio video streaming have had the
    least stringent.  For applications that want to have strong
    confirmation of delivery guarantees, TRACK may be more applicable
    than alternatives such as NORM or ALC.  For applications that do
    not require this level of reliability, or that demand the lowest
    levels of latency and the highest levels of failure isolation,
    TRACK may be less applicable.
 
    The TRACK BB, in particular, is designed to optionally work on top
    of a NORM or ALC PI, to allow applications to select this tradeoff
    on a dynamic basis.
 
 3.2 Network Infrastructure
 
    The TRACKs also serve to provide feedback information to the
    sender.  The sender uses this information for congestion and flow
    control.  This allows TRACK to be applicable in most networks (i.e.
    managed and shared networks, and high congestion networks.)
 
    Asymmetric networks with very low upbound bandwidth and a very low
    loss data channel may be better served through NACK based
    protocols, particularly if high reliability is not required.  A
    good example is some satellite networks.
 
    Networks that have very high loss rates, and regularly experience
    partial network partitions, router flapping, or other persistent
    faults, may be better served through NACK only protocols.
 
 
 4. Message Types
 
    The following table summarizes the messages and their fields used
    by the TRACK BB.
 
 
 INTERNET DRAFT      draft-ietf-rmt-bb-track-00.txt                   6
 
 
 INTERNET DRAFT      draft-ietf-rmt-bb-track-00.txt       November 2000
 
 
 
 
 
 Message          From    To      Mcast?  Fields
 
 
 
 BindRequest      Child   Parent  no      Session, Scope, Level,
                                          Role, SubTreeCount
 
 BindConfirm      Parent  Child   no      Session, Level, RepairAddr,
                                          SeqNum, MemberId
 
 BindReject       Parent  Child   no      Session, Reason
 
 UnbindRequest    Child   Parent  no      Session, Reason
 
 UnbindConfirm    Parent  Child   no      Session
 
 EjectRequest     Parent  Child   either  Session, Reason
 
 EjectConfirm     Child   Parent  no      Session
 
 Heartbeat        Parent  Child   either  Session, Level,
                                           ChildrenList, SeqNum,
                                            SendTime
 
 NullData         Sender  all     yes     Session, SeqNum, Rate,
                                          AppSynch
 
 Data             Sender  all     yes     Session, SeqNum, Rate,
                                           AppSynch, HighestReleased,
                                            SendTimeStamp
 
 Retransmission   Parent  Child   yes     Session, SeqNum, Rate,
                                          AppSynch, HighestReleased,
                                            SendTimeStamp
 
 Track            Child   Parent  no      Session, SeqNum, BitMask,
                                           Scope, HighestAllowed,
                                            AppSynch, SubTreeCount,
                                            Slowest, RTT, ParentThere,
                                            ParentTimeStamp,
                                            LocalDallyTime,
                                            SenderTimeStamp,
                                            SenderDallyTime
 
 
 
 
 
 
 INTERNET DRAFT      draft-ietf-rmt-bb-track-00.txt                   7
 
 
 INTERNET DRAFT      draft-ietf-rmt-bb-track-00.txt       November 2000
 
 
 
 
    The various fields of the messages are described as follows:
 
    - Session: an id that identifies the session.
 
    - Level: an integer that indicates the level in the repair tree.
    This value is used to keep loops in the tree from forming, in
    addition to indicating the distance from the sender.
 
    - SeqNum: an integer indicating the sequence number of a data
    message within a given data session.
 
    - AppSynch: a sequence number signaling a request for confirmed
    delivery by the application.
 
    - HighestAllowed: a sequence number, used for flow control from the
    receivers.  It signals the highest
    sequence number the sender is allowed to send that will not overrun
    the receivers' buffer pools.
 
    - BitMask: an array of 1's and 0's.  Together with a sequence
    number it is used to indicate lost data messages.  If the i'th
    element is a 1, it indicates the message SeqNum+i is lost.
 
    - Scope: an integer to indicate how far a repair message travels.
    This is optional.
 
    - Role: This indicates if the bind requestor is a receiver or
    repair head.
 
    - SubTreeCount: This is an integer indicating the current number of
    receivers below the node.
 
    - ChildrenList: This field contains the identifiers for a list of
    children.  As part of the keepalive message, this field together
    with the SeqNum field is used to urge those listed receivers to
    send a TRACK (for the provided SeqNum).  The repair head sending
    this must have been missing the regular TRACKs from these children
    for an extended period of time.
 
    - Rate: This field is used by the sender to tell the receivers its
    sending rate, in packets per second.  It is part of the data or
    nulldata messages.
 
    - HighestReleased:  This field contains a sequence number,
    corresponding to the trailing edge of the sender's retransmission
    window.  It is used (as part of the data, nulldata or
    retransmission headers) to inform the receivers that they should no
 
 
 
 INTERNET DRAFT      draft-ietf-rmt-bb-track-00.txt                   8
 
 
 INTERNET DRAFT      draft-ietf-rmt-bb-track-00.txt       November 2000
 
 
    longer attempt to recover those messages with a smaller (or same)
    sequence number.
 
    - HighestPacketSent:  This field is part of nulldata messages.  It
    contains the highest sequence numbered data message sent so far as
    part of this data session.
 
    - Slowest: This field contains a field that characterizes the
    slowest receiver in the subtree beneath (and including) the node
    sending the TRACK.  This is used to provide information for the
    congestion control BB, and the aggregation methods on this
    information are defined by that BB.
 
    - ParentThere: This field indicates to the parent that the receiver
    sending the TRACK has not been receiving the regular keepalive
    messages from its parent, and is wondering if it needs to find a
    new parent.
    - Reason: a code indicating the reason for the BindReject,
    UnbindRequest, or EjectRequest message.
 
    - RepairAddr: This field in the BindConfirm message is used to tell
    the receiver which multicast address the repair head will be
    sending retransmissions on.  If this field is null, then the
    receiver should expect retransmissions to be sent on the sender's
    data multicast address.  This SeqNum field in the same BindConfirm
    message indicates the sequence number starting from which the
    repair head promises to provide repair service.
 
    - SenderTimestamp: This field is included in Data messages to
    signal the need to do a roundtrip time measurement from the sender,
    through the tree, and back to the sender.  It is the time (measured
    by the sender's local clock) when it sent the packet.
 
    - SenderDallyTime: This field is included in TRACK messages.  It is
    associated with a SenderTimestamp field.  It contains the sum of
    the waiting time that should be subtracted from the RTT measurement
    at the sender.
 
    - ParentTimestamp: This field is included in Heartbeat messages to
    signal the need to do a local RTT measurement from a parent.  It is
    the time when the parent sent the packet.
 
    - ParentDallyTime: This is the same as the SenderDallyTime, but is
    associated with a ParentTimestamp instead of a SenderTimestamp.
 
    - MemberId: This is an integer the repair head assigns to a
    particular child.  The child receiver uses this value to implement
    the rotating TRACK Generation algorithm.
 
 
 
 
 INTERNET DRAFT      draft-ietf-rmt-bb-track-00.txt                   9
 
 
 INTERNET DRAFT      draft-ietf-rmt-bb-track-00.txt       November 2000
 
 
 5. Global Configuration Variables and Error Codes
 
 6. External API
 This section describes external interfaces for the building block.
 
 6.1 Interfaces to the BB from the PI
 
 6.1.1 Start(boolean RepairHead, boolean RejoinAllowed,
       Advertisement)
 
 Start instructs the BB to initiate operation. RepairHead indicates
 whether or not the node may also operate as a repair head.
 RejoinAllowed indicates whether or not the node is allowed to rejoin
 the session if the only repair heads available are missing some repair
 data.  In particular, this parameter also controls whether or not the
 node is allowed to join the session after the first data messages have
 become unrecoverable (late join).  The Advertisement parameter passes
 to the BB all of the parameters from the session advertisement.
 
 6.1.2 End
 
 End instructs the BB to end its operation.  If the node is the Sender,
 it indicates to the group that the session has ended.  If the node is
 a repair head, it continues to operate as a repair head until all of
 its children have finished.
 
 6.1.3 SendMessage(Message)
 
 SendMessage presents the BB with a data message to send to the group.
 
 6.1.4 MessageSynched(Message)
 
 MessageSynched tells the BB that the indicated message has been
 synched with the application.
 
 6.2 Interfaces from the BB to the PI
 
 6.2.1 MessageReceived(Message, boolean SynchMessage)
 
 MessageReceived passes a data message up to the PI. SynchMessage
 indicates whether or not the PI should call MessageSynched once the
 message has been consumed by the application.
 
 6.2.2 SenderLost
 
 SenderLost tells the PI that contact with the sender has been lost.
 
 6.2.3 UnrecoverableData
 
 
 
 
 INTERNET DRAFT      draft-ietf-rmt-bb-track-00.txt                  10
 
 
 INTERNET DRAFT      draft-ietf-rmt-bb-track-00.txt       November 2000
 
 
 UnrecoverableData indicates to the PI that the BB was unable to
 recover some session data.
 
 6.2.4 SessionDone
 
 SessionDone indicates to the PI that the sender has completed sending
 the data, and the node has left the session.
 
 7. Algorithms
 
 7.1 Tree Based Session Creation and Maintenance
 
    7.1.1 Overview of Tree Configuration
 
    Before a Data Session starts delivering data, the tree for the Data
    Session needs to be created.  This process binds each Receiver to
    either a Repair Head or the Sender, and binds the participating
    Repair Heads into a loop-free tree structure with the Sender as the
    root of the tree.  This process requires tree configuration
    knowledge, which can be provided with some combination of manual
    and/or automatic configuration.  The algorithms for automatic tree
    configuration are part of the Automatic Tree Configuration BB.
    They return to each node the address of the parent it should bind
    to, as well as zero or more backup parents to use if the primary
    parent fails.
 
    In addition to receiving the tree configuration information, the
    receivers all receive a Session Advertisement message from the
    senders, informing them of the Data Multicast Address and other
    session configuration information.  This advertisement may
    advertise other relevant session information such as whether or not
    Repair Heads should be used, whether manual or automatic tree
    configuration should be used, the time at which the session will
    start, and other protocol constants.  This advertisement is created
    as part of either the PI or as part of an external service.  In
    this way, the Sender enforces a set of uniform Session
    Configuration Parameters on all members of the session.
 
    As described in the automatic tree configuration BB, the general
    algorithm for a given node in tree creation is as follows.
    1) Get advertisement that a session is starting
    2) Get list of neighbor candidates, contact them
    3) Select best neighbor as parent in a loop free manner
    4) Bind to parent
    5) Optionally, later rebind to another parent
 
    When a child finishes step 4, it is up to automatic tree
    configuration to, if necessary, continue building the tree in order
    to connect the node back to the Sender.  After the session is
    created, children can unbind from their parents and bind again to
 
 
 INTERNET DRAFT      draft-ietf-rmt-bb-track-00.txt                  11
 
 
 INTERNET DRAFT      draft-ietf-rmt-bb-track-00.txt       November 2000
 
 
    new parents.  This happens when faults occur, or as part of a tree
    optimization process.  Steps 1 through 3 are external to the TRACK
    BB.  Step 4 is performed as part of session creation.  Step 5 is
    performed as part of session maintenance in conjunction with
    automatic tree building, as either a child initiated unbind or a
    parent initiated unbind, combined with another bind operation.
 
    Once steps 1 through 3 are completed, Receivers join the Data
    Multicast Address, and attempt to bind to either the Sender or a
    local Repair Head.  A Receiver will attempt to bind to the first
    node in the tree configuration list returned by step 3, and if this
    fails, it will move to the next one.  A Receiver only binds to a
    single Repair Head or Sender, at a time, for each Data Session.
 
    The automatic tree building BB ensures that the tree is formed
    without loops.  As part of this, when a Repair Head has a Receiver
    attempt to bind to it for a given Data Session, it may not at first
    be able to accept the connection, until it is able to join the tree
    itself.  Because of this, a Receiver will sometimes have to
    repeatedly attempt to bind to a given parent before succeeding.
 
    Once the Sender initiates tree building, it is also free to start
    sending Data messages on the Data Multicast Address.  Repair Heads
    and Receivers may start receiving these messages, but may not
    request retransmission or deliver data to the application until
    they receive confirmation that they have successfully bound to the
    group.
 
 7.1.2 Bind
 
 7.1.2.1 Input Parameters
 
    In order to join a data session and bind to the tree, the following
    nodes need the following parameters.
 
    A Repair Head requires the following parameters.
 
    - Session:  the unique identifier for the data session to join,
    received from the Session Advertisement algorithm in the PI.
 
    - ParentAddress:  the address and port of the parent node to which
    the node should connect
 
    - UDPListenPort:  the number of the port on which the node will
    listen for its children's control messages
 
    - RepairAddr:  the multicast address, UDP port, and TTL on which
    this node sends control messages to its children.
 
 
 
 
 INTERNET DRAFT      draft-ietf-rmt-bb-track-00.txt                  12
 
 
 INTERNET DRAFT      draft-ietf-rmt-bb-track-00.txt       November 2000
 
 
    A Sender requires the above parameters, except for the
    ParentAddress.  A Receiver requires the above parameters, except
    for the UDPListenPort and RepairAddr.
 
 7.1.2.2 Bind Algorithm
 
    A Bind operation happens when a child wishes to join a parent in
    the distribution tree for a given data session.  The Receivers
    initiate the first Bind protocols to their parents, which then
    cause recursive binding by each parent, up to the Sender.  Each
    Receiver sends a separate BindRequest message for each of the
    streams that it would like to join.  At the discretion of the PI,
    multiple BindRequest messages may be bundled together in a single
    message.
 
    A node sends a BindRequest message to its automatically selected or
    manually configured parent node.  The parent node sends either a
    BindConfirm message or a BindReject message.  Reception of a
    BindConfirm message terminates the algorithm successfully, while
    receipt of a BindReject message causes the node to either retry the
    same parent or restart the Bind algorithm with its next parent
    candidate (depending on the BindReject reason code), or if it has
    none, to declare a REJECTED_BY_PARENT error.
 
    Reliability is achieved through the use of a standard request-
    response protocol.  At the beginning of the algorithm, the child
    initializes TimeMaxBindResponse to the constant
    TIMEOUT_PARENT_RESPONSE and initializes NumBindResponseFailures to
    0.  Every time it sends a BindRequest message, it waits
    TimeMaxBindResponse for a response from the parent node.  If no
    response is received, the node doubles its value for
    TimeMaxBindResponse, but limits TimeMaxBindResponse to be no larger
    than MAX_TIMEOUT_PARENT_RESPONSE.  It also
    increments NumBindResponseFailures, and retransmits the BindRequest
    message.  If NumBindResponseFailures reaches
    NUM_MAX_PARENT_ATTEMPTS, it reports a PARENT_UNREACHABLE error.
 
    When a parent receives a BindRequest message, it first consults the
    automatic tree building BB for approval, for instance to ensure
    that accepting the BindRequest will not cause a loop in the tree.
    Then the parent checks to be sure that it does not have more than
    NUM_MAX_CHILDREN children already bound to it for this session.  If
    it can accept the child, it sends back a BindConfirm message.
    Otherwise, it sends the node a BindReject message. Then the parent
    checks to see if it is already a member of this data session.  If
    it is not yet a member of this session, it attempts to join the
    tree itself.
 
    The BindConfirm message contains the lowest sequence number that
    the repair head has available.  If this number is 0 or 1, then the
 
 
 INTERNET DRAFT      draft-ietf-rmt-bb-track-00.txt                  13
 
 
 INTERNET DRAFT      draft-ietf-rmt-bb-track-00.txt       November 2000
 
 
    repair head has all of the data available from the start of the
    session.  Otherwise, the requesting node is attempting a late join,
    and can only use this repair head if late join was allowed by the
    PI.  If late join is not allowed, the node may try another repair
    head, or give up.
 
    Similarly, if a failure recovery occurs, when a node tries to bind
    to a new repair head, it must follow the same rules as for a late
    join.
 
    7.1.3 Child Initiated Unbind
 
    A child may decide to leave a data session for the following
    reasons.  1) It detects that the data session is finished.  2) The
    application requests to leave the data session.  3) It is not able
    to keep up with the data rate of the data session.  When any of
    these conditions occurs, it initiates an Unbind process.
 
    A Child Initiated Unbind is, like the Bind function, a simple
    request-reply protocol.  Unlike the Bind function, it only has a
    single response, UnbindAccept.   With this exception, the Unbind
    operation uses the same state variables and reliability algorithms
    as the Bind function.
 
    When a child receives an UnbindAccept message from its parent, it
    reports a LEFT_DATA_SESSION_GRACEFULLY event.  If it does not
    receive this message after NUM_MAX_PARENT_ATTEMPTS, then it reports
    a LEFT_DATA_SESSION_ABNORMALLY event.
 
 7.1.4 Parent Initiated Unbind
 
    A parent may decide to remove one or more of its children from a
    data stream for the following reasons.  1) The parent needs to
    leave the group due to application reasons.  2) The repair head
    detects an unrecoverable failure with either its parent or the
    sender.  3) The parent detects that the child is not able to keep
    up with the speed of the data stream.  4) The parent is not able to
    handle the load of its children and needs some of them to move to
    another parent.  In the first two cases, the parent needs to
    multicast the advertisement of the termination of one or more data
    sessions to all of its children.  In the second two cases, it needs
    to send one or more unicast notifications to one or more of its
    children.
 
    Consequently, a Parent Initiated Unbind can be done either with a
    repeated multicast advertisement message to all children, or a set
    of unicast request-reply messages to the subset of children that it
    needs to go to.
 
 
 
 
 INTERNET DRAFT      draft-ietf-rmt-bb-track-00.txt                  14
 
 
 INTERNET DRAFT      draft-ietf-rmt-bb-track-00.txt       November 2000
 
 
    For the multicast version of Parent Initiated Unbind, the parent
    sends a multicast UnbindNotification message to all of its children
    for a given Data Session, on its Local Multicast Channel.  It is
    only necessary to provide statistical reliability on this message,
    since children will detect the parent's failure even if the message
    is not received.  Therefore, the UnbindNotification message is sent
    NUM_REDUNDANT_MULTICAST_ADVERTISEMENT times, which has a
    recommended value of 3.
 
    For the unicast version of Parent Initiated Unbind, the parent
    sends a unicast UnbindNotification message to all of its children.
    Each of them respond with an EjectConfirm.  Reliability is ensured
    through the same request-reply mechanism as the Bind operation.
 
 7.1.5 Fault Detection
 
    There are three cases where fault detection is needed.  1)
    Detection (by a child) that a parent has failed.  2) Detection (by
    a parent) that a child has failed.  3) Detection (by either a RH or
    Receiver) that a Sender has failed.
 
    In order to be scaleable and efficient, fault detection is
    primarily accomplished by periodic keep-alive messages, combined
    with the existing TRACK messages.  Nodes expect to see keep-alive
    messages every set period of time.  If more than a fixed number of
    periods go by, and no keep-alive messages of a given type are
    received, the node declares a preliminary failure.  The detecting
    node may then ping the potentially failed node before declaring it
    failed, or it can just declare it failed.
 
    Failures are detected through three keep-alive messages:
    Heartbeat, TRACK, and NullData.  The Heartbeat message is multicast
    periodically from a parent to its children on its local control
    channel.  NullData messages are multicast by a sender on the global
    multicast address when it has no data to send.  TRACK messages are
    generated periodically, even if no data is being sent to a data
    session, as described in section 7.2.
 
    Heartbeat messages are multicast every HeartbeatPeriod seconds,
    from a parent to its children.  Every time that a parent sends a
    Retransmission message or a Heartbeat message (as well as at
    initialization time), it resets a timer for HeartbeatPeriod
    seconds.  If the timer goes off, a Heartbeat is sent.  The
    HeatbeatPeriod is dynamically computed as follows:
 
            interval = AckWindow / PacketRate
 
            HeartbeatPeriod = 2 * interval
 
 
 
 
 INTERNET DRAFT      draft-ietf-rmt-bb-track-00.txt                  15
 
 
 INTERNET DRAFT      draft-ietf-rmt-bb-track-00.txt       November 2000
 
 
    If no Retransmission messages are sent within HeartbeatPeriod, a
    Heartbeat message is generated.  Global configuration parameters
    ConstantHeartbeatPeriod and MinimumHeartbeatPeriod can be used to
    either set HeartbeatPeriod to a constant, or give HeartbeatPeriod a
    lower bound, globally.
 
    Similarly, a NullData message is multicast by the sender to all
    data session members, every NULL_DATA_PERIOD.  The NullData timer
    is set to NULL_DATA_PERIOD, and is reset every time that a Data or
    NullData message is sent by the Sender.
 
    The key parameter for failure detection is the global tree
    parameter FAILURE_DETECTION_REDUNDANCY.  The higher the value for
    this parameter, the more keep-alive messages that must be missed
    before a failure is declared.
 
    A major goal of failure detection is for children to detect parent
    failures fast enough that there is a high probability they can
    rejoin the stream at another parent, before flow control has
    advanced the buffer window to a point where the child can not
    recover all lost messages in the stream.  In order to attempt to do
    this, children detect a failure of a parent if
    FAILURE_DETECTION_REDUNDANCY * HeartbeatPeriod time goes by without
    any heartbeats.  As part of buffer window advancement, described in
    section 7.2.4, all parents MAY choose to buffer all messages for a
    minimum of FAILURE_DETECTION_REDUNDANCY * 2 * HeartbeatPeriod
    seconds, which gives children a period of time to find a new parent
    before the buffers are freed.
 
    A parent detects a preliminary failure of one of its children if it
    does not receive any TRACK messages from that child in
    FAILURE_DETECTION_REDUNDANCDY * TrackTimeout seconds (see
    discussion of how TrackTimeout is computed in 7.2).  Because a
    failed child can slow down the group's progress, it is very
    important that a parent resolve the child's status quickly.  Once a
    parent declares a preliminary failure of a child, it issues a set
    of up to FAILURE_DETECTION_REDUNDANCY Heartbeat messages that are
    unicast (or multicast) to the failed receiver(s).  These messages
    are spaced apart by 2*LocalRTT, where LocalRTT is the round trip
    time that has been measured to the child in question (see 7.4 for
    description of how LocalRTT is measured).  These Heartbeat messages
    contain a ChildrenList field that contains the children who are
    requested to send a TRACK immediately.
 
    Whenever a child receives a Heartbeat message with an
    ImmediateTRACK field set to 1, it immediately sends a TRACK to its
    parent.  If a parent does not receive a TRACK message from a child
    after waiting a period of 2*ChildRTT after the last Heartbeat
    message to that child, it declares the child failed, and removes it
    from the parent's child membership list.
 
 
 INTERNET DRAFT      draft-ietf-rmt-bb-track-00.txt                  16
 
 
 INTERNET DRAFT      draft-ietf-rmt-bb-track-00.txt       November 2000
 
 
 
    A child or a repair head detects the failure of a sender if it does
    not receive a Data or NullData message from a sender in
    FAILURE_DETECTION_REDUNDANCY * NULL_DATA_PERIOD.
 
    Note that the more receivers there are in a tree, and the higher
    the loss rate, the larger FAILURE_DETECTION_REDUNDANCY must be, in
    order to give the same probability that erroneous failures won't be
    declared.
 
 7.1.6 Fault Notification
 
    When a parent detects the failure of a child, it adds a failure
    notification field to the next FAILURE_DETECTION_REDUNDANCY TRACK
    messages that it sends up the tree.  It sends this notification
    multiple times because TRACKs are not delivered reliably.  A
    failure notification field includes the failure code, as well as a
    list of one or more failed nodes.  Failure notifications are
    aggregated up the tree, according to the rules in 7.3.  A failure
    notification is not a definitive report of a failure, as the child
    may have moved to a different repair head.
 
 7.1.7 Fault Recovery
 
    The Fault Recovery algorithms require a list of one or more
    addresses of alternate parents that can be bound to, and that still
    provide loop free operation.
 
    If a child detects the failure of its parent, it then re-runs the
    Bind operation to a new parent candidate, in order to rejoin the
    tree.  As described above in section 7.1.2, a node may perform a
    late bind, i.e. binding with a repair head which cannot provide all
    the necessary repair data, only if allowed by the PI.
 
 7.2 TRACK Generation
 
    This section describes the algorithms used by the receiver to
    determine when to send the TRACK messages.
 
    TRACK messages are sent from receivers to their parents.  TRACK
    messages may be sent for the following purposes:
    - to request retransmission of messages
    - to advance the sender's transmission window for flow control
    purposes
    - to deliver end-to-end confirmation of data reception
    - to propagate other relevant feedback information up through the
    session (such as RTT and loss reports, for congestion control)
 
    The TRACK PI also makes use of the NACK BB, which request
    retransmission of messages from a parent.  The TRACK request and
 
 
 INTERNET DRAFT      draft-ietf-rmt-bb-track-00.txt                  17
 
 
 INTERNET DRAFT      draft-ietf-rmt-bb-track-00.txt       November 2000
 
 
    response algorithms should be highly similar to the NACK algorithms
    for this specific case.
 
 7.2.1 TRACK Generation with the Rotating TRACK Algorithm
 
    Each receiver sends a TRACK message to its parent once per
    AckWindow (a globally configured variable) of data messages
    received.  A receiver uses an offset from the boundary of each
    AckWindow to send its TRACK, in order to reduce burstiness of
    control traffic at the parents.  Each parent has a maximum number
    of children, MaxChildren.  When a child binds to the parent, the
    parent assigns a locally unique ChildID to that child, between 0
    and MaxChildren-1.
 
    Each child in a tree generates a TRACK message at least once every
    AckWindow of data messages, when the most recent data message's
    sequence number, modulo MemberID, is equal to AckWindow.  If the
    message that would have triggered a given TRACK for a given node is
    missed, the node will generate the TRACK as soon as it learns that
    it has missed the message, typically through receipt of a higher
    numbered data message.
 
    Together, AckWindow and MaxChildren determine the maximum ratio of
    control messages to data messages seen by each parent, given a
    constant load of data messages.
 
    In each data message, the sender advertises the current PacketRate
    (measured in messages per second) it is sending data at.  This rate
    is generated by the congestion control algorithms in use at the
    sender.
 
    At the time a node sends a regular TRACK, it also computes a
    TRACKTimeout value:
 
            interval = AckWindow / PacketRate
 
            TRACKTimeout = 2 * interval
 
    If no TRACKs are sent within TRACKTimeout interval, a TRACK is
    generated, and TRACKTimeout is increased by a factor of 2, up to a
    value of MaxTRACKTimeout.
 
    This timer mechanism is used by a receiver to ensure timely repair
    of lost messages and regular feedback propagation up the tree even
    when the sender is not sending data continuously. This mechanism
    complements the AckWindow-based regular TRACK generation mechanism.
 
 7.2.2 Local Repair
 
 
 
 
 INTERNET DRAFT      draft-ietf-rmt-bb-track-00.txt                  18
 
 
 INTERNET DRAFT      draft-ietf-rmt-bb-track-00.txt       November 2000
 
 
    A repair head maintains the following state for each of its
    children, for the purpose of providing repair service to the local
    group:
 
    - HighestConsecutivelyReceived: a sequence number indicating all
       Data messages before this number (inclusive) have been received
       by a given child.
 
    - MissingPackets: a data structure to keep track of the reception
       status of the Data messages with sequence number higher than
       HighestConsecutivelyReceived.
 
    In addition, a repair head also maintains other state for purposes
    of feedback aggregation described in the next section.
 
    The minimum HighestConsecutivelyReceived value of all its children
    is kept as the variable LocalStable.
 
    A repair head also maintains a retransmission buffer. The size of
    the retransmission buffer must be greater than the maximum value of
    a sender's transmission window. The retransmission buffer must keep
    all the Data messages received by the repair head with sequence
    number higher than LocalStable, optionally some messages with
    sequence number lower than LocalStable if there is room (beyond the
    maximum value of sender's transmission window).  The latter
    messages are kept in the retransmission buffer in case a receiver
    from another group losses its parent and needs to join this group.
 
    As TRACK messages are received, the repair head updates the above
    states.
 
    To perform local repair, a repair head implements a retransmission
    queue with memory.  Each lost message (reported by a child using
    the BitMask field) is entered into the retransmission queue in
    increasing order according to its sequence number. If the same Data
    message has already been retransmitted recently (recognized due to
    the queue's memory) it is delayed by the local group RTT (see
    roundtrip time measurement) before retransmission.
 
    The retransmissions are sent using the same PacketRate is that used
    by the sender.
 
 7.2.3 Flow Control Window Update
 
    When a receiver sends a TRACK to its parent, the HighestAllowed
    field provides information on the status of the receiver's flow
    control window.  The value of HighestAllowed is computed as
    follows:
 
    HighestAllowed = seqnum + ReceiverWindow
 
 
 INTERNET DRAFT      draft-ietf-rmt-bb-track-00.txt                  19
 
 
 INTERNET DRAFT      draft-ietf-rmt-bb-track-00.txt       November 2000
 
 
 
    Where seqnum is the highest sequence number of consecutively
    received data message at the receiver.  The size of the
    ReceiverWindow may either be based on a parameter local to the
    receiver or be a global parameter.
 
 7.2.4 Reliability Window
 
    The sender and each repair head maintain a window of messages for
    possible retransmission.  As messages are acknowledged by all of
    its children, they are released from the parent's retransmission
    buffer, as described in 7.2.2. In addition, there are two global
    parameters that can affect when a parent releases a data message
    from the retransmission buffer -- MIN_HOLD_TIME, and MAX_HOLD_TIME.
 
    MIN_HOLD_TIME specifies a minimum length of time a message must be
    held for retransmission from when it was received. This parameter
    is useful to handle scenarios where one or more children have been
    disconnected from their parent, and have to reconnect to another.
    If, for example, MIN_HOLD_TIME is set to
    FAILURE_DETECTION_REDUNDANCY * 2 * ConstantHeartbeatPeriod, then
    there is a high likelihood that any child will be able to recover
    any lost messages after reconnecting to another parent.
 
    The sender continually advertises to the members of the data
    session both edges of its retransmission window.  The higher value
    is the HighestPacketSent field in each data message, which
    specifies the highest sequence number of any data message sent.
    The trailing edge of the window is advertised in the
    HighestReleased field.  This specifies the largest sequence number
    of any message sent that has subsequently been released from the
    sender's retransmission window.  If both values are the same then
    the window is presently empty.  Zero is not a legitimate value for
    a data sequence number, so if either field has a value of zero,
    then no messages have yet reached that state.  All sequence number
    fields use sequence number arithmetic so that a data session can
    continue after exhausting the sequence number space.
 
    When a member of a data session receives an advertisement of a new
    HighestReleased value, it stores this, and is no longer allowed to
    ask for retransmission for any messages up to and including the
    HighestReleased value.  If it has any outstanding missing messages
    that are less than or equal to HighestReleased, it MAY move forward
    and continue delivering the next data messages in the stream.  It
    also SHOULD report an error for the messages that are no longer
    recoverable.
 
    MAX_HOLD_TIME specifies the maximum length of time a message may be
    held for retransmission.  This parameter is set at the sender who
    uses it to set the HighestReleased field in data message headers.
 
 
 INTERNET DRAFT      draft-ietf-rmt-bb-track-00.txt                  20
 
 
 INTERNET DRAFT      draft-ietf-rmt-bb-track-00.txt       November 2000
 
 
    This is particularly useful for real-time, semi-reliable streams
    such as live video, where retransmissions are only useful for up to
    a few seconds.  When combined with Unordered delivery semantics,
    and application-level jitter control at the receivers, this
    provides Time Bounded Reliability.  Obviously, MAX_HOLD_TIME must
    always be larger than MIN_HOLD_TIME.
 
 7.2.5 Confirmed Delivery
 
    Flow control and the reliability window are concerned with goodput,
    of delivering data with a high probability that it is delivered at
    all receivers.  However, neither mechanism provides explicit
    confirmation to the sender as to the list of recipients for each
    message.  Confirmed delivery allows applications to determine the
    set of applications that have received a set of data messages.
 
    To request this service, a sender fills the AppSynch field of data
    messages with the sequence number of the highest data message it
    wishes to confirm delivery of.  It continues to do so until it
    receives confirmation, moves the AppSynch point forward to a higher
    sequence number, or declares an error.
 
    When a receiver gets a data message with a non-zero AppSynch field,
    it starts including the highest sequence number that has been
    acknowledged by the application in the ApplicationConfirms field of
    each TRACK message that it sends up the tree.  In order to provide
    reliable delivery of this acknowledgement, this continues so long
    as a receiver gets data messages with non-zero AppSynch fields.
 
    Each receiver is responsible for locally deciding the value of the
    ApplicationConfirms field.  There are two primary issues a receiver
    must consider in setting this field:  the reliability semantics of
    the data stream, and when a given message is considered confirmed
    at the receiver.  As this is an application level confirmation, a
    handshake with the application is required to get this
    confirmation.
 
    One example of how an application can implicitly signal
    confirmation of delivery is through the freeing of buffers passed
    to it by the transport.  The API could specify that whenever an
    application has freed up a buffer containing one or more data
    messages, then these messages are considered acknowledged by the
    application.  Alternatively, the application could be required to
    explicitly acknowledge each message.
 
    With a given transport-application API for signaling
    acknowledgement, the transport then keeps track of all contiguous
    acknowledgements from that application, and reports these up in the
    ApplicationConfirms field.  If one or more messages can not be
    acknowledged, the receiver should pass an error report attached to
 
 
 INTERNET DRAFT      draft-ietf-rmt-bb-track-00.txt                  21
 
 
 INTERNET DRAFT      draft-ietf-rmt-bb-track-00.txt       November 2000
 
 
    the ApplicationConfirms field, detailing the type of failure that
    occurred, and the sequence number of the first message that has not
    yet been delivered.
 
    If MAX_HOLD_TIME is not in use for a data stream, so that delivery
    is fully reliable, then any message that can not be delivered will
    be considered a fatal error for that receiver.  If MAX_HOLD_TIME
    has a non-zero value, then any messages that could not be
    delivered, but are less than HighestReleased as advertised by the
    sender, are not reported as errors.
 
    In addition to the AppSynch field, a sender may also set the
    ImmediateACK field to 1.  When a node gets a data message that has
    this flag set, it will immediately send a TRACK after processing
    that message.
 
 7.3 Feedback Aggregation
 
    This section describes how repair heads perform aggregation on
    feedback information sent up in the fields of the TRACK message,
    and the purposes for performing such aggregation.
 
    There are many reasons for providing feedback from all the
    receivers to the sender in an aggregated form.  The major ones are
    listed below:
 
    1) End-to-end delivery confirmation.  This confirmation tells the
    sender that all the receivers (in the entire tree) have received
    data packets up to a certain sequence number.  The field that
    carries this information is AppSynch.
 
    2) Flow control.  The aggregated information is carried in the
    field HighestAllowed.  It tells the sender the highest sequence
    number that all the receivers (in the entire tree) are prepared to
    receive.
 
    3) Identifying the slowest receiver.  The aggregated information is
    carried in the field Slowest.  The sender can used this value as
    part of congestion control.
 
    4) Counting current membership in the group.  This information is
    carried in the field SubTreeCount.  This lets the sender know the
    number of receivers currently connected to the repair tree.
 
    5) Measuring the round-trip time from the sender to the "worst"
    receiver.
 
    A repair head maintains state for each child.  Each time a TRACK
    (from a child) is received, the corresponding states for that child
    are updated based on the information in the TRACK message. When a
 
 
 INTERNET DRAFT      draft-ietf-rmt-bb-track-00.txt                  22
 
 
 INTERNET DRAFT      draft-ietf-rmt-bb-track-00.txt       November 2000
 
 
    repair head sends a TRACK message to its parent, the following
    fields of its TRACK message are derived from the aggregation of the
    corresponding states for its children.  The following rules
    describe how the aggregation is performed:
 
    - AppSynch: take the minimum of the AppSynch value from all
    children
 
    - HighestAllowed: take the minimum of the HighestAllowed value from
    all children
 
    - Slowest: this is a measure of how slow the slowest member in the
    whole subtree is; take either the minimum (or maximum) of the
    Slowest value from all children (depending what the Slowest measure
    is).
 
    - SubTreeCount: take the sum of the SubTreeCount from all
    children
 
    - SenderDallyTime: take the minimum value of the SenderDallyTime
    from all children and add in the local dally time.  Note, the
    SendTimeStamp field is left alone.  The sender will derive the
    roundtrip time to the worst receiver by doing its local aggregation
    for SenderDallyTime and then compute:
 
            RTT = currentTime - SendTimeStamp - SenderDallyTime.
 
 
 7.4 Measuring Round Trip Times
 
 
 
    This TRACK BB provides two algorithms for distributed RTT
    calculations-LocalRTT measurements and SenderRTT measurements.
    LocalRTT measurements are only between a parent and its children.
    SenderRTT measurements are an end to end RTT measurement, measuring
    the RTT to the worst receiver as selected by the congestion control
    algorithms.
 
    The SenderRTT is useful for congestion control. It can be used to
    set the data rate based on the TCP response function, which is
    being proposed for the congestion control building block.
 
    The LocalRTT can be used to (a) quickly detect faulty children (as
    described in 7.1) or (b) avoid sending unnecessary retransmissions
    (as described in 7.2 in the local repair algorithm).
 
    In the case of LocalRTT measurements, a parent initiates
    measurement by including a ParentTimestamp field in a Heartbeat
    message sent to its children.  When a child receives a Heartbeat
 
 
 INTERNET DRAFT      draft-ietf-rmt-bb-track-00.txt                  23
 
 
 INTERNET DRAFT      draft-ietf-rmt-bb-track-00.txt       November 2000
 
 
    message with this field set, it notes the time of receipt using its
    local system clock, and stores this with the message as
    HeartbeatReceiveTime.  When the child next generates a TRACK, just
    before sending it, it measures its system clock again as
    TRACKSendTime, and calculates the LocalDallyTime.
 
    LocalDallyTime = TRACKSendTime - HeartbeatReceiveTime.
 
    The child includes this value, along with the ParentTimestamp
    field, as fields in the next TRACK message sent.  Every heartbeat
    message that is multicast to all children SHOULD include a
    ParentTimestamp field.
 
    The SenderRTT algorithm is similar.  A sender initiates the process
    by including a SenderTimestamp field in a data message.  When a
    receiver gets a message with this field set, it keeps track of the
    DataReceiveTime for that message, and when it generates the next
    TRACK message, includes the SenderTimestamp and SenderDallyTime
    value.  These values are aggregated by Repair Heads, as described
    in section 7.3.
 
    Each node only keeps track of the most recent value for
    {SenderTimestamp, DataReceiveTime} and {ParentTimestamp,
    HeartbeatReceiveTime}, replacing any older values any time that a
    new message is received with these values set.  As long as it has
    non-zero values to report, each node sends up both a
    {SenderTimestamp, SenderDallyTime} and a {ParentTimestamp,
    LocalDallyTime} set of fields in each TRACK message generated.
 
    These measurements need to be averaged by the TRACK PI.
 
 8. Security
 
    This BB does not specifically deal with security.  It is the
    responsibility of the TRACK PI or the Security BB.  This issue is
    covered in the Security Requirements For TRACK draft [HV00].
 
 9. References
 
    [HV00] T. Hardjono, B. Whetten, "Security Requirements For TRACK,"
    Internet Draft, Internet Engineering Task Force, June, 2000.
 
    [KV00] R. Kermode, L. Vicisano, "Author Guidelines for RMT Building
    Blocks and Protocol Instantiation Documents," Internet Draft,
    Internet Engineering Task Force, June, 2000.
 
    [SFCGLTLLBEJMRSV00] T. Speakman, D. Farinacci, J. Crowcroft, J.
    Gemmell, S. Lin, A. Tweedly, D. Leshchiner, M. Luby, N. Bhaskar, R.
    Edmonstone, K. M. Johnson, T. Montgomery, L. Rizzo, R.
    Sumanasekera, and L. Vicisano, "PGM Reliable Transport Protocol
 
 
 INTERNET DRAFT      draft-ietf-rmt-bb-track-00.txt                  24
 
 
 INTERNET DRAFT      draft-ietf-rmt-bb-track-00.txt       November 2000
 
 
    Specification," Internet Draft, Internet Engineering Task Force,
    April 2000.
 
    [WCP00] B. Whetten, D. Chiu, S. Paul, M. Kadansky, G. Taskale,
    "TRACK Architecture, A Scalable Real-Time Reliable Multicast
    Protocol," Internet Draft, Internet Engineering ask Force, July
    2000.
 
    [WVKHFL00]  B. Whetten, L. Vicisano, R. Kermode, M. Handley, S.
    Floyd, and M. Luby, "Reliable Multicast Transport Building Blocks
    for One-to-Many Bulk-Data Transfer," Internet Draft, Internet
    Engineering Task Force, October 2000.
 
 
 10.  Acknowledgments
    We would like to thank the follow people: Sanjoy Paul, Seok Joo
    Koh, Supratik Bhattacharyya, Joe Wesley, and Joe Provino.
 
 
 11. Authors' Addresses
 
    Dah Ming Chiu
    dahming.chiu@sun.com
    Miriam Kadansky
    miriam.kadansky@sun.com
    Sun Microsystems Laboratories
    1 Network Drive
    Burlington, MA 01803
 
    Gursel Taskale
    gursel@talarian.com
    Brian Whetten
    whetten@talarian.com
    Talarian
    333 Distel Circle
    Los Altos, CA 94022-1404
 
 
    Full Copyright Statement
 
    "Copyright (C) The Internet Society (2000). All Rights Reserved.
    This document and translations of it may be copied and furnished to
    others, and derivative works that comment on or otherwise explain
    it or assist in its implementation may be prepared, copied,
    published and distributed, in whole or in part, without restriction
    of any kind, provided that the above copyright notice and this
    paragraph are included on all such copies and derivative works.
    However, this document itself may not be modified in any way, such
    as by removing the copyright notice or references to the Internet
    Society or other Internet organizations, except as needed for the
 
 
 INTERNET DRAFT      draft-ietf-rmt-bb-track-00.txt                  25
 
 
 INTERNET DRAFT      draft-ietf-rmt-bb-track-00.txt       November 2000
 
 
    purpose of developing Internet standards in which case the
    procedures for copyrights defined in the Internet Standards process
    must be followed, or as required to translate it into languages
    other than English.
 
    The limited permissions granted above are perpetual and will not be
    revoked by the Internet Society or its successors or assigns.
 
    This document and the information contained herein is provided on
    an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET
    ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR
    IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
    THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
    WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE."
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 INTERNET DRAFT      draft-ietf-rmt-bb-track-00.txt                  26