RMT Working Group                                  Dah Ming Chiu, CUHK
 Internet Engineering Task Force                     Seok Joo Koh, ETRI
 Category: Informational              Miriam Kadansky, Sun Microsystems
 December 2003                                Brian Whetten, Consultant
 Expires June 2004                                Gursel Taskale, Tibco


               Tree Auto-Configuration (TREE) Building Block
                      for Reliable Multicast Transport

                  <draft-sjkoh-rmt-bb-tree-config-03.txt>



 Status of this Memo

    This document is an Internet-Draft and is in full conformance with
    all provisions of Section 10 of RFC 2026.

    Internet-Drafts are valid for a maximum of six months and may be
    updated, replaced, or obsoleted by other documents at any time.  It
    is inappropriate to use Internet-Drafts as reference material or to
    cite them other than as a "work in progress".

    The list of current Internet-Drafts can be accessed at
    http://www.ietf.org/ietf/1id-abstracts.txt
    The list of Internet-Draft Shadow Directories can be accessed at
    http://www.ietf.org/shadow.html.


 Abstract

    This document defines the TREE building block (BB) for tree
    configuration within a reliable multicast transport (RMT) protocol.
    As an RMT building block, the TREE BB is a coarse-grained modular
    component that may be common to multiple RMT protocols.  The TREE
    BB provides automatic configuration of a logical ACK-tree for RMT
    protocols in the tree-based ACK family.  A companion document
    [RFCxxxx] describes the TRACK (Tree-based ACK) building block,
    which assumes that the logical ACK tree has been configured by the
    TREE BB described here.











 Chiu, Koh, Kadansky, Whetten, Taskale                         [Page 1]


                 RMT BB: Tree Auto-Configuration (TREE)   December 2003








                            Table of Contents

    1. Introduction..................................................3
    2. Terminology...................................................4
    3. TREE BB Rationale.............................................5
    4. Functionality of TREE BB......................................5
       4.1 Assumptions and Requirements..............................5
       4.2 Functional Overview.......................................6
    5. BB Details: Session Announcement..............................9
    6. BB Details: Repair Head Discovery and Selection...............9
       6.1 Repair Head Discovery Algorithms..........................9
       6.2 Distance Measurement.....................................16
       6.3 Repair Head Selection....................................18
    7. BB Details: Tree Formation...................................19
    8. BB Details: Tree Maintenance.................................21
       8.1 Monitoring Parents and Children..........................21
       8.2 Optimizing the Tree......................................21
    9. External Interfaces..........................................22
       9.1 Interfaces to this BB....................................22
       9.2 Interfaces from this BB to the PI or a higher-level BB...23
    10. Applicability Statement.....................................24
    11. Messages for TREE BB........................................25
    12. Requirements from other Building Blocks.....................26
    13. Security Considerations.....................................26
    14. References..................................................27
    Acknowledgments.................................................27
    Author's Addresses..............................................28


















 Chiu, Koh, Kadansky, Whetten, Taskale                         [Page 2]


                 RMT BB: Tree Auto-Configuration (TREE)   December 2003

 1. Introduction

    The Reliable Multicast Transport (RMT) working group was chartered
    to standardize IP multicast transport services [RFC2887].  Rather
    than create a set of monolithic protocol specifications, the RMT WG
    has chosen to break the reliable multicast protocols into Building
    Blocks (BB) and Protocol Instantiations (PI).  A Building Block is
    a specification of the algorithms of a single component, with an
    abstract interface to other BBs and PIs.  A PI combines a set of
    BBs, adds in the additional required functionality not specified in
    any BB, and specifies the specific instantiation of the protocol.

    This document describes the Tree auto-configuration building block,
    or TREE BB.  The function of the TREE BB is to construct a session
    tree, comprised of a single sender, repair heads, and receivers,
    for use by a tree-based ACK reliable delivery protocol, for example
    the Tree-Based ACK (TRACK) BB [RFCxxxx].  The design goals of the
    TREE BB are motivated by the TRACK BB, but trees it constructs
    could be useful for other functions.

    The process of session tree construction for IP multicast is
    difficult. The best session trees match the underlying multicast
    routing tree topology; however, the multicast service model does
    not provide explicit support for discovering routing tree topology.
    There are several viable solutions for constructing a tree,
    depending on network conditions.  The optimality of a tree may
    depend on a variety of factors, such as the need for load balancing
    and the need to minimize the tree depth used for collecting
    feedback information.

    The TREE building block specifies a distributed procedure for
    automatically constructing a tree that is loop-free and as
    efficient as possible, given the information available.  It
    provides a unified solution for tree construction in the presence
    of different multicast service models and routing protocols. In
    particular, it specifies a single procedure which may be used with
    various techniques for repair head discovery and distance
    measurements, several of which are specified within this document.
    The difference in these techniques primarily affects the optimality
    of the tree.












 Chiu, Koh, Kadansky, Whetten, Taskale                         [Page 3]


                 RMT BB: Tree Auto-Configuration (TREE)   December 2003

 2. Terminology

    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
    "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in
    this document are to be interpreted as described in RFC 2119.

    In addition, the following terms are applied in this document as
    well as the TRACK BB document [RFCxxxx].

    Session

       A session is used to distribute data over a multicast address. A
       Session Tree is used to provide reliability and feedback
       services for a session.

    Session Identifier

       A fixed-size number, chosen either by the application that
       creates the session or by the transport.  Senders and receivers
       use the session Identifier to distinguish sessions.

    Repair Head (RH)

       A node within the tree which receives and retransmits data, and
       aggregates and forwards control information toward the sender.
       The sender operates as the root repair head in any session tree.
       An RH that has accepted children for a session is called a
       parent.

    Session Tree (ST)

       The session tree is a tree spanning all receivers of a multicast
       session.  It is rooted at the sender, consisting of zero of more
       repair heads as interior nodes, and zero or more receivers as
       leaf nodes.

    Parent

       A parent is an RH or receiver's predecessor in the ST on the
       path toward the sender.  Every RH or receiver on the tree except
       the sender itself has a parent. Each parent communicates with
       its children using either an assigned multicast address or
       through unicast.

    Children

       The set of receivers and RHs for which an RH or the sender is
       providing repair and feedback services.




 Chiu, Koh, Kadansky, Whetten, Taskale                         [Page 4]


                 RMT BB: Tree Auto-Configuration (TREE)   December 2003

 3. TREE BB Rationale

    In order to accommodate various multicast deployments, this
    document divides the tree building process into the following major
    components:

      A. Several techniques for discovering neighboring repair heads:
         static, expanding ring search, and mesh. Discovering
         neighboring repair heads is a necessary condition for getting
         connected, so each node in the session must implement at least
         one of the above techniques.

      B. Several algorithms for selecting neighboring repair heads: the
         measurement and use of neighbor distance and sender distance
         are described.  These repair head selection algorithms help
         produce a good tree.

      C. A single distributed procedure for construction and
         maintenance of loop-free session Trees.


 4. Functionality of TREE BB

 4.1 Assumptions and Requirements

    This TREE BB is designed based on the recommendations for tree
    configuration (Section 4.5 of RFC 3048, as also shown below), along
    with RFC 2357, RFC 2887 and RFC 3269.

       It has been shown that the scalability of RM protocols can be
       greatly enhanced by the insertion of some kind of retransmission
       or feedback aggregation agents between the sender and receivers.
       These agents are then used to form a tree with the sender at the
       root, the receivers at the leaves of the tree, and the
       aggregation/local repair nodes in the middle.  The internal
       nodes can either be dedicated software for this task, or they
       may be receivers that are performing dual duty.

       The effectiveness of these agents to assist in the delivery of
       data is highly dependent upon how well the logical tree they use
       to communicate matches the underlying routing topology.  The
       purpose of this building block would be to construct and manage
       the logical tree connecting the agents.  Ideally, this building
       block would perform these functions in a manner that adapts to
       changes in session membership, routing topology, and network
       availability.

    This TREE BB document describes how to build trees under the
    following conditions:



 Chiu, Koh, Kadansky, Whetten, Taskale                         [Page 5]


                 RMT BB: Tree Auto-Configuration (TREE)   December 2003


       a. The multicast group has only a single sender.
       b. A single RH can serve multiple sessions.
       c. Sessions can take advantage of a pre-deployed infrastructure
          of RHs (ones that are not necessarily aware of a session
          before the receivers), or recruit receivers to be RHs.
       d. Generic Router Assist and Expanding Ring Search are not
          required of the network infrastructure, but if available they
          should be able to be utilized.

    For tree configuration, the following are specifically required:

       a. While the tree building process may take advantage of
          information from the routing layer, the mechanisms described
          are independent of the routing protocol(s) used by the
          underlying multicast tree.
       b. All trees constructed must be loop-free
       c. These mechanisms must support late joiners and tree
          optimization


 4.2 Functional Overview

    Session trees are spanning trees rooted at the sender. Session
    trees can be used for forwarding control information (i.e. ACKs)
    towards the root, or for forwarding data (i.e. repairs) towards the
    leaf nodes.

    Figure 1 illustrates overall procedures for tree configuration.























 Chiu, Koh, Kadansky, Whetten, Taskale                         [Page 6]


                 RMT BB: Tree Auto-Configuration (TREE)   December 2003


                   +--------------------+
                   |    1. Session      |
                   |   Advertisement    |
                   +--------------------+
                             |
                             | Node receives tree-building parameters
                             V
                   +--------------------+
                   |  2. Measurements   |<-------------------------|
                   |   to the sender    |                          |
                   |     (optional)     |                          |
                   +--------------------+                          |
                             |                                     |
                             |                                     |
                             V                                     |
                   +--------------------+                          |
                   |    3. Neighbor     |                          |
                   |      Discovery     |                          |
                   +--------------------+                          |
                             |                                     |
                             |                                     |
                             V                                     |
                   +--------------------+                          |
                   |  4. Repair Head    |                          |
                   |     Selection      |                          |
                   +--------------------+                          |
                             |                                     |
                             | Node picks best neighbor            |
                             V                                     |
                   +--------------------+                          |
                   |   5. Binding to    |---------------------------
                   |    Repair Head     |   6. Optimization (optional)
                   |                    |      and fault recovery
                   +--------------------+


           Figure 1. Generic procedures for tree configuration


    Session trees are constructed per sender; each node wishing to join
    the tree discovers its neighboring RHs and then selects its best
    parent based on locally available information, such as the relative
    sender distances and neighbor distances.  This document specifies
    several techniques for measuring distances.

    It is important to note that RHs may be actual receivers (e.g.
    receivers willing and able to also function as RHs) or pre-deployed
    "specialized" servers that are signaled to join the tree by



 Chiu, Koh, Kadansky, Whetten, Taskale                         [Page 7]


                 RMT BB: Tree Auto-Configuration (TREE)   December 2003

    receivers.  We use the term repair head to refer to either a
    receiver or server that is participating as part of the logical
    tree formation.

    Tree construction, regardless of RH discovery and selection
    algorithm, proceeds generically as follows.

 4.2.1 Session Announcement

    Receivers of a session use standard out-of-band mechanisms for
    discovery of a session's existence (e.g. session advertisement in
    RFC 2974).  In this way, a receiver discovers the multicast group
    address, the sender's address, and other information necessary for
    logical tree construction.

    Sessions may be announced in two parts, the first part containing
    generic information about the session, such as the multicast
    address, and the second part, announced on the multicast address,
    containing additional information.

 4.2.2 Measurements to the sender (optional)

    All RHs and receivers that know about the session optionally
    determine their distance to the sender.

 4.2.3 Neighbor Discovery

    Meanwhile, each receiver discovers nearby RHs (candidate parents)
    for the session using the neighbor discovery algorithm(s).

 4.2.4 Repair Head Selection

    Once a receiver (or RH needing to join the tree) discovers a nearby
    RH, it obtains the RH's distance to the sender as well as the RH's
    distance to the receiver and other suitability values, if available.
    After discovery is complete, the best RH is selected.

 4.2.5 Binding to Repair Head

    The receiver or RH then binds to the chosen RH.  If a bind is
    unsuccessful, the receiver or RH retries with another nearby RH, or
    starts the discovery process all over again.   Once an RH receives
    a bind from a child, that RH then joins the tree if it has not
    already, discovering an RH of its own, possibly using a different
    method than leaf receivers.







 Chiu, Koh, Kadansky, Whetten, Taskale                         [Page 8]


                 RMT BB: Tree Auto-Configuration (TREE)   December 2003

 4.2.6 Optimization (optional) and Fault Recovery

    During a session, a receiver or RH may change to a different RH for
    a number of reasons described below, including fault tolerance.
    The session Tree is maintained and optimized over time.


 5. BB Details: Session Announcement

    The first step in the tree-building process is for a node to be
    informed of the session's existence.  This can be done using some
    out-of-band method, such as Session Advertisement [RFC2974], URL,
    e-mail, etc.

    RHs do not necessarily receive these advertisements.  If an RH is
    not a receiver, it obtains the advertisement information once it is
    contacted by a receiver.

    The advertisement includes the multicast address being used, the
    sender's address, the session Identifier, any specific port numbers
    to be used, and any global information useful for tree construction.

    The advertisement may also contain information about one or more
    repair head Discovery options (such as Static, ERS, and Mesh) that
    can possibly be used by receivers in the session.


 6. BB Details: Repair Head Discovery and Selection

    Discovery is the process by which a node determines a suitable
    repair head.  During the discovery process, suitable neighbors are
    found, sender distances are optionally exchanged, and the best RH
    is selected.

 6.1 Repair Head Discovery Algorithms

    This draft describes three algorithms for discovering RHs: Static,
    Expanding Ring Search (ERS), and Mesh.

    Multiple algorithms may be used within a single session.  For
    example, RHs may use the Mesh algorithm, while the receivers use
    static configuration to discover the RHs; alternatively, some
    receivers may use static configuration while other receivers depend
    on ERS (in an intranet where ERS is available). Each receiver may
    pre-configure which algorithm to use before it starts.

    The transport protocols request the following information from this
    BB using the getRHs interface.




 Chiu, Koh, Kadansky, Whetten, Taskale                         [Page 9]


                 RMT BB: Tree Auto-Configuration (TREE)   December 2003

    Repair Heads:

       1. parentAddress:  the address and port of the parent node to
                          which the node should connect
       2. UDPListenPort:  the UDP port number on which the node will
                          listen for its children's control messages
       3. RepairAddr:     the multicast address, UDP port, and TTL on
                          which this node sends control messages to its
                          children.

    Receivers:

       1. parentAddress.

    Senders:

       1. UDPListenPort

       2. RepairAddr

    After the above information is obtained from tree-configuration,
    the transport protocol may perform the necessary Bind operations
    for participating in the session tree.

    Tree configuration may rely on a functional entity, named Tree
    Configurator (TC), which is pre-configured by sender for tree
    configuration. TC may be simply implemented by a program and thus
    installed at sender or any other host. It is not necessarily
    specialized infrastructure.


 6.1.1 Static

    In the static scheme, TC is used to govern the tree building based
    on its own (session-specific) tree configuration and RH(s)
    selection rules for the new joiners.

    If a TC is used for tree building, its address and port are
    included with the session advertisement. Receivers and RHs will
    realize there is a TC for the session via session Announcement, and
    they can contact with the TC to get a list of candidate RHs by
    sending a unicast Query message.

    In response to a Query message, TC replies with an advertisement
    message that contains a list of candidate RHs available to the new
    joiner. The rule of determining such candidate RHs may depend on
    the pre-configured mechanism taken by TC. For example, TC may
    determine the candidate RH list for a node among the possible RHs
    (it contains at that time) by considering which RHs are in the same



 Chiu, Koh, Kadansky, Whetten, Taskale                        [Page 10]


                 RMT BB: Tree Auto-Configuration (TREE)   December 2003

    network domain with the node (i.e., via comparing their network
    prefix), or by considering the load balancing for the tree topology
    it has configured until then. For this purpose, TC maintains a pool
    of active RHs for the session. The list of candidate RHs carried by
    Advertise message is ordered in decreasing levels of preference, in
    which a lower number represents a higher preference.

    When a receiver node receives the responding advertisement message
    from TC, the node MAY proceed to try to bind to a candidate RH
    following the given order by sending a BindRequestmessage, and then
    waits for the responding message such as BindConfirm or BindReject
    from TC. These binding steps will be done according to the TRACK
    mechanism, as described in the TRACK BB.

    In the static algorithm, RH discovery with a TC proceeds as
    follows:

        1. The node joins the multicast session and learns of the
           location of TC (from the session announcement).

        2. The node optionally discovers its distance from the sender.
           Any metric described in Section 4.2 may be used.

        3. The node sends a Query message to the TC for a parent. The
           request includes (optionally) the node's distance to the
           sender and whether the node functions as an RH or not.

        4. The TC chooses one or more candidate parents (RHs) for the
           node from the active RHs by its own tree configuration rule.
           The selection of candidate parents may be done by comparing
           the network prefix or by referring to any other information
           such as the number of currently attached children, etc.

        5. The TC responds to the Query message with an Advertise
           message, which include the candidate parent list. In the
           list, each entry contains the corresponding IP address and
           port of an RH.

           All the entries in the list need to be arranged in the
          decreasing order of preference levels.

          In the rejection case, the Advertise message does not include
          any candidate parent. In this case, the node may resort to
          the other mechanism such as ERS and Mesh.

          In the success case, the node will be enrolled as an active
          RH by the TC, if it functions as an RH in the session. Each
          active RH (functioning as a parent for the session Tree)
          sends the TC the periodic Query messages with a flag



 Chiu, Koh, Kadansky, Whetten, Taskale                        [Page 11]


                 RMT BB: Tree Auto-Configuration (TREE)   December 2003

          indicating that it is active over a specific time interval.
          Based on the Query messages, the TC updates a pool of active
          RHs in the session. In response to the Query message, the TC
          sends an Advertise with a flag simply indicating that the
          Query message is received.

        6. After receiving a successful Advertise message from the TC,
           the node will try to connect to its parent by sending
           BindRequest messages based on the candidate parent list.

    Depending on implementation, one or more TCs may be locally
    deployed, in which each receiver can locally access some configured
    list of the prospective RHs.


 6.1.2 ERS

    ERS is used only in the network environments where IP multicast
    transmissions are allowed by receivers and RHs.

    In ERS algorithm, the RH discovery with a TC proceeds as follows:

        1. The nodes first look for Neighbors using a multicast Query
           message. The initial TTL value in the Query message,
           TTLNeighborInit, is specified in the session announcement
           and may be as large as the session TTL (TTLSession).

           An RH that is able to handle additional children responds to
           a Query message by multicasting an Advertise message.

           If the RH is not yet a parent, the TTL used in this response
           is the same TTL used in the Query message. If the RH is a
           parent, the TTL used is the greater of the Query TTL and the
           parent's current Advertise TTL.

        2. The node listens for Advertise messages after sending the
           Query message. If one or more Advertise messages are
           received during a SolicitPeriod, the best RH among them is
           selected.

        3. If no Advertise messages are received, the node sends
           another multicast Query message with a TTL that is
           incremented by TTLIncrement.  The process of sending the
           multicast Query message with an increasing TTL value
           continues until a response is received.

           The TTL value is limited by a value, TTLMax, also specified
           in the session announcement.  TTLMax defaults to the value
           of TTLSession.



 Chiu, Koh, Kadansky, Whetten, Taskale                        [Page 12]


                 RMT BB: Tree Auto-Configuration (TREE)   December 2003


           If the TTL value required to reach the soliciting node is
           greater than the TTL used to reach the RH, an Advertise
           message may not reach the node. However, if future Query
           messages have increased TTL values, the TTL may eventually
           be large enough for the Advertise message to reach the node.
           However, it is possible that the node will not locate any
           RHs using Expanding Ring Search. It is advisable that a
           backup method, such as static, be available.

        4. RHs need to suppress sending Advertise messages in response
           to Query messages if one was sent with at least the Query's
           TTL within the last SolicitPeriod.

           After multicasting a Query message, a node waits for an
           interval, BetweenQuery, before sending another Query message.
           Nodes will suppress sending Query messages for BetweenQuery
           seconds when they first start in order to collect
           information from Advertise messages already solicited by
           other nodes.

        5. Getting a successful Advertise message via ERS, the node
           will try to connect to the parent by sending BindRequest
           messages.

    The following variables are used in the Expanding Ring Search
    algorithm.

    - TTLNeighborInit: This is the initial TTL value to be used by the
      ERS if no other TTL value is specified by the algorithm.

    - TTLIncrement: This is the periodic increment for the TTL used in
      ERS.

    - TTLMax: This is a configured maximum TTL value to be used by
      either Query or Advertise messages.

    - TTLSession: This is the session TTL value for the multicast
      session.

    - SolicitPeriod: Each receiver MUST not send more than one QUERY
      message per SolicitPeriod.  When a RH responds to QUERY messages,
      it also suppresses its ADVERTISE message if one has been sent
      less than SolicitPeriod ago.  This parameter is used to control
      the amount of control traffic during tree construction if ERS is
      used.

    - BetweenQuery: This is the time interval a node waits before
      sending successive Query messages.



 Chiu, Koh, Kadansky, Whetten, Taskale                        [Page 13]


                 RMT BB: Tree Auto-Configuration (TREE)   December 2003


    - MaxBindAttempts: This variable is an integer used to control how
      many times a receiver tries to bind to a RH before giving up and
      try another one.

    - BindPeriod: In order to prevent loops, sometimes a RH may reject
      a BindRequest (for example, when the RH is not on the tree yet
      and has a BindRequest outstanding itself) from a receiver.  In
      this case, if the receiver needs to retry binding to the same RH
      again (perhaps because the receiver does not discover any
      alternative RHs), then it waits for BindPeriod seconds.

 6.1.3 Mesh

    The mesh approach relies on a set of RHs already deployed as
    infrastructure servers.  These RHs are on-line, but are not
    necessarily aware of any particular session unless informed by the
    following mechanisms.

    RHs in the mesh are configured to know who their neighbor RHs are,
    and exchange reachability information with their neighbors in a way
    analogous to routers in a network.  The actually protocol used by
    RHs to exchange such reachability information is outside the scope
    of this BB.

    Instead, this BB specifies the following properties that the mesh
    of RHs MUST satisfy:

       a) Each RH knows a subset of RHs in the mesh as its immediate
          neighbors.

       b) Each RH has a "forwarding table", such that given any other
          (destination) RH in the mesh, the forwarding table gives a
          "next-hop" RH that can be used to reach the destination RH,
          plus the distance to the destination RH from the local RH.

       c) A given RH in the mesh can "broadcast" information to all
          other RHs in the mesh (in the sense of having a means of
          sending the same information to all other RHs, but not
          necessarily simultaneously).

       d) All potential sender and receivers of a multicast session
          can discover a "neighboring" RH in the mesh, using the
          neighbor discovery mechanisms described in Section 4.1.

    The reason for running a routing-like algorithm to maintain the
    forwarding tables in each RH is to provide fault tolerance when
    some RHs in the mesh fail.  When that happens, the remaining RHs
    exchange information with each other to update the forwarding



 Chiu, Koh, Kadansky, Whetten, Taskale                        [Page 14]


                 RMT BB: Tree Auto-Configuration (TREE)   December 2003

    tables. In the steady state, the mesh of RHs still needs to satisfy
    the above four properties.

    In the simplest form, each RH in the mesh has a forwarding table
    that contains all other RHs in the mesh.  This is called a fully-
    connected mesh.

    As mentioned earlier, the mesh scheme assumes that there is a pre-
    deployed infrastructure of RH servers. That is, the RH-RH bindings
    and sender-RH (Sender's RHs) will be performed internally by the
    provider's own policy. The only thing done by sender is to inform
    its RH's that a session starts. The relationships between sender
    and its RH's (i.e., sender-RH bindings) MUST also be pre-configured.

    For example, the internal bindings between sender and RHs MAY be
    done as follows:

       1. The sender locates a neighbor RH in the mesh by a pre-
         configured mechanism. This RH is referred to as the sender's
         RH.

      2. The sender sends the multicast session id, address and port
         (all these can be set as a abbreviated session announcement
         message) to the sender's RH.

      3. The sender's RH in turn "broadcasts" the session information
         to all RHs in the mesh; since RHs can support multiple
         sessions simultaneously, they keep the information about each
         session in an entry in a session table.

      4. After the sender-RH bindings, sender multicasts its session
         announcement to the multicast receivers. Then, the sender's RH
         binds to the sender by sending a BindRequest message.

    During the internal processing of sender-RH binding described up to
    now, no Query and Advertise messages are used.

    When a session starts, the bindings between RH and receivers will
    be done. The tree binding of RH-receiver is done with the Tree
    Configurator (TC), which was used in the Static algorithm. The main
    difference between Static algorithm with TC and Mesh algorithm with
    TC is that the active RHs are considered as candidate RHs in the
    Static scheme, while the pre-deployed RHs are considered as
    candidates in the Mesh scheme. That is, in the Mesh scheme, the TC
    is required to already get the information on the locations of the
    pre-deployed RHs.

    Then the tree buildings is done as follows:




 Chiu, Koh, Kadansky, Whetten, Taskale                        [Page 15]


                 RMT BB: Tree Auto-Configuration (TREE)   December 2003

       1. When a session starts, a receiver sends a Query message to
         the TC. The receiver may optionally discover its distance from
         the sender. Any metric described in Section 4.2 may be used.

       2. The TC chooses one or more candidate parents RHs for the
         receiver from the pre-deployed RHs by its own tree
         configuration rule, as described in Section 4.1.1.

       3. TC responds to the Query message with an Advertise message,
         which includes the candidate parent list. In the list, each
         entry contains the corresponding IP address and port of the
         parent.

       4. Receiving a successful Advertise message from the TC, the
         node will try to connect to its parent by sending BindRequest
         messages based on the candidate parent list.

    Once a receiver is connected to a parent RH, the RH-RH bindings
    will also be done internally by the pre-configured provider's
    policy. For example, each RH in the mesh tries to bind to its
    "next-hop" RH. If the "next-hop" RH is not reachable for some
    reason, an RH may also try to bind to any neighbor RH as a back-up
    alternative. These procedures will be devised to ensure that the
    loop freedom is guaranteed.


 6.2 Distance Measurement

    Different techniques can be used to determine distances between
    nodes in a session.  The distances of interest are:

       Sender distance - distance from a RH to sender

       Neighbor distance - distance from a receiver to a neighboring RH

    These distances can be used in selecting a repair head if several
    are discovered.

    These techniques quantify distance differently.  Each specific way
    of quantifying distance is called a metric.  Different Metrics are
    not necessarily comparable.  For example, if distance between A and
    B is X using metric m1, and distance between A and C is Y using
    metric m2, then X > Y does not necessarily imply B is farther from
    A than C.

    Only distances of the same metric will be compared and ranked.
    Therefore, a receiver will only rank two RHs based on their
    respective sender distance if those distances are based on the same
    metric.



 Chiu, Koh, Kadansky, Whetten, Taskale                        [Page 16]


                 RMT BB: Tree Auto-Configuration (TREE)   December 2003


    On the other hand, it is not necessary for all receivers to use the
    same metric to select their neighboring RH to connect to.  Suppose
    the receivers use neighbor distance as a selection criterion.  One
    receiver may determine neighbor distances to RHs based on hop count,
    whereas another receiver may determine neighbor distances to its
    neighboring RHs based on delay.

 6.2.1 TTL Hop-Count

    If this metric is used, a node periodically sends a Beacon message
    on the session's multicast address.  The Beacon message includes
    the original time-to-live value set by the node.  The distance to
    the node is then calculated as

         Beacon's original TTL  -  Beacon's current TTL

    One node is closer than another if its distance is a lower number.
    Note that the TTL value may not be available in some implementation
    environments.

 6.2.2 Number of Levels

    One metric a receiver can use for RH selection is the number of
    levels the RH is from the sender.   For example, given two RHs in
    close proximity to a receiver, if one RH is n levels from the
    sender and the other is m levels, where m<n, the receiver selects
    the RH with m levels.  This is because a shallower tree allows
    faster propagation of feedback information to the sender.  (Note,
    we assumed the choice is between two RHs equally close to the
    receiver.  The receiver to RH distance is another consideration).

    The number of levels metric is not generally available, as the tree
    may be constructed bottom up.  If the mesh approach is used,
    however, the distance in the RH's forwarding tables can be
    implemented as an estimate of the number of levels from the sender.

 6.2.3 Delay

    Another metric is the delay from an RH to the sender.  If the RH is
    directly connected to the sender, then the delay would simply be
    the time to send feedback from the RH to the sender.  If the RH is
    several levels down from the sender, then the delay would be the
    sum of the delays for each level (with some jitter time added in
    each level).  For example, given two RHs in close proximity to a
    receiver, the receiver selects the RH with a smaller delay to the
    sender.  This is again for the purpose of minimizing the feedback
    time.




 Chiu, Koh, Kadansky, Whetten, Taskale                        [Page 17]


                 RMT BB: Tree Auto-Configuration (TREE)   December 2003

    If the tree is built bottom up, this metric cannot be used.  If the
    mesh approach is used, this metric can be implemented, although it
    requires the RHs in the mesh to exchange distance information based
    on the delay metric.

 6.2.4 Address

    For IPv6 addresses, the distance can be approximately determined by
    the number of aggregation levels one address has in common with
    another.  For this metric, one node is closer than another if its
    address has more aggregation levels in common with the querying
    node's address.

 6.2.5 Static

    The node's distance to other nodes may be made available in some
    well-known location.  One node's is closer than another if its
    distance is a lower number.

 6.2.6 GRA

    The node's distance to the sender may be determined with help of a
    Generic Router Assist (GRA) message that lists the set of GRA
    routers on the path from the source.


 6.3 Repair Head Selection

    Once Neighbors have been discovered, a node selects the best one
    using whatever distance information is available.

    If there is no sender distance information to compare, the best RH
    is simply the one that is closest to the node, without a loop being
    formed by binding the node to the RH.

    If sender distances are available, there are two cases:

       Leaf nodes: For leaf nodes, the goal is to use the closest RH
                   possible.

       RHs: For RHs joining the tree, it is important to pick an RH
            that is closer to the sender; neighbor distance is a
            secondary factor.

    Once an RH has been selected, the node tries to bind to it. Loop
    prevention is done during the bind process using only Tree Level
    information.





 Chiu, Koh, Kadansky, Whetten, Taskale                        [Page 18]


                 RMT BB: Tree Auto-Configuration (TREE)   December 2003

    This algorithm is recommended because it assigns each node the
    closest RH, and does not require all nodes to measure their sender
    distance at the start of the session.  Depending on the selected
    metric, multiple nodes measuring sender distance could cause
    message implosion, and delay tree construction.  On the other hand,
    the RHs selected may actually be further from the sender than their
    children are.  However, it may be necessary to assign nodes to non-
    optimal RHs in order to get them on the tree, since it is possible
    that no RH closer to the sender can accept any more children.

    Alternatively, nodes may be required to measure sender distance
    before selecting an RH in order to ensure that each parent is
    closer to the sender than its children.  Presumably, this results
    in a tree in which parents detect message loss before their
    children, minimizing repair requests.


 7. BB Details: Tree Formation

    The following is a detailed description of the tree formation
    process. All tree construction follows this pattern. The ONLY
    differences between instantiations of this building block lie in
    how nodes discover and select neighbors.

    An architectural constant, tree level, is used to represent the
    tree height and its value is limited to 127. That is, a node with
    tree level of 128 means that it has not been connected to the tree
    yet.

    Once an RH has been selected, the node sends a BindRequest message
    to the RH. If the RH has an outstanding request to bind to another
    RH, it must refuse the incoming bind request in case it would form
    a loop.

    Otherwise, it MAY accept the node as a child as long as selecting
    it would not cause a loop in the tree.  Loop freedom is guaranteed
    by these two rules:

    1.  If the requesting node does not have children, the RH can
        accept it as a child as long as the RH has no outstanding
        bind requests.  If it does have an outstanding bind request,
        the RH can accept the node as a child if its address is less
        than the child's address.

    2.  If the requesting node has children, the RH can accept it as
        a child if

       a.  RH's level is 128, i.e., it is the top of a sub-tree not
           yet connected to the sender, or



 Chiu, Koh, Kadansky, Whetten, Taskale                        [Page 19]


                 RMT BB: Tree Auto-Configuration (TREE)   December 2003


       b.  RH's level is less than 128, i.e., it is connected to the
           sender.

    The second rule prevents a node from selecting one of its own
    children as its parent.  Two nodes at level 128 are prevented from
    selecting each other using tie-breaking criteria described step 1
    above.

    If the RH accepts the node as a child, it returns a BindConfirm
    message. If it does not accept the node, it sends a BindReject
    message. If the node does not receive a response after
    MaxBindAttempts tries every BindPeriod seconds, it MAY select the
    next best Neighbor from its cached list, or else run the repair
    head Discovery process again to determine an alternate RH to try.

    The BindReject message contains a reason code.  If the code
    indicates that the node was rejected because the RH was not yet on
    the tree, the node MAY choose to retry that RH after BindPeriod
    seconds, or select a different available RH.

    The BindReject message may also include a list of alternative RHs
    for the node to try.

    The BindConfirm message includes the parent's current Tree Level.
    The node sets its Tree Level to one more than the parent's level.

    The BindConfirm message also indicates the starting sequence number
    of the message from which data reliability is assured.  This
    information is included in the BindConfirm message to enable
    receivers to meet the PI's late join requirements.  If nodes join
    the tree after the sender has started to send data, it is possible
    that some of the data is no longer available within the tree. nodes
    may need to have specific information about repair availability
    before selecting a parent.

    Repair Heads MAY limit the number of children they support
    depending on their capacity.  Once an RH has accepted its maximum
    number of children, it stops accepting new children until a change
    in membership causes its count of children to go below this limit.

    If an RH limits the number of children it supports, it reserves at
    least one child slot for other RHs. This guarantees the growth of
    the repair tree.








 Chiu, Koh, Kadansky, Whetten, Taskale                        [Page 20]


                 RMT BB: Tree Auto-Configuration (TREE)   December 2003

 8. BB Details: Tree Maintenance

    Tree maintenance is an ongoing process active in every node.
    Because the tree is based on the operation of RHs, as well as the
    various underlying metrics that may change over time, it is
    important that these dependencies be monitored for changes.  Nodes
    monitor the parents for liveness and changes in tree level, and
    continue to run the Neighbor Discovery and Selection process in
    order to optimize their choice of RH. Parents also monitor children
    for liveness.

 8.1 Monitoring Parents and Children

    The upper Building Block or Protocol Instantiation is responsible
    for monitoring parents and children.  Monitoring messages from
    parents to children MUST contain the parent's current Tree Level.
    Children MUST set their Tree Level to one more than their parent's
    level.

    If a child loses contact with its parent for a period of time, it
    reports it using the lostRH interface, and attempt to bind with an
    alternate RH. A child that is leaving a session sends a unicast
    UnbindRequest message to its parent.  The parent responds with an
    UnbindConfirm message.

    A parent that is leaving the session sends an EjectRequest message
    to its children indicating that they need to bind with an alternate
    RH.  If possible the EjectRequest message is multicasted, but the
    EjectRequest message can also be sent via unicast to each child
    individually.  Upon receiving an EjectRequest message from its
    parent, a receiver sets its Tree Level to 128 again.  Using the
    heartbeat mechanism, the Tree Level for all receivers in the
    affected subtree will be updated (to a value higher than 128).

    If a parent does not hear from a child for a period of time, or it
    receives a UnbindRequest message from a child, it removes that
    child from its list of children, and reports the loss using the
    removeChild interface.

 8.2 Optimizing the Tree

    Implementations of this building block should continue to run the
    Neighbor Discovery and Selection process in order to optimize the
    choice of RH.  This continuous process also keeps the distance
    information for the current parent up-to-date.  Whenever the
    process returns a better RH than the current one, the node MAY bind
    to the new RH.  Once the new RH is bound to, the node sends a
    UnbindRequest message to the original parent.  A parent with no
    children MAY leave the session.



 Chiu, Koh, Kadansky, Whetten, Taskale                        [Page 21]


                 RMT BB: Tree Auto-Configuration (TREE)   December 2003

 9. External Interfaces

    This section describes external interfaces for the building block.


 9.1 Interfaces to this BB.

    These may be used by a PI or by a higher-level BB.

  (1) start(boolean RH, advertisement)

    start notifies the BB to begin operation.  If the RH parameter is
    set to TRUE, the BB also starts RH operation.

  (2) end()

    end notifies the BB to end operation.

  (3) incomingMessage(message)

    This interface is used to pass an incoming message down from the PI.

  (4) getStatistics

    getStatistics returns current BB statistics to the upper BB or PI.

  (5) getRHs

    getRHs instructs the BB to start the process of finding RH
    candidates for this node.  getRHs may return immediately with a
    list of candidate RH, or may use the RHlist interface to return the
    list at a later time.

  (6) setRH(RH)

    setRH informs the BB that the PI or upper BB has successfully bound
    to an RH.

  (7) acceptChild(node)

    acceptChild asks the BB to accept or reject the node as a member.
    The BB returns a boolean value in response.

  (8) removeChild(node)

    removeChild is called to inform the BB that the child is no longer
    a member.





 Chiu, Koh, Kadansky, Whetten, Taskale                        [Page 22]


                 RMT BB: Tree Auto-Configuration (TREE)   December 2003

  (9) treeLevelUpdate(newLevel)

    This interface is used to pass down any update to the node's tree
    level that the upper BB or the PI has learned.  newLevel replaces
    the BB's current tree level.

  (10) lostRH

    lostRH notifies the BB that the connection to the RH was lost.

  (11) setOptimization(boolean)

    setOptimization tells the BB to start or stop the tree optimization
    process. The upper BB or PI may want to control when tree
    optimization takes place.

  (12) recordRHs(destination)

    recordRHs tells the BB to record the current RH, plus the list of
    alternates, to the indicated destination.


 9.2 Interfaces from this BB to the PI or a higher-level BB.

  (1) outgoingMessage(message)

    outgoingMessage instructs the PI to send the message.

  (2) RHlist(list)

    RHlist returns to the upper PI or BB a list of RH candidates.  The
    list may contain additional information for each candidate, such as
    details about the data packets that it has available for repair.



















 Chiu, Koh, Kadansky, Whetten, Taskale                        [Page 23]


                 RMT BB: Tree Auto-Configuration (TREE)   December 2003

 10. Applicability Statement

    The authors recognize that automatic tree construction is a very
    difficult problem.  Nonetheless, complete reliance on manual
    configuration is very user-unfriendly and error prone as well.

    This building block describes a procedure for constructing loop-
    free trees when there is minimal manual configured information
    available.

    This is analogous to providing a system with default configurations
    that allow the system to work correctly, but not necessarily
    optimally.

    There are many possible criteria for tree optimality.  This BB does
    not attempt to define a single optimality criterion, nor does it
    try to produce an optimal tree.  It is, however, the goal of the BB
    to construct better trees as more configuration and measurement
    data are introduced to the procedure.

    This BB describes only a subset of the possible parameters for
    constructing optimal trees, in particular sender distance and
    neighbor distance.  There are many techniques for measuring these
    distances.  Some of the techniques may not be applicable globally.

    Expanding ring search (ERS) is an effective technique in a local
    subnet or intranet (especially when the IP multicast routing
    protocol is dense-mode based).  On the other hand, it is not
    practical in a multi-domain network; it is not effective when the
    routing protocol is sparse-mode based; and it can add significant
    control traffic overhead.

    Generic Router Assist (GRA) can provide measurement hooks to
    determine RHs that are located along the path for multicast data
    distribution. However, such facilities may not be available in all
    networks.

    The tree construction procedure does allow manual configuration and
    various distance measurement techniques to be selectively and
    independently applied for different subgroups of receivers and RHs,
    to achieve incremental improvement to the quality of the tree.

    There are many other criteria for tree-building than what is
    described in this document, for instance, methods based on load
    balancing and minimizing feedback latency.







 Chiu, Koh, Kadansky, Whetten, Taskale                        [Page 24]


                 RMT BB: Tree Auto-Configuration (TREE)   December 2003

 11. Messages for TREE BB

    These messages are required for implementations of this BB.  The
    list below indicates which message contents are required by
    implementations.  Implementations may also include other protocol-
    specific information in these messages.  Note that these messages
    are parts of packets specified in PI's that use this BB.


            Table 1. Messages for Tree Configuration

 +----------------+-----------------------------+---------------------+
 | Message Name   |       Description           |                     |
 | m- or u-cast   |                             |     Contents        |
 +----------------+-----------------------------+---------------------+
 |   Query        | A message used to discover  | sender distance     |
 |    both        | neighbors. The unicast      | (optional),         |
 |                | is sent to TC; the multicast| TTL (m-cast only)   |
 |                | is used in ERS.             |                     |
 +----------------+-----------------------------+---------------------+
 |   Advertise    | A message used to advertise | IP address, port,   |
 |    both        | an RH. The unicast is sent  | sender distance     |
 |                | from the TC; the multicast  | (optional)          |
 |                | is sent by RHs themselves   |                     |
 +----------------+-----------------------------+---------------------+
 |  BindRequest   | Request to RH to join tree  | Current Tree Level  |
 |   unicast      |                             |                     |
 +----------------+-----------------------------+---------------------+
 |  BindConfirm   | RH accepts BindRequest      | Current Tree Level  |
 |   unicast      |                             |                     |
 +----------------+-----------------------------+---------------------+
 |  BindReject    | RH rejects BindRequest      | Reject reason,      |
 |   unicast      |                             | alternate RH list   |
 +----------------+-----------------------------+---------------------+
 | UnbindRequest  | child leaving parent(u-cast)| Unbind reason       |
 |    unicast     | parent leaving tree (m-cast)|                     |
 +----------------+-----------------------------+---------------------+
 | UnbindConfirm  |  Acknowledgement of         |                     |
 |   unicast      |  UnbindRequest message      |                     |
 +----------------+-----------------------------+---------------------+
 |  EjectRequest  | parent refusing or leaving  | Eject reason        |
 |    both        | service to children         |                     |
 +----------------+-----------------------------+---------------------+
 |  EjectConfirm  |  Acknowledgement of         |                     |
 |   unicast      |  EjectRequest message       |                     |
 +----------------+-----------------------------+---------------------+






 Chiu, Koh, Kadansky, Whetten, Taskale                        [Page 25]


                 RMT BB: Tree Auto-Configuration (TREE)   December 2003

 12. Requirements from other Building Blocks

    This TREE BB can be interfaced to any other BB or PI wishing to use
    a tree structure.  To actually use this BB's features, the PI needs
    to include the messages described in this BB in its packets. An
    example of how to use this Tree Configuration BB can be found in
    the TRACK BB [RFCxxxx].


 13. Security Considerations

    Basically, this document is for informational and security issues
    are not applied. The following considerations are given just for
    information:

       a. The primary security requirement for a TRACK protocol is
          protection of the transport infrastructure.  This is
          accomplished through the use of lightweight group
          authentication of the control and, optionally, the data
          messages sent to the group.  These algorithms use IPsec and
          shared symmetric keys.

       b. For TRACK, it is recommended that there be one shared key for
          the Data session and one for each Local Control Channel.
          These keys are distributed through a separate key manager
          component, which may be either centralized or distributed.
          Each member of the group is responsible for contacting the
          key manager, establishing a pair-wise security association
          with the key manager, and obtaining the appropriate keys.

       c. The exact algorithms for this BB are presently the subject of
          research within the IRTF Secure Multicast Group (SMuG) and/or
          standardization within the IETF Multicast Security (MSEC)
          working group.


















 Chiu, Koh, Kadansky, Whetten, Taskale                        [Page 26]


                 RMT BB: Tree Auto-Configuration (TREE)   December 2003

 14. References

 Normative:

 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
           Requirement Levels," BCP 14, RFC 2119, March 1997

 [RFC3048] Whetten, B., Vicisano, L., Kermode, R., Handley, M., Floyd,
           S. and M. Luby, "Reliable Multicast Transport Building
           Blocks for One-to-Many Bulk-Data Transfer," RFC 3048,
           January 2001.

 [RFCxxxx] Whetten, B., Chiu, D., Kadansky, M., Koh, S. and G. Taskale,
           "Tree-based ACK (TRACK) Building Block for Reliable
           Multicast Transport", RFC xxxx, 2004.

 Informative:

 [RFC3269] Kermode, R., Vicisano, L., "Author Guidelines for Reliable
           Multicast Transport (RMT) Building Blocks and Protocol
           Instantiation documents," RFC 3269, April 2002.

 [RFC2887] Handley, M., Whetten, B., Kermode, R., Floyd, S., Vicisano,
           L., and Luby, M., "The Reliable Multicast Design Space for
           Bulk Data Transfer," RFC 2887, August 2000.

 [RFC2357] Mankin, A., Romanow, A., Bradner, S. and V. Paxson, "IETF
           Criteria for Evaluating Reliable Multicast Transport and
           Application Protocols," RFC 2357, June 1998.

 [RFC2974] Handley, M., Perkins, C., Whelan, E., "Session Announcement
           Protocol," RFC 2974, October 2000.


 Acknowledgments

   The authors would like to give special thanks to Brian Neil Levine,
   Dave Tahler, Joe Wesley and Juyoung Park for their valuable comments.














 Chiu, Koh, Kadansky, Whetten, Taskale                        [Page 27]


               RMT BB: Tree Auto-Configuration (TREE)   December 2003


 Author's Addresses

      Dah Ming Chiu
      dmchiu@ie.cuhk.edu.hk
      Information Engineering Department,
      The Chinese University of Hong Kong Shatin, N.T. Hong Kong

      Seok Joo Koh
      sjkoh@etri.re.kr
      Protocol Engineering Center,
      ETRI, 161 Kajung-Dong Yusong-Gu, TAEJON, 305-350, KOREA

      Miriam Kadansky
      miriam.kadansky@sun.com
      Sun Microsystems Laboratories
      1 Network Drive Burlington, MA 01803

      Brian Whetten
      brian@whetten.net
      2430 20th Street #D, Santa Monica, CA  90405

      Gursel Taskale
      gursel@tibco.com
      TIBCO
      3303 Hillview Ave. Palo Alto, CA. 94304-1213


























hiu, Koh, Kadansky, Whetten, Taskale                        [Page 28]


               RMT BB: Tree Auto-Configuration (TREE)   December 2003

 Full Copyright Statement

  Copyright (C) The Internet Society (2003).  All Rights Reserved.

  This document and translations of it may be copied and furnished to
  others, and derivative works that comment on or otherwise explain it
  or assist in its implementation may be prepared, copied, published
  and distributed, in whole or in part, without restriction of any
  kind, provided that the above copyright notice and this paragraph are
  included on all such copies and derivative works. However, this
  document itself may not be modified in any way, such as by removing
  the copyright notice or references to the Internet Society or other
  Internet organizations, except as needed for the purpose of
  developing Internet standards in which case the procedures for
  copyrights defined in the Internet languages other than English.

  The limited permissions granted above are perpetual and will not be
  revoked by the Internet Society or its successors or assigns.

  This document and the information contained herein is provided on an
  "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
  TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
  BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
  HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
  MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE."



























hiu, Koh, Kadansky, Whetten, Taskale                        [Page 29]