RMT Working Group                     Miriam Kadansky/Sun Microsystems
Internet Engineering Task Force                Brian Neil Levine/UMass
INTERNET-DRAFT                          Dah Ming Chiu/Sun Microsystems
draft-ietf-rmt-bb-tree-config-01.txt            Brian Whetten/Talarian
22 November, 2000                              Gursel Taskale/Talarian
Expires 22 May 2001                             Brad Cain/Mirror Image
                                                 Dave Thaler/Microsoft
                                               Seok Joo Koh/ETRI/Korea
  Reliable Multicast Transport Building Block: Tree Auto-Configuration

                 <draft-ietf-rmt-bb-tree-config-01.txt>


Status of this Memo

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC2026.

   Internet-Drafts are valid for a maximum of six months and may be
   updated, replaced, or obsoleted by other documents at any time.  It
   is inappropriate to use Internet-Drafts as reference material or to
   cite them other than as a "work in progress".

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

Abstract

   The Reliable Multicast Transport Working Group has been chartered to
   standardize multicast transport services. This working group expects
   to initially standardize three protocol instantiations. This draft is
   concerned with the requirements of the tree-based ACK protocol. In
   particular, it is concerned with defining the building block for
   auto-configuration of the logical ACK-tree. According to the charter,
   a building block is "a coarse-grained modular component that is
   common to multiple protocols along with abstract APIs that define a
   building block's access methods and their arguments."  For more
   information, see the Reliable Multicast Transport Building Blocks and
   Reliable Multicast Design Space documents [WLKHFL00][HWKFV00].





















draft-ietf-rmt-bb-tree-config-01                                [Page 1]


INTERNET DRAFT    draft-ietf-rmt-bb-tree-config-01.txt  20 November 2000


Table of Contents

         1. Introduction
            1.1 Terminology
            1.2 Assumptions
            1.3 Requirements
            1.4 Applicability Statement
         2. Overview
         3. Distance Metrics
            3.1 Multicast Beacon
            3.2 Mesh
            3.3 Address
            3.4 Static
            3.5 GRA
         4. Session Announcement
         5. Service Node Discovery
            5.1 Service Node Discovery Algorithms
                5.1.1 Static
                5.1.2 ERS
                5.1.3 POC
                5.1.4 Mesh
            5.2 Service Node Selection
         6. Tree Formation
         7.  Tree Maintenance
            7.1 Monitoring Parents and Children
            7.2 Optimizing the Tree
         8. Messages
         9. External Interfaces
            9.1 Interfaces to the BB
                9.1.1 findSNs
                9.1.2 messageFromSN
                9.1.3 lostSN
                9.1.4 activateSN
                9.1.5 setOptimization
                9.1.6 setSN
                9.1.7 acceptMember
            9.2 Interfaces from the BB
                9.2.1 SNlist
                9.2.2 SNactive
         10. References
         Authors' Addresses






















draft-ietf-rmt-bb-tree-config-01                                [Page 2]


INTERNET DRAFT    draft-ietf-rmt-bb-tree-config-01.txt  20 November 2000


1. Introduction

   The Reliable Multicast Transport (RMT) working group has been
   chartered to standardize IP multicast transport services. This draft
   is concerned with the requirements of the tree-based ACK
   protocol[WCP00]. In particular, this draft defines a building block
   for auto-configuration of a logical ACK-tree comprised of a single
   Sender, Service Nodes, and Receivers into a tree (called a Session
   Tree in this document). The design goals of this draft are motivated
   by the needs of TRACK-based protocols, however the trees it
   constructs are useful for other services.

   The process of session tree construction is difficult for IP
   multicast. The best session trees match the underlying (i.e.
   forwarding tree) multicast routing tree topology[LPG98], however the
   multicast service model[DEE89] does not provide explicit support for
   discovering routing tree topology. Furthermore, deployed multicast
   architectures can vary; for example, hosts may be restricted to
   multicast reception and not transmission with Source-Specific
   routing[HC00]; and routers may provide special extended routing
   services with Generic Router Assist [CST00].  The RMT charter does
   not restrict the use of any particular network service in
   constructing the tree, it only suggests preferred scenarios.
   Accordingly, there are several viable solutions for constructing a
   tree.

   The optimality of a tree may also depend on other factors, such as
   the need for load balancing, and the need to minimize the depth when
   used for collecting feedback information. The goal of this building
   block is to specify a distributed procedure for automatically
   constructing a tree that is loop-free, but only as good as possible
   depending on the information available.

   This draft describes a unified solution for tree construction in the
   presence of different multicast service models and routing protocols.
   In particular, it specifies a single procedure which may be used with
   various techniques for service node discovery and distance
   measurements, several of which are specified within this document.
   The difference in these techniques primarily affects the optimality
   of the tree.  The unified algorithm ensures that different
   implementations can interoperate and construct a loop-free tree.

   In order to accommodate various multicast deployments, this document
   divides the tree building process into the following major
   components:
      1. Several techniques for discovering neighboring service nodes.
         In particular: Static, Expanding Ring Search, Point-of-Contact, and
         Mesh.  Discovering neighboring service nodes is a necessary
         condition for getting connected, so each node in the session
         must implement at least one of the above techniques.

      2. Several algorithms for selecting neighboring service nodes.
         In particular, the measurement and use of distance-to-neighbor
         and distance-to-sender are described.  These service node
         selection algorithms help produce a good tree.

      3. A single distributed procedure for construction and maintenance
         of loop-free Session Trees.

   This draft is required to conform to the guidelines specified in



draft-ietf-rmt-bb-tree-config-01                                [Page 3]


INTERNET DRAFT    draft-ietf-rmt-bb-tree-config-01.txt  20 November 2000


   draft-ietf-rmt-author-guidelines-00, Author Guidelines for RMT
   Building Blocks and Protocol Instantiation Documents [KV00].  A
   future version of this draft will more explicitly describe this
   conformance.

1.1 Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119.

Session

   A session is used to distribute data over a multicast address.  A
   tree is used to provide reliability service for a session.

Sender

   The single sender of data on a multicast session. The Sender is the
   root of the Session Tree.

Receiver

   A receiver receives data from the sender via the multicast session.

Session Identifier

   A fixed-size number, chosen either by the application that creates
   the session or by the transport.  Senders and Receivers use the
   Session Identifier to distinguish sessions.  The size of this number
   is specified by the PI.

Service Node (SN)

   A node within the tree which receives and retransmits data, and
   aggregates and forwards control information toward the Sender.  The
   Sender operates as the root Service Node in any session tree.
   Service Nodes MAY be dedicated servers within a network designed to
   participate in multiple Sessions and support multiple trees, or they
   MAY be Receivers participating in an individual session. SNs MAY
   limit the number of Children they choose to service, and MAY also
   make other restrictions on the characteristics of each Child
   (distance, location, etc.).  An SN that has accepted Children for a
   session is called a Parent.

Session Tree (ST)

   The Session Tree is a spanning tree per Session, rooted at the Sender
   and consisting of a number of Service Nodes.  An ST is constructed
   for the forwarding of control information back to the Sender as well
   as for the resending of missed data to the Receivers. The ST for a
   particular session may change over the course of the session.

Parent

   A Parent is an SN or Receiver's predecessor in the ST on the path
   toward the Sender.  Every SN or Receiver on the tree except the
   Sender itself has a parent.  Each Parent communicates with its
   children using either an assigned multicast address or through
   unicast.  If a multicast address is used, this may be the same



draft-ietf-rmt-bb-tree-config-01                                [Page 4]


INTERNET DRAFT    draft-ietf-rmt-bb-tree-config-01.txt  20 November 2000


   address used by the session, or one specifically assigned to the
   Parent.

Children

   The set of Receivers and SNs for which an SN is providing repair and
   feedback services.

Tree Level

   A number indicating the number of "generations" away from the root
   The sender is at TL=0.  Those that use the sender as service node are
   at TL=1 and so on.  When a receiver is not connected to the tree yet,
   it has a tree level value greater or equal to 128.  The reason for
   reserving part of the space (of tree levels) for indicating "off-
   tree" is so that special measures can be used to prevent forming
   loops.  The largest value is 256, so the range of off-tree levels are
   in the range 128 - 256.  Initially, all receivers have a TL value of
   128.  Once a Node joins the tree, its Tree Level is updated to be one
   more than its Parent's level.

Distance Metric

   There are several techniques to quantify distances between nodes
   (Receivers, SNs, and the Sender) in a session.  Each type of
   quantification is called a distance metric.  Several Distance Metrics
   are described in this draft.

Sender Distance

   The distance from a node to the Sender.

Neighbor Distance

   The distance from a node to a Neighbor.

Neighbor

   A node's (Receiver or SN) Neighbors are SNs that are close to the
   node, according to the Distance Metric(s) used by the node.


1.2 Assumptions

   This document describes how to build trees under the following
   conditions:

     a. The multicast group has only a single sender.
     b. A single SN can serve multiple sessions.
     c. Sessions can take advantage of a pre-deployed infrastructure of
        SNs (ones that are not necessarily aware of a session before
        the receivers), or recruit Receivers to be SNs.
     d. Generic Router Assist[CST00] and Expanding Ring Search[YGS95]
        are not required of the network infrastructure, but if available
        they should be able to be utilized.

1.3 Requirements

   The following are specifically required:




draft-ietf-rmt-bb-tree-config-01                                [Page 5]


INTERNET DRAFT    draft-ietf-rmt-bb-tree-config-01.txt  20 November 2000


     a. While tree-building may take advantage of information from the
        routing layer, the mechanisms described are independent of the
        routing protocol(s) used by the underlying multicast tree.
     b. All trees constructed must be loop-free
     c. These mechanisms must support late joiners and tree optimization

1.4 Applicability Statement

   The authors recognize that automatic tree construction is a very
   difficult problem.  Nonetheless, complete reliance on manual
   configuration is very user unfriendly and error prone as well.

   This building block describes a procedure for constructing loop-free
   trees when there is minimal manual configured information available.
   This is analogous to providing a system with default configurations
   that allow the system to work correctly, but not necessarily
   optimally.

   There are many possible criteria for tree optimality.  This BB does
   not attempt to define a single optimality criterion, nor does it
   tries to produce an optimal tree.  It is, however, the goal of the BB
   to construct better trees as more configuration and measurement data
   are introduced to the procedure.

   This BB describes only a subset of the relevant parameters for
   constructing a better tree, in particular sender distance and
   neighbor distance.  There are many techniques for measuring these
   distances.  Some of the techniques may not be applicable globally.

   Expanding ring search (ERS) is an effective technique in a local
   subnet or intranet (specially when the IP multicast routing protocol
   is dense-mode based).  On the other hand, it is not practical in a
   multi-domain network; it is not effective when the routing protocol
   is sparse-mode based; and it can add a significant control traffic
   overhead.

   Generic Router Assist (GRA) can provide measurement hooks to
   determine SNs that are located along the path for multicast data
   distribution. But such facilities many not be available in all
   networks.

   The tree construction procedure does allow manual configuration and
   various distance measurement techniques to be selectively and
   independently applied for different subgroup of receivers and SNs, to
   achieve incremental improvement to the quality of the tree.


2. Overview

   The tree building process described within this document builds
   logical trees which consist of:

           1. A root node which is the Sender
           2. Intermediate nodes (Service Nodes or SNs) which may be either
              Receivers or nodes specifically allocated to the task of repair
              and aggregation
           3. Leaf nodes which are Receivers only

   ACK trees are spanning trees rooted at the Sender. ACK trees can be
   used for the forwarding of control information (i.e. ACKs) towards



draft-ietf-rmt-bb-tree-config-01                                [Page 6]


INTERNET DRAFT    draft-ietf-rmt-bb-tree-config-01.txt  20 November 2000


   the root, or for forwarding of data (i.e. repairs) towards the leafs.

   ACK trees are constructed per Sender; each node wishing to join the
   tree discovers its neighboring SNs and then selects its best parent
   based on locally available information, such as the relative sender
   distances and neighbor distances.  This document specifies several
   techniques for measuring distances.

   It is important to note that SNs may be actual Receivers (e.g.
   Receivers willing and able to also function as SNs) or pre-deployed
   "specialized" servers that are prodded into service by Receivers.  We
   use the term Service Node to refer to either a Receiver or "server"
   which is participating as part of the logical tree formation.

   Tree construction, regardless of SN discovery and selection
   algorithm, proceeds generically as follows.

      1. Session Announcement
         Receivers of a session use standard out-of-band mechanisms for
         discovery of a session's existence (e.g. Session Advertisement
         [HPW00], URL, etc).  In this way, a Receiver discovers the multicast
         group address, the Sender's address, and other information necessary
         for logical tree construction.  Sessions may be announced in two
         parts, the first part containing generic information about the
         session, such as the multicast address, and the second part,
         announced on the multicast address, containing additional information.

      2. Measurements (optional)
         All SNs and Receivers that know about the session optionally determine
         their distance to the Sender.

      3. Neighbor Discovery
         Meanwhile, each Receiver discovers nearby SNs for the Session using
         the neighbor discovery algorithm(s).

      4. Service Node Selection
         Once a Receiver (or SN needing to join the tree) discovers a nearby SN,
         it obtains the SN's distance to the Sender as well as its distance to
         the Receiver, tree level, and other suitability values, if available.
         After discovery is complete, the best SN is selected.
         (Note, in the case when the POC algorithm is used for neighbor
         discovery, neighbor discovery and selection may become a combined
         step, performed at the POC).

      5. Binding to Service Node
         The Receiver or SN then binds to the chosen SN.  If a bind is
         unsuccessful, the Receiver or SN retries with another nearby SN,
         or starts the discovery process all over again.   Once an SN receives
         a bind from a child, that SN must then also join the tree if it has
         not already, discovering an SN of its own, possibly using a different
         method than leaf Receivers.

      6. Optimization and Fault Recovery (optional)
         During a session, a Receiver or SN may change to a different SN for a
         number of reasons described below, including fault tolerance.  The Session
         Tree is maintained and optimized over time.







draft-ietf-rmt-bb-tree-config-01                                [Page 7]


INTERNET DRAFT    draft-ietf-rmt-bb-tree-config-01.txt  20 November 2000


   The building block provides mechanisms for maintaining and optimizing
   the tree, as well as tearing it down when the Sender is done with it.
   In the rest of this  document, the term 'Node' denotes a Receiver or
   SN.

                  +--------------------+
                  |                    |
                  |    1. Session      |
                  |   Advertisement    |
                  |                    |
                  +--------------------+
                            |
                            | Node receives tree-building parameters
                            |
                            V
                  +--------------------+
                  |                    |
                  |  2. Measurements   |<-------------------------|
                  |                    |                          |
                  |     (optional)     |                          |
                  +--------------------+                          |
                            |                                     |
                            |                                     |
                            |                                     |
                            V                                     |
                  +--------------------+                          |
                  |                    |                          |
                  |    3. Neighbor     |                          |
                  |      Discovery     |                          |
                  |                    |                          |
                  +--------------------+                          |
                            |                                     |
                            |                                     |
                            |                                     |
                            V                                     |
                  +--------------------+                          |
                  |                    |                          |
                  |  4. Service Node   |                          |
                  |     Selection      |                          |
                  |                    |                          |
                  +--------------------+                          |
                            |                                     |
                            |                                     |
                            | Node picks best Neighbor            |
                            |                                     |
                            V                                     |
                  +--------------------+                          |
                  |                    |                          |
                  |   5. Binding to    |---------------------------
                  |    Service Node    |                6. Optimization and
                  |                    |                   Fault Tolerence
                  +--------------------+


3. Distance Measurement

   Different techniques can be used to determine distances between nodes
   in a session.  These distances can be used in selecting a Service
   Node if several are discovered.




draft-ietf-rmt-bb-tree-config-01                                [Page 8]


INTERNET DRAFT    draft-ietf-rmt-bb-tree-config-01.txt  20 November 2000


   These techniques quantify distance differently.  Each specific way of
   quantifying distance is called a metric.  Different Metrics are not
   necessarily comparable.  For example, if distance between A and B is
   X using metric m1, and distance between A and C is Y using metric m2,
   then X > Y does not necessarily imply B is farther from A than C.
   Only distances of the same metric should be compared and ranked.

   Metrics quantify how close two nodes are from each other.  Though not
   all of them measure specific distances, we will refer to these
   measures as distances in this document.

   It is not necessary for all nodes to use the same metric to select
   their neighboring SN to connect to.

3.1 Multicast Beacon

   Periodically, a node sends a Beacon message on the Session's
   multicast address.  The Beacon message includes the original time-
   to-live value set by the node.  The distance to the node is then
   calculated as


             Beacon's original TTL  -  Beacon's current TTL

   One node is closer than another if its distance is a lower number.

3.2 Mesh

   This metric is just like the Multicast Beacon Metric, except that
   nodes query their local router to determine their distance to the
   sender.

3.3 Address

   For IPV6 addresses[HOD98], distance can be approximately determined
   by the number of aggregation levels one address has in common with
   another.  For this metric, one node is closer than another if its
   address has more aggregation levels in common with the querying
   node's address.

3.4 Static

   The node's distance to other nodes is made available in some well-
   known location.  One node's is closer than another if its distance is
   a lower number.

3.5 GRA

   The node's distance to the sender is determined by a GRA message that
   lists the set of GRA routers on the path from the source. Nodes (and
   POCs) can use these path-strings to determine relative locations of
   SNs.

4. Session Announcement

   The first step in the tree-building process is for a node to be
   informed of the session's existence.  This can be done using some
   out-of-band method, such as Session Advertisement [HPW00], URL, e-
   mail, etc.




draft-ietf-rmt-bb-tree-config-01                                [Page 9]


INTERNET DRAFT    draft-ietf-rmt-bb-tree-config-01.txt  20 November 2000


   SNs do not necessarily receive these advertisements.  If an SN is not
   a Receiver, it obtains the advertisement information once it is
   contacted by a Receiver.

   The advertisement includes the multicast address being used, the
   Sender's address, the Session Identifier, any specific port numbers
   to be used, and any global information useful for tree construction
   (such as a sender POC), to be described later.

5. Service Node Discovery

   Discovery is the process by which a node determines a suitable
   Service Node.  During the discovery process, SNs are found, Sender
   distances are exchanged, and the best SN is selected.

5.1 Service Node Discovery Algorithms

   This draft describes four algorithms for discovering SNs: Static,
   Expanding Ring Search (ERS), Point-of-Contact (POC), and Mesh.
   Multiple algorithms may be used within a single session.  In
   particular, one algorithm may be used by statically configured SNs in
   a session, while a different algorithm is used by Receivers of the
   session.  For instance, SNs for a session may use the Mesh algorithm,
   while the receivers use static configuration to discover the SNs.

   The transport protocols request the following information from this
   BB by providing a unique identifier for the data session to join.
   This identifier, called a Session Identifier, is obtained from the
   Session Advertisement algorithm in the PI.

   Service Nodes:

      1. ParentAddress:  the address and port of the parent node to which
                         the node should connect
      2. UDPListenPort:  the number of the port on which the node will listen
                         for its children's control messages
      3. RepairAddr:     the multicast address, UDP port, and TTL on which this
                         node sends control messages to its children.

   Receivers:

      1. ParentAddress.

   Senders:

      1. UDPListenPort
      2. RepairAddr

   After the above information is obtained from auto-tree-config, the
   transport protocol may perform the necessary Bind operations for
   participating in the Session Tree.

5.1.1 Static

   A list of Neighbors is made available to a Node in some well-known
   location. The location of the Neighbor list, which could be a file
   name, a URL, or a DNS name, MAY be included in the session
   advertisement.  A node MAY query these neighbors for their Sender
   distances and pick the best one, but it may simply pick one and try
   to bind to it.  It is important to carefully configure static trees



draft-ietf-rmt-bb-tree-config-01                               [Page 10]


INTERNET DRAFT    draft-ietf-rmt-bb-tree-config-01.txt  20 November 2000


   without loops.  If the tree-building process detects that binding two
   nodes would form a loop, the bind will not take place and the node
   will not be able to join the tree unless there are alternative static
   neighbors.

5.1.2 ERS

   Nodes look for Neighbors using a multicast QUERY message. The initial
   TTL value in the QUERY message is specified in the session
   announcement and may be as large as the session TTL.  An SN that is
   able to handle additional Children SHOULD respond to a QUERY message
   by multicasting a ADVERTISE message. If the SN is not yet a Parent,
   the TTL used in this response is the  same TTL used in the QUERY
   message.  If the SN is a Parent, the TTL used is the greater of the
   QUERY TTL and the Parent's current ADVERTISE TTL.

   The Node listens for ADVERTISE messages after sending the QUERY
   message. If one or more ADVERTISE messages are received during a
   solicit period, the best SN among them is selected as described in
   section 5.2.  If no ADVERTISE messages are received, the Node sends
   another multicast QUERY message with an incremented TTL.  The process
   of sending the multicast QUERY message with an increasing TTL value
   continues until a response is received.  The TTL value is limited by
   a value specified in the advertisement.

   If the TTL value required to reach the soliciting Node is greater
   than the TTL used to reach the SN, a ADVERTISE message may not reach
   the Node. However, if future QUERY messages have increased TTL
   values, the TTL may eventually be large enough for the ADVERTISE
   message to reach the Node.  However, it is possible that the Node
   will not locate any SNs using Expanding Ring Search. It is advisable
   that a backup method, such as static, be available.

   SNs SHOULD suppress sending ADVERTISE messages in response to QUERY
   messages if one was sent with at least the QUERY's TTL within the
   last SolicitInterval (a multicast session parameter).

   Nodes MAY suppress sending QUERY messages for one interval when they
   first start in order to collect information from ADVERTISE messages
   already solicited by other nodes.

5.1.3 POC

   Nodes query using unicast a designated node, the POC, for Neighbors.
   The POC returns one or more choices of SN, from which the node
   chooses. The POC scheme is suited for scenarios where receivers lack
   the ability to use multicast for ERS, no mesh is deployed, and
   dynamic construction of the session tree is required.

   POCs may be well-known or discovered. POCs do not join the multicast
   tree of a session.  Receivers and SNs inform the POC they wish to
   join the session tree and the POC recommends an SN to use.  Local
   POCs contact the Sender's POC for interdomain connectivity.

   POCs are not necessarily specialized infrastructure. Any receiver,
   SN, or the Sender may be assigned the role at the start of the
   session, as long as all hosts have a unified view of who the local
   POC is. We expect an existing local SN may typically be appointed as
   the POC.




draft-ietf-rmt-bb-tree-config-01                               [Page 11]


INTERNET DRAFT    draft-ietf-rmt-bb-tree-config-01.txt  20 November 2000


   Alternatively, POCs could be specially deployed redirection servers
   similar to DNS redirect servers implementing the messages and
   interfaces described in this BB.

   The address of the Sender's POC MUST be included with the session
   advertisement. Local POCs SHOULD be used to configure trees within a
   single domain.  Receivers and SNs may discover local POCs via a
   static configuration or other more dynamic mechanisms such as an
   anycast service [PTM93] or directory services. These local POCs MUST
   be told of the Sender's POC in order to connect all trees across
   domains.  POCs also serve as an interface between interdomain and
   intradomain tree construction.

   SN discovery with POCs proceeds as follows:

       1. The node joins the multicast session and learns of the Sender's
       POC or a local POC from the session announcement or directory services.

       2. The node optionally discovers its distance from the Sender. Any
       metric described in Section 3 may be used.

       3. The node sends a QUERY message to the POC for a parent. The request
       includes (optionally) the node's distance to the sender.

       4. The POC chooses a parent for the node and sends the IP address (and
       port) of the parent in an ADVERTISE packet.

       5. A SN wishing to accept children sends (unicasts) an ADVERTISE message
       to a POC.  The POC used these ADVERTISE messages as replies to
       QUERY messages.

5.1.4 Mesh

   The job of an IP Multicast routing protocol is very similar to that
   of the auto tree configuration building block-to dynamically
   construct and maintain a set of distribution trees, where each router
   is a node in the tree.  To do so, the multicast routing protocol
   makes use of a unicast routing protocol, which is responsible for
   discovering and maintaining a set of unicast routes between
   neighbors, such that any node in the interconnected mesh can be
   reached by any other.  The candidates for those neighbors are
   statically configured as part of the router configuration, while host
   entities (senders and receivers) dynamically find their nearest SN
   using neighbor discovery algorithms described in this BB.

   The mesh approach to neighbor discovery and distance metrics is
   directly analogous to this scenario, providing functionality similar
   to what OSPF offers.  The mesh approach requires that all of the SN's
   that will participate in any given tree construction process exist as
   persistent infrastructure entities, with statically configured routes
   among their neighbors.  Each of these neighbor routes among the SNs
   in a mesh exchange unicast routing updates, using a standard unicast
   routing protocol such as OSPF or IS-IS, or even a very simple link-
   state routing protocol.  The link for each route SHOULD be
   implemented as a TCP connection.

   The result of this unicast routing protocol is a state table at each
   SN, with a next-hop neighbor SN forwarding entry for each SN that is
   presently connected into the mesh.  This next-hop neighbor is used as
   the selection for the best neighbor service node to bind to, for SN



draft-ietf-rmt-bb-tree-config-01                               [Page 12]


INTERNET DRAFT    draft-ietf-rmt-bb-tree-config-01.txt  20 November 2000


   to SN bindings.  Receivers and senders locate a neighboring SN
   through any of the other mechanisms described in section 5.

   The following algorithm is used.
       1. At the time they start running, each SN contacts its statically
       configured neighbors via TCP, and joins in the unicast routing protocol
       being run over all of the SNs in this mesh.

       2. At the time a sender wishes to join a tree, it finds a SN that is part
       of the mesh, using any of the other neighbor discovery algorithms in section
       5.  The sender binds to this SN, and then advertises this SN's address in the
       session advertisement it sends to all receivers.

       3. When a receiver wishes to join the tree, it finds a neighboring SN using
       any of the other neighbor discovery methods in section 5.

       4. For neighbor selection, each SN looks up the sender's SN in its own
       forwarding table, and uses the primary next-hop neighbor in this table.

       5. If that neighbor SN does not wish to join the tree, for example due to
       the load at that SN, in the Reject message of the bind it can pass back its
       next-hop neighbor to the sender's SN, which can be used to attempt to bind to
       instead.

       6. Loop freedom is guaranteed by the algorithms in section 6.

5.2 Service Node Selection

   Once Neighbors have been discovered, a node selects the best one
   using whatever distance information is available.

   If there is no sender distance information to compare, the best SN is
   simply the one that is closest to the node, without a loop being
   formed by binding the node to the SN.

   If sender distances are available, there are two cases:

   Leaf nodes: For leaf nodes, the goal is to use the closest SN
   possible.

   SNs: For SNs joining the tree, it is important to pick an SN that is
   closer to the sender; neighbor distance is a secondary factor.

   Once an SN has been selected, the node tries to bind to it as
   described in Section 6.  Loop prevention is done during the bind
   process using only Tree Level information.

   This algorithm is recommended because it assigns each node the
   closest SN, and does not require all nodes to measure their sender
   distance at the start of the session.  Depending on the selected
   metric, multiple nodes measuring sender distance could cause message
   implosion, and delay tree construction.  On the other hand, the SNs
   selected may actually be further from the Sender than their children
   are.  However, it may be necessary to assign nodes to non-optimal SNs
   in order to get them on the tree, since it is possible that no SN
   closer to the Sender can accept any more children.

   More optimal trees can be constructed by using stricter rules for SN
   selection. For instance, a session MAY ensure loop-freedom by
   building the tree down from the Sender.  In this case, nodes only



draft-ietf-rmt-bb-tree-config-01                               [Page 13]


INTERNET DRAFT    draft-ietf-rmt-bb-tree-config-01.txt  20 November 2000


   need to select SNs whose level is less than 128 to ensure the tree is
   loop-free.  However, this makes tree-building take more time.

   Alternatively, nodes may be required to measure sender distance
   before selecting an SN in order to ensure that each parent is closer
   to the Sender than its children.  Presumably, this results in a tree
   in which parents detect message loss before their children,
   minimizing repair requests.

   There are other criteria for SN selection based on load balancing and
   minimizing the number of levels.

6. Tree Formation

   The following is a detailed description of the tree formation
   process. All tree construction follows this pattern. The ONLY
   differences between instantiations of this building block lie in how
   nodes discover neighbors.

   Once an SN has been selected, the Node sends a BIND message to the
   SN. If the SN has an outstanding request to bind to another SN, it
   must refuse the incoming bind request in case it forms a loop.

   Otherwise, it MAY accept the Node as a child as long as selecting it
   would not cause a loop in the tree.  Loop freedom is guaranteed by
   these two rules:

       1. If the requesting node does not have children, the SN can accept it as
          a child as long as the SN has no outstanding bind requests.  If it does
          have an outstanding bind request, the SN can accept the node as a child
          if it's address is less than the child's address.

       2. If the requesting node has children, the SN can accept it as a child if

          a. the SN's level is 128, i.e., it is the top of a sub-tree not yet
          connected to the Sender, or

          b. the SN's level is less than 128, i.e., it is connected to the Sender.

   The second rule prevents a node from selecting one of its own
   children as its parent.  Two nodes at level 128 are prevented from
   selecting each other using tie-breaking criteria described step 1
   above.

   If the SN accepts the Node as a Child, it returns an ACCEPT message.
   If it does not accept the Node, it sends a REJECT message.  If the
   Node does not receive a response after a reasonable number of tries
   and timeout period, it MAY select the next best Neighbor from its
   cached list, or else run the Service Node Discovery process again to
   determine an alternate SN to try.

   The REJECT message contains a reason code.  If the code indicates
   that the node was rejected because the SN was not yet on the tree,
   the node MAY choose to retry that SN after a suitable timeout.

   The REJECT message may also include a list of alternative SNs for the
   node to try.

   The ACCEPT message MUST include the Parent's current Tree Level.  The
   Node MUST set its Tree Level to one more than the Parent's level.  If



draft-ietf-rmt-bb-tree-config-01                               [Page 14]


INTERNET DRAFT    draft-ietf-rmt-bb-tree-config-01.txt  20 November 2000


   the Node itself has Children, it MUST send a HEARTBEAT message
   containing the new tree level.  The ACCEPT message MAY also indicate
   the  starting sequence number of the message from which data
   reliability is assured.

   Service Nodes MAY limit the number of children they support depending
   on their capacity.  Once an SN has accepted its maximum number of
   children, it stops accepting new children until a change in
   membership causes its count of children to go below this limit.

   If an SN limits the number of children it supports, it SHOULD reserve
   at least one child slot for other SNs. This guarantees the growth of
   the repair tree.

   A note about late joiners: a late joiner is a node seeking membership
   in the tree after the Sender has started to send data.  In this case,
   it is possible that some of the data is no longer available within
   the tree.  Nodes may need to have specific information about repair
   availability before selecting a Parent.

   In order to accommodate this requirement, the QUERY, BIND, ACCEPT,
   and HEARTBEAT messages may contain information about both a Node's
   requirements for repair data, and an SN's ability to provide that
   data.

7.  Tree Maintenance

   Tree maintenance is an ongoing process active in every Node.  Because
   the tree is based on the operation of SNs, as well as the various
   underlying metrics that may change over time, it is important that
   these dependencies be monitored for changes.  Nodes MUST monitor
   Parents for liveliness and changes in tree level, and SHOULD continue
   to run the Neighbor Discovery and Selection process in order to
   optimize their choice of SN.  Parents must also monitor Children for
   liveliness.

7.1 Monitoring Parents and Children

   Parents periodically send out HEARTBEAT messages unicast or on their
   assigned multicast addresses as a keep-alive to their Children.  The
   HEARTBEAT message MUST contain the Parent's current Tree Level.
   Children MUST set their Tree Level to one more that their Parent's
   level.  Children respond to their Parent's HEARTBEAT message with a
   HEARTBEATCONFIRM message.  Both messages MAY be suppressed if other
   messages in the protocol instantiation have recently provided a
   keep-alive function.

   If a Child does not receive a HEARTBEAT message from its Parent for a
   period of time, it MUST attempt to bind with an alternate SN.

   A Child that is leaving a session MUST send a unicast LEAVE message
   to its Parent.  The Parent SHOULD respond with a LEAVECONFIRM
   message.

   A Parent that is leaving the session MUST send a multicast LEAVE
   message to its Children indicating that they need to bind with an
   alternate SN.

   If a Parent does not hear a HEARTBEATCONFIRM message from a Child for
   a period of time, or it receives a LEAVE message from a Child, it



draft-ietf-rmt-bb-tree-config-01                               [Page 15]


INTERNET DRAFT    draft-ietf-rmt-bb-tree-config-01.txt  20 November 2000


   removes that Child from its list of Children.

7.2 Optimizing the Tree

   Implementations of this building block SHOULD continue to run the
   Neighbor Discovery and Selection process in order to optimize the
   choice of SN.  This continuous process also keeps the distance
   information for the current Parent up- to-date.  Whenever the process
   returns a better SN than the current one, the Node MAY bind to the
   new SN.  Once the new SN is bound to, the Node MUST send a LEAVE
   message to the original Parent.  A Parent with no Children MAY leave
   the session.



















































draft-ietf-rmt-bb-tree-config-01                               [Page 16]


INTERNET DRAFT    draft-ietf-rmt-bb-tree-config-01.txt  20 November 2000


8. Messages

   These messages are required for implementations of this building
   block.  The list below indicates which message contents are required
   by implementations.  Implementations may also include other
   protocol-specific information in these messages.

   +------------------+---------------------------------+--------------------------+
   |  Message Name    |           Description           |                          |
   | m-cast or u-cast |                                 |          Contents        |
   |------------------+---------------------------------+--------------------------+
   |     QUERY        | A message used to discover      | Sender distance, TTL     |
   |      both        | neighbors.  The unicast version | (optional)               |
   |                  | is sent to POCs; the multicast  |                          |
   |                  | version is used in ERS.         |                          |
   |------------------+---------------------------------+--------------------------+
   |   ADVERTISE      | A message used to advertise a   | IP address, port         |
   |      both        | SN. The unicast version is sent |                          |
   |                  | from the POC; the multicast     |                          |
   |                  | version is sent by SN themselves|                          |
   |------------------+---------------------------------+--------------------------+
   |      BIND        | request to SN to join tree      |                          |
   |     unicast      |                                 | current Tree Level       |
   |------------------+---------------------------------+--------------------------+
   |     ACCEPT       | acceptance of BIND from SN      | current Tree Level       |
   |     unicast      |                                 |                          |
   |------------------+---------------------------------+--------------------------+
   |     REJECT       | rejection of BIND from SN       | reject reason            |
   |     unicast      |                                 | alternate SN list        |
   |------------------+---------------------------------+--------------------------+
   |      LEAVE       | Child leaving Parent (u-cast),  |                          |
   |      both        | or Parent leaving tree (m-cast) |                          |
   |------------------+---------------------------------+--------------------------+
   |  LEAVECONFIRM    | Acknowledgement of LEAVE        |                          |
   |     unicast      | message                         |                          |
   |------------------+---------------------------------+--------------------------+
   |     HEARTBEAT    | Periodic keep-alive from Parent | current Tree Level       |
   |      both        | to Children                     | Sender distance          |
   |                  |                                 | Advertisement (optional) |
   |------------------+---------------------------------+--------------------------+
   | HEARTBEATCONFIRM | Acknowledgement of HEARTBEAT    |                          |
   |     unicast      | from Child                      |                          |
   |------------------+---------------------------------+--------------------------+




















draft-ietf-rmt-bb-tree-config-01                               [Page 17]


INTERNET DRAFT    draft-ietf-rmt-bb-tree-config-01.txt  20 November 2000


9. External Interfaces This section describes external interfaces for
   the building block.

9.1 Interfaces to this BB.  These may be used by a PI, or by a higher-
   level BB.

9.1.1 findSNs findSNs instructs the BB to start the process of finding
   SN candidates for this node.  findSNs may return immediately with a
   list of candidate SN, or may use the SNlist interface (see section
   9.2.1) to return the list at a later time.

9.1.2 messageFromSN(message) messageFromSN passes along a message
   received from the SN.  The PI or BB should pass messages down to this
   BB in case they contain information necessary for the BB.

9.1.3 lostSN lostSN notifies the BB that the connection to the SN was
   lost.

9.1.4 activateSN activateSN notifies the BB to begin SN operation if
   necessary.

9.1.5 setOptimization(boolean) setOptimization tells the BB to start or
   stop the tree optimization process. The upper BB or PI may want to
   control when tree optimization takes place.

9.1.6 setSN(SN) setSN informs the BB that the PI or upper BB has
   successfully bound to an SN.

9.1.7 acceptMember(node) acceptMember asks the BB to accept or reject
   the node as a member.  The BB returns a boolean value in response.

9.2 Interfaces from this BB to the PI or a higher-level BB.

9.2.1 SNlist(list) SNlist returns to the upper PI or BB a list of SN
   candidates.

9.2.2 SNactive(boolean) SNactive informs the upper PI or BB that this
   node has become an SN or has ceased SN operation.

























draft-ietf-rmt-bb-tree-config-01                               [Page 18]


INTERNET DRAFT    draft-ietf-rmt-bb-tree-config-01.txt  20 November 2000


   10. References

   [CST00] B. Cain, T. Speakman, D. Towsley, "Generic Router Assist
   (GRA) Building Block - Motivation and Architecture",  Internet Draft,
   Internet Engineering Task Force, March 2000.

   [DEE89] Deering, S., "Host Extensions for IP Multicasting", RFC 1112.

   [ECTP00] ITU-T SG7/Q.13 and ISO/IEC JTC1/SC6/WG7, "Enhanced
   Communication Transport Protocol," ITU-T Draft Recommendation X.ectp
   and JTC1/SC6 Committee Draft 14476, June, 2000.

   [HC00] H. Holbrook, B. Cain, "Source-Specific Multicast for IP,"
   Internet Draft, Internet Engineering Task Force, March 2000.

   [HOD98} R. Hinden, M. O'Dell, S. Deering, "An IPv6 Aggregatable
   Global Unicast Address Format," Request For Comments 2374, July 1998.

   [HPW00] M. Handley, C. Perkins, E. Whelan, "Session Announcement
   Protocol", Internet Draft, Internet Engineering Task Force, March
   2000.

   [HWKFV00] M. Handley, B. Whetten, R. Kermode, S. Floyd, L. Vicisano,
   and M.  Luby, "The Reliable Multicast Design Space for Bulk Data
   Transfer," Internet Draft, Internet Engineering Task Force, March
   2000.

   [KV00] R. Kermode, L. Vicisano, "Author Guidelines for RMT Building
   Blocks and Protocol Instantiation Documents," Internet Draft,
   Internet Engineering Task Force, July, 2000.

   [LPG98] B.N. Levine, S. Paul, and J.J. Garcia-Luna-Aceves,
   "Organizing Multicast Receivers Deterministically According to
   Packet-Loss Correlation," Proc. Sixth ACM International Multimedia
   Conference (ACM Multimedia 98), Bristol, UK, September 1998.

   [PTM93] C. Partridge, T. Mendez, and W. Milliken. "Host anycasting
   service."  Request For Comments 1546, November 1993.

   [WCP00] B. Whetten, D. Chiu, S. Paul, "TRACK Architecture, A Scalable
   Real-Time Reliable Multicast Protocol," Internet Draft, Internet
   Engineering Task Force, July 2000.

   [WLKHFL00]  B. Whetten, L. Vicisano, R. Kermode, M. Handley, S.
   Floyd, and M. Luby, "Reliable Multicast Transport Building Blocks for
   One-to-Many Bulk-Data Transfer," Internet Draft, Internet Engineering
   Task Force, March 2000.

   [YGS95] R. Yavatkar, J. Griffioen and M. Sudan, "A Reliable
   Dissemination Protocol for Interactive Collaborative Applications",
   University of Kentucky, 1995.












draft-ietf-rmt-bb-tree-config-01                               [Page 19]


INTERNET DRAFT    draft-ietf-rmt-bb-tree-config-01.txt  20 November 2000


Authors' Addresses

      Brad Cain
      brad.cain@mirror-image.com
      Mirror Image
      49 Dragon Court
      Woburn, MA 01801

      Dah Ming Chiu
      dahming.chiu@sun.com
      Miriam Kadansky
      miriam.kadansky@sun.com
      Sun Microsystems Laboratories
      1 Network Drive
      Burlington, MA 01803

      Seok Joo Koh
      sjkoh@pec.etri.re.kr
      ETRI/Korea
      161 Kajong-dong
      Yusong-Gu, TAEJON, 305-350, KOREA

      Brian Neil Levine
      brian@cs.umass.edu
      Computer Science Department
      University of Massachusetts
      Amherst, MA 01003

      Dave Thaler
      dthaler@microsoft.com
      Microsoft Corporation
      One Microsoft Way
      Redmond, WA 98052

      Gursel Taskale
      gursel@talarian.com
      Brian Whetten
      whetten@talarian.com
      Talarian
      333 Distel Circle
      Los Altos, CA 94022-1404


Full Copyright Statement

   Copyright (C) The Internet Society (2000).  All Rights Reserved.

   This document and translations of it may be copied and furnished to
   others, and derivative works that comment on or otherwise explain it
   or assist in its implementation may be prepared, copied, published
   and distributed, in whole or in part, without restriction of any
   kind, provided that the above copyright notice and this paragraph are
   included on all such copies and derivative works. However, this
   document itself may not be modified in any way, such as by removing
   the copyright notice or references to the Internet Society or other
   Internet organizations, except as needed for the purpose of
   developing Internet standards in which case the procedures for
   copyrights defined in the Internet languages other than English.

   The limited permissions granted above are perpetual and will not be



draft-ietf-rmt-bb-tree-config-01                               [Page 20]


INTERNET DRAFT    draft-ietf-rmt-bb-tree-config-01.txt  20 November 2000


   revoked by the Internet Society or its successors or assigns.

   This document and the information contained herein is provided on an
   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE."























































draft-ietf-rmt-bb-tree-config-01                               [Page 21]