INTERNET-DRAFT                                                  T. Anker
                                                            D. Breitgand
File: draft-anker-congress-01.txt                               D. Dolev
                                                                 Z. Levy
                                           The Hebrew Univ. of Jerusalem
                                             Expiration:    18 July 1998

                  IMSS: IP Multicast Shortcut Service

Status of this Memo

   This document is an Internet Draft. Internet Drafts are working
   documents of the Internet Engineering Task Force (IETF), its areas,
   and its Working Groups. Note that other groups may also distribute
   working documents as Internet Drafts.

   Internet Drafts are draft documents valid for a maximum of six
   months. Internet Drafts may be updated, replaced, or obsoleted by
   other documents at any time. It is not appropriate to use Internet
   Drafts as reference material or to cite them other than as a "working
   draft" or "work in progress".

   To learn the current status of any Internet-Draft, please check the
   "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow
   Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe),
   munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or
   ftp.isi.edu (US West Coast).

Abstract

   This memo describes an IP Multicast Shortcut Service (IMSS) over a
   large ATM cloud. The service enables cut-through routing between
   routers serving different Logical IP Subnets (LISs). The presented
   solution is complementary to MARS [2], adopted as the IETF standard
   solution for IP multicast over ATM.

   IMSS consists of two orthogonal components: CONnection-oriented Group
   address RESolution Service (CONGRESS) and IP multicast SErvice for
   Non-broadcast Access Networking TEchnology (IP-SENATE). An IP class D
   address is resolved into a set of addresses of multicast routers that
   should receive the multicast traffic targeted to this class D
   address. This task is accomplished using CONGRESS. The cut-through
   routing decisions and actual data transmission are performed by IP-
   SENATE.

   IMSS preserves the classical LIS model [8]. The scope of IMSS is to
   facilitate inter-LIS cut-through routing, while MARS provides tools
   for the intra-LIS IP multicast.



Anker, Breitgand et. al    Expires July 1998                    [Page 1]


Internet Draft       <draft-anker-congress-00.txt>           1 July 1997


Table of Content

   1.     ................................................Introduction
   1.1    ..................................................Background
   1.2    ....................................................CONGRESS
   1.3    ...................................................IP-SENATE
   2.     ..................................................Discussion
   3.     ...............................................IMSS Overview
   3.1    ...............................................Network Model
   3.2    ....................................................CONGRESS
   3.2.1  ...............................................CONGRESS' API
   3.3    ...................................................IP-SENATE
   4.     ................................................Architecture
   4.1    .......................................CONGRESS Architecture
   4.2    ......................................IP-SENATE Architecture
   4.3    ...........................................IMSS Architecture
   5.     ...........................................CONGRESS Protocol
   5.1    .............................................Data Structures
   5.2    .......................IMSS Router Joining/Leaving a D-group
   5.3    ...........Reception of Incremental Membership Notifications
   5.4    ...............................Resolution of D-Group Address
   5.5    ........................................Handling of Failures
   5.5.1  .........................................IMSS Router Failure
   5.5.2  ..............................................Domain Failure
   5.5.3  .............................................Domain Recovery
   6.     ..........................................IP-SENATE Protocol
   6.1    ........................................Main Data Structures
   6.2    .....................................Maintenance of D-groups
   6.2.1  ............................................Joining D-Groups
   6.2.2  ............................................Leaving D-Groups
   6.2.3  .........................Client and Server Operational Roles
   6.2.4  ...............................Regular and Sender-Only Modes
   6.3    ........................................Forwarding Decisions
   6.3.1  ..................A Server Receives a Datagram from a Client
   6.3.2  ............A Server Receives a Datagram from another Server
   6.3.3  .........A Client Receives a Datagram from an IDMR Interface
   6.3.4  .........A Server Receives a Datagram from an IDMR Interface
   6.3.5  ...........................................Pruning Mechanism
   7.     .............................................Fault Tolerance
   8.     .....................................Security Considerations
   9.     .............................................Message Formats
   9.1    ...........................................CONGRESS Messages
   9.2    ..........................................IP-SENATE Messages
   10.    ..................................................References
   11.    .............................................Acknowledgments
   12.    .......................................List of Abbreviations





Anker, Breitgand et. al    Expires July 1998                    [Page 2]


Internet Draft       <draft-anker-congress-00.txt>           1 July 1997


1. Introduction

   As was noted in VENUS [3]: "The development of NHRP [21], a protocol
   for discovering and managing unicast forwarding paths that bypass IP
   routers, has led to some calls for an IP multicast equivalent.
   Unfortunately, the IP multicast service is a rather different beast
   to the IP unicast service.". The problems correctly identified by
   VENUS can be divided into two broad categories: 1) problems
   associated with multicast group membership maintenance and resolution
   and 2) problems concerned with the multicast routing. Although VENUS,
   "...focuses exclusively on the problems associated with extending the
   MARS model to cover multiple clusters or clusters spanning more than
   one subnet", most of the discussed problems are, in fact, intrinsic
   to any cut-through routing solution. The main conclusion that one can
   draw from VENUS is that these problems cannot be solved just by the
   straightforward extension of MARS to cover multiple LISs. This memo
   presents a solution that relies on MARS for intra-LIS multicast
   communication, and uses an alternative methodology to provide an
   inter-LIS multicast shortcut service that scales to large ATM clouds.
   It is assumed that the reader is familiar with the classical LIS
   model [8], MARS[2] and the basics of the Inter-Domain Multicast
   Routing (IDMR) protocols [4,5,9,10,11].

   This document has two goals:

      o To provide a generic protocol for dynamic mapping
        of any IP class D address onto a set of the multicast routers
        that have an ATM (or any other SVC-based Data Link subnetwork)
        connectivity and have either directly attached hosts, or down-
        stream routers (w.r.t. to a specific multicast tree) that need
        to receive the corresponding multicast traffic. The resolved
        addresses are used to establish the shortcut ATM connections
        among the multicast routers. The mapping protocol should be
        independent of any underlying IP multicast protocol. It should
        be specifically noted that this document proposes usage of the
        shortcut multicast connections on a per-source basis.  This is
        motivated by the fact that the shortcut connections will be
        mainly used by multicast applications that need guaranteed QoS.
        For all other multicast applications the current IP over ATM
        paradigm would, probably, suffice.  Multicast applications that
        require QoS, such as video-conferencing, transmission of high
        quality video stream, interactive games, etc, will usually
        involve a small number of sources and will require a source
        specific multicast trees in order to achieve the required QoS.

      o To provide a solution for the generic interoperability and
        routing problems that arise when any cut-through routing
        protocol is deployed in conjunction with the existing IDMR



Anker, Breitgand et. al    Expires July 1998                    [Page 3]


Internet Draft       <draft-anker-congress-00.txt>           1 July 1997


        protocols.


   This document proposes an architectural separation between the two
   problem domains above, so that each one of them can be tackled with
   the most appropriate methodology and in the most generic manner.

1.1 Background

   The classical IP network over an ATM cloud consists of multiple
   Logical IP Subnets (LISs) interconnected by IP routers [8]. The
   standardized solution for IP Multicast over ATM, Multicast Address
   Resolution Service (MARS[2]) follows the classical model. In the MARS
   approach, each LIS is served by a single MARS server and is termed
   "MARS cluster". MARS can be viewed "as an extended analog of the ATM
   ARP server [8]".

   From the IP multicast perspective, MARS is functionally equivalent to
   IGMP [1]. Similarly to IGMP, a MARS server registers the hosts that
   are directly attached to a multicast router and are interested to
   receive multicast traffic targeted to a specific IP class D address.
   The important difference, however, is that MARS is aware of the
   connection-oriented nature of the underlying network. For each
   relevant IP class D address, the MARS server maintains a set
   (membership) of the hosts that belong to the same LIS and have been
   registered to receive IP datagrams being sent to this address.

   The process of mapping an IP class D address onto a set of ATM end-
   point addresses is termed "multicast address resolution". Each such
   set is used to establish native ATM connections between an IP
   multicast router and the local members of the IP multicast group. The
   IP multicast datagrams targeted to a specific class D address are
   propagated over these connections. The ATM connections' layout within
   a MARS cluster may be based either on a mesh of point to multipoint
   (ptmpt) Virtual Circuits (VCs) [6,7], or a Multicast Server (MCS).

   There is a work in progress to distribute the MARS server in order to
   provide for load balancing and fault tolerance [17]. A group of
   redundant MARS servers will constitute a single logical entity that
   would provide the same functionality as a non-distributed MARS
   server.

   There is another work in progress, EARTH [12] that intends to extend
   the scope of the services provided by MARS to multiple LISs. EARTH
   defines a Multicast LIS (MLIS) that is composed of a number of LISs
   and is served by a single EARTH server. Due to the centralistic
   approach taken by EARTH, ultimately, very large MLISs would look as
   very large MARS clusters. Thus the discussion and the conclusions



Anker, Breitgand et. al    Expires July 1998                    [Page 4]


Internet Draft       <draft-anker-congress-00.txt>           1 July 1997


   provided in VENUS are equally applicable to EARTH.

   In the classical LIS model, LIS has the following properties:

      o All members of a LIS have the same IP network/subnet number
        and address mask;

      o All members of a LIS are directly connected to the same NBMA
        subnetwork;

      o All hosts and routers outside the LIS are accessed via a router;

      o All members of a LIS access each other directly (without
      routers).


   In the MARS model that retains the LIS model, it is assumed that all
   the multicast communication outside the LISs is performed via
   multicast routers that run some IDMR protocols. As explained in [13],
   the classical LIS model may be too restrictive for networks based on
   switched virtual circuit technology, e.g, ATM. Obviously, if LISs
   share the same physical ATM network (ATM cloud), the LIS
   internetworking model may introduce extra routing hops. This mismatch
   between the IP and ATM topologies complicates full utilization of the
   capabilities provided by the ATM network (e.g., QoS).

   In addition, the extra routing hops impose an unnecessary
   segmentation and reassembly overhead, because every IP datagram
   should be reassembled at every router so that a router can perform
   routing decisions. The "short-cut" (or "cut-through") paradigm seeks
   to eliminate the mismatch between the topology of IP and that of the
   underlying ATM network. Unfortunately, as was already stated above,
   bypassing the extra routing hops is not a trivial task.



1.2 CONGRESS

   The purpose of cut-through routing is to establish direct
   communication links among the multicast group members. The discovery
   of the multicast group members addresses is performed by a multicast
   group address resolution and maintenance service. Generally this
   service maps some application-defined character string, a multicast
   group address, onto a set of identifiers of the group members.

   Since a multicast group address resolution and maintenance service is
   crucial to any multicast routing short-cut solution over NBMA
   networks, it is appropriate to ask whether it should be implemented



Anker, Breitgand et. al    Expires July 1998                    [Page 5]


Internet Draft       <draft-anker-congress-00.txt>           1 July 1997


   once as a generic stand-alone service or suited specifically for each
   and every multicast short-cut service. The tradeoff here is between
   the generality and efficiency w.r.t. a specific multicast routing
   protocol. In the IMSS approach a general multicast address resolution
   service, CONGRESS, is used.

   CONGRESS is a multicast address resolution and maintenance service
   for NBMA networks that is independent of an underlying multicast
   protocol. This is a generic stand-alone service. Although CONGRESS
   may be exploited by the native ATM applications, as well as by the
   network layer (IP), this document will focus only on the aspects of
   CONGRESS related to IP. In fact a reduced version of CONGRESS having
   the minimal set of features is presented in this memo. The interested
   reader is encouraged to refer to [14] for more information.

   CONGRESS operates in the native ATM environment. Its purpose is to
   provide multicast address resolution and maintenance service
   scaleable to a large ATM WAN.  CONGRESS design is based on the
   following principles:

      o No flooding: CONGRESS does not flood the WAN on every multicast
        group membership change.

      o Hierarchical design: CONGRESS services are provided to
        applications by multiple hierarchically organized servers.

      o Robustness: Due to network failures and/or network
        reconfiguration and re-planning, some CONGRESS servers may
        temporarily disconnect and later reconnect. CONGRESS withstands
        such transient failures by providing a best-effort service to
        applications.


   It is important to stress that CONGRESS is not concerned with the
   actual data transfer. Its functionality is limited to the resolution
   of multicast group addresses upon requests from the applications.

   An overview of CONGRESS is provided in Section 3.2.

1.3 IP-SENATE

   IP-SENATE is the second component of IMSS. It is concerned with the
   actual IP datagram transmission over the short-cut communication
   links, establishment of these links, routing decisions and the
   interoperability with the existing IDMR protocols. IP-SENATE provides
   a solution for the problems arising from bypassing of the multicast
   routers.  Most of these problems are general and independent of the
   underlying IDMR protocols. The design philosophy of IP-SENATE is



Anker, Breitgand et. al    Expires July 1998                    [Page 6]


Internet Draft       <draft-anker-congress-00.txt>           1 July 1997


   based on the following principles:

      o IP-SENATE is a best effort service. IP-SENATE does not
        guarantee that short-cut is always possible, but it attempts to
        perform the short-cut wherever possible.

      o Short-cut is performed only among the multicast routers and not
        directly among hosts.

      o IP-SENATE facilitates (a) a full mesh of ptmpt connections
        based communication, (b) multicast servers based communication
        and (c) a hybrid form of communication based on the previous
        two.

      o IP-SENATE facilitates  migration from a mesh of ptmpt
        connections to multicast service-based connections and for
        load-balancing among the multicast servers without a need for
        global reconfiguration.

      o IP-SENATE uses CONGRESS services for resolution and maintenance
        of the multicast addresses into a set of addresses of the
        relevant multicast routers. IP-SENATE may use any other service
        providing the same functionality as CONGRESS.

      o IP-SENATE is an inter-LIS protocol. It extends only the IDMR
        routers. Host interface to IP multicast services [19] is not
        changed.

      o IP-SENATE relies on MARS to facilitate all the intra-LIS IP
        multicast traffic.

      o IP-SENATE does not assume a single multicast routing domain.
        IP-SENATE is designed to operate in a heterogeneous network
        where network consists of multiple interconnected multicast
        routing domains.  Consequently, IP-SENATE is not tailored for
        any specific multicast routing protocol, but can be dynamically
        configured to inter-work with different multicast protocols.

      o IP-SENATE is to be implemented as an extension to the existing
        multicast routing software.


2. Discussion


   A designer of a short-cut routing multicast solution is opposed with
   multiple non-trivial problems. The more prominent problems are
   discussed below.



Anker, Breitgand et. al    Expires July 1998                    [Page 7]


Internet Draft       <draft-anker-congress-00.txt>           1 July 1997


      o If hosts are allowed to communicate directly with other
        hosts (as in [3]), bypassing the multicast routers, then each
        host must maintain membership information about all other hosts
        scattered all over the internet and belonging to the same IP
        multicast group. This scheme does not scale well because:

           - The hosts must maintain large amounts of data that should
             be kept consistent and updated.
           - A considerable traffic and signalling overhead is
             introduced when membership changes, e.g, join or leave
             events are flooded over the network.
           - As was noted in RFC2121 [18], an ATM Network
             Interface Card (NIC) is capable of supporting a limited
             number of connections (i.e, VCs originating from a NIC or
             terminating at a NIC). If full mesh of ptmpt VCs is used
             for cut-through communication within a multicast group,
             NICs might not be capable to support all the simultaneous
             connections.

      o To solve the NIC limitations problem, the current IETF IP
        multicast over ATM solution, MARS, supports a migrate
        functionality that allows to switch from a mesh of ptmpt
        connections to a multicast server based communication within a
        single MARS cluster. It is not clear how to extend this
        functionality, to a large ATM cloud. Such switching obsoletes
        membership information kept at the hosts that are scattered
        throughout the internet. As a result, some currently active
        connections may become stale or terminate abruptly.


   The IMSS solution presented in this memo performs cut-through only
   among the multicast routers, reducing the problems above to a certain
   extent. The NIC limitation problem is not completely eliminated,
   however. Hence, IMSS facilitates deployment of "multicast servers"
   for other routers that are termed "clients". In IMSS some of the
   multicast routers may also function as multicast servers.

   Cut-through mechanisms may have a negative impact on the conventional
   IDMR protocols. For the sake of discussion of the interoperability
   issues with the IDMR protocols, we divide the IDMR protocols into two
   large families: "broadcast & prune"- based [10] and "explicit join"-
   based [4,5,9,11]. In the first model periodical flooding of the
   network and the subsequent pruning of irrelevant branches of the
   multicast propagation trees is employed. In the second model, some
   explicit information about the topology of the IP multicast groups is
   exchanged among the multicast routers.

   As we see it, a cut-through solution will have to co-exist with a



Anker, Breitgand et. al    Expires July 1998                    [Page 8]


Internet Draft       <draft-anker-congress-00.txt>           1 July 1997


   regular Inter-Domain Multicast Routing protocol in the same routing
   domain. One of the reasons for deployment of an IDMR protocol in
   addition to the cut-through mechanism, in the same ATM cloud, is that
   it is not guaranteed that a cut-through connections can reach all the
   relevant targets in the ATM cloud.

       ===============================================================

         |------------|
         | IP cloud   |
         |  (DVMRP)   |
         |            |
         | S #######> R ##   |-----------|
         |------------|  #   |           |----------|
                         ##> CTR xxxxxxx>CTR        |
                             | IP/ATM    |# IP cloud|
         S - source          | cloud     |# (DVMRP) |------------|
         D - Destination     |-----------|#         |            |
         R - DVMRP router                |#########>R########> D |
         CTR - Cut-through router        |----------|            |
         x - Cut-through connection                 | IP cloud   |
         # - DVMRP branch                           |  (DVMRP)   |
                                                    |------------|


       Figure 1.
       ===============================================================


   Another important reason is that if a "broadcast & prune" IDMR
   protocol is used in some non-ATM based IP subnetworks connected to
   the ATM cloud, the border routers that connect these subnetworks to
   the ATM cloud, do not receive explicit notifications that some
   downstream routers could be a part of an IDMR multicast propagation
   tree (as depicted in Figure 1). Thus, a broadcast & prune mechanism
   of the IDMR protocol should be exploited periodically by the cut-
   through multicast routers in order to learn about the downstream
   routers that depend on them. The discovery process is based on
   analysis of the prune messages that the multicast router will receive
   from the neighboring routers.

   On the other hand, the co-existence of IDMR protocols with the cut-
   through solution, raises several problems:

      o Routing decisions are normally made at the multicast
        routers. If hosts can bypass a multicast router, the latter
        should be aware of all the hosts in its own LIS (and in all of
        the downstream LISs) that participate in the cut-through



Anker, Breitgand et. al    Expires July 1998                    [Page 9]


Internet Draft       <draft-anker-congress-00.txt>           1 July 1997


        connections. Otherwise the IDMR protocols would not be able to
        construct the multicast propagation trees correctly and the
        multicast datagrams may be lost.

      o If a multicast cut-through mechanism is deployed in
        conjunction with some IDMR protocol, then conflicts with the
        Reverse Path Forwarding (RPF) [20] may occur. The RPF mechanisms
        prevent routing loops and are crucial for the correct operation
        of IDMR protocols. Thus, the cut-through traffic should be
        treated carefully in order not to confuse the IDMR protocol.

      o A multicast distribution tree of an IDMR protocol may span
        non-ATM based IP subnetworks and contains more than one border
        router that connect these subnetworks to the ATM cloud as shown
        in Figure 2. If these border routers maintain the cut-through
        ATM connections to all other relevant border routers, undesired
        datagram duplication may result.

      o Another scenario that may lead to routing loops and
        undesired datagram duplication, may arise when both a cut-
        through mechanism and some conventional IDMR protocol, are
        deployed in the same ATM cloud. This means that an IDMR tree
        spans some routers within the ATM cloud and not only the border
        routers.

     ===============================================================
                                               S
                                               |
          CTR xxxxxxxxxxx CTR(a) ##############R
           xx          x  x                    #
           x x        x   x                  # # #
           x  x      x    x                #   #   #
           x   x    x     x               #    #    #
           x    x  x      x              R     R    R
           x     xx       x             #
           x     xx       x            #
           x    x  x      x           #       ....
           x   x    x     x          #
           x  x      x    x         #
           x x        x   x        #
           xx          x  x       #
           x            x x      #
          CTR xxxxxxxxxxx x CTR(b)

         IP/ATM + Shortcut Domain         DVMRP Domain

     S - the source
     R - IP router



Anker, Breitgand et. al    Expires July 1998                   [Page 10]


Internet Draft       <draft-anker-congress-00.txt>           1 July 1997


     CTR - cut-through router
     x - cut-through connection
     # - DVMRP branch

     Figure 2.
     ===============================================================


3. IMSS Overview

   IMSS organizes IP multicast routers into logical groups, where each
   group corresponds to some class D IP address and contains routers
   that have members of this IP multicast group or senders to it in
   their domain. These groups are termed "D-groups".  D-groups will be
   further discussed in Section 4.2. The resolution and management of
   these multicast router groups is performed through the CONGRESS
   services described later in Section 3.2.

3.1 Network Model

   In this memo, the physical layer is assumed to be comprised of
   different interconnected Data Link subnetworks: ATM, Ethernet,
   Switched Ethernet, Token Ring etc.  IMSS facilitates IP multicast
   data transfer over large-scale Non-Broadcast Media Access (NBMA)
   network. We assume that ATM is the underlying NBMA network. We call a
   single ATM Data Link subnetwork an ATM cloud. For administrative and
   policy reasons a single ATM cloud may be partitioned into several,
   disjoint logical ATM clouds, so that the direct connectivity is
   allowed only within the same logical cloud. Hereafter, unless
   otherwise specified, we use the term ATM cloud to mean logical ATM
   cloud.

   We assume that the network layer is IP. The topology of the IP
   network consists of hosts (that may be either ATM based or non-ATM
   based) and IP routers. IP multicast traffic (which is our focus) is
   routed using IP multicast routers running some (potentially
   different) IDMR protocols.

   The internals of IP implementation may vary from one IP subnetwork to
   another. The differences are due to the usage of different Data Link
   layers. If the underlying network is ATM, then the IP subnetwork's
   implementation can be based either on LAN Emulation, or Classical IP
   and ARP over ATM (RFC1577) [8] standards.

   We differentiate between the two types of IP-multicast routers: a)
   routers that run an IDMR protocol and b) those that run both an IDMR
   protocol and the IP-SENATE protocol. We refer to the latter routers
   as "border routers". A border router connects either an ATM based



Anker, Breitgand et. al    Expires July 1998                   [Page 11]


Internet Draft       <draft-anker-congress-00.txt>           1 July 1997


   LIS, or a conventional IP subnetwork to an ATM cloud.

   An important assumption is that only one IDMR protocol is allowed
   _inside_ (including the border routers) the same logical ATM cloud.
   Having multiple IDMR protocols in the same logical ATM cloud
   considerably complicates the task of avoiding datagrams duplications
   that may happen as was explained in Section 2.  If multiple IDMR
   protocols need to be deployed in an ATM cloud than each of the
   respective multicast routing domains will constitute a distinct
   logical ATM cloud.

   It should be noted, that we use the term border router in a slightly
   different manner than this term is usually used. Namely, if upon
   receiving an IP multicast datagram via an IDMR protocol, a border
   router for some reason cannot forward it using a cut-through
   connection, it may use an IDMR protocol for the next hop forwarding.
   As one may note, the border router behaves just as a regular router
   in this case. For this reason, we will sometimes refer to a border
   router simply as "IP-SENATE router", to stress the mere fact that it
   may take either IDMR routing decisions, or IP-SENATE routing
   decisions at any given time w.r.t the same network interface.

   Depending on the direction of the IP multicast traffic, a border
   router may be called "ingress router" (if the traffic is directed to
   the IP subnetwork), or "egress router" (if the traffic is directed
   outside the IP subnetwork).

   All IDMR protocols make use of multicast distribution trees over
   which IP multicast datagrams are propagated. Multicast routers that
   comprise a specific tree, receive datagrams from the upstream routers
   and forward them to the downstream routers.

   For the sake of simplicity, we assume that each border router has
   only one ATM interface that participates in the IP-SENATE protocol.

3.2 CONGRESS

   CONGRESS is a native ATM protocol that provides multicast group
   address (name) resolution and dynamic membership monitoring services
   to higher-level applications. Multicast group names are application-
   defined character strings. CONGRESS does not deal with actual data
   transmission. Address resolution services provided by CONGRESS, are
   used by applications in order to open and maintain native ATM
   connections for data transmission. Although CONGRESS is much more
   than just an auxiliary service for IP-SENATE, in this document we
   concentrate only on those CONGRESS' features that are relevant for
   IP-SENATE (The interested reader is advised to read the full version
   of the CONGRESS protocol presented in [14]). From the CONGRESS'



Anker, Breitgand et. al    Expires July 1998                   [Page 12]


Internet Draft       <draft-anker-congress-00.txt>           1 July 1997


   perspective, IP-SENATE is one of the applications that utilizes its
   services.

3.2.1 CONGRESS' API

   We refer to a client that uses CONGRESS services by a generic term
   end-point (in the context of this document, an end-point is always an
   IP-SENATE router). An end-point may become a group member by joining
   a group or cease its membership by leaving a group. Each join or
   leave request of an end-point leads to a generation of an Incremental
   Membership Notification w.r.t. a specific group. Incremental
   membership notifications reflects only the difference between the new
   membership and the previously reported one. The full membership of a
   group may be constructed by resolving a group name once upon joining
   and then by applying the incremental membership notifications as they
   arrive. Incremental membership notifications may be also triggered by
   various asynchronous network events, i.e, host or communication link
   crash/recovery.

   The CONGRESS services are provided by a library that includes the
   following basic functions:

      o join(G, id, id_len): Makes the invoking end-point a
        registered member of a multicast group G. id is the identifier
        of the new member (a pointer to some application-specific
        structure). id_len is the size of this application specific
        structure.

      o leave(G, id, id_len): Unregister the invoking end-point
        from G.

      o resolve(G): A multicast group name G is resolved into
        a set of the ATM end-point identifiers. This set includes all
        the end-points who joined G and have not disconnected due to a
        network failure or a host crash.

      o set_flag(G, imn_flag): Enables or disables the reception
        of the incremental membership notifications w.r.t.  G, by the
        invoking end-point.


   In the context of this memo, a multicast group is always a D-group.

3.3 IP-SENATE

   An IP-SENATE extension at a multicast router uses the group
   membership information that it receives from CONGRESS, in order to
   open ATM connections that bypass the IP routing mechanism. Since the



Anker, Breitgand et. al    Expires July 1998                   [Page 13]


Internet Draft       <draft-anker-congress-00.txt>           1 July 1997


   number of multicast routers is considerably lower than the overall
   number of the ATM-based destinations (both hosts and multicast
   routers), IP-SENATE reduces the number of potential short-cut
   connections comparing to a straightforward host to host cut-through
   routing. It may still be the case, however, that the number of
   multicast routers participating in a mesh of ptmpt connections is
   very large. Using the address resolution services of CONGRESS, IP-
   SENATE can support both hierarchies of multicast servers and meshes
   of ptmpt connections, and to switch back and forth between these two
   layouts as required. This will be described in Subsection 6.2.3.

   In order to avoid stable routing loops, an IP-SENATE router never
   routes IP multicast datagrams using cut-through connections if they
   were received from another IP-SENATE router. In addition, an RPF-like
   mechanism is deployed by IP-SENATE in order to prevent the extensive
   duplication of IP multicast datagrams. Such duplication may result
   from multiple IP-SENATE routers setting up multiple cut-through
   connections to the same destinations (see Figure 2).


   We assume that IP-SENATE will be used along with conventional IDMR
   protocols and that not all of the multicast routers will run IP-
   SENATE within an ATM cloud. As was explained in Subsection 2, this
   deployment mode may lead to unnecessary datagram duplication when a
   datagram is propagated over some multicast distribution tree and,
   simultaneously, over a cut-through connection.

   IP-SENATE provides a pruning mechanism that cuts the branches of an
   IDMR multicast distribution tree so that IP-SENATE multicast router
   that receives datagrams via a cut-through connection would not
   receive duplications via IDMR.

4. Architecture

4.1 CONGRESS Architecture

   CONGRESS services are provided by a set of servers. There are two
   kinds of CONGRESS servers: Local Membership Servers (LMSs) and Global
   Membership Servers (GMSs). An LMS resides at the same hosts as a
   multicast router and constitutes this router's interface to the
   CONGRESS services. GMSs are organized in a hierarchical structure
   throughout the network, and may run on either dedicated machines or
   in switches. Logically, an LMS location is independent of the
   router's host.

   CONGRESS views the network as a hierarchy of domains, where each
   domain is serviced by a CONGRESS server (the CONGRESS hierarchy can
   be readily mapped onto a peer group hierarchy provided by the native



Anker, Breitgand et. al    Expires July 1998                   [Page 14]

Internet Draft       <draft-anker-congress-00.txt>           1 July 1997


   ATM network layer, PNNI). Note, that there is no relationship between
   a CONGRESS domain and a LIS. At the lowest level, a domain consists
   of a single multicast router. Such a domain is called a "host domain"
   and is serviced by the LMS of the router's host. The LMS is called a
   "representative" of a host domain. Higher level domains consist of a
   set of the lower level domain representatives. Thus, a single GMS may
   serve a domain that consists of either several LMSs, or several GMSs
   that are representatives of their respective lower level domains.

   A CONGRESS `domain identifier' is the longest common address prefix
   of the domains it is built of. The domain identifier of a host domain
   is the ATM address of the host itself. Figure 3 illustrates the
   CONGRESS domain layout.

   Note, that there is no relation between the addresses in the figure
   below and the IP address whatsoever. The IP-like addresses were
   chosen to illustrate the hierarchy idea in the most simple way.

========================================================================

             GMS --------------------------------- GMS
             1.1                                   1.7
             /  \                                  /  \
            /    \                                /    \
           /      \                              /      \
          /        \                            /        \
         /          \                          /          \
        /            \                        /            \
       /              \                      /              \
      /                \                    /                \
     GMS ------------ GMS                 GMS --------------- GMS
    1.1.1             1.1.2              1.7.4               1.7.2
    /  \              /  \                /  \                /  \
   /    \            /    \              /    \              /    \
 LMS     LMS       LMS    LMS          LMS    LMS          LMS    LMS
1.1.1.2 1.1.1.5  1.1.2.1  1.1.2.3   1.7.4.8   1.7.4.9   1.7.2.1  1.7.2.6

Figure 3.

========================================================================




   In order to avoid flooding of the whole network upon every membership
   change occurring in every D-group, membership notifications
   pertaining to a D-group are propagated using a distributed spanning
   tree for this group. This spanning tree is a sub-tree of the CONGRESS



Anker, Breitgand et. al    Expires July 1998                   [Page 15]


Internet Draft       <draft-anker-congress-00.txt>           1 July 1997


   servers hierarchy. The CONGRESS servers comprising the sub-tree
   corresponding to a D-group, are the servers that have multicast
   routers from this group in their domains. Each server in the CONGRESS
   hierarchy maintains only a part of the spanning tree that consists of
   its immediate neighbours. The spanning tree is constructed and
   maintained according to the multicast routers join/leave requests
   issued through their LMSs. In addition, asynchronous network events
   such as crashes/recoveries of end-points (multicast routers),
   CONGRESS servers and/or failures of communication links change the
   topology of the spanning tree (such events are detected by a best-
   effort fault detector module). Obviously, since CONGRESS operates in
   an asynchronous environment, the spanning tree of a group can only be
   a best-effort approximation.

4.2 IP-SENATE Architecture

   In Figure 4, the architecture of IP-SENATE router is presented. An
   IP-SENATE router is, by definition, a border router that connects a
   cut-through routing domain to some IDMR routing domain(s). As shown
   in the figure, IP-SENATE extends a multicast router`s software. D-
   groups of IP-SENATE are managed through CONGRESS. We employ an LMS at
   each IP-SENATE router in order to provide the interface to the
   CONGRESS services. In order to make routing decisions and to open
   cut-through connections, IP-SENATE communicates with the CONGRESS
   protocol that supplies group address resolution and maintenance
   services.

   ==================================================================

     |----\  /---------|----------|   |-------|----------\  /-----|
     | IP  \/ IP-SENATE|  IDMR    |   | IDMR  | IP-SENATE \/  IP  |
     |      |____ _____|__________|   |_______|____________|      |
     |      | ^   ^         ^     |   |     ^        ^   ^ |      |
     |      | | |---|  |--------| |   | |--------| |---| | |      |
     |      | | |CGS|  |RFC+MARS| |   | |RFC+MARS| |CGS| | |      |
     |      | | |if |  |1577 if | |   | |1577 if | |if | | |      |
     | |----| | |---|  |--------| |   | |--------| |---| | | |----|
     | |IDMR| v   v         v     |   |     v        v   v | |IDMR|
     |-|----|---------------------|   |------------------- |-|----|
     |MAC   | ATM   ______________|   |___________    ATM  |      |
     |      |       |signalling   |   |signalling|         |      |
     |------|---------------------|   |------------------- |------|
     |phy.  | phy. layer          |   | phy. layer         |phy.  |
     |layer |                     |   |                    |layer |
     |------|---------------------|   |------------------- |------|
         |                    |          |              |
    ... ==                    ============              ===== ...




Anker, Breitgand et. al    Expires July 1998                   [Page 16]


Internet Draft       <draft-anker-congress-00.txt>           1 July 1997


   CGS if  - CONGRESS interface
   MARS if - MARS interface

   Figure 4.

   ==================================================================

   In the classical IP multicast model [19], a host does not have to
   become a registered member of a multicast group in order to send
   datagrams to this group. A sender does not see any difference between
   sending a datagram to a multicast IP address or a unicast IP address.
   The difference is in the multicast router, that has to participate in
   some IDMR protocol that builds a multicast propagation tree. In this
   model, a multicast router usually should know only about its
   immediate neighbours that belong to the propagation tree, and not
   about the whole tree (example of an exception to this is MOSPF [11]).

   IP-SENATE provides the hosts with the same interface for IP multicast
   service as in the classical model. A border IP-SENATE router that
   forwards IP multicast datagrams from a particular source residing in
   a non-ATM cloud into the ATM cloud, or from an ATM-based host
   residing in the router's LIS, is termed an injector for the
   corresponding <source, group> pair. (Note, that the same router may
   function as an injector for multiple <source, group> pairs).

   Injectors for any specific class D address must know the identifiers
   of all other IP-SENATE routers that must receive the traffic targeted
   to this class D address. For any <source, group> pair, shortcut
   connections should be opened by the corresponding injectors to these
   IP-SENATE routers. Ideally, only a single injector should be active
   w.r.t. any source in order to avoid datagram duplication. The set of
   IP-SENATE routers' identifiers that has to be maintained per IP class
   D address, includes the identifiers of the IP-SENATE routers that
   have either

      o directly connected hosts that registered (e.g., using IGMP)
        to receive IP multicast traffic pertaining to a specific class D
        address, or

      o some downstream multicast routers (w.r.t. some source) that have
        receivers in their LISs.


   This set of IP-SENATE routers is termed D-group. In order to obtain
   the membership of a D-group, an IP-SENATE router joins this group via
   CONGRESS. The name associated with this multicast group is just a
   class D address interpreted as a character string. The details of how
   D-groups are formed and managed are provided in Subsection 6.2.



Anker, Breitgand et. al    Expires July 1998                   [Page 17]


Internet Draft       <draft-anker-congress-00.txt>           1 July 1997


   It may seem that an IP-SENATE router that does not have any
   downstream receivers (neither routers, nor hosts) w.r.t any source,
   does not need to be a member of a D-group because it does not need to
   receive any traffic. Such a router could have used the CONGRESS
   resolve operation each time it needs to learn about the membership of
   the corresponding D-group (for example, when it needs to send a
   datagram originated by a sender in its domain). In this scheme,
   however, CONGRESS would have been heavily used and unnecessary
   overhead on the network would be imposed. In our approach, an IP-
   SENATE router joins the relevant D-group even if it does not have to
   receive the multicast traffic. In this case, it will receive
   incremental membership notifications concerning the D-group. These
   scheme is less costly. In order to prevent such a router, from being
   added as a leaf to the cut-through connections within the D-group,
   special sub-identifiers are added to the IP-SENATE router's
   identifier. This is explained in Subsection 6.1.

   In order to overcome the previously mentioned NIC's limitations on a
   number of simultaneously opened connections, some IP-SENATE routers
   may act as multicast servers, serving other IP-SENATE routers that
   are termed clients.

   It is important to stress that an IP-SENATE router acting as a server
   in one D-group may act as a client in another one. Moreover, as will
   be explained in Subsection 6.2.3, the operational roles of the IP-
   SENATE routers may dynamically change within the same D-group.

   It is important to understand that maintaining a distinct multicast
   group simultaneously for every possible IP class D address is
   technically infeasible. Fortunately, there is no real need to do
   this, because only a part of these addresses is actually in use at
   any given time. It is also unlikely that the same multicast router
   would belong to ALL the D-groups. In IP-SENATE's approach, membership
   of D-groups is formed on-demand using CONGRESS, as will be explained
   in Subsection 6.2.

   Another very important property of the IP-SENATE solution is that
   IP-SENATE can tear down the cut-through connections among the members
   of a D-group when no multicast data is transmitted over these
   connections for a sufficiently long period of time. The cut-through
   connections may be resumed later on-demand, using CONGRESS to obtain
   updated membership information. Note, that when an IP-SENATE router
   terminates the inactive connections within a D-group, this does not
   affect CONGRESS which may continue to monitor the membership of the
   group running "in the background". Thus, when the cut-through
   connections need to be resumed, the membership information would be
   instantly available.




Anker, Breitgand et. al    Expires July 1998                   [Page 18]


Internet Draft       <draft-anker-congress-00.txt>           1 July 1997


   For a variety of reasons that were explained in Section 2, IP-SENATE
   may have to co-exist with some IDMR protocol in the same ATM cloud.
   This implies that an IP-SENATE router may receive IP multicast
   datagrams both via an IDMR protocol and the cut-through connections
   on the same network interface. For the correct operation of IP-SENATE
   protocol, it is necessary to differentiate between these two cases.
   One way to do this is to use the protocol field of the IP datagram
   header. An IP-SENATE protocol should be assigned a special unique
   number. Each time an IP-SENATE router forwards a datagram over a
   cut-through connection, the original protocol number is extracted and
   appended to the end of the datagram.  The IP-SENATE protocol number
   is inserted into the protocol field and all other relevant fields of
   the IP datagram header (total length, header checksum, etc.) should
   be updated appropriately. Obviously, the reverse operations should be
   performed by the IP-SENATE routers on the other side of the cut-
   through connections. A more detailed description of this
   encapsulation technique is to be provided.

4.3 IMSS Architecture

   In Figure 5 the architecture of IMSS is summarized. IMSS does not
   change a MARS server's functionality. An IP-SENATE router interacts
   with the MARS server in order to carry out IP multicast transmission
   within the LIS. An LMS serves as a CONGRESS front-end to the IP-
   SENATE router.  An IP-SENATE router communicates with an LMS in order
   to handle the membership of the relevant D-groups. An LMS
   communicates with the GMS as was explained in Section 4.1. In the
   figure above an LMS is shown to run on the same machine as the IP-
   SENATE router.  This layout is most reliable since the LMS monitors
   the IP-SENATE router's liveness using IPC tools. It is possible,
   however, to run an LMS on a different machine.

    ===================================================================

                      --------------------------------
                      |                              |
                 |    |                              |
    ---------    |    | -------------        ------- |         -------
    | MARS  | <--|--> | | IP-SENATE | <----> | LMS | | <-----> | GMS |
    | Server|    |    | | Router    |        ------- |         -------
    ---------    |    | -------------                |
                      |                              |
              LIS     --------------------------------
              border

    Figure 5.

    ===================================================================



Anker, Breitgand et. al    Expires July 1998                   [Page 19]


Internet Draft       <draft-anker-congress-00.txt>           1 July 1997


5. CONGRESS Protocol

5.1 Data Structures

   In this subsection we summarize the main data structures used by both
   LMS and GMS types of CONGRESS servers.

   Each LMS maintains a Local Membership List. This list contains the
   D-group addresses that the multicast router local to the LMS had
   joined through CONGRESS.

   In order to avoid constant flooding of the network with excess
   messages, the GMSs maintain for each D-group G a distributed CONGRESS
   "group control tree", T(G), that is a sub-tree of the CONGRESS
   hierarchy tree. Vertices of T(G) are LMSs and GMSs (where LMSs are
   the leafs of T(G)) that have the members of G in their respective
   domains. All CONGRESS protocol messages concerning G are confined to
   T(G).


   Each GMS maintains only a local part of T(G) for each D-group G in a
   vector GT(G). GT(G) holds an entry for each neighbour (i.e., parent,
   sibling or child) of the GMS in T(G). A value of an entry in this
   vector can be either `resolve', or `all'. In case of `resolve', only
   `resolve' requests are forwarded to the corresponding neighbour
   (because no member of G in its domain have set the on-line flag). A
   value of `all' means that all CONGRESS protocol messages concerning G
   should be forwarded to that neighbour.

   When a GMS first creates a vector for a group, all its entries are
   initialized to `all' for each of the GMS's neighbours.

   Each GMS also keeps track of the liveliness of its neighbours through
   updates supplied by its fault-detector module.

5.2 IMSS Router Joining/Leaving a D-group

   When an IMSS router wishes to join a D-group G, it issues a `join'
   request to its LMS, L, using some local IPC mechanism. Next, L
   informs its GMS about the new member of G by forwarding it a `join'
   message.

   The `join' message must be propagated to all members of G that have
   requested incremental membership notifications. As will be explained
   later, a multicast router that acts as a client of a multicast
   server, does not require constant reception of incremental membership
   notifications.




Anker, Breitgand et. al    Expires July 1998                   [Page 20]


Internet Draft       <draft-anker-congress-00.txt>           1 July 1997


   When a `join' message travels in the CONGRESS hierarchy, GMSs can
   learn about the new member of G and update their GT(G) accordingly in
   order to ensure the correct operation of the future `resolve'
   operations.

   If a GMS receives the `join' notification from one of its children C,
   and GT(G) does not exist (i.e., the new member of G is the first one
   in the GMS's domain), then the GMS initializes it, and forwards this
   message to all its live siblings and the parent.

   If GT(G) exists, the GMS sets GT(G,C) to `all' and forwards the
   notification to all its live siblings and the parent that have `all'
   in their corresponding entries of GT(G). As a special case, upon the
   reception of the join notification directly from an LMS, a GMS
   forwards it also to all of its children (i.e. LMSs) that are alive
   and have `all' in their corresponding entries of GT(G).

   If a `join' notification w.r.t. G was received by a GMS from its
   parent or a sibling, X, and GT(G) does not exist, the notification is
   ignored. Otherwise, the entry GT(G,X) is set to `all' and the GMS
   forwards the notification to all its live children that have `all' in
   the corresponding entries of GT(G).

   Upon the reception of the notification about a new router joining G
   from its GMS, an LMS delivers a corresponding incremental membership
   notifications to the local IMSS router.

   In order to maintain a T(G) accurately, GMSs should prune all their
   neighbours that do not have members of G in their respective domains,
   from their GT(G) entries. This will allow to keep the message
   overhead linear in the size of G. Immediately after the new router
   register in a new D-group , the local LMS issues a `resolve' request
   w.r.t. G, on its behalf. This request is handled as described in
   Subsection 5.4. The CONGRESS servers that reply with an empty lists
   of members (routers) are removed from GT(G) by the GMSs throughout
   the hierarchy. Note that if a parent GMS reply with an empty list to
   its child (in the CONGRESS hierarchy), the child does not remove the
   corresponding entry of the parent from its GT(G).

   An IMSS router leaves a D-group through issuing a `leave' request to
   its LMS. The propagation of the `leave' notification corresponding to
   this request is exactly the same as that of the `join' notification
   described above. In addition, if a GMS S discovers that there are no
   more members of a group G in its domain, it deletes the GT(G) vector
   from its GT. After that, S informs all its neighbours to which it
   forwarded the corresponding `leave' notification that they should
   remove GT(G, S) entry from their GTs (The set of these neighbours
   does not include neighbouring LMSs). Note that an LMS knows that a



Anker, Breitgand et. al    Expires July 1998                   [Page 21]


Internet Draft       <draft-anker-congress-00.txt>           1 July 1997


   group should be deleted by directly monitoring the membership of its
   local IMSS router. A GMS knows that a group should be deleted when
   all of its children have reported that there are no more members of a
   group G in its domain.

5.3 Reception of Incremental Membership Notifications

   Whenever an IMSS router wishes to start or stop receiving incremental
   membership notifications w.r.t. a D-group G of which it is a member,
   all the GMSs that have members of G in their domain must know this.
   This is necessary for accurate propagation of future membership
   changes of G occurring in their domain. However, a notification of
   this request is not necessary to be received by GMSs if the
   requesting router is not the first inside (or outside) the GMS's
   domain to request incremental membership notifications. The same is
   true if a router is the last inside (or outside) their domain
   requesting to stop receiving incremental membership notifications. An
   IMSS router may wish to stop the reception of the incremental
   membership notifications if it decides to operate in a `client' role,
   as will be explained in Section 6.2.3.

   Let G' be the set of members of G that requested to receive
   incremental membership notifications. When an IMSS router R desires
   to receive incremental membership notifications w.r.t. a D-group G,
   it issues a `set_flag' request with the `online_flag' parameter set
   to TRUE to its LMS. The LMS forwards the `set_flag' request message m
   to its GMS. Similarly, when R desires to stop receiving incremental
   membership notifications, it issues a `set_flag' request with the
   `online_flag' parameter set to FALSE to its LMS. When a GMS receives
   m from a neighbour, it sets the entry of this neighbour in GT(G) to
   `all' if `online_flag' is TRUE, and to `resolve' otherwise. If R is
   the first member of G' in the GMSs domain or G' has no more members
   in this domain, m is forwarded to all the siblings that are listed in
   GT(G), and to the parent. If R is the first member of G' outside the
   GMSs domain or G' has no more members outside this domain, then m is
   forwarded to all the children of the GMS that are listed in GT(G).

   It should be noted, that each CONGRESS server always marks its parent
   as `all'.

5.4 Resolution of D-Group Address

   An IMSS router that is a member of a D-group G, can resolve G's name
   into a list of the live registered members by issuing an appropriate
   `resolve' request to its LMS. The LMS then generates an appropriate
   message m from it, and forwards m to its GMS.

   When a GMS receives m from one of its children, it forwards m to all



Anker, Breitgand et. al    Expires July 1998                   [Page 22]


Internet Draft       <draft-anker-congress-00.txt>           1 July 1997


   the live siblings and the parent that are listed in GT. As a special
   case, if m was received from an LMS, the GMS also forwards m to the
   live LMSs that are listed in GT(G). If m was received by the GMS from
   either its parent or a sibling, it forwards it to all the live
   children that are listed in GT(G). The GMS then collects the
   responses to m until all the neighbours have responded or became
   disconnected. Then the GMS sends the aggregated response to the
   neighbour from which the request was received.

   When an LMS receives a `resolve reply' message m w.r.t. G, it
   responds with the the address of the local router. If the local
   router is not a member of G the LMS responds with an `empty' message.

   This way, the `resolve' request is propagated to the relevant LMSs
   that are leaves of the T(G). The responses of these LMSs are
   aggregated by the GMSs, the intermediate nodes of T(G). The final
   response will be received by the LMS that originated the `resolve'
   request from its GMS and will be delivered to the requesting IMSS
   router.

5.5 Handling of Failures

   The CONGRESS handling of failures focuses on asynchronous host
   crash/recoveries, and communication links failures/recoveries. In
   order to handle these failures each CONGRESS server interacts with a
   local "fault detector" module that monitors the liveliness of this
   CONGRESS server's neighbours. All the messages that are sent/received
   by a CONGRESS server pass through the fault detector in the first
   place. Thus, a message received from a CONGRESS server is interpreted
   by the fault detector as the evidence of the sender's liveliness. If
   a server's neighbour was suspected by the fault detector of this
   server, and later a message from the presumably failed neighbour was
   received, the fault detector delivers the notification about the
   neighbour's liveliness before the delivery of its message.

5.5.1 IMSS Router Failure

   When an IMSS router fails, a local LMS discovers this using internal
   IPC mechanisms. This event is handled by the LMS as if the failed
   router had issued a `leave' request w.r.t. to all the D-groups that
   it was a member of.

5.5.2 Domain Failure

   When a CONGRESS server disconnects from the rest of the hierarchy due
   to a communication link failure or a host crash, this event is
   interpreted by its neighbours as if all the IMSS routers that reside
   in its domain have left their respective D-groups. Instead of sending



Anker, Breitgand et. al    Expires July 1998                   [Page 23]


Internet Draft       <draft-anker-congress-00.txt>           1 July 1997


   multiple `leave' notifications, each GMS that detects a failure of a
   neighbouring CONGRESS server, generates a `domain leave' notification
   message that contains the domain identifier of the failed domain and
   a list of all the D-groups that had members in this domain. The
   latter is obtained from the local GT table.  The `domain leave'
   notification is propagated and processed throughout the CONGRESS
   hierarchy in the same way as a `join'/`leave' notification. An IMSS
   router outside the failed domain can compute a new membership of a
   D-group from the `domain leave' notification by discarding all the
   IMSS routers that have the same address prefix as the failed domain
   identifier. Similarly, an IMSS router within the failed domain
   discards all the IMSS routers that have the address prefix different
   from that of the failed domain.

5.5.3 Domain Recovery

   A GMS and its respective domain are considered recovered whenever the
   GMS re-connects or re-starts execution. For each D-group G in the
   recovered domain group membership information must be updated
   throughout the re-merged T(G). A recovered GMS initializes its data
   structures from scratch as described in Section 5.1.

   When a GMS detects (through the fault detector) a recovery of one of
   its siblings in the CONGRESS hierarchy, it resolves all the D-groups
   that are present in its GT by issuing `resolve' requests to its
   children. The aggregated replies of these `resolve' requests are sent
   as ordinary `join' notifications to the recovered sibling.

   When a GMS detects a recovery of one of its children, it does not
   perform any actions except marking this server as alive. The
   necessary actions will be initiated by the recovered child as
   described below.

   When a CONGRESS server detects a recovery of its parent, it generates
   `resolve' requests to its children w.r.t. all the D-groups known to
   this CONGRESS server. The aggregated results are sent to the parent
   as special `join' notifications. These notifications are forwarded as
   ordinary `join' notifications, but are also marked with a special
   flag. When such a message w.r.t. a D-group G is received by some GMS
   from its sibling, the GMS resolves the membership of G within its
   domain and sends back the aggregated result as an ordinary `join'
   notification.

6. IP-SENATE Protocol

   In this section we provide a detailed description of the IP-SENATE
   protocol. For the sake of simplicity we divide the protocol into two
   parts: a) D-groups' formation and maintenance, b) datagram forwarding



Anker, Breitgand et. al    Expires July 1998                   [Page 24]


Internet Draft       <draft-anker-congress-00.txt>           1 July 1997


   decisions. The IP-SENATE routers are event-driven. This means that in
   the core of the program, there is a main event-dispatching loop, and
   when a certain event occurs, an appropriate event handler function is
   invoked. After an event has been processed, the control is returned
   to the main loop. It is important to stress, that the event handling
   is atomic, i.e, a pending event is not handled until the current
   event has been fully processed. For the sake of simplicity, we
   provide all the explanations for a single IP multicast group (i.e., a
   single class D address).

6.1 Main Data Structures

   This subsection depicts the main data structures used by the IP-
   SENATE routers.

      o RAV[G]: Each IP-SENATE router R maintains a Redundancy Avoidance
        Vector (RAV) for each D-group G with which R is involved. RAV[G]
        has an entry for each source (originator) of the IP multicast
        datagrams that were forwarded to R by other IP-SENATE routers
        (i.e., via short-cut connections).

        We define "remoteness" of an injector to be an estimation of the
        distance of a router from a datagram source. This estimation can
        be based, for instance, on the TTL value of the packet received
        from the source (the higher is the value of the TTL field, the
        closer is the injector to the source). Another method for
        measuring the remoteness is piggybacking the routing metrics
        derived from the routing tables of the injector on the packets
        forwarded over the short-cut connections. It should be noted
        that using TTL as a measure for remoteness may cause some
        problems, as will be explained later. We will use a function
        denoted as remoteness(m, R) where m is a datagram received by an
        IP-SENATE router R in order to calculate the remoteness of R
        from the source of m. We use regular mathematical notation to
        compare two remoteness values. The meaning of remoteness(m, R) <
        remoteness(m, R') is that R is closer to m's source than R'.

        The entry RAV[G][S] holds the name of the IP-SENATE router that
        has the minimal remoteness value w.r.t the source S and is
        forwarding datagrams from S to R through short-cut connections.
        The value of remoteness is kept in the same entry with the
        router's identifier.  The information kept in RAV[G] is temporal
        and is refreshed regularly, as will be explained later.

      o eif: expected network interface variable. This variable is
        concerned with the RPF techniques that are used by the IDMR
        protocols in order to break routing loops that may occur in
        multicast distribution trees. When a multicast IP datagram



Anker, Breitgand et. al    Expires July 1998                   [Page 25]


Internet Draft       <draft-anker-congress-00.txt>           1 July 1997


        arrives to a multicast router, the router checks whether it
        received it from the "expected" network interface. The expected
        network interface for a multicast datagram originated at some
        source S, is the interface that would be used to forward unicast
        datagrams to S by this multicast router. If a multicast datagram
        arrived from an unexpected interface it is silently discarded,
        because it was not propagated over the optimal branch. Obviously
        for each IP multicast datagram originated at some source S, the
        value of this variable depends on the IDMR routing tables. It is
        important to understand that an actual implementation is not
        required to support eif explicitly.  This variable is used by us
        in order to simplify the presentation of the algorithms.

      o id: identifier of an IP-SENATE router. This is a structure
        containing the following fields:

         - physical address: an ATM address of the IP-SENATE router;

         - operational role: `client' or `server';

         - mode: `sender-only' or `regular'.

      o Membership[G]: group membership table. For each D-group
        of which an IP-SENATE router is a member, there is a row in this
        table. Each item in the row is an id structure, as explained
        above. These memberships are maintained through CONGRESS'
        incremental membership notifications.

6.2 Maintenance of D-groups

   In this subsection we explain in a more detailed manner how IP-SENATE
   routers build and manage D-groups.

6.2.1 Joining D-Groups

   The code below deals with handling of four kinds of events that cause
   an IP-SENATE router to join a D-group.

      C1. explicitly requested join:

         C1.1

         An IP-SENATE router R finds out (e.g, through processing of
         IGMP "join_group" request or "MARS_JOIN" request) that there
         exists some destination within its LIS, that needs to receive
         IP multicast datagrams that are sent to some IP class D
         address.




Anker, Breitgand et. al    Expires July 1998                   [Page 26]


Internet Draft       <draft-anker-congress-00.txt>           1 July 1997


         C1.2

         An IP-SENATE router R learns via some mechanism (e.g, via some
         control messages) that there exist downstream multicast routers
         that depend on it for receiving multicast datagrams for some
         group.

      C2. traffic-driven join:

         C2.1

         An IP-SENATE router R receives an IP multicast datagram via
         some IDMR propagation tree from some neighbouring multicast
         router.

         C2.2

         An IP-SENATE router R receives an IP multicast datagram from
         some directly attached host.

   In cases C2.1 and C2.2 an IP-SENATE router should decide whether it
   will forward a multicast datagram further. Moreover, if it decides to
   forward, it should also decide which protocol it will use, i.e, via
   IP-SENATE cut-through connections or via some IDMR multicast
   distribution tree. The IP-SENATE approach is to use cut-through
   wherever possible. In order to open the cut-through connections to
   all other relevant IP-SENATE routers, an IP-SENATE router joins an
   appropriate D-group.

   As was explained in Section 4.2, an IP-SENATE router may join a D-
   group assuming either a server or a client operational role. The
   operational role of an IP-SENATE router is indicated by its
   identifier. Further explanations about the operational roles are
   provided in Subsection 6.2.3.

   If an IP-SENATE router joins a D-group as a sender-only, it schedules
   a timer-related event handler that will terminate the membership of
   this router in the D-group, if no directly attached host emits
   multicast datagrams for a sufficiently long time. This timer will be
   referred later, as a D-timer.

   --------------------------------------------------------

   if R is a member of G /* go to forwarding decisions */
      go to the table of forwarding decisions (Figure 6);

   else
      if case C1.1 or case C1.2 or case C2.1



Anker, Breitgand et. al    Expires July 1998                   [Page 27]


Internet Draft       <draft-anker-congress-00.txt>           1 July 1997


         decide on the operational role according to local conditions;
         id = {R, role, regular};
         join(G, id, ...);    /* Join via CONGRESS  */
      else /* case C2.2 */
         decide on role according to local conditions;
         id = {R, role, sender-only};
         join(G, id, ...);    /* Join via CONGRESS  */
         Reset D-timer;

   go to the table of forwarding decisions (Figure 6);

   --------------------------------------------------------

   Note that if downstream routers participate in a "broadcast &
   prune"-based IDMR protocol, case C1.2 is problematic, since no
   explicit information about these routers is available. This is a
   generic problem that does not pertain to cut-through routing only.
   The same problem arises when any "broadcast & prune"- based routing
   protocol works in conjunction with a protocol based on "explicit
   join" messages. As an example consider PIM [4,5] and DVMRP [10]
   interoperability issues [15]. Another work in progress that attempts
   to classify the inter-operability issues that arise from deployment
   of various IDMR protocols, is given in [16].

   In the IP-SENATE approach we solve this problem as follows.

   Since we allow IP-SENATE to coexist with some other IDMR protocols
   (see Section 4.2) on the same NIC, an IP-SENATE router may
   periodically propagate datagrams using both an IDMR protocol and
   cut-through connections. This way a multicast propagation tree of an
   IDMR protocol will be preserved, and all IP-SENATE routers that are
   also nodes in some IDMR propagation tree (see case C2.1) will join
   the relevant D-group. As will be explained in the following
   subsection, an IP-SENATE router leaves this D-group when it receives
   "prune" messages from all of its neighbouring downstream multicast
   routers and no directly attached hosts desire to receive multicast
   traffic for this class D address.



6.2.2 Leaving D-Groups

   This subsection depicts the part of an IP-SENATE router's algorithm
   that deals with leaving of the D-groups

   Generally, an IP-SENATE router may leave a D-group corresponding to
   some class D IP address, when this router has neither directly
   attached hosts, nor downstream routers that need to receive the IP



Anker, Breitgand et. al    Expires July 1998                   [Page 28]


Internet Draft       <draft-anker-congress-00.txt>           1 July 1997


   multicast traffic pertaining to the multicast IP address, or need to
   send datagrams to it. This happens when

      o all directly attached hosts performed IGMP/MARS leave, and

      o all neighboring multicast routers (of attached networks),
        running some IDMR protocol, have sent `prune' or `leave'
        messages (depending on the IDMR protocol) for this group, or

      o the router is a `sender-only' member, and its D-timer for this
        group had expired.

6.2.3 Client and Server Operational Roles

   An IP-SENATE router locally decides whether it will assume a client
   or a server role upon joining the relevant D-group. The decision
   depends on a number of connections that are already supported by the
   IP-SENATE router's NIC and the number of additional connections that
   need to be supported, if the router decides to assume a specific
   operational role.

   When an IP-SENATE router joins a D-group, assuming the client
   operational role, it expects that some server will take care of it.
   If no server takes care of this client for a certain period of time,
   this client starts using an IDMR protocol for the forwarding of IP
   multicast traffic. The IP-SENATE routers that act as servers, learn
   through the CONGRESS' incremental membership notifications about the
   new client. Based on the load of the server's NICs and CPU, physical
   distance, administrative policies etc., each server locally decides
   whether to take care of the new client. If a server decides to serve
   a client, it tries to open a native ATM VC to this client (or to add
   this client as a leaf to an already opened ptmpt connection). If the
   client has already accepted some other server's connection set-up
   request, it may either refuse to accept the new connection, or tear
   down the previous connection and to switch to the new one. In both
   cases this is a local decision of the client.

   In case of some server's failure, all its clients should re-join the
   relevant D-group. This will once again trigger the procedure
   described above.

   It should be noted that the operational roles are not fixed "once and
   for all". Depending on the size of a D-group and the local NIC and
   CPU load, an IP-SENATE router may desire to change its operational
   role.  In order to do this, an IP-SENATE router should simply leave
   its D-group and then re-join it with the appropriately updated
   identifier that indicates its new operational role (see Section 6.1).




Anker, Breitgand et. al    Expires July 1998                   [Page 29]


Internet Draft       <draft-anker-congress-00.txt>           1 July 1997


6.2.4 Regular and Sender-Only Modes

   An IP-SENATE router may operate either in `regular' or `sender-only'
   mode, as was explained in Section 4.2.  An IP-SENATE router may wish
   to change its mode from sender-only to regular if it learns about
   some downstream host or router that needs to receive the multicast
   traffic pertaining to a specific class D address.  In order to
   perform this transition, an IP-SENATE router should leave the
   relevant D-group and re-join it with the updated identifier
   indicating that it is acting in the regular mode.

   Note, that actually there is no need for the transition in the
   opposite direction, i.e, from a regular to a sender-only mode.
   Indeed, if an IP-SENATE router does not have any downstream hosts or
   routers that desire to receive multicast traffic, this IP-SENATE
   router will simply leave the relevant D-group (see Subsection 6.2.2).
   If there exist some down-stream senders, this IP-SENATE router will
   re-join the group on-demand later, as was explained in Subsection
   6.2.1.

6.3 Forwarding Decisions

   This subsection depicts the forwarding algorithm executed by the IP-
   SENATE routers. Due to the assumed heterogeneous network model, there
   are multiple cases that should be handled carefully. By using
   CONGRESS membership services and the encapsulation/decapsulation
   technique described in Section 4.2, an IP-SENATE router can
   differentiate between the multicast traffic that it receives from
   another IP-SENATE routers via the cut-through connections and traffic
   received via an IDMR propagation tree. An IP-SENATE server decides
   how to forward an incoming multicast packet according to the identity
   and operational role of the sending router and according to its own
   operational role. For each possible pair of sender and receiver, the
   table in Figure 6 provides a pointer to the subsection that describes
   the relevant part of the pseudo-code. The short parts of the pseudo-
   code are shown directly in the table.















Anker, Breitgand et. al    Expires July 1998                   [Page 30]


Internet Draft       <draft-anker-congress-00.txt>           1 July 1997


   ============================================================

     -------------------------------------------------------
     |   \ Sender| Multicast      | IP-SENATE  | IP-SENATE |
     |    \      | Router (via    |            |           |
     |     \     | IDMR protocol) |   CLIENT   |   SERVER  |
     |      \    | or a directly  |            |           |
     |       \   | attached host  |            |           |
     |        \  |                |            |           |
     |Receiver \ |                |            |           |
     |-----------------------------------------------------|
     |           |                |            | Forward m |
     |           |                |            |   using   |
     |IP-SENATE  |    6.3.3       |    X       |   IDMR    |
     |           |                |            | protocol. |
     | Client    |                |            |           |
     |           |                |            |           |
     |-----------------------------------------------------|
     |IP-SENATE  |                |            |           |
     |           |    6.3.4       |   6.3.1    |   6.3.2   |
     | Server    |                |            |           |
     |           |                |            |           |
     -------------------------------------------------------

   Figure 6.
   ============================================================

   For the sake of simplicity and shorter representation, we assume that
   the involved IP-SENATE  routers have already joined the relevant D-
   groups, according to the algorithm explained in Subsection 6.2.1.

   In all of the following cases we depict the steps taken by an IP-
   SENATE router R, upon a reception of an IP multicast datagram m
   originated at some source S and targeted to some multicast group G.


6.3.1 A Server Receives a Datagram from a Client

   An IP-SENATE router acting as a server, is responsible for the
   propagation of the multicast traffic that it receives from its
   clients, to all the relevant multicast routers and directly attached
   hosts.

   In order to avoid undesired duplication of IP multicast datagrams, an
   IP-SENATE router should check whether some other IP-SENATE router(s)
   might propagate the IP multicast datagrams originating at the same
   source S. This may happen when a multicast distribution tree of some
   IDMR protocol contains more than one egress router that connect the



Anker, Breitgand et. al    Expires July 1998                   [Page 31]


Internet Draft       <draft-anker-congress-00.txt>           1 July 1997


   branches of the propagation tree to the ATM cloud. Figure 2 provides
   a graphical representation of this scenario. In such a case, it is
   obviously preferable that only one of the egress routers, closest to
   the source, would transmit the datagrams.

   In cases such as described above, IP-SENATE routers belonging to the
   same D-group, can deterministically choose a router that will perform
   forwarding of IP multicast datagrams by using the CONGRESS membership
   services. This is done by inspecting RAV[G][S]. Initially RAV[G][S]
   is set to this server's identifier and the remoteness value is
   derived from either the server's routing table or from the TTL field
   of the datagram seen seen by this router. With the passage of time,
   however, the server may find out that other servers are forwarding to
   it datagrams (over shortcut connections) originated at the same
   source, and that these routers are located closer to the source than
   itself (as seen from the piggybacked remoteness value). In this case
   RAV[G][S] is set to the name and remoteness of the router that is
   closest to the source S.  This server (router) will be a designated
   injector for the datagrams originated at S and targeted to G.
   Obviously, when a router receives a datagram from source S over a
   non-shortcut connection, it may update its RAV[G][S] if its own
   remoteness value is better than that of the current injector.  If
   this is the case, the router becomes a new designated injector.

   Since we assume an asynchronous network model, it is possible that at
   some point multiple IP-SENATE routers belonging to the same D-group,
   will consider themselves as the ones that must forward datagrams. As
   time passes, however, the IP-SENATE routers will learn about this
   redundancy, because it will be reflected by RAV[G]. In the following
   subsection more details about RAV maintenance are provided.

   In Section 6.1, two examples of measuring a remoteness were provided.
   It should be noted that TTL is not always a reliable measure since a
   source may change its value arbitrarily. In this case, due to the
   asynchronous nature of the network, oscillations between multiple
   injectors may occur.  Since source initiated changes of TTL may occur
   considerably more often than changes of the network topology, these
   oscillations may present a serious problem.

   The information kept in RAV[G] is temporal. Each time an IP-SENATE
   router enters information into a row S of RAV[G], it resets a timer
   associated with the source S. We refer to this timer as S-timer. If
   no traffic from S is encountered during the time window defined by
   the S-timer, the IP-SENATE router discards the row in RAV[G]
   associated with S.

   When RAV[G] becomes empty, the IP-SENATE router starts another timer,
   called G-timer. In case no multicast traffic is encountered within G



Anker, Breitgand et. al    Expires July 1998                   [Page 32]


Internet Draft       <draft-anker-congress-00.txt>           1 July 1997


   during the G-timer, an IP-SENATE router tears down the cut-through
   connections within the corresponding D-group. These cut-through
   connections may be resumed on-demand later.

   --------------------------------------------------------

   if exists an entry RAV[G][m.S]
      if remoteness(m, R) <=  RAV[G][m.S]
           /* The closest router to S is responsible for the
            * cut-through propagation, so that R is the injector
            */

           update RAV[G][m.S] to hold R and remoteness(m, R);
           update eif to be the correct one;
           forward m using IDMR protocol;
           forward m to all other servers that are members of G
                   that act in regular mode (directly);
           forward m to all clients that are members of G that act
                   in regular mode excluding the sender (directly);
      else
           /* m will be sent by the router nearest to source. */
           discard m;

   else /* The source of the datagram is not in the RAV[G] yet */
      Create a new entry for m.S in RAV[G];
      update RAV[G][m.S] to hold R and remoteness(m, R);

      forward m using IDMR protocol;
      forward m to all other servers that are members of G
              that act in regular mode (directly);
      forward m to all clients that are members of G that act in
              regular mode excluding the sender (directly);

   --------------------------------------------------------

6.3.2 A Server R Receives a Datagram from another Server R'

   If a server receives multicast traffic from another server belonging
   to the same D-group, the sending server believes that it is the one
   closest to the source (i.e. it receives packets from the source with
   the lower remoteness vault than all the other IP_SENATE routers).
   Otherwise it would not have been sending the datagrams. If the entry
   for the sending server in the RAV[G][S] is empty (e.g. because RAV
   was refreshed) the receiving server should insert the remoteness
   value of the received packet of the sending server into the
   corresponding entry in RAV[G][S]. Note that this operation may change
   the local notion of the IP-SENATE router with the lowest remoteness
   value, at the receiving IP-SENATE router.



Anker, Breitgand et. al    Expires July 1998                   [Page 33]


Internet Draft       <draft-anker-congress-00.txt>           1 July 1997


   An IP-SENATE router acting as a server, is responsible for the
   propagation of the IP multicast traffic to all its clients belonging
   to the same D-group and to all the relevant IDMR interfaces. The
   latter case should be treated especially carefully because IDMR
   routers use RPF mechanisms in order to break stable routing loops.
   When a multicast IP datagram arrives to an IDMR router, the router
   checks whether it received it from the "expected" network interface.
   An IDMR router expects to receive multicast datagrams originated at
   some source S, from the same network interface that this router would
   use in order to forward unicast datagrams to S. If a multicast
   datagram arrived from an unexpected interface, it is silently
   discarded, because it was not propagated over the optimal branch of
   the IDMR multicast propagation tree.

   As seen from the code below, an IP-SENATE router updates the variable
   eif to be as expected by the IDMR interface. Otherwise, the RPF
   mechanism may might erroneously discard datagrams that should not be
   discarded.

   Obviously, there is no need to forward the IP multicast datagram that
   came from an IP-SENATE router acting as a server to other servers
   belonging to the same D-group. These servers are supposed to be the
   leaves of the same ptmpt connection as the receiving server.

   --------------------------------------------------------

   update eif to be the expected interface;
   forward m using IDMR protocol;
   forward m to all clients that are members of G
           that act in regular mode;

   /* There is no need to forward to other servers, since
    * they are supposed to be handled by the same IP-SENATE
    * server that sent m.
    */

   If the entry RAV[G][m.S] does not exist
      Create a new entry for m.S in RAV[G];
      /* Since this is the first datagram originated at S that this
       * router (R) sees, it is assumed that the forwarder is the
       * designated injector for (S,G)
       */
      update RAV[G][m.S] to hold the identifier of the
             forwarder R' and remoteness(m, R');
      return;

   if (remoteness(m, R') < RAV[G][m.S])
      update RAV[G][m.S] to hold the identifier of the



Anker, Breitgand et. al    Expires July 1998                   [Page 34]


Internet Draft       <draft-anker-congress-00.txt>           1 July 1997


         datagram forwarder R' and remoteness(m, R');

   --------------------------------------------------------


6.3.3 A Client Receives a Datagram from an IDMR Interface

   When an IP-SENATE server acting as a client receives an IP multicast
   datagram from an IDMR interface, it should forward it to all other
   involved IDMR interfaces. In order to propagate the datagram to all
   the relevant IP-SENATE routers using short-cut, a client should
   forward the datagram to its server. The latter will forward it
   further according to the algorithm described in Subsection 6.3.1.

   As will be explained in Subsection 6.3.5, IP-SENATE routers that also
   participate in some "broadcast & prune"- based IDMR protocol, prune
   the redundant branches of an IDMR multicast propagation tree.

   --------------------------------------------------------

   forward m using IDMR protocol;

   forward m to Multicast_Server over a point-to-point SVC;

   --------------------------------------------------------


6.3.4 A Server Receives a Datagram from an IDMR Interface

   If an IP-SENATE router, acting as a server receives an IP multicast
   datagram via an IDMR multicast propagation tree, it is responsible to
   forward it to all the relevant non-IP-SENATE multicast routers and to
   the relevant clients. In case this IP-SENATE router is the designated
   injector for (S,G), it should also forward the multicast datagram to
   all the IP-SENATE routers acting as servers (over short-cut
   connections).

   --------------------------------------------------------

   if exists an entry RAV[G][m.S]
      if remoteness(m, R) <= RAV[G][m.S]
           /* The closest router to S is responsible for the
            * cut-through propagation, so that R is the injector
            */

           update RAV[G][m.S] to hold R and remoteness(m, R);
           forward m using IDMR protocol;
           forward m to all other servers that are members of G



Anker, Breitgand et. al    Expires July 1998                   [Page 35]


Internet Draft       <draft-anker-congress-00.txt>           1 July 1997


                   that act in regular mode (directly);
           forward m to all clients that are members of G
                   that act in regular mode (directly);
      else
           /* m was received or will be received from the
            * the IP-SENATE router nearest to source.
            */
           discard m;
   else
      Create a new entry for m.S in RAV[G];
      /* Since this is the first datagram originated at S that
       * this router (R) sees, it is assumed that R is the
       * designated injector for (S,G).
       */
      update RAV[G][m.S] to hold R and remoteness(m, R);
      forward m using IDMR protocol;
      forward m to all other servers that are members of G
              that act in regular mode (directly);
      forward m to all clients that are members of G
              that act in regular mode (directly);

   --------------------------------------------------------

6.3.5 Pruning Mechanism

   As mentioned earlier, IP-SENATE uses an IDMR mechanism along with
   short-cutting. An IP-SENATE router that must forward multicast
   traffic of a group G to directly attached hosts or to multicast
   routers, joins the relevant D-group upon reception of datagrams (or
   explicit join) from an IDMR interface. Consequently, shortcut
   connections will be formed between the members of the D-group. At
   this point the router may receive traffic both from shortcut
   connections and from the existing IDMR interface. In order to avoid
   this redundancy, the router prunes the upstream IDMR interface,
   hereafter accepting upstream traffic only from the shortcut
   connection.















Anker, Breitgand et. al    Expires July 1998                   [Page 36]


Internet Draft       <draft-anker-congress-00.txt>           1 July 1997


   ==================================================================

                                          S
                                         x \
                                        x   \
              On the left -            <     R1     On the right
              the cut-through         x       \     side - the IDMR
              connection from        x         ...  propagation tree
              S to R'               x           \   branch
                                   x             R
              Here, R' should     x             /
              send prune to      <             /
              R.                R'<_______<___<
                               /
                     _________R2________________
                    /                          \
                    |  A DVMRP routing domain  |
                    |                          |
                    |                          |
                    |                          |
                    \_______R''________________/
                            |
                            |
                            |
                    ------------------
                    |    ....        |
                    |                |
                    H                H

             a directly attached hosts that want to receive
             datagrams targeted to G

   "\" - An IDMR propagation
   "x" - the shortcut link

   Figure 7.

   ==================================================================

   Figure 7 depicts a scenario when a downstream multicast router
   requests prune in spite of having downstream routers and directly
   attached hosts that are dependent on it. Since IP-SENATE router R'
   receives the IP multicast traffic targeted to a group G both via a
   cut-through connection and an IDMR propagation tree, R' sends prune
   message to R. Note, however, that the rest of the IDMR multicast
   propagation tree located beneath the multicast router R' continues to
   function as usual. If all the downstream IDMR interfaces of an IP-
   SENATE router R have been pruned and the router has no directly



Anker, Breitgand et. al    Expires July 1998                   [Page 37]


Internet Draft       <draft-anker-congress-00.txt>           1 July 1997


   attached hosts who are registered in G or are senders in G (no D-
   Timer is set), R leaves the relevant D-group through CONGRESS.

   It should be noted that if the IDMR protocol that runs inside the ATM
   cloud is based on broadcast-and-prune model, e.g. DVMRP, then an
   extensive signalling overhead may be introduced by shortcutting. This
   is because a multicast propagation tree of DVMRP is reconstructed
   periodically by flooding of multicast traffic to all the routers
   residing inside the ATM cloud. At the beginning, all routers will
   join the relevant D-group in order to make themselves available for
   shortcut connections. Later a considerable part of these routers will
   leave this D-group since their respective downstream routers will
   send them prune messages. This way shortcut connections may be opened
   to routers that, in fact, do not need to receive multicast traffic at
   all.  These connections will be later teared down. Obviously, it is
   possible to introduce some optimizations that will try to minimize
   the signalling overhead, but, generally speaking, we believe that
   broadcast-and-prune IDMR protocols do not go well with shortcutting.


   In some cases, it may happen that a short-cut connection is
   mistakenly established from a downstream multicast router to the
   upstream multicast routers. Such short-cut connection would
   contradict the orientation of the IDMR propagation tree. If the
   upstream router would blindly prune its upstream IDMR branches just
   because it has a short-cut connection, it may destroy the
   connectivity of the IDMR propagation tree. In order to avoid such
   situations, an IP-SENATE router requests pruning of its upstream IDMR
   interfaces only if the remoteness value of a datagram received over
   the short-cut connection is lower than that of the datagram received
   over an IDMR tree.

   As was explained in Sections 6.3.1 and 6.3.4, a downstream router's
   cut-through connection would be suppressed by some other IP-SENATE
   router that is located closer to the source in terms of remoteness.

7. Fault Tolerance

   Currently each GMS is a single point of failure in its domain, i.e.,
   when a GMS fails, its domain is disconnected from the rest of the
   CONGRESS hierarchy. Note that this situation resembles a single DNS
   failure in its domain. The use of a distributed GMS server comprised
   of a primary and backup servers acting as a single logical entity can
   make the CONGRESS protocol more robust. Another way to increase the
   robustness is to elect a new GMS from the lower level in the CONGRESS
   hierarchy to take over the failed server's responsibilities.

   This subject is for further study.



Anker, Breitgand et. al    Expires July 1998                   [Page 38]


Internet Draft       <draft-anker-congress-00.txt>           1 July 1997


8. Security Considerations

   Security issues are not discussed in this document.


9. Message Formats

9.1 CONGRESS Messages

   To be supplied.

9.2 IP-SENATE Messages

   To be supplied.

10. References

      [1]  Fenner, W., "Internet Group Management Protocol,
           Version 2", Internet Draft, September 1995

      [2]  G. Armitage, "Support for Multicast over UNI
           3.0/3.1 based ATM Networks.", RFC2022, November
           1996.
      [3]  G. Armitage, VENUS - Very Extensive Non-Unicast Service.
           Internet Draft, June 1997.
           draft-armitage-ion-venus-03.txt

      [4]  Estrin, D, et. al., "Protocol Independent Multicast
           Sparse Mode (PIM-SM): Protocol Specification". Internet Draft
           draft-ietf-idmr-PIM-SM-spec-09.ps, October, 1996.

      [5]  Estrin, D, et. al., "Protocol Independent Multicast
           Dense Mode (PIM-DM): Protocol Specification". Internet Draft
           draft-ietf-idmr-PIM-DM-spec-04.ps, September, 1996.

      [6]  ATM Forum, "ATM User-Network Interface Specification Version
           3.1", 1994.

      [7]  ATM Forum, "ATM User-Network Interface Specification Version
           4.0", 1996.

      [8]  Laubach, M., "Classical IP and ARP over ATM", RFC 1577,
           Hewlett-Packard Laboratories, December 1993.

      [9]  A. Ballardie. Core Based Tree (CBT) Multicast Architecture.
           Internet Draft, 1997.
           draft-ietf-idmr-cbt-spec-10.txt




Anker, Breitgand et. al    Expires July 1998                   [Page 39]


Internet Draft       <draft-anker-congress-00.txt>           1 July 1997


      [10] T. Pusateri. Distance vector multicast routing protocol.
           Internet Draft, September 1996.
           draft-ietf-idmr-dvmrp-v3-03.[txt,ps].

      [11] J. Moy.  Multicast extensions to OSPF.
           RFC1584, July 1993.

      [12] M. Smirnov. EARTH - EAsy IP multicast Routing
           THrough ATM clouds. Internet Draft, 1997.
           draft-smirnov-ion-earth-02.txt

      [13] Yakov Rekhter and Dilip Kandlur.
           "Local/Remote" Forwarding Decision in Switched Data Link
           Subnetworks, RFC 1937.

      [14] T. Anker and  D. Breitgand and D. Dolev and Z. Levy.
           CONGRESS: CONnection-oriented Group-address RESolution
           Service. The Hebrew University, Jerusalem Israel.
           Technical Report CS96-23, December 1996.
           http://www.cs.huji.ac.il/labs/transis/transis.html

      [15]  Deborah Estrin and Ahmed Helmy and David Thaler.
            PIM Multicast Border Router (PMBR) specification
            for connecting  PIM-SM domains to a DVMRP Backbone.
            Internet Draft, February 1997.
            draft-ietf-mboned-pmbr-spec-00.txt

      [16] D. Thaler. Interoperability Rules for Multicast
           Routing Protocols. Internet Draft May 1996.
           draft-ietf-mboned-imrp-some-issues-02.txt

      [17] G. Armitage, A Distributed MARS Protocol.
           Internet Draft, January 1997.
           draft-armitage-ion-distmars-spec-00.txt

      [18] G. Armitage, Issues affecting MARS Cluster Size.
           RFC 2121, March 1997,

      [19] S. Deering. Host Extensions for IP Multicasting.
           RFC 1112, August 1989.

      [20] C. Semeria. Introduction to IP Multicast Routing.
           Internet Draft, January 1997
           draft-ietf-mboned-intro-multicast-00.txt

      [21] J. Luciani, et al. NBMA Next Hop Resolution Protocol (NHRP).
           Internet Draft, February 1997.
           draft-ietf-rolc-nhrp-11.txt



Anker, Breitgand et. al    Expires July 1998                   [Page 40]


Internet Draft       <draft-anker-congress-00.txt>           1 July 1997


11. Acknowledgments

   We would like to thank Prof. Israel Cidon from the Technion
   Institute, Israel. We also thank Yoav Kluger and Benny Rodrig from
   Madge Networks (Israel), for their helpful comments and their
   precious time.


12. List of Abbreviations

   o IMSS      - IP Multicast Shortcut Service
   o CONGRESS  - CONnection-oriented Group address RESolution Service
   o IP-SENATE - IP multicast SErvice for Non-broadcast Access
                 Networking TEchnology
   o LMS       - Local Membership Server
   o GMS       - Global Membership Server
   o MCS       - Multicast Server


































Anker, Breitgand et. al    Expires July 1998                   [Page 41]


Internet Draft       <draft-anker-congress-00.txt>           1 July 1997


Authors' Addresses

   Tal Anker
   The Hebrew University of Jerusalem
   Computer Science Dept
   Givat-Ram, Jerusalem
   Israel, 91904

   Phone: (972) 6585706

   EMail: anker@cs.huji.ac.il


   David Breitgand
   The Hebrew University of Jerusalem
   Computer Science Dept
   Givat-Ram, Jerusalem
   Israel, 91904

   Phone: (972) 6585706

   EMail: davb@cs.huji.ac.il


   Danny Dolev
   The Hebrew University of Jerusalem
   Computer Science Dept
   Givat-Ram, Jerusalem
   Israel, 91904

   Phone: (972) 6584116

   EMail: dolev@cs.huji.ac.il


   Zohar Levy
   The Hebrew University of Jerusalem
   Computer Science Dept
   Givat-Ram, Jerusalem
   Israel, 91904

   Phone: (972) 6585706

   EMail: zohar@cs.huji.ac.il







Anker, Breitgand et. al    Expires July 1998                   [Page 42]


Internet Draft       <draft-anker-congress-00.txt>           1 July 1997





















































Anker, Breitgand et. al    Expires July 1998                   [Page 43]