J. Bion
Internet Draft                                            Cisco Systems
Document: draft-shand-erm-00.txt                           D. Farinacci
Category: Informational                                Procket Networks
                                                               M. Shand
                                                          Cisco Systems
                                                             A. Tweedly
                                                          Cisco Systems
                                                              June 2000



                     Explicit Route Multicast (ERM)


Status of this Memo


   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC2026 [1].

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups. Note that
   other groups may also distribute working documents as Internet-
   Drafts. Internet-Drafts are draft documents valid for a maximum of
   six months and may be updated, replaced, or obsoleted by other
   documents at any time. It is inappropriate to use Internet- Drafts
   as reference material or to cite them other than as "work in
   progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

1. Abstract

   Conventional multicast builds a tree structure from the source of a
   multicast stream to the destinations of that stream. Multicast
   packets are forwarded from the sources down the tree by each
   intervening router. Each router contains state information to
   determine the next hop forwarding destination(s) for a packet. Where
   there are a large number of groups, this gives rise to a scaling
   problem. The amount of state to be stored in the routers becomes
   impossibly large.

   It is attractive to consider the use of multicast to provide
   applications such as n-way voice and video conferencing, which would
   require a very large number (perhaps millions) of relatively small
   (between 3 and 10 members) multicast groups.



Shand, et al               Expires Dec 2000                   [Page 1]


                       Explicit Route Multicast              June 2000


   This proposal replaces the need for state in the intervening routers
   with a description of the delivery tree contained in the packet
   itself. This allows a multicast service to be deployed for very
   large numbers of small groups. The trade off is that the packet size
   is increased proportionate to the size of the delivery tree, which
   imposes an effective upper limit on the size of the group.

   The encapsulation with the delivery tree, and the subsequent
   decapsulation are performed by the ingress and egress routers. The
   process is transparent to the hosts, which continue to operate
   conventional IP multicast data protocols.

2. Conventions used in this document

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in
   this document are to be interpreted as described in RFC-2119 [2].

3. Table of Contents

1. Abstract...........................................................1

2. Conventions used in this document..................................2

3. Table of Contents..................................................2

4. Overview...........................................................5

5. Design Constraints.................................................6

6. Packet encapsulation and forwarding................................8

 6.1 ERM header format ...............................................8

  6.1.1 Tree list format .............................................9

 6.2 Forwarding .....................................................11

  6.2.1 Interpreting the tree list ..................................11

 6.3 Building the multicast tree ....................................12

  6.3.1 Changing routes .............................................13

  6.3.2 Loops .......................................................13

  6.3.3 Removing non-branching nodes ................................14

  6.3.4 Building the packet headers .................................14

  6.3.5 Removing a destination ......................................14


Shand, et al.              Expires Dec 2000                   [Page 2]


                       Explicit Route Multicast              June 2000


  6.3.6 Memory scaling issues in the encapsulating router ...........16

 6.4 Detecting and recovering from failures .........................16

  6.4.1 Unicast routing topology changes ............................16

  6.4.2 Intermediate ERM router failures ............................16

  6.4.3 Destination ERM router failures .............................18

  6.4.4 Source ERM router failures ..................................19

7. Route discovery...................................................19

 7.1 Trace Packets ..................................................19

  7.1.1 Choosing a unique IP address ................................20

  7.1.2 Acknowledgement of trace packets ............................20

  7.1.3 Becoming the source ERM router ..............................21

  7.1.4 Ceasing to be the source ERM router .........................21

  7.1.5 Incongruent unicast and multicast topologies ................22

 7.2 Trace packets ..................................................22

 7.3 Locating the source ERM router(s) ..............................23

  7.3.1 Remote source detection. ....................................24

 7.4 Changing the source ERM router .................................25

 7.5 Failures in the multicast delivery to the source ERM router etc.25

 7.6 Summary of timers ..............................................26

  7.6.1 Periodic timer  t1 ..........................................27

  7.6.2 Error recovery timer  t2 ....................................27

  7.6.3 Trace Repeat Interval .......................................27

8. Multicast capable subnetworks.....................................27

 8.1 Modified trace packets .........................................27

 8.2 Modified tree building algorithm ...............................28

 8.3 An example .....................................................29


Shand, et al.              Expires Dec 2000                   [Page 3]


                       Explicit Route Multicast              June 2000


 8.4 Modified forwarding algorithm ..................................29

  8.4.1 Setting the TTL .............................................30

9. Further Optimisations?............................................31

 9.1 Multiple groups with the same source ...........................31

 9.2 Minimising the source impact of trace packets ..................31

 9.3 Multiple groups with the same source ERM router, but different
 sources ............................................................32

 9.4 Minimal Encapsulation ..........................................32

  9.4.1 Benefits ....................................................34

  9.4.2 Costs. ......................................................34

  9.4.3 Problems with Fragmentation. ................................34

 9.5 Final hop optimisation .........................................35

10. Limiting the size of a group.....................................35

 10.1 Limiting the number of members ................................36

 10.2 Limiting the number of destination routers ....................36

  10.2.1 Multiple source ERM routers ................................36

  10.2.2 Topology changes with a single source ERM router ...........37

  10.2.3 Failure of a destination router ............................37

 10.3 Effect of tree encoding length on MTU .........................37

11. Source Discovery.................................................38

12. Security Considerations..........................................38

13. References.......................................................38

14. Acknowledgments..................................................38

15. Intellectual Property Rights Notice..............................38

16. Author's Addresses...............................................39





Shand, et al.              Expires Dec 2000                   [Page 4]


                       Explicit Route Multicast              June 2000


4. Overview



                       D5
                       ^
                       |
                       R8-->R9-->D6
                       ^
                       |
              S-->R1-->R2--R4-->R5-->R7-->D4
                       |        |
                       v        v
                       R3-->D1  R6-->D3
                                |
                                v
                                D2



   In the diagram above, R1 is the ingress (or encapsulation) router,
   and R3, R6, R7, R8 and R9 are the egress (or destination) routers.
   The delivery tree to reach all the destinations (D1 to D6) is
   indicated by the arrows. When R1 receives a packet addressed to the
   multicast group from S, it encapsulates the multicast packet in a
   unicast packet addressed to the first hop router R2 and includes in
   the packet a header which describes the required delivery tree. Note
   that the delivery tree need only include the routers that are either
   branch points in the tree, or delivery points. An intervening
   router, such as R4 above, need not be included. On receiving such a
   unicast packet, a router inspects the header to determine the next
   hop routers and duplicates the packet, adjusting the unicast
   destination address of each packet to be the next hop IP address.

   R1 forwards to R2

   R2 forwards to R3, R5, and R8

   R5 forwards to R6 and R7

   R8 forwards to R9

   On reaching a destination router, the packet is decapsulated and the
   original multicast packet is forwarded to the final multicast
   destination(s) using normal multicast methods.

   The details of the packet encoding and forwarding are described in
   Section 6

   Only the encapsulating router (R1) is required to maintain state
   concerning the delivery tree. The remaining intervening routers
   merely forward based on the information in the header. The

Shand, et al.              Expires Dec 2000                   [Page 5]


                       Explicit Route Multicast              June 2000


   information to build the delivery tree is acquired by the
   encapsulating router, by each destination router sending a 'trace'
   packet (as discussed in section 7.1) towards the source. At it
   traverses the network it records the address of each ERM capable
   router traversed. So in the above example, R1 would receive the
   following packets.

   (D1) R3, R2, R1

   (D2, D3) R6, R5, R4, R2, R1

   (D4) R7, R5, R4, R2, R1

   (D5) R8, R2, R1

   (D6) R9, R8, R2, R1

   By combining this information (as described in section 6.3), it is a
   simple matter to build the delivery tree for inclusion in the
   packet, eliminating any redundant, non-branching nodes (such as R4
   above).

   In order to send the trace packets, the destination routers must
   know the source(s) of the group. Techniques to acquire this
   information are discussed in section 11.

   Destination routers handle recovery from failures by re-sending
   trace packets when no traffic for the group has arrived after some
   period. In the absence of genuine traffic the encapsulating router
   sends periodic heartbeat traffic to inhibit the trace packets from
   still connected nodes. Details of these mechanisms are described in
   section 6.4.

5. Design Constraints

   The following constraints were applied to the design of this
   protocol: -

   1. It must require zero state in the intermediate routers.

      a)  State in the 'ingress' and 'egress' routers is permitted.

      b)  Solutions that require short duration dynamic per-group state
          in intermediate routers during tree building may be
          considered, but solutions that require no state at all are
          far preferable.

   2. It must support multicast group of up to 10 members.

      a)  While ten group members is a minimum requirement, the actual
          constraints on a design concern the number of 'egress'


Shand, et al.              Expires Dec 2000                   [Page 6]


                       Explicit Route Multicast              June 2000


          delivery points, each of which may deliver to any number of
          group members.

   3. It must support at least 100,000 simultaneously active groups
      (source group pairs).

   4. It must require NO changes to hosts.

      a)  No changes in the operation of the network layer (and below)
          multicast functions are permitted.

      b)  Solutions relying on novel application functions (for example
          to discover sources) are permitted.

   5. Forwarding performance shall not be dramatically worse than
      conventional multicast.

   6. It shall not impact the forwarding performance for other unicast
      or multicast traffic.

   7. Bandwidth utilisation shall not be dramatically worse than
      conventional multicast.

      a)  This means that any encapsulation overhead shall be kept to a
          minimum.

      b)  The delivery tree shall be genuinely multicast. For example,
          simply unicasting a separate packet from the ingress router
          directly to each egress router would not be an acceptable
          general solution (except where the topology so dictates).

      c)  The additional 'control' traffic shall be minimal. In
          particular a good solution should scale to support a large
          number of 'simultaneous' group instantiations.

   8. Delivery to the group should survive topology changes in the
      network.

      a)  Unicast routing changes that leave the optimum delivery tree
          unchanged should have no effect.

      b)  Changes that make the delivery tree sub-optimal may be
          ignored, but solutions that allow re-optimisation of the
          delivery tree with minimal control overhead would be
          preferred.

      c)  Changes which would prevent delivery to some or all groups
          should be recovered from in a timely manner.

   9. Leaving group members (egress points) should be removed from the
      delivery tree in a timely manner.


Shand, et al.              Expires Dec 2000                   [Page 7]


                       Explicit Route Multicast              June 2000


      a)  Where groups leave gracefully, the tree should be quickly
          pruned back to prevent unnecessary delivery.

      b)  Failed or unreachable egress points should be detected in a
          timely manner, and pruned from the tree (if recovery is not
          feasible).

   10.       Join latency should be comparable to that of conventional
      multicast.

6. Packet encapsulation and forwarding

   Multicast packets are encapsulated in a standard unicast packet as
   follows: -

   o  Destination IP address = IP address of first hop router

   o  Source IP address = IP address of encapsulating router

   o  Protocol = ERM (a new protocol type value TBA)

   o  TTL = TTL from multicast packet (minus 1).

   o  ToS = copied from multicast packet

   o  The data portion of the unicast packet contains an ERM header as
      described below, followed by the original multicast packet.

6.1 ERM header format

   The ERM header contains the following fields

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |  ERM type     |  List Size    |   Offset      |      TTL      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |          Checksum             |   Tree list 1 |   Tree list 2 |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   Tree list n |  Pad 1        |   Pad 2       |   Pad 3       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                     Address list 1                            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                     Address list 2                            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                     Address list n                            |
   +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
   |  Multicast Data Packet . . .
   +-+-+-+-+-+-+-+-+-+-+-+-+




Shand, et al.              Expires Dec 2000                   [Page 8]


                       Explicit Route Multicast              June 2000


   o  ERM type -- Type of ERM packet 128 = (encapsulated multicast
      data). Note that the high order bit is set to indicate that the
      packet contains an ERM route and should be processed using ERM
      forwarding.

   o  List Size -- Number of addresses in the address list (1 byte).
      The offset to the start of the address list (a) is therefore
      ceiling((6 + n)/4) 32 bit words and hence the total length of the
      ERM header (i.e. the offset to the start of the encapsulated
      multicast data packet) is (a + n) .

   o  Offset -- The numerical offset of the receiving node's entry in
      the tree list (1 byte). This is initialised to 0 for delivery to
      the first hop router, since the first hop router's address does
      not appear in the list.

   o  TTL -- Normally indeterminate, but used when forwarding over a
      layer 2 multicast capable subnetwork. (see Section 8.4.1)

   o  Checksum -- Ones compliment checksum of ERM header excluding the
      first four bytes (to avoid the necessity to recalculate when the
      Offset or TTL fields are modified).

   o  Tree list -- The list describing the delivery tree (n bytes,
      where n is the number of entries in the tree list).

   o  Padding -- To start Address list on 4-byte boundary.

   o  Address list -- The list of IP addresses for the delivery tree
      (4n bytes)

6.1.1 Tree list format

   The tree list is encoded as a list of parents. Each entry in the
   list describes the entry number of that entry's parent. At any point
   in the tree, the set of nodes to which the packet should be
   replicated on the next hop can be identified as the set of nodes
   having the present node as their common parent. Thus the example
   tree above would be represented as

   0, 1, 2, 2, 4, 4, 2, 7

   where the address list is

   R1, R2, R3, R5, R6, R7, R8, R9

   An entry of zero indicates that this node's parent is the root of
   the tree.

   This is interpreted as follows



Shand, et al.              Expires Dec 2000                   [Page 9]


                       Explicit Route Multicast              June 2000


   Entry    Parent     Entry Address     Parent Address

   1        0          R1                (root)

   2        1          R2                R1

   3        2          R3                R2

   4        2          R5                R2

   5        4          R6                R5

   6        4          R7                R5

   7        2          R8                R2

   8        7          R9                R8

   In reality, it is not necessary to include information about R1 in
   the tree, since R1 is the encapsulation node. Similarly it is not
   necessary to include information about R2 in the tree, since the
   packet is unicast addressed to that node. Hence the information
   actually included in the packet would be just.

   0, 0, 2, 2, 0, 5

   with an address list of

   R3, R5, R6, R7, R8, R9

   This is interpreted as

   Entry    Parent     Entry Address     Parent Address

   1        0          R3                (root)

   2        0          R5                (root)

   3        2          R6                R5

   4        2          R7                R5

   5        0          R8                (root)

   6        5          R9                R8








Shand, et al.              Expires Dec 2000                  [Page 10]


                       Explicit Route Multicast              June 2000



6.2 Forwarding

   On receipt of an ERM encapsulated packet, an ERM router performs the
   following actions.

   1. Checks whether it is a delivery (egress) point for this multicast
      group, by examining the multicast destination address in the
      encapsulated multicast packet, and if so takes a decapsulated
      copy of the multicast packet, updates the TTL of that packet to
      be the TTL in the received ERM packet -1, and forwards the packet
      via normal multicast.

   2. Determines the next hop forwarding destination if any (see
      section 6.2.1 below) and forwards a copy of the entire
      encapsulated packet to each of those destinations with the
      following changes

   o  Set the destination IP address in the unicast header to the new
      destination

   o  Decrement the TTL in the unicast header

   o  Update the offset field of the ERM header

   o  Adjust the unicast header checksum accordingly

      Note that the source IP address in the unicast header is
      unchanged by the forwarding process. I.e. it remains set as the
      IP address of the encapsulating source ERM router.

6.2.1 Interpreting the tree list

   The children of node n are found by scanning the list looking for
   the value n. So in the example

   0, 0, 2, 2, 0, 5

   with an address list of

   R3, R5, R6, R7, R8, R9

   At the first hop node the offset has the value 0, so we look for
   that in the list and find entries 1 (corresponding to R3), 2
   (corresponding to R5) and 5 (corresponding to R8).

   At R3 the offset has the value 1, and since there are no entries
   with the value 1, we do not forward any further.

   At R5 the offset has the value 2, and we find entries 3
   (corresponding to node R6) and 4 (corresponding to node R7).


Shand, et al.              Expires Dec 2000                  [Page 11]


                       Explicit Route Multicast              June 2000


   At R6 the offset has the value 3, and there are no entries with
   value 3.

   At R7 the offset has the value 4, and there are no entries with
   value 4.

   At R8 the offset has the value 5, and we find entry 6 (corresponding
   to node R9).

   Note that while a generalized parent tree does not require any
   particular ordering, the forwarding algorithm can be improved
   slightly by requiring that it be in preorder form. This allows the
   search for the offset value to start at entry offset + 1 instead of
   having to scan the entire list.

6.3 Building the multicast tree

   Given a set of 'trace' lists such as those in the example above

   (D1) R3, R2, R1

   (D2,D3) R6, R5, R4, R2, R1

   (D4) R7, R5, R4, R2, R1

   (D5) R8, R2, R1

   (D6) R9, R8, R2, R1

   A parent tree can be constructed by processing each trace list in
   turn (in the order in which they arrived - see Section 6.3.1 below)
   and assigning sequential Ids to each unique router address
   encountered. The parent of each node can then be entered by taking
   the ID of the next router in the list. We don't need the address of
   the encapsulating router, so we enter zero for it's ID.

   So after processing the first trace we have

   2, 0

   With an address list of

   R3, R2

   After the second trace we have

   2, 0, 4, 5, 2

   With an address list of

   R3, R2, R6, R5, R4,


Shand, et al.              Expires Dec 2000                  [Page 12]


                       Explicit Route Multicast              June 2000


   And after all traces have been processed

   2, 0, 4, 5, 2, 4, 2, 7

   With an address list of

   R3, R2, R6, R5, R4, R7, R8, R9

   Note that this example is NOT in pre-order form. The actual tree-
   list in an ERM packet MUST be in pre-order form (see section 6.3.4).

   Because trace packets may be processed sequentially, it is a simple
   matter to accommodate a new receiver, merely by 'adding' its trace
   packet to the existing tree.

   To permit correct identification of non-branching and dead nodes
   (see below) it is necessary to record which nodes are terminators
   (i.e. R3, R6, R7, R8 & R9 in the example. In particular R8 is
   necessary to prevent it from being removed as a non-branching node,
   and hence failing to deliver packets to D5.)

6.3.1 Changing routes

   The algorithm above will always build a tree incorporating the most
   recent route from the root to any particular node, overriding any
   previous routes. This seems to be reasonable behaviour given that
   the most recently received trace packet probably reflects the most
   recent routing information.

   When routes change, it is likely that some portion of the tree will
   no longer reach any destination. Such 'dead' portions must be pruned
   off to avoid unnecessary bandwidth wastage. There are two obvious
   ways to deal with this.

   1. Detect that the parent of a node was already set and is being
      changed to a new value, then follow up the chain of old parents
      until a node is reached with more than one child (found by
      scanning the list looking for nodes with parents pointing to this
      node).

   2. Alternatively, the dead branches can be left in place, then
      pruned by performing a depth first exploration the entire tree
      from the root looking for nodes that do not lead to a delivery
      point.

6.3.2 Loops

   Partially looping trace packets (as a result of dynamic routing
   changes) will be dealt with naturally by the above algorithm. When
   the trace packet crosses its own path the loop will be removed from
   the tree just as if it had been a new route.


Shand, et al.              Expires Dec 2000                  [Page 13]


                       Explicit Route Multicast              June 2000


   Clearly, persistently looping trace packets will not arrive at their
   destination and will be treated as dropped trace packets. It is
   possible that such a packet may overflow the trace list before the
   hop count is exhausted. When there is no room in a trace packet to
   add an ERM entry, the packet should be discarded.

   Note that since a Trace Packet has the DF bit set in its outer IP
   header, a trace packet may also be discarded if it exceeds some link
   MTU.

6.3.3 Removing non-branching nodes

   The tree built by the above algorithm may include non-branching
   nodes (such as R4 in the example). These can be removed by
   performing a depth first exploration of the tree from the root and
   removing nodes that have exactly one child (a node which is also a
   terminator is never removed). Note that this must be done after any
   dead branches have been pruned, since removal of dead branches may
   generate further single child nodes.

   It is possible to perform the dead branch removal and non-branching
   node removal during the same exploration. However, this may not be
   desirable since a new trace packet can be added to the tree after
   dead branch removal, but NOT after non-branching node removal (since
   the new trace path may merge with the old at a node that was
   previously non-branching).

   Performing dead branch removal after each (set of) trace packet(s)
   may be desirable since it allows the memory used to store the dead
   nodes to be recovered.

6.3.4 Building the packet headers

   The tree lists in all ERM packet headers MUST be in pre-order form
   (in order to permit the previously described forwarding
   optimization). Packet headers in preorder form are easily built from
   the complete parent tree by performing a depth first exploration and
   reassigning node IDs. Note that for these purposes the trees are
   built with the first hop router(s) as the root. If there are
   multiple first hop routers (i.e. the encapsulating router is a
   branch point), there will be multiple distinct trees.

6.3.5 Removing a destination

   When a destination router detects that it has no more members of the
   group, it unicasts a 'prune-leave' message directly to the current
   source ERM router (current_source_ERM_router see section 7.4) and
   sets current_source_ERM_router to NO_MEMBERS. (The value NO_MEMBERS
   is returned when there is no record for the group at the destination
   router. I.e. there is no need to retain 'negative' state for the
   group after its members have gone away.) The prune message is


Shand, et al.              Expires Dec 2000                  [Page 14]


                       Explicit Route Multicast              June 2000


   carried in an IP packet with protocol type ERM, and has the
   following data format.

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |  ERM Type     |   n Groups    |       Reserved                |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                     Source Address                            |
   +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
   |                     Group Address 1                           |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                     Group Address _                           |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                     Group Address n                           |
   +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+

   o  ERM Type (1 byte) - Type of ERM packet = 2 (prune-leave) or 3
      (prune-change)

   o  n Groups (1 byte) - Number of Group Addresses

   o  Source Address - Multicast source IP Address

   o  Group Addresses (4 * n Groups bytes) - List of Group Addresses to
      be pruned.

   The only information required in the prune message is the [S, G] and
   the address of the destination router. The latter is obtained from
   the Source IP address of the IP packet. The encapsulating router can
   then mark the destination router as no longer being a terminator,
   and remove the dead branch by either of the techniques outlined
   above. Note that it is NOT necessary to have access to the original
   trace packet in order to remove it.

   Prunes are not acknowledged. If the prune message is lost, unwanted
   multicast data may still arrive. The value of NO_MEMBERS in
   current_source_ERM_router is deemed to match no source ERM router
   address, and hence results in the (rate limited) re-transmission of
   prune-leave messages as described in section 7.4.

6.3.5.1. Timer based destination removal

   A destination router may die without being able to send a prune-
   leave message, or it may become partitioned from the rest of the
   network. In these circumstances, we want the destination to be
   eventually removed from the delivery tree. This is achieved by the
   source ERM router maintaining a timer (n*t1) (on the order of a few
   minutes) with each destination. This is reset by the arrival of a
   trace packet from that destination. On expiry of the timer, the
   destination is removed as if a prune-leave message had been
   received.

Shand, et al.              Expires Dec 2000                  [Page 15]


                       Explicit Route Multicast              June 2000


   Destination routers maintain a timer with the value t1, and send a
   trace packet when it expires (irrespective of the arrival of
   multicast data - unlike the t2 timer described in section 6.4.2.1).

6.3.6 Memory scaling issues in the encapsulating router

   As a minimum the encapsulating router needs to store

   o  The set of all the unique ERM router addresses mentioned in trace
      packets it receives. Addresses of nodes pruned because they are
      on dead branches may be safely forgotten, but addresses of non-
      branching nodes must be retained in case they are subsequently
      needed.

   o  For each group (identified by [S, G] ), a node list of length N,
      (where N is the number of unique addresses in the set of trace
      lists for that group), consisting of offsets into the address
      list.

   o  For each group, a parent list of length N.

   o  Note that there is no need to keep the trace lists themselves.

6.4 Detecting and recovering from failures

6.4.1 Unicast routing topology changes

   ERM encapsulated packets are unicast between branch point ERM
   routers. Changes in unicast topology between ERM routers that do not
   affect reachability will simply be accommodated by normal unicast
   routing. ERM encapsulated packets will still be delivered to the
   next ERM router.

   Where the topology changes such that the existing delivery tree is
   no longer optimum (but is still connected), the old sub-optimal
   delivery tree will continue to be used until such time as it is re-
   evaluated as the result of receiving new trace packets. This may
   occur as a result of new receivers joining the group on destination
   routers that were not previously receiving the group, or as a result
   of delivery failure (see section 6.4.2.1). It will also occur
   periodically at an interval of t1 as described in section 6.3.5.1.
   Hence, the maximum time for which a non-optimal delivery topology
   will persist is t1, and it will usually be much less, especially in
   the part of the tree near the root, where multiple traces contribute
   towards the topology discovery.

6.4.2 Intermediate ERM router failures

6.4.2.1. Router Failures

   Failure of an intermediate ERM router on the delivery tree will
   cause all destinations below it to stop receiving data. Each

Shand, et al.              Expires Dec 2000                  [Page 16]


                       Explicit Route Multicast              June 2000


   destination router runs a timer (n*t2, where t2 is the expected
   maximum interval between data packets, and n is the number of lost
   packets which can be tolerated before recovery is initiated.) which
   is reset by the receipt of data for the corresponding [S, G]. On
   expiry of the timer, a new trace packet is sent towards the source,
   which will discover a new delivery path (should one exist). Since
   trace packet delivery is unreliable it is necessary to allow
   multiple trace packet attempts to be made until a trace-ACK is
   received (see section 7.1.2). However it may be that the source
   really is unreachable, and no acknowledgement will ever be received.
   It would be wasteful to continue trace attempts under those
   circumstances.

   A counter C is maintained per [S, G] and is incremented on each
   transmission of a trace packet containing G towards S. Receipt of a
   trace-ACK referring to [S, G] resets the counter. If the counter
   exceeds some limit L, no further trace attempts are made for [S, G]
   until the process is re-initiated by the application and somehow
   that fact is reported to the application.

   It is envisaged that t2 would be of the order of a second (perhaps
   less) to allow recovery of a voice over IP connection within a few
   seconds. However, sending trace packets at this frequency would be
   expensive. Therefore, in the absence of any real data for [S, G] for
   a period t2, the encapsulating router sends a dummy 'heart-beat' ERM
   encapsulated packet carrying no data packet. These have ERM type
   130, with a standard ERM tree header followed by [S, G].

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |  ERM type     |  List Size    |   Offset      |      TTL      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |          Checksum             |   Tree list 1 |   Tree list 2 |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   Tree list n |  Pad 1        |   Pad 2       |   Pad 3       |
   +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
   |                     Address list 1                            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                     Address list 2                            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                     Address list n                            |
   +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
   |                Multicast Source Address                       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                Multicast Group Address                        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Receipt of such a packet by the destination router causes the timer
   to be reset in the same way as a normal data packet and hence
   inhibits the recovery attempt, but no output multicast packet is
   generated.

Shand, et al.              Expires Dec 2000                  [Page 17]


                       Explicit Route Multicast              June 2000


   Provided ALL the destination routers that are no longer reachable
   succeed in re-establishing a working delivery path, any 'dead'
   branches will be completely pruned by the mechanisms described in
   section 6.3.1. However, if one or more destinations remain
   unreachable the branches leading to them will not be pruned until
   the expiry of the destination holding timer.

6.4.2.2. Silent sources

   The heartbeat mechanism described above is satisfactory for dealing
   with short periods of silence, but becomes somewhat inefficient when
   sources exhibit long periods of silence. This can be alleviated by
   defining a holding time in the heartbeat, which would normally be
   n*t2. This is the period after which the receiver will begin to
   trace at t2. Setting this to a longer time allows the heartbeat rate
   to be reduced or even ceased altogether. However, this introduces a
   problem when transmission is resumed, since there is no mechanism to
   force a trace if there has been a failure during the period of
   silence.

   A solution to this problem is as follows. On resumption of
   transmission the source ERM router sets a timer per destination
   router, and ERM multicasts a heartbeat packet with a zero holding
   time, thus triggering traces from all the destination routers.
   Receipt of the trace from a particular destination router clears the
   timer, but if no trace is received from a particular router, expiry
   of the timer causes a heartbeat packet to be unicast directly to the
   destination router concerned. If the destination is reachable at
   all, this packet (or possibly a subsequent one if there is random
   packet loss), will cause the destination to begin tracing to repair
   the ERM path. Failure to receive a response after (some number of)
   retries will cause the destination to timeout completely.

6.4.2.3. Router reachability failures

   ERM router reachability failures are indistinguishable from router
   failures, and are dealt with by the same mechanism.

6.4.3 Destination ERM router failures

6.4.3.1. Router Failures

   Failure of the destination router causes state for the [S, G] to be
   lost. If there is only one ERM router to which the multicast
   receiver can join, then recovery is impossible (until the router in
   question is re-booted). The branch of the tree leading to the
   unreachable destination will eventually be pruned by the expiry of
   the destination holding timer as described in section 6.3.5.1.

   If there are multiple routers, then normal multicast operation will
   result in another router receiving the IGMP joins, and beginning the
   trace registration process in its own right. However the source ERM

Shand, et al.              Expires Dec 2000                  [Page 18]


                       Explicit Route Multicast              June 2000


   router will treat this as a completely distinct delivery point, and
   will continue to attempt delivery to the old DR until its holding
   timer expires as above. This will result in a period of unnecessary
   packet transmission, but this will usually be restricted to the last
   hop.

6.4.3.2. Router reachability failures

   These are dealt with as for intermediate ERM router reachability
   failures as described in section 6.4.2.3. If another route exists,
   recovery will be complete. If not, the destination will eventually
   be pruned by the expiration of the destination holding timer as
   described in section 6.3.5.1.

6.4.4 Source ERM router failures

   If the source ERM router fails, all the tree state will be lost.
   Normal recovery mechanisms will result in destination nodes re-
   sending trace packets towards the source. If another route to the
   source exists, this may result in a new router becoming the
   encapsulating router and building a new tree as usual.

   If the source ERM router doesn't fail, but is partitioned from the
   rest of the network, a new source ERM router may be initiated while
   the old source ERM router eventually (n*t1) prunes off its delivery
   tree as a result of the failure of periodic destination refresh.

7. Route discovery

   The information for building the delivery tree is obtained from
   trace packets sent from the destination nodes towards the source of
   the group. The exact form of the trace packet mechanism is described
   in section 7.1. The way the delivery nodes discover the source
   address is described in section 11.

   In the first instance we will assume that the source ERM router is
   the (single) router adjacent to the source. I.e. the ERM router
   knows that it is the encapsulating source ERM router by its
   proximity to the source. Later we will discuss how to extend this to
   permit the source to be separated from the encapsulating router(s)
   by a multicast cloud. Details of this are described in section 7.2

7.1 Trace Packets

   A trace packet is sent from a destination when the first member of a
   group joins, and periodically as described above. The trace packet
   builds a list of ERM capable routers traversed in order to reach the
   source ERM router.





Shand, et al.              Expires Dec 2000                  [Page 19]


                       Explicit Route Multicast              June 2000


7.1.1 Choosing a unique IP address

   There needs to be some guarantee about which IP address goes in the
   list. If the same node puts different IP addresses in the list for
   the same path, it will prevent correct identification of the
   delivery tree. If it uses the IP address of the outbound (from the
   source) interface, then a single node may appear as different nodes
   in two different traces, hence causing packet duplication over the
   upstream hop. If it uses the inbound (from the source) interface,
   (i.e. the interface address the trace packet is forwarded over),
   crossing paths won't be detected, since they will appear to be
   different ERM routers. Always choosing the lowest interface address
   avoids these problems, but it may cease to be available if the
   interface fails, causing perturbation of the ERM routes, even the
   infterface was not directly involved.

7.1.2 Acknowledgement of trace packets

   The source ERM router acknowledges receipt of a trace packet, by
   sending an ERM encapsulated packet to a sub-tree of the optimised
   multicast delivery tree, which contains only the relevant
   destination router. No additional optimisation is performed on the
   tree, which may therefore contain multiple hops. Thus the
   acknowledgement packet is delivered over the same path which will be
   used for the delivery of multicast traffic and 'shares fate' with
   that traffic. Note that the acknowledgement of the first trace
   packet for the group will be delivered directly to the destination
   router, since the multicast tree will consist entirely of that one
   hop. As more destination routers are added, the tree will approach
   the final multicast delivery tree.

   The ERM type is 129 (trace-ACK - the high order bit indicating that
   it contains an ERM route and should be forwarded using standard ERM
   forwarding) and the 'encapsulated data' consists only of [S, G] and
   the two byte sequence number of the trace packet being ACKed.


















Shand, et al.              Expires Dec 2000                  [Page 20]


                       Explicit Route Multicast              June 2000


    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |  ERM type     |  List Size    |   Offset      |      TTL      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |          Checksum             |   Tree list 1 |   Tree list 2 |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   Tree list n |  Pad 1        |   Pad 2       |   Pad 3       |
   +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
   |                     Address list 1                            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                     Address list 2                            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                     Address list n                            |
   +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
   |                Multicast Source Address                       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                Multicast Group Address                        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |       Sequence Number         |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


   On triggering a trace packet for a group, the destination router
   sets the value of current_source_ERM_router for that group to zero.
   The trace packet is re-sent every TRACE_REPEAT_INTERVAL seconds,
   incrementing the sequence number on each transmission until a trace
   ACK with the current sequence number is received. (The value of
   TRACE_REPEAT_INTERVAL needs to be guaranteed to be greater than the
   normal round trip time for trace /trace-ACK packets between the
   destination and the source). The IP address of the source ERM router
   for that multicast source (from the IP source address of the trace
   ACK packet) is then recorded in current_source_ERM_router. This is
   used to detect changes in the Source ERM router (see section 7.4).

7.1.3 Becoming the source ERM router

   When an ERM router determines that it is the source ERM router (as
   described elsewhere), it performs the actions associated with a
   member of that group sending an IGMP register. I.e. it does a PIM
   join or whatever, to pull down the multicast traffic for that [S,
   G].

7.1.4 Ceasing to be the source ERM router

   When the last destination for [S, G] is removed from a source ERM
   router (either as a result of receiving an ERM prune, or as a result
   of the destination holding timer expiring), the router performs the
   appropriate multicast leave operations and purges all state for [S,
   G].



Shand, et al.              Expires Dec 2000                  [Page 21]


                       Explicit Route Multicast              June 2000


7.1.5 Incongruent unicast and multicast topologies

   The ERM trace packet follows the unicast topology towards the
   source, and hence the distribution tree branch points will be
   determined by the unicast routing topology. However, where the
   unicast and multicast topologies are incongruent (as provided for by
   MBGP, for example), an ERM capable router can choose to forward the
   ERM trace packet according to its multicast forwarding information.
   Provided ERM capable routers are located at the critical divergence
   points between the two topologies, this will allow some degree of
   separation of the two topologies. The separation will not be
   complete, however for two reasons.

   1. When forwarding ERM trace packets a non-ERM capable router will
      choose the unicast forwarding path even if it has a multicast
      path (since it has no knowledge that the ERM trace packet has
      anything to do with multicast).

   2. When forwarding ERM data between ERM branch points, any router
      (whether or not it is ERM capable) will forward according to the
      unicast forwarding path to the next branch point.

   There is a risk of looping trace packets, where the unicast and
   multicast topologies are sufficiently diverse and the trace packet
   is forwarded according to the multicast topology by some routers and
   according to the unicast topology by others. The capability of ERM
   routers to forward trace packets according to the multicast topology
   should therefore be used only with some caution.

7.2 Trace packets

   A unicast packet containing the router alert option is addressed to
   the source address. It is forwarded normally by non-ERM capable
   routers (but it will incur the penalty of RA processing to determine
   it is not interesting). ERM capable routers append their IP address
   to the list, update the offset and re-forward it towards the source
   address.

   The packets have the following format (see also the modifications
   proposed in sections 7.3.1 and 8.1):-

   o  Normal unicast IP header with RA option and "Don't Fragment" set.

   o  Destination address = source address of multicast group

   o  Source Address = Destination router ID

   o  Total Length = IP Header length + (max trace length +1)*4

   o  Protocol  = ERM

   o  Data, comprising

Shand, et al.              Expires Dec 2000                  [Page 22]


                       Explicit Route Multicast              June 2000


    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |  n Groups     |     Offset    |       Sequence Number         |
   +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
   |                     Group Address 1                           |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                     Group Address _                           |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                     Group Address n                           |
   +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
   |                     Address list 1                            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                     Address list _                            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                     Address list m                            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+



   o  n Groups (1 byte) = Number of Group Addresses

   o  Offset (1 byte) = Offset (in 4 byte units) of next free position
      (initialised to zero)

   o  Sequence number (2 bytes)

   o  Group Address (n Groups * 4 bytes) = List of Group Addresses to
      which this trace refers

   o  Address List (max trace length * 4 bytes) = List of ERM router
      addresses (initialised to zero)

7.3 Locating the source ERM router(s)

   Up to now, it has been assumed that the source ERM router is
   adjacent to the source host and can identify itself as such. The
   source ERM router

   1. Records the state information from the trace packets, and builds
      the delivery tree.

   2. Sends a trace-ACK to the originator of the trace packet.

   3. Encapsulates subsequent multicast data packets for [S, G].

   4. Performs whatever actions are necessary to pull down multicast
      traffic for the group (see section 7.1.3)

   An ERM router that is NOT the source ERM router retains no state
   from the trace packets it forwards (in accordance with design
   constraint 1).

Shand, et al.              Expires Dec 2000                  [Page 23]


                       Explicit Route Multicast              June 2000


   However, this assumption places the considerable constraint on the
   design that all sources must be adjacent to an ERM capable router.
   In reality the source may be separated from the 'first' ERM capable
   router by a conventional multicast domain because

   1. It is not feasible to deploy ERM capable routers adjacent to
      every host.

   2. It is required for administrative reasons to use conventional
      multicast for that portion of the delivery tree.

7.3.1 Remote source detection.

   A small modification to the format of the trace packet allows the
   above constraint to be relaxed. The data portion of the 'trace'
   packet is carried as the data portion of an ICMP echo request packet
   with destination address the group source IP address, source address
   the destination router IP address (initially) and an RA option.  The
   ICMP echo request identifier is set to the protocol type assigned to
   ERM to provide some protection against aliasing with genuine 'ping'
   traffic.

   An ERM router intercepting the packet updates the trace information
   with its own unique IP address (including adjusting the offset
   pointer and the IP header total length field) and also sets the
   source IP address of the ICMP packet to be its own unique IP
   address. Both the ICMP and IP header checksums must be modified as
   appropriate.

   Any source host receiving the packet acts on it as a normal echo
   request and returns it to the last ERM router (i.e. the most recent
   value inserted as the ICMP source IP address) as an echo reply with
   the trace data intact. On receiving such a packet, the ERM router
   establishes itself as source ERM router, builds the initial part of
   the tree from the enclosed trace list, and sends a trace-ACK to the
   initiator of the trace (i.e. the first address in the trace list).

   There's an obscure case that needs discussion. Since the returned
   echo response packet will, presumably, still have the RA option set,
   it will be examined by all routers between the source and the source
   ERM router. It is possible, perhaps as a result of dynamic topology
   changes, or asymmetric routing, that one or more of these routers
   may be ERM capable. This gives rise to the strange situation that an
   ERM router has been found that appears to be 'closer' to the source
   than the router which had previously been identified as the source
   ERM router. However, this is 'closer' in the source to destination
   sense. Since multicast routing uses RPF, the original, which is
   closer in the destination to source sense, is preferred. If it turns
   out that the dynamic routing changes converge such that the second
   router really is 'closer' in the required direction, then the source
   ERM router change procedures described in section 7.4 will
   ultimately resolve the situation. Hence any echo responses seen by

Shand, et al.              Expires Dec 2000                  [Page 24]


                       Explicit Route Multicast              June 2000


   an ERM capable router, which are not directly addressed to it, can
   safely be ignored.

7.4 Changing the source ERM router

   When the source ERM router is not an immediate neighbour of the
   multicast source, routing changes may result in a different source
   ERM router being identified by subsequent trace packets. The new
   source ERM router will begin to encapsulate data packets down the
   delivery tree, but the original source ERM router will also continue
   to encapsulate packets down its delivery tree, until the destination
   router holding timer (see section 6.3.5.1) expires. Thus multicast
   data will be duplicated for the period of the destination router
   holding timer.

   In order to minimise the period of duplication, a destination router
   checks the source address (i.e. the address of the encapsulating
   source ERM router) of each ERM encapsulated packet received,
   including heartbeat packets. If it does not match the value of
   current_source_ERM_router corresponding to the IP source address of
   the encapsulated multicast packet (or that of the heartbeat packet),
   it indicates that duplicate data may be being received. The data (if
   any) is delivered in any case (a short duration of duplication is
   preferable to the risk of dropping data erroneously), but an ERM
   prune-change is triggered, to be unicast directly to the
   unrecognised source ERM router. These prunes are rate limited.

   A value of zero in current_source_ERM_router (indicating that the
   current source ERM router is unknown because a trace is in progress,
   as described in section 7.1.2), is deemed to match any source ERM
   router. No prunes are sent until the correct source ERM router has
   been identified, by receiving a trace-ACK.

   A value of NO_MEMBERS in current_source_ERM_router (indicating that
   the destination router no longer has members for the group as
   described in section 6.3.5) is deemed to match no source ERM router.
   Hence, rate limited prune-leaves are sent to the source address of
   the encapsulated packets in response to ERM encapsulated data for
   the group from any source ERM router.

7.5 Failures in the multicast delivery to the source ERM router etc.

   Since the source ERM router is sending heartbeats towards the
   destinations to suppress traces even in the absence of multicast
   data, only periodic traces will be generated while the delivery path
   between the source ERM router and the destinations remains intact.
   This is true even if there is a multicast delivery failure between
   the source and the source ERM router(s). If the traces from the
   destinations had not been suppressed, they might have been able to
   discover a new source ERM router, which had connectivity to the
   source.


Shand, et al.              Expires Dec 2000                  [Page 25]


                       Explicit Route Multicast              June 2000


   It is not possible to use a heartbeat from the source to the source
   ERM router(s) to detect this type of failure, since this would
   require co-operation from the source host. Once a source ERM router
   has joined the conventional multicast delivery tree, it is the
   conventional multicast protocols which will (attempt to) maintain
   the delivery path from the source to the source ERM router(s).
   Failures of intervening routers and links should (if connectivity
   still exists at all) not affect the reliable delivery of multicast
   data to the source ERM router(s). If multicast routing fails to
   deliver multicast data to a particular source ERM router, then it is
   possible that the ERM router has become partitioned from the
   conventional multicast network. If this is the only feasible source
   ERM router, then recovery is impossible. But it may be that some
   other potential source ERM router still has multicast connectivity.

   Each destination router is sending periodic traces at the rate of
   once per t1 seconds. In the steady state these will all converge on
   the source ERM router in question. Thus, there are m opportunities
   per t1 seconds for a periodic trace packet to discover an
   alternative source ERM router, where m is the number of destination
   routers associated with both the multicast source and the source ERM
   router in question.

   When such a trace packet discovers an alternative source ERM router,
   the mechanisms described in section 7.4 will cause a prune-change
   message to be sent to the original source ERM router. On receipt of
   such a prune-change message, the source ERM router performs the
   normal prune-leave action of removing the associated destination
   router from the delivery tree. In addition, it ceases transmitting
   downstream heartbeat packets to all destination routers associated
   with the source. Sending of heartbeat packets is not resumed until a
   period of (N+1)*t2 seconds has elapsed and a new trace packet for
   the source has been received. In the absence of genuine multicast
   traffic, this will cause the remaining destination routers served by
   this source ERM router to begin non-periodic tracing, and hence
   rapidly discover the new source ERM router if appropriate. If, on
   the other hand, multicast data is still arriving at the source ERM
   router, then this confirms that the conventional multicast delivery
   tree is still intact, and there is no harm in the non-periodic trace
   messages continuing to be suppressed.

   The effect of these mechanisms is that such a failure will be
   repaired for the first destination in an average time of about t1/m
   seconds, and the remaining destinations should catch up in a further
   period of n*t2 seconds.

7.6 Summary of timers

   The previous sections have identified a number of timers. Their use
   is summarised here for clarity.



Shand, et al.              Expires Dec 2000                  [Page 26]


                       Explicit Route Multicast              June 2000


7.6.1 Periodic timer  t1

   This timer is used for periodic functions to discover more optimal
   topology. Destination routers send periodic trace packets every t1
   seconds, and failure to receive such a packet from a destination
   router for a period of n*1 results in the state for the destination
   being pruned.

   A plausible value for t1 is 60 seconds.

7.6.2 Error recovery timer  t2

   This timer is used for protocol functions associated with the
   recovery from errors such as failed routers and links. The source
   ERM router guarantees to send ERM encapsulated data (genuine
   multicast traffic, heartbeats, or track-ACKs) at least once per t2
   interval. Failure to receive such data for a period of n*t2 results
   in the destination router initiating a trace. In the absence of a
   trace-ACK, such traces are repeated up to TRACE_FAILURE_COUNT times
   at an interval of TRACE_REPEAT_INVERVAL. Once the count has been
   exceeded, the destination router abandons further attempt to join
   that [S, G] (until when?).

   Service interruption as a result of router or link failure will be
   at least n*t2 seconds, rising in increments of TRACE_REPEAT_INTERVAL
   seconds if trace packets are lost.

   A plausible value for t2 is 1 second.

7.6.3 Trace Repeat Interval

   The interval between non-periodic trace attempts. This is probably
   n*t2.

8. Multicast capable subnetworks

   The algorithms described so far will not take advantage of a
   subnetwork that has layer 2 multicast capability. A separate copy of
   the data packet will be unicast to each child router on the subnet.
   This compares poorly with conventional IP multicast which will
   (usually) multicast a single copy of each data packet to all the
   downstream routers on the subnet. The following sections describe
   enhancements to permit a similar optimisation for ERM.

8.1 Modified trace packets

   When transmitting a trace packet, the source IP address of the
   enclosing ICMP packet is set, not to the ERM 'unique' IP address,
   but to the actual transmitting interface IP address. The ERM
   'unique' address is still inserted in the trace list as before.



Shand, et al.              Expires Dec 2000                  [Page 27]


                       Explicit Route Multicast              June 2000


   When receiving a trace packet over a layer 2 multicast capable LAN
   (only), a check is made to determine whether the source IP address
   of the enclosing ICMP packet is a direct neighbour over that
   interface. If so an additional pseudonode identifier is inserted
   into the trace list before also inserting the ERM 'unique' address
   as normal. The pseudonode identifier is assigned uniquely to the
   interface within the scope of that router, and has the top 3 bits
   (i.e. class D) set to allow it to be distinguished from a genuine
   node identifier (since a class D address will never be used as the
   ERM 'unique' address).

   The additional cost of this is 4 bytes per multicast capable
   subnetwork traversed by the trace packet. In the worst case this
   could double the number of entries.

8.2 Modified tree building algorithm

   When processing the trace packet the pseudonode identifier is
   removed (i.e. it never appears in the tree address list placed in
   the ERM data packet header.). However its presence is noted. If the
   set of logical children of a particular pseudonode (i.e. the
   children of the parent of the pseudonode whose traces include the
   pseudonode) has two or more members, those children are retained
   even if they themselves have only one child. This ensures that the
   routers that will receive the multicast ERM data packet will always
   appear in the address list even if they are not a branch point. This
   is necessary to enable them to identify their position in the
   delivery tree, since the multicast packet is of necessity the same
   for all recipients. In addition each pseudonode of a particular
   router is given a unique identifier in the range 1-15. This may or
   may not correspond to the original interface number, which may or
   may not be encoded in the pseudonode identifier carried in the trace
   packet. This identifier is encoded in the top 4 bits of the tree
   list entry for each child (the bottom 4 bits being the offset of the
   parent).

   In order to maximise the size of distribution tree which can be
   accommodated within the 4 bit parent offset restriction, it may be
   observed that no leaf node (i.e. one with no children, whether or
   not it is a delivery point), by definition, is ever referenced as a
   parent. By arranging that all leaf nodes appear at the end of the
   trace list (this can be done without breaking the pre-ordering
   requirement), it can be guaranteed that all of the 15 available
   parent offset Ids are assigned to nodes which are referenced as
   parents.








Shand, et al.              Expires Dec 2000                  [Page 28]


                       Explicit Route Multicast              June 2000


8.3 An example


            R1 R11
           A \ |  B
   R5--R3--|  \|  |-R9
   R6--/   |--R2--|
           |   |  |-R8
   R7--R4--|   |
               |
          C----------
                 |

                 R10

   We would expect the following trace packets towards router 1.

   5,3,A,2,1
   6,3,A,2,1
   7,4,A,2,1
   10,C,2,1
   8,B,2,1
   9,B,2,1
   11,2,1

   We keep 4, even though it is not a branch point, because A has
   multiple children (3 and 4).

   The resulting trace list would be (using hex to make the top and
   bottom 4 bits clearer)

   00,01,12,12,03,03,04,22,22,02,02

   with an address list of

   r1,r2,r3,r4,r5,r6,r7,r8,r9,r10,r11

8.4 Modified forwarding algorithm

   When searching for children, only the bottom 4 bits of the trace
   list entry are compared to the parents index. Thus in the above
   example, R2 will treat R3, R4, R8, R9, R10 and R11 all as its
   children. However, it performs the following additional tests on the
   set of children.

   1. If the top 4 bits are zero, a copy is unicast to the child as
      normal. Thus R10 and R11 receive unicast copies.

   2. If the top 4 bits are non-zero, it performs a next hop lookup on
      the corresponding IP address, to determine the output interface,
      and multicasts, to all-ERM routers, a copy over that interface.
      I.e. the destination address of the outer IP header is set to the

Shand, et al.              Expires Dec 2000                  [Page 29]


                       Explicit Route Multicast              June 2000


      all-ERM routers IP multicast address. It then marks all
      subsequent entries in the children set as having been processed.
      Thus R3 has an entry of 12, hence it looks up R3, determines that
      interface A is the output interface, multicasts a copy over that
      interface and sets the remaining entries for 12 (R4) as
      processed. The next child of R2 is then R8, and the process is
      repeated over interface C.

   On receipt of a multicast copy, an ERM router needs to find its own
   position in the address list. This could be achieved by always
   scanning the address list looking for one's own address, but that
   involves n 4 byte compares. The number of compares can be restricted
   by the following algorithm. Note that when sending the multicast
   copy, the pointer is always adjusted for the first of the children.

   If the entry corresponding to the pointer has the top 4 bits zero no
   address check is required and the algorithm works as before. If the
   top 4 bits are non zero, the address corresponding to the pointer is
   checked against the receiving router's own address. If it matches,
   the current location is correct and forwarding can proceed as
   normal. If not, it searches along the tree list (starting from the
   current location) looking for the same value (including the top 4
   bits) as the current entry. If it find it, it again checks the
   corresponding address, and if it matches, the search terminates and
   the packet is forwarded as normal, otherwise the search continues.
   If the search fails to find an entry the packet is discarded, since
   this must have been a multicast packet received by virtue of a
   router which is not a member of the tree being on the same LAN as
   members of the tree.

   To proceed with the example, when R3 receives the multicast copy it
   finds the pointer at 3, and the top 4 bits of the entry have the
   value 1. It therefore checks its own address against 3, finds a
   match and forwards normally. When R4 receives the packet, it
   performs the same checks, but this time the address doesn't match,
   so it searches for the next entry (4) with the value 12. The address
   check of 4 now succeeds and forwarding proceeds normally. If there
   were another router (R12 say) on LAN A, which was not part of the
   delivery tree, it too would receive the multicast packet, but both
   checks would fail, and so it would be discarded. Unfortunately this
   means that on a LAN with n downstream members of the tree, not only
   do each of those routers have to perform up to n checks (overall a
   total of n(n+1)/2 checks per packet), but every router on the LAN
   which is not a member of the tree must also perform n (failing)
   checks.

8.4.1 Setting the TTL

   Transmitting to the all-ERM routers multicast with a TTL greater
   than one could potentially allow these packets to be multicast
   forwarded off the LAN. Therefore the TTL in the outer IP header of
   such packets MUST be set 1. In order to preserve the 'real' TTL,

Shand, et al.              Expires Dec 2000                  [Page 30]


                       Explicit Route Multicast              June 2000


   this is first copied from the outer IP header to the TTL field of
   the ERM header. On reception of such a multicast packet, the
   original TTL (suitably decremented) is restored in the outer IP
   header.

9. Further Optimisations?

   Further optimisations may be possible, taking advantage of the fact
   that there may be multiple groups that share the same source or
   source ERM router(s). On balance, they may not be worth the extra
   complexity they introduce, but they are discussed below for
   completeness.

9.1 Multiple groups with the same source

   A destination router may have multiple active groups that share the
   same multicast source (and implicitly, the same source ERM router).
   In this case a single trace MAY be sent for the entire set of
   groups, containing a list of the group addresses to which it refers.
   Not only does this reduce the bandwidth requirements for trace
   packets, but it may also reduce the memory requirements in the
   source and destination ERM router(s). In this case the destination
   ERM router need maintain current_source_ERM_router only per source,
   and not per [S, G].

   However, separate trace-ACKs will still be required, since the ERM
   delivery trees for the groups may be different, owing to their
   different membership. For similar reasons, it is not possible to use
   traffic or ERM heartbeats arriving at a destination router for one
   group to imply correct operation of any other group even though they
   share the same source and source ERM router.

9.2 Minimising the source impact of trace packets

   The algorithms described above require the multicast source to
   process at least one trace packet per periodic time t1 for every
   distinct destination router served by all groups to which it is
   transmitting. Under normal conditions the traces will all converge
   at one (or more) source ERM router(s). The portion of the trace from
   the source ERM router to the multicast source and back is only
   necessary to detect the arrival of a new ERM capable router closer
   to the source (once the initial detection of the source ERM router
   has been accomplished). Limiting the number of such packets can
   therefore reduce the load on the multicast source.

   When a router sees a trace packet travelling to the source S, and it
   already has encapsulation state for [S, *] (I.e. it is a source ERM
   router for [S, *]) it can 'short circuit' the delivery of such trace
   packets provided at least one such packet per t1 interval is allowed
   to pass unimpeded. Such short-circuited packets must be processed as
   if they had been received as responses from the source. This reduces
   the load on the multicast source to one trace packet per t1

Shand, et al.              Expires Dec 2000                  [Page 31]


                       Explicit Route Multicast              June 2000


   interval, but still maintains the possibility of discovery of a
   closer source ERM router within that interval.

   Note that this optimisation does not necessarily conflict with the
   mechanisms described in section 7.5 for recovering from failures
   which require the detection of a new source ERM router. Only those
   trace packets that actually pass through the current source ERM
   router are affected. Where the path from the destination router to
   the potential source ERM router does not pass through the existing
   source ERM router there will still be m opportunities per t1
   seconds.

9.3 Multiple groups with the same source ERM router, but different
    sources

   Since the sources are different, it is necessary to use separate
   traces, as they may subsequently identify different source ERM
   routers for the groups. The same arguments as above preclude the use
   of common trace-ACKs or heartbeats unless the delivery trees of the
   groups are identical.

9.4 Minimal Encapsulation

   In the existing encoding for an ERM data packet, several fields are
   duplicated between the original multicast header and the IP header
   of the ERM data packet. By using similar techniques to RFC 2004 [3
   "Minimal Encapsulation within IP", this duplication can be avoided.
   The modified ERM data packet is as follows:-

























Shand, et al.              Expires Dec 2000                  [Page 32]


                       Explicit Route Multicast              June 2000


    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |  ERM type     |  No of nodes  |   Offset      |      TTL      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |          Checksum             |   Prot        |   Tree list 1 |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   Tree list 2 |  Tree list n  |   Pad 1       |   Pad 2       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                     Address list 1                            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                     Address list 2                            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                     Address list n                            |
   +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
   |            Original Multicast Source Address                  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |            Original Multicast Destination Address             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |             Original Multicast Data ...
   +-+-+-+-+-+-+-+-+-+-+-+-+


   The original multicast source and destination addresses from the
   multicast packet (together with the data of course) are kept intact,
   but the preceding parts of the original header are stripped out.

   The original Prot is stored in the ERM header (the outer IP header
   prot, being set to ERM).

   ToS, ident, frags and TTL are copied into the outer IP header (and
   copied back on decapsulation).

   The ERM header TTL field is still required for multicast capable
   LANs as described above in section 8.

   The ERM checksum covers the original multicast source and
   destination addresses as well as the other ERM header fields.
   However if all the ERM header fields were covered it would be
   necessary to incrementally adjust the checksum when the offset (and
   TTL) are changed. Therefore, the ERM checksum is defined to start at
   the checksum field. Omitting the offset exposes the venerability to
   corruption of the offset but this can only cause the following
   errors: -

   1. If it is corrupted to an earlier value, the packet may be
      returned to a previous point in the delivery tree, causing
      duplication along the branches not leading to the current router.
      The branch below the current router will receive a single copy of
      the packet. No data loss will occur, and a single instance of
      corruption will only cause a single instance of duplication,
      since the contents of the offset field will be reset.

Shand, et al.              Expires Dec 2000                  [Page 33]


                       Explicit Route Multicast              June 2000


   2. If it is corrupted to a later (but still valid) value, the
      intervening branches will suffer packet loss (since the packet
      will appear to 'jump' to the later point in the tree).

   3. If it is corrupted to an invalid value, the branch below the
      current router will suffer packet loss.

9.4.1 Benefits

   This saves 11 bytes (or thereabouts, because of alignment issues)
   compared to the full encapsulation. For example, in what might be a
   common case of three delivery points from a common fan out (i.e. 3
   addresses in the address list), the minimal encapsulation would cost
   a total of 32 bytes encapsulation overhead, compared to 44 bytes
   with full encapsulation headers. For a typical uncompressed VoIP
   packet, which is about 50 bytes, that's a 64% overhead compared to
   88%. This should be compared to the 'overhead' of using separate
   multicast packets, which in the above case would be 200%, since 3
   unicast packets would be required.

9.4.2 Costs.

   This form of encapsulation is less efficient for encapsulation and
   decapsulation. It also precludes the possibility of using a separate
   ToS value for the ERM encapsulation, since this must be copied from
   the original multicast packet.

9.4.3 Problems with Fragmentation.

   Fragmentation is an issue with ERM, whatever encapsulation is used,
   since an ERM packet may be fragmented by intermediate non-ERM
   routers (i.e. by performing normal IP fragmentation on the outer IP
   header). Since not all fragments will contain the ERM header, and
   hence cannot be ERM forwarded, it is necessary for ERM routers (i.e.
   the destination of the outer IP header) to perform re-assembly
   before ERM forwarding.

   The situation is complicated if the 'minimal encapsulation' ERM
   header is used, since there is then only one set of fragmentation
   information. If the multicast packet were already fragmented before
   ERM encapsulation, it would invoke re-assembly at each ERM hop.
   Presumably, since it required fragmentation in the first place, it
   would then need to be re-fragmented for transmission.

   Given that (potential) re-assembly at every ERM router is highly
   undesirable, the best solution may be to set the "don't fragment"
   bit in the outer IP header, and hence never do any re-assembly at an
   ERM router. (This would be desirable even with a full header
   encapsulation). In the case of 'minimal encapsulation' it would be
   necessary to find a single bit somewhere in the ERM header to carry
   the original value of the DF bit. However, if the original multicast
   packet had previously been fragmented this could result in a packet

Shand, et al.              Expires Dec 2000                  [Page 34]


                       Explicit Route Multicast              June 2000


   with DF set AND non-zero values for either or both of MF and
   fragment offsets! It is not clear whether this would be treated as
   an error by any IP implementation. If it would, then it would be
   necessary to store the whole 16 bits of the fragmentation fields in
   the ERM header, which makes the 'minimal' somewhat less attractive.

9.5 Final hop optimisation

   The ERM tree information is never required for the final hops (that
   is, from the last fan out point to the delivery router - except of
   course for the cases where the last fan out router IS a delivery
   router). By stripping this out of the packet before the final
   forwarding, another 20 bytes could be saved (for the 3 way example)
   reducing the overhead to 24% for those hops. (The ERM type, prot,
   checksum and the SA and DA. are still required, giving a 12 byte
   overhead = 12 bytes). Such a packet shrinking operation is likely to
   be rather costly, but could perhaps be justified by the fact that
   the last hop is likely to be at the edge of the network and hence
   have lower bandwidth capable links.

   Another way of looking at this is to say that for any particular
   hop, sending a single ERM packet is roughly comparable to sending 2
   unicast packets (for 50 byte packets and small tree lists). So it is
   only on the final hops (where there would be no packet duplication
   even in the multiple unicast) that ERM is at a serious disadvantage.
   On hops which would require 3 or more unicast packets ERM almost
   always wins (except for large tree lists > 15 nodes, which is
   outside the design scope).

   Of course ERM can never do better than true multicast.

10. Limiting the size of a group

   Explicit Route Multicast is designed to operate with a 'small'
   group. However, the limiting factor is not the size of the group
   (i.e. number of group members) per se, but rather the size of the
   encoded delivery tree. This depends on the number of destination
   routers (which may each serve multiple group members) and also on
   the topology of the delivery tree.

   N delivery nodes require an encoded tree of length 2N-1. The best
   case (ignoring the trivial case where the encapsulating router sends
   a packet directly to each delivery node, requiring a zero length
   encoded tree) is where each node on the tree is itself a delivery
   node, which requires an encoded tree of length N.  Thus for any set
   of N delivery nodes, the encoded length may vary between N and 2N-1
   depending on the topology. Since the topology may change during the
   lifetime of the group, the encoded tree length may also change
   between these limits even if the number of delivery nodes remains
   constant. Equally, the number of delivery nodes may change as nodes
   join and leave the group.


Shand, et al.              Expires Dec 2000                  [Page 35]


                       Explicit Route Multicast              June 2000


   A number of possible strategies for controlling the size of the
   encoded tree are discussed below.

10.1 Limiting the number of members

   The absolute worst case is when each delivery node serves a single
   member, and the topology requires worst case encoding. Hence, if a
   maximum of 5 group members were permitted, the encoded tree would
   never exceed 9 addresses. This would be a simple strategy to explain
   to users, but is severely restrictive. It is also hard to police,
   since the actual number of group members is unknown to the ERM
   protocols.

10.2 Limiting the number of destination routers

   Since each destination router is required to perform a trace and
   receive a trace-ACK when it 'joins' the group, it is possible for
   the source ERM router to check the current number of destination
   routers, and reject the join attempt (by sending a trace-NACK) for a
   new destination router. As above a hard worst case limit can be
   chosen which will guarantee an upper limit on the size of the
   delivery tree irrespective of any topology changes.

10.2.1 Multiple source ERM routers

   Where there are multiple source ERM routers, each ERM router will
   independently acquire a set of destination routers, and limit the
   size of only that subset. Subsequent topology changes could then
   make one or more source ERM routers redundant which may in turn
   cause one or more of the remaining source ERM routers to exceed the
   limit. Possible approaches to solving this problem are:-

   1. Dynamically remove one or more destination routers - not very
      friendly to existing users, but at least its simple! The normal
      algorithm of rejecting those above the threshold could be used.
      Doing anything else, such as LIFO, would be difficult. Since the
      state for the 'old' source ERM routers will be lost, if the new
      set of source ERM routers has no overlap with the old set, ALL
      the destinations will appear to be new. The first destinations to
      send traces will then be (arbitrarily) accepted and the remainder
      rejected.

   2. Allow the encoded tree size (and hence the packet size) to exceed
      the desirable limit - but since the number of source ERM routers
      is unbounded, so too is the size of the encoded tree.

   3. Adjust the delivery tree to remove one or more intermediate nodes
      at the expense of making the delivery tree less efficient, since
      multiple copies of a packet would be sent over some links. A
      similar effect could be obtained by splitting the delivery tree
      into two (or more) parts, each of which is below the critical
      limit. Some intelligence into exactly which nodes to eliminate

Shand, et al.              Expires Dec 2000                  [Page 36]


                       Explicit Route Multicast              June 2000


      could be introduced by having the ERM trace message include the
      current value the ICMP message hop count for each entry (from
      which the number of unicast hops corresponding to each ERM hop
      can be deduced.) This could be used as a weighting when
      evaluating the modified delivery tree.

   4. Communicate the number of destination nodes associated with each
      source ERM router and enforce a limit for the entire set.

10.2.2 Topology changes with a single source ERM router

   Even when there is only a single source ERM router, (or where there
   exist multiple source ERM routers, but their sub-trees do not become
   merged), a topology change can potentially result in a factor of 2
   (actually (2N_1)/N) increase in the size of the encoded tree. This
   can be contained by limiting the maximum number of destination
   routers assuming the worst case topology. Alternatively, a more
   optimistic assumption about the topology could be made, and worse
   cases dealt with by using the techniques outlined in bullets 1-3
   above.

10.2.3 Failure of a destination router

   As has already been described in Section 6.4.3.1, it is possible
   that a destination router may fail and be replaced by the election
   of a new designated router. The failed router would still appear as
   a destination router until the expiry of the timeout period. There
   may be other circumstances where a router is being unnecessarily
   included in the delivery tree while awaiting timeout. This could
   cause the tree size to exceed the limit, and result in the 'new'
   router being rejected.

   Under these circumstances the source ERM router MAY send one or more
   heartbeat packets with a zero holding time (as described in Section
   6.4.2.2) and reduce the timer associated with each of the
   destination routers. Any destination routers that do not quickly
   respond with a non-periodic trace can then be eliminated in a timely
   manner, thus freeing up resources for the 'new' router.

10.3 Effect of tree encoding length on MTU

   The space required in the ERM header for the encoded tree is
   unpredictable, and may vary during the lifetime of a group (as a
   result of topology changes, or joining and leaving of destinations).
   If the header size is kept to the minimum capable of containing the
   current delivery tree, then the header size, and hence the available
   MTU, will also vary. Conversely, if sufficient space in the header
   is always allocated to contain the worst case encoded tree, the MTU
   will remain constant, but there will be significant wasted
   bandwidth. The variation is approximately 5(N-1). So for N = 5 it is
   about 20 bytes, and for N = 10 about 45 bytes - a significant
   fraction of the total packet size for small payloads.

Shand, et al.              Expires Dec 2000                  [Page 37]


                       Explicit Route Multicast              June 2000


   The best compromise is to calculate MTU assuming worst case tree
   length, but adjust the header length to reflect the current encoded
   tree length requirements.

11. Source Discovery

   ERM requires that the multicast source(s) for a particular group are
   known by all the egress routers (so that they can send traces
   towards the source(s)). The means by which the source addresses for
   a particular group are assigned are outside the scope of this draft.

12. Security Considerations


   Security considerations will be addressed in a future version of
   this draft.

13. References


   1  Bradner, S., "The Internet Standards Process -- Revision 3", BCP
      9, RFC 2026, October 1996.

   2  Bradner, S., "Key words for use in RFCs to Indicate Requirement
      Levels", BCP 14, RFC 2119, March 1997

   3  Perkins, C.,"Minimal Encapsulation within IP", RFC 2004 October
      1996.



14. Acknowledgments

   The authors would like to acknowledge contributions made by Liming
   Wei, Richard Edmonstone and Brian Singerman.

15. Intellectual Property Rights Notice

   The IETF takes no position regarding the validity or scope of any
   intellectual property or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; neither does it represent that it
   has made any effort to identify any such rights.  Information on the
   IETF's procedures with respect to rights in standards-track and
   standards-related documentation can be found in BCP-11.  Copies of
   claims of rights made available for publication and any assurances
   of licenses to be made available, or the result of an attempt made
   to obtain a general license or permission for the use of such
   proprietary rights by implementors or users of this specification
   can be obtained from the IETF Secretariat.


Shand, et al.              Expires Dec 2000                  [Page 38]


                       Explicit Route Multicast              June 2000


   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights which may cover technology that may be required to practice
   this standard.  Please address the information to the IETF Executive
   Director.

16. Author's Addresses

   Joel Bion
   Cisco Systems
   375 East Tasman Drive
   San Jose
   California
   USA
   Phone: 408 527-9376
   Email: jpbion@cisco.com

   Dino Farinacci
   Procket Networks
   Phone: 408-954-7909
   Email: dino@procket.com

   Mike Shand
   Cisco Systems
   4, The Square,
   Stockley Park,
   UXBRIDGE,
   Middlesex
   UB11 1BN, UK
   Phone: +44 20 8756 8690
   Email: mshand@cisco.com

   Alex Tweedly
   Cisco Systems
   4, The Square,
   Stockley Park
   UXBRIDGE
   Middlesex
   UB11 1BN, UK
   Email: agt@cisco.com













Shand, et al.              Expires Dec 2000                  [Page 39]