Network Working Group                                          S. Bryant
Internet-Draft                                             Cisco Systems
Expires: April 19, 2006                                       R. Perlman
                                                        Sun Microsystems
                                                                A. Atlas
                                                                  Google
                                                                D. Fedyk
                                                         Nortel Networks
                                                        October 16, 2005


         TRILL using Pseudo-Wire Emulation (PWE) Encapsulation
                draft-bryant-perlman-trill-pwe-encap-00

Status of this Memo

   By submitting this Internet-Draft, each author represents that any
   applicable patent or other IPR claims of which he or she is aware
   have been or will be disclosed, and any of which he or she becomes
   aware will be disclosed, in accordance with Section 6 of BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on April 19, 2006.

Copyright Notice

   Copyright (C) The Internet Society (2005).

Abstract

   A new layer of encapsulation is required with RBridges.  This layer
   must contain at least a time-to-live and an RBridge identifier field.
   This document proposes that the reuse of the encapsulation defined by



Bryant, et al.           Expires April 19, 2006                 [Page 1]


Internet-Draft   draft-bryant-perlman-trill-pwe-encap-00    October 2005


   PWE3 for encapsulation of Ethernet frames over an MPLS packet
   switched network.


Table of Contents

   1.  Motivation . . . . . . . . . . . . . . . . . . . . . . . . . .  3
   2.  Forwarding Considerations  . . . . . . . . . . . . . . . . . .  4
     2.1.  Forwarding Table Population  . . . . . . . . . . . . . . .  5
     2.2.  QoS Treatment  . . . . . . . . . . . . . . . . . . . . . .  5
     2.3.  Load Balancing . . . . . . . . . . . . . . . . . . . . . .  6
     2.4.  Multicast and Broadcast Frames . . . . . . . . . . . . . .  6
   3.  Dynamic Assignment of 19-bit Nicknames . . . . . . . . . . . .  7
   4.  Security Considerations  . . . . . . . . . . . . . . . . . . .  8
   5.  References . . . . . . . . . . . . . . . . . . . . . . . . . .  8
   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . .  9
   Intellectual Property and Copyright Statements . . . . . . . . . . 10


































Bryant, et al.           Expires April 19, 2006                 [Page 2]


Internet-Draft   draft-bryant-perlman-trill-pwe-encap-00    October 2005


1.  Motivation

   The TRILL encapsulation requires a TTL and an RBridge ID, which could
   be the ingress or the egress depending upon the particular packet.
   There are four encapsulation mechanism that TRILL could use:

   a.  It could design its own encapsulation from scratch.

   b.  It could use an Ethernet based encapsulation.

   c.  It could use an IP based encapsulation.

   d.  It could use an MPLS based encapsulation.

   Adding, or removing an encapsulation, or forwarding a packet based on
   an encapsulation is one of the most time critical operation in any
   networking equipment, and usually requires hardware support.  The use
   of a new network encapsulation type is always problematic because new
   hardware is usually required.  This is expensive to design and
   deploy, and frequently has a significant time and risk impact on the
   market acceptance of a new network architecture.  The use of a new,
   TRILL specific, encapsulation should therefore, if possible, to be
   avoided.

   TRILL could opt to use an Ethernet based encapsulation.  The nesting
   of 802.x tags is a well understood technology and suitable hardware
   is widely deployed.  However the absence of a TTL field in the header
   means that a controlled convergence technology needs to be used
   [CCONV] to avoid the collateral damage caused by microlooping packets
   during network convergence.  Although convergence control
   technologies are now available, they are not well understood by the
   networking industry, and their use by TRILL may not be accepted by
   the industry.

   TRILL could use an IP encapsulation, but using an IP header for this
   purpose has issues (see Section 5.5 in [RBRIDGE]).  Such issues
   include the encapsulation overhead, the complexity of providing L2
   services within the L3 subnet, and the additional potential work for
   fragmentation and reassembly.

   The simplest existing encapsulation that meets the TRILL requirement
   is that defined by PWE3 for the encapsulation of Ethernet frames over
   an MPLS packet switched network [PWE3-ETHER].  The forwarding
   functionality required by TRILL is very similar to that needed to
   implement virtual private lan service (VPLS [VPLS]).  Equipment
   capable of encapsulating Ethernet packets for carriage over an MPLS
   core is widely available, and the modifications necessary to support
   TRILL would reside primarily in the control plane.



Bryant, et al.           Expires April 19, 2006                 [Page 3]


Internet-Draft   draft-bryant-perlman-trill-pwe-encap-00    October 2005


   The encapsulation described in [PWE3-ETHER] consists of an MPLS label
   stack [RFC3032] plus an OPTIONAL four byte control word.  At least
   one MPLS label stack entry (LSE) will be present in the TRILL packet.
   In addition to containing the label (delivery address), the LSE also
   contains the TTL field required by TRILL, and a QoS field (exp bits)
   that may also be of use.

   The control word carries some information that prevents the packet
   being mistaken for an IP packet in an MPLS network and incorrectly
   being subjected to ECMP.  This functionality is not required in a
   TRILL network.  The control word also contains a sequence number
   which is used to prevent the out of order delivery of PWE3 Ethernet
   payloads.  If order preservation is required the control word MUST be
   used, otherwise a TRILL implementation MAY omit the PWE3 control
   word.

   The use of the PWE3 Ethernet over MPLS encapsulation by TRILL would
   facilitate the integration of TRILL and MPLS networking.


2.  Forwarding Considerations

   As described in Section 3, each RBridge can obtain two 19-bit
   nicknames.  The first nickname can be used for the RBridge when
   unicast traffic is directed to it; it is the egress RBridge nickname.
   The second nickname can be used for multicast and broadcast traffic
   from the RBridge; it will be the ingress RBridge nickname.

   An MPLS shim header contains a 20-bit label field.  The same format
   can be used for the TRILL shim header; the labels will be distributed
   via the link-state protocol used between RBridges; those labels will
   be unique within this RBridge network instance.  The Ethertype will
   indicate that it is a TRILL frame; this will be used to provide the
   correct forwarding context for the label space.  The bottom-most bit
   of the label field can indicate whether the top 19 bits indicate a
   unicast nickname or a multicast and broadcast nickname.  The
   forwarding behavior will differ based upon this.

   In the unicast case, when an Ethernet frame is received without the
   new TRILL ethertype, the ingress RBridge will lookup the egress
   RBridge, as specified in [RBRIDGE], and obtain its egress RBridge
   nickname.  The ingress RBridge will also determine if the Ethernet
   frame has a priority specified as in 802.1p and will extract that
   3-bit priority field.  Then the original Ethernet frame will be
   encapsulated as follows:






Bryant, et al.           Expires April 19, 2006                 [Page 4]


Internet-Draft   draft-bryant-perlman-trill-pwe-encap-00    October 2005


     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |     Egress Nickname                 |0| Exp |S|       TTL     |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   |              Received Ethernet Frame                        ///
   ///                                                             |
   |                                                               |
   +---------------------------------------------------------------+

                        Exp:    Indicates Priority
                        S:      Bottom of Stack, 1 bit
                        TTL:    Time to Live, 8 bits


   Figure 1: Unicast Encapsulation

   Traditional bridges avoid misordering; it is an Ethernet invarient.
   During a traditional network convergence using a link-state protocol,
   it is possible for packets to be misordered.  The PWE3 control word
   can be used for this purpose with pseudo-wires (Section 3.7 in [PWE3-
   ETHER]); such use might require too much hardware state due to the
   desired load-balancing of flows.

   This gives the encapsulated frame the same format as an Ethernet
   pseudo-wire [PWE3-ETHER].  The forwarding path can be exactly the
   same as that used for an Ethernet pseudo-wire.

2.1.  Forwarding Table Population

   When an RBridge X learns a new egress nickname A, on each interface,
   the top 19 bits of the label are filled out with the new nickname and
   the bottom bit (the unicast/other) is set to 0; an insegment for that
   label is created (usually by adding an entry into the input label
   mapping (ILM) table.)  A corresponding outsegment is installed for
   each interface that is on the shortest path tree from the RBridge X
   to the RBridge indicated by A. That out-segment does a label swap
   operation, where the label swapped to is the same constructed label.
   The created in-segment is connected to the created out-segments with
   load balancing specified; only one out-segment will be used for a
   particular frame.

2.2.  QoS Treatment

   The encapsulation preserves the priority, if specified, of the frame
   without requiring intermediate RBridges to examine the encapsulated
   frame.  The ingress RBridge extracts the priority from the 802.1p



Bryant, et al.           Expires April 19, 2006                 [Page 5]


Internet-Draft   draft-bryant-perlman-trill-pwe-encap-00    October 2005


   field and stores that in the EXP field of the shim header.

   When an RBridge adds the outer Ethernet frame to an TRILL
   encapsulated frame, the RBridge can specify an 802.1p field with a
   priority equal to that stored in the EXP field of the shim header.
   If the EXP field is 0, then no 802.1p field is necessary.

2.3.  Load Balancing

   Load balancing between multiple equal cost paths is a concern for
   RBridges.  To properly load balance TRILL encapsulated frames, an
   RBridge should identify TRILL encapsulated frames and implement a
   specific hashing algorithm for this ethertype.  A specific Ethertype
   would be used for TRILL frames, making them trivial to identify.

   The load balancing that would be provided by current mechanisms is
   not sufficient.  Without the PWE3 control word, either the TRILL
   encapsulated frame would appear as non-IP and would be load balanced
   based on a hash of the label stack (known as LABEL ECMP [MPLS-ECMP])
   or it would be mis-identified as IP and load balanced based on the
   bits located where IP addresses would be if the encapsulated Ethernet
   frame were an IP packet.  The former case would provide no flow
   diversity, since all TRILL encapsulated frames would have the same
   label, corresponding to the same egress RBridge nickname.  The latter
   case could risk packet re-ordering.  Current mechanisms seeing the
   PWE3 control-word would use LABEL EMP and thus provide no flow
   diversity.

2.4.  Multicast and Broadcast Frames

   For multicast/broadcast frames, the ingress RBridge nickname
   indicates the spanning tree which should be used.  As with the
   unicast case, a label is formed of the nickname field and the
   unicast/other field (label[19:1] = nickname[18:0] and label[0] = 1).
   The treatment of the TTL field and the EXP fields are the same.

   When an RBridge learns of a new ingress RBridge nickname, an ILM
   entry corresponding to the label is created.  An out-segment is
   created for each interface that is in the SPT rooted at the ingress
   RBridge.  The in-segment is connected to the created out-segments
   with multicasting specified; subject to filtering, each frame will be
   sent out each out-segment.  Except for the egress filtering, the
   above forwarding behavior is already part of MPLS; it is used to
   support point-to-multipoint MPLS LSPs.

   Filtering may be applied based upon the frame and the outgoing
   interface's membership.  For instance, if a frame is being broadcast
   along a VLAN and an interface is marked as not being connected to any



Bryant, et al.           Expires April 19, 2006                 [Page 6]


Internet-Draft   draft-bryant-perlman-trill-pwe-encap-00    October 2005


   bridges or RBridges with VLAN membership, then the frame need not be
   sent out that interface.  Similarly, if a frame is being multicasted,
   the RBridge could decide to filter the frame if the interface is
   explicitly known to not be part of the multicast tree.


3.  Dynamic Assignment of 19-bit Nicknames

   We assume each RBridge has a unique 6-byte system ID, which it uses
   as its IS-IS ID.  In order to use the compressed MPLS-like encoding
   of the shim header, we need to create an identifier which is 19-bits.
   This gives a space of half a million nicknames, large enough that
   there will be enough nicknames.  We do, however, need a method for
   assigning nicknames to RBridges so that the nicknames are unique
   within the RBridge domain.

   We will assign a new type value to be carried in LSPs.  The TLV will
   carry the nickname the LSP source wishes to use.  The TLV will be:

               +------+--------+-----------------------+
               | type | length | value=19 bit nickname |
               +------+--------+-----------------------+

   Figure 2: Nickname TLV

   Each RBridge chooses its own nickname.  However, each RBridge is also
   responsible for ensuring that its nickname is unique.  If R1 chooses
   nickname x, and R1 discovers, through receipt of R2's LSP, that R2
   has also chosen x, then the RBridge with the lower system ID keeps
   the nickname, and the other one must choose a new nickname.

   If two RBridge domains merge, then there might be a lot of nickname
   collisions for a short time, but as soon as each side receives the
   link state packets of the other, the RBridges that need to change
   nicknames will quickly become aware of this, and choose new nicknames
   that do not, to the best of their ability, collide with any existing
   nicknames.

   To minimize the probability of nickname collisions, each RBridge
   chooses its nickname randomly from the set of assigned nicknames.
   Alternatively, we could use some sort of hash algorithm (such as the
   bottom 19 bits of the MD5 of the RBridge's system ID), to choose the
   first nickname, and then if there is a collision, go to the next 19
   bits of the MD5, and so on, until all 128 bits of the MD5 hash are
   exhausted, in which case the RBridge hashes its own system ID again,
   this time together with the constant "1".

   There is no reason for all RBridges to use the same algorithm for



Bryant, et al.           Expires April 19, 2006                 [Page 7]


Internet-Draft   draft-bryant-perlman-trill-pwe-encap-00    October 2005


   choosing nicknames.  Picking them at random, or using a hash, are an
   attempt to avoid collisions when the network starts up, but that is
   only an optimization.  Even if all RBridges used the same algorithm,
   say as a worst case, they all start with "1" and count up
   sequentially until they find an uncontested nickname, the network
   will eventually stabilize.  And once it is stable, nicknames should
   remain stable even as routers go up or down.

   To minimize the probability of a new RBridge usurping a nickname
   already in use, an RBridge should wait to acquire the link state
   database from a neighbor before it announces its own nickname.


4.  Security Considerations

   The security implications of selecting this format have not yet been
   considered.

5.  References

   [CCONV]    Bryant, S. and M. Shand, "Applicability of Loop-free
              Convergence", draft-bryant-shand-lf-conv-frmwk-00.txt
              (work in progress), June 2005.

   [MPLS-ECMP]
              Swallow, G., Bryant, S., and L. Andersson, "Avoiding Equal
              Cost Multipath Treatment in MPLS Networks",
              draft-ietf-mpls-ecmp-bcp-01.txt (work in progress),
              July 2005.

   [PWE3-ETHER]
              Martini, L., Rosen, E., and G. Heron, "Encapsulation
              Methods for Transport of Ethernet Over MPLS Networks",
              draft-ietf-pwe3-ethernet-encap-10.txt (work in progress),
              June 2005.

   [RBRIDGE]  Perlman, R., Touch, J., and A. Yegin, "RBridges:
              Transparent Routing", draft-perlman-rbridge-03.txt (work
              in progress), May 2005.

   [RFC3032]  Rosen, E., Tappan, D., Fedorkow, G., Rekhter, Y.,
              Farinacci, D., Li, T., and A. Conta, "MPLS Label Stack
              Encoding", RFC 3032, January 2001.

   [VPLS]     Lasserre, M. and V. Kompella, "Virtual Private LAN
              Services over MPLS", draft-ietf-l2vpn-ldp-07.txt (work in
              progress), July 2005.




Bryant, et al.           Expires April 19, 2006                 [Page 8]


Internet-Draft   draft-bryant-perlman-trill-pwe-encap-00    October 2005


Authors' Addresses

   Stewart Bryant
   Cisco Systems
   250, Longwater, Green Park
   Reading  RG2 6GB
   United Kingdom

   Email: stbryant@cisco.com


   Radia Perlman
   Sun Microsystems

   Email: Radia.Perlman@sun.com


   Alia K. Atlas
   Google
   1600 Amphitheatre Parkway
   Mountain View, CA  94043
   USA

   Email: akatlas@alum.mit.edu


   Don Fedyk
   Nortel Networks
   600 Technology Park
   Billerica, MA  01821
   USA

   Phone: +1 978 288 3041
   Email: dwfedyk@nortelnetworks.com

















Bryant, et al.           Expires April 19, 2006                 [Page 9]


Internet-Draft   draft-bryant-perlman-trill-pwe-encap-00    October 2005


Intellectual Property Statement

   The IETF takes no position regarding the validity or scope of any
   Intellectual Property Rights or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; nor does it represent that it has
   made any independent effort to identify any such rights.  Information
   on the procedures with respect to rights in RFC documents can be
   found in BCP 78 and BCP 79.

   Copies of IPR disclosures made to the IETF Secretariat and any
   assurances of licenses to be made available, or the result of an
   attempt made to obtain a general license or permission for the use of
   such proprietary rights by implementers or users of this
   specification can be obtained from the IETF on-line IPR repository at
   http://www.ietf.org/ipr.

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights that may cover technology that may be required to implement
   this standard.  Please address the information to the IETF at
   ietf-ipr@ietf.org.


Disclaimer of Validity

   This document and the information contained herein are provided on an
   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.


Copyright Statement

   Copyright (C) The Internet Society (2005).  This document is subject
   to the rights, licenses and restrictions contained in BCP 78, and
   except as set forth therein, the authors retain all their rights.


Acknowledgment

   Funding for the RFC Editor function is currently provided by the
   Internet Society.




Bryant, et al.           Expires April 19, 2006                [Page 10]