INTERNET-DRAFT                                              Mingui Zhang
Intended Status: Proposed Standard                          Xudong Zhang
Expires: April 26, 2012                                           Huawei
                                                        October 24, 2011

                      TRILL IS-IS MTU Negotiation
                draft-zhang-trill-mtu-negotiation-00.txt

Abstract

   The IETF TRILL protocol provides least cost pair-wise layer 2 data
   forwarding by using IS-IS link state routing. This document defines a
   new link MTU size negotiation mechanism to update the TRILL documents
   "Routing Bridges (RBridges): Base Protocol Specification" and
   "Routing Bridges (RBridges): Adjacency".

Status of this Memo

   This Internet-Draft is submitted to IETF in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as
   Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/1id-abstracts.html

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html


Copyright and License Notice

   Copyright (c) 2011 IETF Trust and the persons identified as the
   document authors. All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document. Please review these documents
   carefully, as they describe your rights and restrictions with respect



Mingui Zhang             Expires April 26, 2012                 [Page 1]


INTERNET-DRAFT              MTU Negotiation             October 24, 2011


   to this document. Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.


Table of Contents

   1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 3
      1.1. Content . . . . . . . . . . . . . . . . . . . . . . . . . . 3
      1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . 3
   2. Issues of Link MTU Testing . . . . . . . . . . . . . . . . . . . 3
      2.1. Global Dependence . . . . . . . . . . . . . . . . . . . . . 4
      2.2. Concealing Wrong Configuration  . . . . . . . . . . . . . . 4
      2.3. "2-Way Trap"  . . . . . . . . . . . . . . . . . . . . . . . 5
   3. TRILL IS-IS MTU Negotiation  . . . . . . . . . . . . . . . . . . 6
      3.1. Definition of Link-Wide Sz  . . . . . . . . . . . . . . . . 6
      3.3. Link MTU Size Testing Algorithm . . . . . . . . . . . . . . 7
      3.4. Re-determining Campus-Wide Sz . . . . . . . . . . . . . . . 8
      3.5. Relationship between Port MTU and Sz  . . . . . . . . . . . 8
      3.6. When Link MTU Testing is Failed . . . . . . . . . . . . . . 9
      3.7. LSP Synchronization . . . . . . . . . . . . . . . . . . . . 9
   4. Determining Link Traffic MTU Size  . . . . . . . . . . . . . . . 9
   5. Security Considerations  . . . . . . . . . . . . . . . . . . . . 9
   6. IANA Considerations  . . . . . . . . . . . . . . . . . . . . .  10
   7. References . . . . . . . . . . . . . . . . . . . . . . . . . .  10
      7.1. Normative References  . . . . . . . . . . . . . . . . . .  10
      7.2. Informative References  . . . . . . . . . . . . . . . . .  10
   Author's Addresses  . . . . . . . . . . . . . . . . . . . . . . .  11






















Mingui Zhang             Expires April 26, 2012                 [Page 2]


INTERNET-DRAFT              MTU Negotiation             October 24, 2011


1. Introduction

   The base TRILL protocol includes the way how RBridges determine the
   minimum inter-RBridge link size for the whole campus (campus-wide
   Sz), for the proper operation of TRILL IS-IS. According to
   [RFCtrill], RBridges need to know the campus-wide Sz before they do
   the link MTU size testing. The link MTU size testing therefore
   depends on the campus-wide Sz collection.

   [TRILLadj] defines the diagram of state transitions of an adjacency.
   The "link MTU size is successfully tested (A6)" is an articulate
   transition between "2-way" state and "Report" state of an adjacency.
   It is not clear, in this draft, when an adjacency should start to
   synchronize LSP database.

   This document analyzes the possible issues caused by the definition
   that link MTU size testing depends on campus-wide Sz collection. A
   new link MTU size negotiation mechanism is provided to solve the
   above problems. According to the analysis, if LSP database
   synchronization is done in the "Report" state of an adjacency, it can
   lead to a chicken and egg problem.  Therefore, this draft argues that
   the LSP database synchronization should start in the "2-way" state of
   an adjacency.

1.1. Content

   Section 2 analyzes the issues caused by the dependence on campus-wide
   Sz for link MTU size testing.

   Section 3 defines a new IS-IS MTU negotiation mechanism to update
   [RFCtrill].

   Section 4 provides a method for link traffic MTU determination.

1.2. Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [RFC2119].

2. Issues of Link MTU Testing

   Link MTU size testing is defined in Section 4.3.2 of [RFCtrill]. The
   link MTU size should not be smaller than campus-wide value of Sz
   which is the smallest value of Sz advertised by any RBridge in its
   LSP [RFCtrill]. If the link MTU size X of an adjacency is
   successfully tested (X >= campus-wide Sz), its state will move from
   2-way to Report, which is defined in [RFCadj]. The link MTU size



Mingui Zhang             Expires April 26, 2012                 [Page 3]


INTERNET-DRAFT              MTU Negotiation             October 24, 2011


   testing depends on the value of campus-wide Sz, which can be
   problematic. The issues causes by this dependence are given in the
   following subsections.

2.1. Global Dependence

                        Sz:1700           Sz:1600           Sz:1500
                         +---+             +---+             +---+
                         |RB1|(1650)-(1650)|RB2|(1650)-(1550)|RB3|
                         +---+      ^      +---+      ^      +---+
                                    |                 |
                                  Report            2-way
                                             |
                                             v

      Sz:1470           Sz:1700           Sz:1600           Sz:1500
       +---+             +---+             +---+             +---+
       |RB4|(1750)-(1750)|RB1|(1650)-(1650)|RB2|(1650)-(1550)|RB3|
       +---+      ^      +---+      ^      +---+      ^      +---+
                  |                 |                 |
                Report            Report            Report

                Figure 2.1: Adjacency global dependence

   Take Figure 2.1 as an example, before RB4 join in the campus the
   adjacency between RB2 and RB3 are trapped in 2-way state. If RB4
   joins in and its Sz value is advertised to RB2 and RB3, the campus-
   wide Sz will be updated to 1470. The link MTU test between RB2 and
   RB3 will be successful, therefore their will the adjacency between
   RB2 and RB3 will move to Report state. The state of an adjacency can
   be determined by another remote adjacency. The stability and
   maintainability of this campus can be terrible.

2.2. Concealing Wrong Configuration

   Take Figure 2.2 as an example, the Sz value of RB3 is falsely
   configured to be greater than its port MTU. The link MTU testing is
   successful because the 1470 which is smaller than the two port MTUs
   of the adjacency between RB2 and RB3. The adjacency will be in
   "Report" state. However, when RB4 leaves the campus and the campus-
   wide Sz is updated to 1550, the link MTU test of link RB2-RB3 cannot
   be successful.









Mingui Zhang             Expires April 26, 2012                 [Page 4]


INTERNET-DRAFT              MTU Negotiation             October 24, 2011


      Sz:1470           Sz:1700           Sz:1600           Sz:1550
       +---+             +---+             +---+             +---+
       |RB4|(1750)-(1750)|RB1|(1650)-(1650)|RB2|(1650)-(1500)|RB3|
       +---+      ^      +---+      ^      +---+      ^      +---+
                  |                 |                 |
                Report            Report            Report
                                             |
                                             v

                        Sz:1700           Sz:1600           Sz:1550
                         +---+             +---+             +---+
                         |RB1|(1650)-(1650)|RB2|(1650)-(1500)|RB3|
                         +---+      ^      +---+      ^      +---+
                                    |                 |
                                  Report            2-way

              Figure 2.2: Concealing wrong configuration

2.3. "2-Way Trap"

             Sz:1500           Sz:1600           Sz:1800
              +---+             +---+             +---+
              |RB1|(1650)-(1650)|RB2|(1650)-(1850)|RB3|
              +---+      ^      +---+      ^      +---+
                         |                 |
                       Report            2-way

               Figure 2.3: Adjacency got into 2-way trap

   The DRB of a LAN link is elected as early as in the "Detect" state of
   an adjacency. When a DRB is elected, it begins to send out CSNP to
   synchronize the LSP database of the RBridges attached to this LAN
   link when the adjacency between this RBridge and the DRB moves to 2-
   way state. If a non-DRB RBridge receives this CSNP and finds that
   LSPx is not in its LSP database, it will send out PSNP to request
   LSPx from the DRB. If a non-DRB receives this CSNP and finds that
   LSPx is not in the LSP database of the DRB, it will also send out
   LSPx to the DRB. It's not clear when the LSPs are start to
   synchronize in [RFCtrill]or[TRILLadj]. It is recommended that the
   synchronization should be done when an adjacency is in Report state.
   If this is true, any LSP sent through an adjacency in 2-way state
   will be discarded and the LSP database is not synchronized.

   It is probably that an RBridge does not receive all the values of Sz
   from other RBridges when there are "2-way" adjacencies in the campus.
   The campus-wide Sz cannot be determined. Take Figure 2.3 as an
   example, suppose the adjacency between RB2 and RB3 are in 2-way
   state. RB3 cannot receive LSPs from RB2 because the adjacency is not



Mingui Zhang             Expires April 26, 2012                 [Page 5]


INTERNET-DRAFT              MTU Negotiation             October 24, 2011


   in report status, so it cannot determine the campus-wide Sz. Since
   link MTU testing depends on the value of campus-wide Sz, it cannot be
   successful. The adjacency will be remained in 2-way state, which
   actually forms a chicken and egg problem.

3. TRILL IS-IS MTU Negotiation

   It is improper to use campus-wide Sz in link MTU testing and LSP
   database synchronization. In order to solved the problems depicted in
   Section 2, a new value, say the link-wide Sz, is used in link MTU
   size testing and LSP database synchronization to replace the role of
   campus-wide Sz. After link MTU size is successfully tested, the
   adjacency is changed to "Report" state.

3.1. Definition of Link-Wide Sz

   RBridges on a LAN link should exchange their local Sz through TRILL
   Hellos using the originatingLSPBufferSize, TLV #14. The smallest
   value of these Sz is the minimum acceptable size on this LAN link.
   This value is called "link-wide Sz" in order to distinguish the
   campus-wide Sz which is determined by having each RBridge in the
   campus advertise its own assumption of the value of Sz in LSPs as
   defined in Section 4.3.1 of [RFCtrill].

   The maximum size of some types of PDUs should be confined by link-
   wide Sz rather than campus-wide Sz because they are only exchanged
   between neighbors instead of the whole campus. CSNPs and PSNPs are
   such kind of PDUs. They are exchanged just on the link after a DRB is
   selected on the link.

                         Sz:1800             Sz:1700
                          +---+       |       +---+
                          |RB1|(2000)-|-(2000)|RB2|
                          +---+       |       +---+
                                      |
                  Sz:1600             |
                   +---+             +--+
                   |RB3|(2000)-(1500)|B1|
                   +---+             +--+
                                      |

               Figure 3.1: Link MTU has to be negotiated

   Even all RBridges on a specific LAN link have reach consensus on the
   link-wide Sz, it does not mean that these RBridges can safely
   exchange PDUs between each other. Take Figure 3.1 as an example. RB1,
   RB2 and RB3 are three RBridges on the same LAN link and their Sz are
   1800, 1700 and 1600 respectively, so the link-wide Sz of this LAN



Mingui Zhang             Expires April 26, 2012                 [Page 6]


INTERNET-DRAFT              MTU Negotiation             October 24, 2011


   link is 1600. There is a bridge (say B1) between RB2 and RB3 whose
   port MTU size is 1500. If RB2 sends PDUs formatted in the size of
   1600, it will be discarded by B1. Therefore the link MTU size has to
   be tested. Only after the link MTU size of an adjacency is
   successfully tested, these CSNP and PSNP PDUs will be formatted no
   greater than the tested link MTU size and will be safely transmitted
   on this link.

3.3. Link MTU Size Testing Algorithm

   The link MTU size testing method given by the last paragraph of
   Section 4.3.2 of [RFCtrill] is updated by the following Binary Search
   algorithm in which the link-wide Sz is used in the testing instead of
   campus-wide Sz.

   Step 0: RB1 sends an MTU-probe padded to the size of link-wide Sz.

   1) If RB1 successfully receives the MTU-ACK to the probe of size
      link-wide Sz from RB2, then link MTU size is set to the size of
      link-wide Sz and stop.

   2) RB1 tries to send an MTU-probe padded to the size 1470.

      a) If RB1 fails to receive an MTU-ACK from RB2 after k tries
         (where k is a configurable parameter whose default is 3), RB1
         sets the "failed minimum MTU test" flag for RB2 in RB1's Hello
         and stop.

      b) Link MTU size <-- 1470, X1 <-- 1470, X2 <-- link-wide Sz, X <--
         [(X1 + X2)/2] (Operation "[...]" returns the fraction-rounded-
         up integer.). Repeat Step 1.

   Step 1: RB1 tries to sends an MTU-probe padded to the size X.

   1) If RB1 fails to receive an MTU-ACK from RB2 after k tries, then:

         X2 <-- X and X <-- [(X1 + X2)/2]

   2) If RB1 receive an MTU-ACK to a probe of size X from RB2 then:

         link MTU size <-- X, X1 <-- X and X <-- [(X1 + X2)/2]

   3) If X1 >= X2 or Step 1 has been repeated n times (where n is a
      configurable parameter whose default is 5), stop. Else go to Step
      1.

   Since the execution of the above algorithm can be resource consuming,
   it is recommended that the DRB takes the responsibility to do the



Mingui Zhang             Expires April 26, 2012                 [Page 7]


INTERNET-DRAFT              MTU Negotiation             October 24, 2011


   testing. If the testing is finished and the tested link MTU size is
   smaller than the original link-wide Sz and the minimum Sz that has
   been advertised to the DRB, the DRB should send the tested link MTU
   size as its local Sz in in LSP0 and TRILL Hello. This will trigger
   other RBridges on the link to update their link-wide Sz to be the
   size of the tested link MTU. Then CSNPs, PSNPs and LSPs used for
   synchronization can be rightly resized and successfully exchanged on
   the link.

3.4. Re-determining Campus-Wide Sz

   RBridges may join in or leave the campus from time to time. The
   campus-wide Sz can become outdated. Section 4.3.1 of [RFCtrill] does
   not define when to re-determine the campus-wide Sz. The following
   suggestions are given for campus-wide Sz re-determination.

   1) When a new RB whose Sz is smaller than current campus-wide Sz
      joins in the campus, it MUST report its Sz in an LSP which will
      cause other RBridges update their campus-wide Sz. The LSPs in the
      campus will be resized to be no greater than the new campus-wide
      Sz.

   2) When an RB whose Sz is right the campus-wide Sz leaves the campus,
      and the LSPs generated by this RBridge are purged from the
      remaining campus after reaching MaxAge [ISO10589]. The campus-wide
      Sz ought to be resized as well. Frequent LSP "resizing" is harmful
      to the stability of the whole campus, so it should be dampened.
      Within the two kinds of resizing actions, only the upward resizing
      will be dampened. When an upward resizing event happens, a timer
      is set (this is a configurable parameter whose default value is
      300 seconds). Before this timer expires, all subsequent upward
      resizing will be dampened.

   3) An RBridge may generate multiple LSPs. It is recommended that each
      RBridge carries its Sz in LSP number 0 (shorted as LSP0)
      [ISO10589]. Otherwise, if Sz is absent in LSP0, the campus-wide Sz
      will be set to a small value 1470 at the receiver RBridge
      [RFCtrill]. When subsequent LSPs carrying Sz arrives, the campus-
      wide Sz will be resized again.

3.5. Relationship between Port MTU and Sz

   The value of local Sz of an RBridge should not be greater than any of
   its port MTU size. When port MTU size is really smaller than the
   local Sz of the RBridge, this port should be disabled from the TRILL
   campus.

   When an RBridge receives an LSP with size greater than its local Sz



Mingui Zhang             Expires April 26, 2012                 [Page 8]


INTERNET-DRAFT              MTU Negotiation             October 24, 2011


   or the campus-wide Sz, this LSP should be normally processed rather
   than discarded. If an LSP is larger than the MTU of a port over which
   it is to be propagated, then no attempt shall be made to propagate
   this LSP over the port and an LSPTooLargeToPropagate alarm shall be
   generated [ISO10589].

3.6. When Link MTU Testing is Failed

   When the MTU testing of a link is failed, the LSP synchronization
   through this link is unreliable which may cause forwarding loops,
   black-hole and so on. Therefore,

   o  The adjacency not in Report state must not be advertised in an LSP
      [RFCtrill].

   o  The overload bit of LSP0 of the pseudonode on this link should be
      set.

3.7. LSP Synchronization

   DRB and non-DRB on a link should start to synchronize LSP database
   using CSNPs and PSNPs with a neighbor when the adjacency between them
   moves to the 2-way state. The CSNPs and PSNPs should be formatted in
   chunks of size at most link-wide Sz. Since the link MTU size has not
   been tested, link-wide Sz may be greater than the actually the link
   MTU size. In that case, an CSNP or PSNP may be discarded if its size
   is greater than the link MTU size. After the link MTU size is
   successfully tested, the adjacencies will begin to formatted these
   PDUs in the size no greater than it, therefore these LSPs will
   successfully get through.

4. Determining Link Traffic MTU Size

   Campus-wide Sz is used to confine the size of the TRILL link state
   information messages (LSPs). This value is different from the MTU
   size that restricting the size of TRILL data frames. TRILL data frame
   forwarded by an RBridge can be greater than the campus-wide Sz or
   link-wide Sz. They are restricted by the physical links and devices.

   The algorithm defined in link MTU size testing can also be used in
   TRILL traffic MTU size testing, only that the link-wide Sz used in
   that algorithm should be replaced with the port MTU of the RBridge
   sending MTU probes. The successfully tested size X can be advertised
   as an attribute of this link using MTU sub-TLV defined in section 2.4
   of [TRILLisis]. An end station may collect these values by TRILL ping
   or traceroute. Path MTU is the smallest tested link MTU on this path.

5. Security Considerations



Mingui Zhang             Expires April 26, 2012                 [Page 9]


INTERNET-DRAFT              MTU Negotiation             October 24, 2011


   This document raises no new security issues for IS-IS.

6. IANA Considerations

   This document changes the following IS-IS TLV that is listed in the
   IS-IS TLV codepoint registry:

   Type        Description                    IIH   LSP   SNP
   ----        ----------------------------   ---   ---   ---
    14         originatingLSPBufferSize TLV    y     y     n

7. References

7.1. Normative References

   [RFCtrill]  R. Perlman, D. Eastlake, et al, "RBridges: Base Protocol
               Specification", RFC 6325, July 2011.

   [TRILLaf]   R. Perlman, D. Eastlake, et al, "RBridges: Appointed
               Forwarders", draft-ietf-trill-rbridge-af-05.txt, working
               in progress.

   [TRILLadj]  D. Eastlake, R. Perlman, et al, "Routing Bridges
               (RBridges): Adjacency", RFC 6327, July 2011.

   [TRILLisis] D. Eastlake, A. Banerjee, et al, "Transparent
               Interconnection of Lots of Links (TRILL) Use of IS-IS",
               RFC 6326, July 2011.

7.2. Informative References

   [ISO10589]  ISO, "Intermediate system to Intermediate system routeing
               information exchange protocol for use in conjunction with
               the Protocol for providing the Connectionless-mode
               Network Service (ISO 8473)," ISO/IEC 10589:2002.
















Mingui Zhang             Expires April 26, 2012                [Page 10]


INTERNET-DRAFT              MTU Negotiation             October 24, 2011


Author's Addresses


   Mingui Zhang
   Huawei Technologies Co.,Ltd
   Huawei Building, No.156 Beiqing Rd.
   Z-park ,Shi-Chuang-Ke-Ji-Shi-Fan-Yuan,Hai-Dian District,
   Beijing 100095 P.R. China

   Email: zhangmingui@huawei.com

   Xudong Zhang
   Huawei Technologies Co.,Ltd
   Huawei Building, No.156 Beiqing Rd.
   Z-park ,Shi-Chuang-Ke-Ji-Shi-Fan-Yuan,Hai-Dian District,
   Beijing 100095 P.R. China

   Email: zhangxudong@huawei.com

































Mingui Zhang             Expires April 26, 2012                [Page 11]