TEAS Working Group                                           Ravi Singh
 Internet Draft                                            Yakov Rekhter
 Intended status: Informational                      Vishnu Pavan Beeram
                                                        Juniper Networks
                                                              Rob Shakir
                                                         British Telecom
                                                              Tarek Saad
                                                           Cisco Systems
 
 Expires: September 09, 2015                              March 09, 2015
 
 
                       RSVP Setup Retry - BCP
              draft-ravisingh-teas-rsvp-setup-retry-00
 
 
 Status of this Memo
 
    This Internet-Draft is submitted in full conformance with the
    provisions of BCP 78 and BCP 79.
 
    Internet-Drafts are working documents of the Internet Engineering
    Task Force (IETF), its areas, and its working groups.  Note that
    other groups may also distribute working documents as Internet-
    Drafts.
 
    Internet-Drafts are draft documents valid for a maximum of six
    months and may be updated, replaced, or obsoleted by other documents
    at any time.  It is inappropriate to use Internet-Drafts as
    reference material or to cite them other than as "work in progress."
 
    The list of current Internet-Drafts can be accessed at
    http://www.ietf.org/ietf/1id-abstracts.txt
 
    The list of Internet-Draft Shadow Directories can be accessed at
    http://www.ietf.org/shadow.html
 
    This Internet-Draft will expire on September 09, 2015.
 
 Copyright Notice
 
    Copyright (c) 2015 IETF Trust and the persons identified as the
    document authors. All rights reserved.
 
    This document is subject to BCP 78 and the IETF Trust's Legal
    Provisions Relating to IETF Documents
    (http://trustee.ietf.org/license-info) in effect on the date of
    publication of this document. Please review these documents
    carefully, as they describe your rights and restrictions with
 
 
 
 
 Ravi Singh            Expires September 09, 2015               [Page 1]


 Internet-Draft             RSVP Setup Retry                  March 2015
 
 
    respect to this document.  Code Components extracted from this
    document must include Simplified BSD License text as described in
    Section 4.e of the Trust Legal Provisions and are provided without
    warranty as described in the Simplified BSD License.
 
 Abstract
 
    This document discusses the best current practices associated with
    the implementation of RSVP setup-retry timer.
 
 Conventions used in this document
 
    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
    "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
    document are to be interpreted as described in RFC-2119 [RFC2119].
 
 
 Table of Contents
 
    1. Introduction...................................................2
    2. Setup-Retry Timer..............................................3
    3. Possible ill-effects due to implementation choices.............3
    4. Causes of the above ill-effects................................5
    5. Solution to the implementation issues..........................5
    6. Security Considerations........................................6
    7. IANA Considerations............................................6
    8. Normative References...........................................6
    9. Acknowledgments................................................6
    10. Authors' Addresses............................................6
    Contributors......................................................7
 
 1. Introduction
 
    In an RSVP-TE network with a very large number of LSPs, link/node
    failure(s) may produce a noticeable increase in RSVP-TE control
    traffic. As a result, RSVP-TE messages might get delayed by virtue
    of being stuck in a queue that is overwhelmed with messages to be
    sent or they might get lost forever. For example, a Path message
    intended to be sent by a transit router might be stuck in the output
    queue to be sent to the next-hop. Alternately, it might have got
    dropped on the receive side due to queue overflows. The same could
    happen for a Resv message in the reverse direction. Also, in the
    absence of reliable delivery of Path-Error messages [RFC2961], an
    error that gets generated at transit/egress for an LSP that is in
    the process of being setup may never make it to the ingress.
 
 
 
 
 Ravi Singh            Expires September 09, 2015               [Page 2]


 Internet-Draft             RSVP Setup Retry                  March 2015
 
 
    Lost/delayed RSVP-TE messages cause the following problems for an
    ingress router:
    - In the absence of an error indication, how is an ingress to know
      that an LSP for which signaling was (re-)initiated and a Resv has
      not yet been received, is ever going to come up?
    - In the absence of any indication, what action should the ingress
      take to support low-latency LSP-setup?
 
    The above problems essentially boil-down to: how long should the
    ingress continue to wait before giving up on its attempt to bring up
    the LSP, and take some alternative course of action (e.g., try to
    bring up the LSP on an alternate path)?. To mitigate this problem,
    some implementations use a setup-retry timer mechanism. This
    document discusses the issues associated with a particular
    implementation of this timer and makes some specific recommendations
    to get around these issues.
 
 2. Setup-Retry Timer
 
    The setup-retry timer is usually a configurable timer which (in the
    absence of an error indication) goes off when an LSP with a given
    LSPID has not received the corresponding Resv in response to its
    Path during a pre-configured duration after its first Path had been
    sent.
 
    Use of the setup-retry timer is based on the presumption that if
    signaling for a given LSP has not been completed within an
    "expected" duration, it is not going to be completed at all. The
    intent in the use of this timer is to expeditiously take some
    alternative course of action when an LSP has not yet completed its
    signaling within an "expected" duration of time.
 
 3. Possible ill-effects due to implementation choices
 
    As mentioned in the previous section, the intent in the use of this
    timer is to take some alternative course of action when an LSP has
    not yet completed its signaling within an "expected" duration of
    time. One such course of action is for the ingress router to
    initiate tear-down for the previously in-the-process-of-being-
    signaled path via a PathTear; run CSPF; and use the outcome of this
    CSPF to signal the brand-new path for this tunnel with a different
    LSP-ID, typically, bumped up by 1. This section describes the
    problems caused by such course of action.
 
    As mentioned in Section 1, in a network with a very large number of
    RSVP-TE LSPs, link/node failure(s) may produce a noticeable increase
 
 
 
 Ravi Singh            Expires September 09, 2015               [Page 3]


 Internet-Draft             RSVP Setup Retry                  March 2015
 
 
    in the volume of RSVP-TE control traffic, which in turn might cause
    a router to either drop RSVP-TE messages or alternately cause them
    to be sent excessively late.
 
    As a result, the following problems can occur:
    - LSP setup latency might be excessively high.
    - Error messages that indicate failure in LSP setup might not make
      it to the ingress router.
 
    A mix of the above problems can cause the setup-retry timer for a
    given LSP (at the ingress router) to fire repeatedly over a period
    of time. The situation being such the ingress gets stuck in a cycle
    as illustrated below for some/many LSPs:
 
    --------------------------------------------------------------------
    Ingress Timeline        | [Ingress]---[]---[]...[Transit]...[]---[]-
    ------------------------|
    1. Trigger LSP setup    | Path
              :             |   TNL-ID=X
              :             |   LSP-ID=Y
              :             | -------->
       <No Resv (X, Y)>     |          ------------> Path (X, Y)
              :             |                        -------> --------->
              :             |                  :
              :             |                  :
    2. Setup-Retry Timer    |                  :
       fires; Recompute     |                  :
       path;                |                  :
    3. Trigger Teardown     | PathTear
                            |   TNL-ID=X
                            |   LSP-ID=Y
                            | -------->
                            |          ------------> PathTear (X, Y)
                            |                        -------> --------->
    4. Trigger setup for new| Path
       instance of the LSP  |   TNL-ID=X
       (same ERO)           |   LSP-ID=Y+1
              :             | -------->
              :             |          ------------> Path (X, Y+1)
              :             |                        -------> --------->
              :             |                                 Resv
       <No Resv (X, Y+1)>   |                                   TNL-ID=X
              :             |                                   LSP-ID=Y
              :             |                                 <---------
              :             |                                 ResvError
              :             |                                   No Path
 
 
 
 Ravi Singh            Expires September 09, 2015               [Page 4]


 Internet-Draft             RSVP Setup Retry                  March 2015
 
 
              :             |                                 --------->
    5. Repeat loop through  |                  :
       2-4                  |                  :
    --------------------------------------------------------------------
 
    In the above illustration, notice how the transit router never gets
    to completely process the "current" LSP-ID (see [Rshakir] for more).
    The implementation recommendations made in this document will help
    avoid this snowball effect.
 
 4. Causes of the above ill-effects
 
    The implementation issues listed in section 3 end up causing an
    increase in the control plane load on a network whose control plane
    is already under stress. The foregoing is caused by unnecessarily
    doing the following even when there is no change in the computed
    path:
 
    - Sending PathTears causes excessive and unjustifiable work on those
      downstream routers on the "previous ERO path" that had managed to
      bring the LSP UP. In other words, the slowness of a given transit
      router should not be the cause to penalize all other transit
      routers downstream of it, as doing so just increases the overall
      network stress.
 
    - Sending Path for LSPID=Y+1 causes unnecessary work for all routers
      on the ERO path including those that were already running slow and
      were the real cause of the Resv for LSPDID=Y not having been
      received timely by the ingress.
 
 5. Solution to the implementation issues
 
    To eliminate causes of the ill-effects listed in the previous
    section and thus to eliminate the ill-effects, this document makes
    the following recommendations.
 
    When the setup-retry timer fires:
 
    If there is no change in the computed path (no error indication for
    that LSP has been received via a PathErr or a TE update indicating a
    failure),
    - Do not send PathTear for LSPID=Y
    - Just let the Path State get refreshed for LSPID=Y.
 
    The recommended default behavior is to keep retrying until the path
    changes or the user intervenes. Implementations MAY choose to
 
 
 
 Ravi Singh            Expires September 09, 2015               [Page 5]


 Internet-Draft             RSVP Setup Retry                  March 2015
 
 
    provide the user with an option to override this default behavior
    and specify a policy to determine when to stop retrying.
 
    Implementations SHOULD use the recommendations listed in this
    section to avoid getting stuck in a LSP signaling hysteresis.
 
 6. Security Considerations
 
    This document does not introduce any new security concerns.
 
 7. IANA Considerations
 
    None.
 
 8. Normative References
 
    [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.
 
    [RShakir] Rob Shakir, "The next spring forward",
             http://rob.sh/files/the-next-spring-forward_rjs120314.pdf
             March 2014.
 
    [RFC2961] Berger, L., "RSVP Refresh Overhead Reduction Extensions",
             RFC 2961, April 2001.
 
 9. Acknowledgments
 
    The authors would like to thank Raveendra Torvi for his inputs.
 
 10. Authors' Addresses
 
    Ravi Singh
    Juniper Networks
    Email: ravis@juniper.net
 
    Yakov Rekhter
    Juniper Networks
    Email: yakov@juniper.net
 
    Rob Shakir
    British Telecom
    Email: rob.shakir@bt.com
 
    Tarek Saad
    Cisco Systems
 
 
 
 Ravi Singh            Expires September 09, 2015               [Page 6]


 Internet-Draft             RSVP Setup Retry                  March 2015
 
 
    Email: tsaad@cisco.com
 
    Vishnu Pavan Beeram
    Juniper Networks
    Email: vbeeram@juniper.net
 
 Contributors
 
    Markus Jork
    Juniper Networks
    Email: mjork@juniper.net
 
    Aman Kapoor
    Juniper Networks
    Email: amanka@juniper.net
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 Ravi Singh            Expires September 09, 2015               [Page 7]