Skip to main content

Early Review of draft-ietf-idr-error-handling-15
review-ietf-idr-error-handling-15-opsdir-early-kumari-2014-11-18-00

Request Review of draft-ietf-idr-error-handling
Requested revision No specific revision (document currently at 19)
Type Early Review
Team Ops Directorate (opsdir)
Deadline 2015-03-10
Requested 2014-10-28
Authors Enke Chen , John Scudder , Prodosh Mohapatra , Keyur Patel
I-D last updated 2014-11-18
Completed reviews Genart Last Call review of -18 by Tom Taylor (diff)
Secdir Last Call review of -18 by Paul E. Hoffman (diff)
Opsdir Early review of -15 by Warren "Ace" Kumari (diff)
Rtgdir Early review of -13 by Joel M. Halpern (diff)
Rtgdir Early review of -15 by Mach Chen (diff)
Assignment Reviewer Warren "Ace" Kumari
State Completed
Request Early review on draft-ietf-idr-error-handling by Ops Directorate Assigned
Reviewed revision 15 (document currently at 19)
Result Has nits
Completed 2014-11-18
review-ietf-idr-error-handling-15-opsdir-early-kumari-2014-11-18-00
Hi all,

Be ye not afraid.
I was asked to do an early ops-dir review of this document. Y'all have
seen directorate reviews before and know what they are for...

Summary:
I see significant operational impact if this is deployed -- however,
it is all positive. IMO this document is (after a bunch of nits have
been addressed) ready for publication.

TL;DR:
Checklist:
Much of the checklist does not apply as this is not a "new" protocol,
rather changes to the handling of errors in an existing (and widely
deployed) protocol.
Much of this has to do with the  "Has the impact on network operation
been discussed? ", and it all seems good :-)



A pile 'o nits (they end at Section 6 because there were no changes
after that). These really are just nits, ignore them if you like....:


Internet Engineering Task Force                             E. Chen, Ed.
Internet-Draft                                       Cisco Systems, Inc.
Updates: 1997, 4271, 4360, 4456, 4760,                   J. Scudder, Ed.
         5543, 5701, 6368 (if approved)                 Juniper Networks
Intended status: Standards Track                            P. Mohapatra
Expires: April 27, 2015                                 Sproute Networks
                                                                K. Patel
                                                     Cisco Systems, Inc.
                                                        October 24, 2014

             Revised Error Handling for BGP UPDATE Messages
                    draft-ietf-idr-error-handling-15

Abstract

   According to the base BGP specification, a BGP speaker that receives
   an UPDATE message containing a malformed attribute is required to
   reset the session over which the offending attribute was received.
   This behavior is undesirable as a session reset would impact not only
[O] This behavior is undesirable as a session reset would impact not only
[P] This behaviour is undesirable, because a session reset would impact not only
[R] Readability

   routes with the offending attribute, but also other valid routes
[O] also other valid routes
[P] also other, valid routes
[R] clarification; otherwise sounds like the first is also valid.

   exchanged over the session.  This document partially revises the
   error handling for UPDATE messages, and provides guidelines for the
[O] UPDATE messages, and provides guidelines
[P]   UPDATE messages and provides guidelines
[R] grammar
   authors of documents defining new attributes.  Finally, it revises
   the error handling procedures for a number of existing attributes.

   This document updates error handling for RFCs 1997, 4271, 4360, 4456,
   4760, 5543, 5701 and 6368.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at 

http://datatracker.ietf.org/drafts/current/

.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on April 27, 2015.

Chen, et al.             Expires April 27, 2015                 [Page 1]
Internet-Draft       Revised Error Handling for BGP         October 2014

Copyright Notice

   Copyright (c) 2014 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (

http://trustee.ietf.org/license-info

) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

   This document may contain material from IETF Documents or IETF
   Contributions published or made publicly available before November
   10, 2008.  The person(s) controlling the copyright in some of this
   material may not have granted the IETF Trust the right to allow
   modifications of such material outside the IETF Standards Process.
   Without obtaining an adequate license from the person(s) controlling
   the copyright in such materials, this document may not be modified
   outside the IETF Standards Process, and derivative works of it may
   not be created outside the IETF Standards Process, except to format
   it for publication as an RFC or to translate it into languages other
   than English.
Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
     1.1.  Requirements Language . . . . . . . . . . . . . . . . . .   4
   2.  Error-Handling Approaches . . . . . . . . . . . . . . . . . .   4
   3.  Revision to BGP UPDATE Message Error Handling . . . . . . . .   4
   4.  Attribute Length Fields . . . . . . . . . . . . . . . . . . .   6
   5.  Parsing of NLRI Fields  . . . . . . . . . . . . . . . . . . .   7
     5.1.  Encoding NLRI . . . . . . . . . . . . . . . . . . . . . .   7
     5.2.  Missing NLRI  . . . . . . . . . . . . . . . . . . . . . .   7
     5.3.  Syntactic Correctness of NLRI Fields  . . . . . . . . . .   8
     5.4.  Typed NLRI  . . . . . . . . . . . . . . . . . . . . . . .   8
   6.  Operational Considerations  . . . . . . . . . . . . . . . . .   9
   7.  Error Handling Procedures for Existing Attributes . . . . . .  10
     7.1.  ORIGIN  . . . . . . . . . . . . . . . . . . . . . . . . .  10
     7.2.  AS_PATH . . . . . . . . . . . . . . . . . . . . . . . . .  10
     7.3.  NEXT_HOP  . . . . . . . . . . . . . . . . . . . . . . . .  11
     7.4.  MULTI_EXIT_DISC . . . . . . . . . . . . . . . . . . . . .  11
     7.5.  LOCAL_PREF  . . . . . . . . . . . . . . . . . . . . . . .  11
     7.6.  ATOMIC_AGGREGATE  . . . . . . . . . . . . . . . . . . . .  11
     7.7.  AGGREGATOR  . . . . . . . . . . . . . . . . . . . . . . .  12

Chen, et al.             Expires April 27, 2015                 [Page 2]
Internet-Draft       Revised Error Handling for BGP         October 2014

     7.8.  Community . . . . . . . . . . . . . . . . . . . . . . . .  12
     7.9.  ORIGINATOR_ID . . . . . . . . . . . . . . . . . . . . . .  12
     7.10. CLUSTER_LIST  . . . . . . . . . . . . . . . . . . . . . .  12
     7.11. MP_REACH_NLRI . . . . . . . . . . . . . . . . . . . . . .  13
     7.12. MP_UNREACH_NLRI . . . . . . . . . . . . . . . . . . . . .  13
     7.13. Traffic Engineering path attribute  . . . . . . . . . . .  13
     7.14. Extended Community  . . . . . . . . . . . . . . . . . . .  13
     7.15. IPv6 Address Specific BGP Extended Community Attribute  .  14
     7.16. ATTR_SET  . . . . . . . . . . . . . . . . . . . . . . . .  14
   8.  Guidance for Authors of BGP Specifications  . . . . . . . . .  14
   9.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  15
   10. Security Considerations . . . . . . . . . . . . . . . . . . .  15
   11. Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  15
   12. References  . . . . . . . . . . . . . . . . . . . . . . . . .  15
     12.1.  Normative References . . . . . . . . . . . . . . . . . .  16
     12.2.  Informative References . . . . . . . . . . . . . . . . .  17
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  17

1.  Introduction

   According to the base BGP specification [RFC4271], a BGP speaker that
   receives an UPDATE message containing a malformed attribute is
   required to reset the session over which the offending attribute was
   received.  This behavior is undesirable as a session reset would
   impact not only routes with the offending attribute, but also other
   valid routes exchanged over the session.  In the case of optional
[O] This behavior is undesirable as a session reset would
   impact not only routes with the offending attribute, but also other
   valid routes exchanged over the session.
[P] This behavior is undesirable, because a session reset impacts not
only routes with the offending attribate, but also other, valid routes
exchanged over the session.
[R] grammar and readability

transitive attributes, the behavior is especially troublesome and may
   present a potential security vulnerability.  The reason is that such
   attributes may have been propagated without being checked by
   intermediate routers that do not recognize the attributes -- in
   effect the attribute may have been tunneled, and when they do reach a
   router that recognizes and checks them, the session that is reset may
   not be associated with the router that is at fault.  To make matters
[O] The reason is that such
   attributes may have been propagated without being checked by
   intermediate routers that do not recognize the attributes -- in
   effect the attribute may have been tunneled, and when they do reach a
   router that recognizes and checks them, the session that is reset may
   not be associated with the router that is at fault.
[P] This is because attributes may have been proproaged without being
checked by intermediate routers that don't recognize the attributes.
In effect, the attributes may have been tunneled; and when they reach
a router that recognizes and checks the attributes, the session that
is reset may not be associated with the router that is at fault.
[R] Grammar, readability


   worse, in such cases although the problematic attributes may have
   originated with a single update transmitted by a single BGP speaker,
   by the time they encounter a router that checks them they may have
   been replicated many times, and thus may cause the reset of many
   peering sessions.  Thus the damage inflicted may be multiplied
   manyfold.

   The goal for revising the error handling for UPDATE messages is to
   minimize the impact on routing by a malformed UPDATE message, while
   maintaining protocol correctness to the extent possible.  This can be
   achieved largely by maintaining the established session and keeping
   the valid routes exchanged, but removing the routes carried in the
   malformed UPDATE from the routing system.

Chen, et al.             Expires April 27, 2015                 [Page 3]
Internet-Draft       Revised Error Handling for BGP         October 2014

   This document partially revises the error handling for UPDATE
   messages, and provides guidelines for the authors of documents
   defining new attributes.  Finally, it revises the error handling
   procedures for a number of existing attributes.  Specifically, the
   error handling procedures of [RFC1997], [RFC4271], [RFC4360],
   [RFC4456], [RFC4760], [RFC5543], [RFC5701], and [RFC6368] are
   revised.

1.1.  Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [RFC2119].

2.  Error-Handling Approaches

   In this document we refer to four different approaches to handling
   errors found in BGP path attributes.  They are as follows (listed in
   order, from the one with the "strongest" action to the one with the
   "weakest" action):

   o  Session reset: This is the approach used throughout the base BGP
      specification [RFC4271], where a NOTIFICATION is sent and the
      session terminated.

   o  AFI/SAFI disable: [RFC4760] specifies a procedure for disabling a
      particular AFI/SAFI.

   o  Treat-as-withdraw: In this approach, the UPDATE message containing
      the path attribute in question MUST be treated as though all
      contained routes had been withdrawn just as if they had been
      listed in the WITHDRAWN ROUTES field (or in the MP_UNREACH_NLRI
      attribute if appropriate) of the UPDATE message, thus causing them
      to be removed from the Adj-RIB-In according to the procedures of
      [RFC4271].

   o  Attribute discard: In this approach the malformed attribute MUST
      be discarded and the UPDATE message continues to be processed.
      This approach must not be used except in the case of an attribute
      that has no effect on route selection or installation.

3.  Revision to BGP UPDATE Message Error Handling

   This specification amends [RFC4271] Section 6.3 in a number of ways.
   See also Section 7 for treatment of specific path attributes.

   a.  The first paragraph is revised as follows:

Chen, et al.             Expires April 27, 2015                 [Page 4]
Internet-Draft       Revised Error Handling for BGP         October 2014

          Old Text:

             All errors detected while processing the UPDATE message
             MUST be indicated by sending the NOTIFICATION message with
             the Error Code UPDATE Message Error.  The error subcode
             elaborates on the specific nature of the error.

          New Text:

             An error detected while processing the UPDATE message for
             which a session reset is specified MUST be indicated by
             sending the NOTIFICATION message with the Error Code UPDATE
             Message Error.  The error subcode elaborates on the
             specific nature of the error.

   b.  Error handling for the following case remains unchanged:

             If the Withdrawn Routes Length or Total Attribute Length is
             too large (i.e., if Withdrawn Routes Length + Total
             Attribute Length + 23 exceeds the message Length), then the
             Error Subcode MUST be set to Malformed Attribute List.

   c.  Attribute Flag error handling is revised as follows:

          Old Text:

             If any recognized attribute has Attribute Flags that
             conflict with the Attribute Type Code, then the Error
             Subcode MUST be set to Attribute Flags Error.  The Data
             field MUST contain the erroneous attribute (type, length,
             and value).

          New Text:

             If the value of either the Optional or Transitive bits in
             the Attribute Flags is in conflict with their specified
             values, then the attribute MUST be treated as malformed and
             the treat-as-withdraw approach used, unless the
             specification for the attribute mandates different handling
             for incorrect Attribute Flags.

   d.  If any of the well-known mandatory attributes are not present in
       an UPDATE message, then "treat-as-withdraw" MUST be used.  (Note
       that [RFC4760] reclassifies NEXT_HOP as what is effectively
       discretionary.)

Chen, et al.             Expires April 27, 2015                 [Page 5]
Internet-Draft       Revised Error Handling for BGP         October 2014

   e.  "Treat-as-withdraw" MUST be used for the cases that specify a
       session reset and involve any of the attributes ORIGIN, AS_PATH,
       NEXT_HOP, MULTI_EXIT_DISC, or LOCAL_PREF.

   f.  "Attribute discard" MUST be used for any of the cases that
       specify a session reset and involve ATOMIC_AGGREGATE or
       AGGREGATOR.

   g.  If the MP_REACH_NLRI attribute or the MP_UNREACH_NLRI [RFC4760]
       attribute appears more than once in the UPDATE message, then a
       NOTIFICATION message MUST be sent with the Error Subcode
       "Malformed Attribute List".  If any other attribute (whether
       recognized or unrecognized) appears more than once in an UPDATE
       message, then all the occurrences of the attribute other than the
       first one SHALL be discarded and the UPDATE message continue to
       be processed.

   h.  When multiple attribute errors exist in an UPDATE message, if the
       same approach (either "session reset", "treat-as-withdraw" or
       "attribute discard") is specified for the handling of these
       malformed attributes, then the specified approach MUST be used.
       Otherwise the approach with the strongest action MUST be used.

   i.  The Withdrawn Routes field MUST be checked for syntactic
       correctness in the same manner as the NLRI field.  This is
       discussed further below, and in Section 5.3.

   j.  Finally, we observe that in order to use the approach of "treat-
       as-withdraw", the entire NLRI field and/or the MP_REACH_NLRI and
       MP_UNREACH_NLRI attributes need to be successfully parsed -- what
       this entails is discussed in more detail in Section 5.  If this
       is not possible, the procedures of [RFC4271] and/or [RFC4760]
       continue to apply, meaning that the "session reset" approach (or
       the "AFI/SAFI disable" approach) MUST be followed.

4.  Attribute Length Fields

   There are two error cases in which the Total Attribute Length value
   can be in conflict with the enclosed path attributes, which
   themselves carry length values.  In the "overrun" case, as the
   enclosed path attributes are parsed, the length of the last
   encountered path attribute would cause the Total Attribute Length to
   be exceeded.  In the "underrun" case, as the enclosed path attributes
   are parsed, after the last successfully-parsed attribute, fewer than
   three octets remain, or fewer than four octets, if the Attribute
   Flags field has the Extended Length bit set -- that is, there remains
   unconsumed data in the path attributes but yet insufficient data to
   encode a single minimum-sized path attribute.  In either of these

Chen, et al.             Expires April 27, 2015                 [Page 6]
Internet-Draft       Revised Error Handling for BGP         October 2014

   cases an error condition exists and the treat-as-withdraw approach
[O] cases an error condition exists
[P] cases, an error condition exists
[R] Readability

MUST be used (unless some other, more severe error is encountered
   dictating a stronger approach), and the Total Attribute Length MUST
   be relied upon to enable the beginning of the NLRI field to be
   located.

   For all path attributes other than those specified as having an
   attribute length that may be zero it SHALL be considered a syntax
[O] length that may be zero it SHALL be considered
[P] length that may be zero, it SHALL be considered
[R] grammar
   error for the attribute to have a length of zero.  (Of the path
   attributes considered in this specification, only AS_PATH and
   ATOMIC_AGGREGATE may validly have an attribute length of zero.)

5.  Parsing of NLRI Fields
[O] 5. Parsing of NLRI Fields
[P] 5. Parsing of Network Layer Reachability Information (NLRI) Fields
[R] First user of acronym

5.1.  Encoding NLRI

   To facilitate the determination of the NLRI field in an UPDATE with a
   malformed attribute:

   o  The MP_REACH_NLRI or MP_UNREACH_NLRI attribute (if present) SHALL
      be encoded as the very first path attribute in an UPDATE.

   o  An UPDATE message MUST NOT contain more than one of the following:
      non-empty Withdrawn Routes field, non-empty Network Layer
      Reachability Information field, MP_REACH_NLRI attribute, and
      MP_UNREACH_NLRI attribute.

   Since older BGP speakers may not implement these restrictions, an
   implementation MUST still be prepared to receive these fields in any
   position or combination.

   If the encoding of [RFC4271] is used, the NLRI field for the IPv4
   unicast address family is carried immediately following all the
   attributes in an UPDATE.  When such an UPDATE is received, we observe
   that the NLRI field can be determined using the "Message Length",
   "Withdrawn Route Length" and "Total Attribute Length" (when they are
   consistent) carried in the message instead of relying on the length
   of individual attributes in the message.

5.2.  Missing NLRI

   [RFC4724] specifies an End-of-RIB message ("EoR") that can be encoded
   as an UPDATE message that contains only a MP_UNREACH_NLRI attribute
   that encodes no NLRI (it can also be a completely empty UPDATE
   message in the case of the "legacy" encoding).  In all other well-
   specified cases, an UPDATE either carries only withdrawn routes
   (either in the Withdrawn Routes field, or the MP_UNREACH_NLRI
   attribute), or it advertises reachable routes (either in the Network

Chen, et al.             Expires April 27, 2015                 [Page 7]
Internet-Draft       Revised Error Handling for BGP         October 2014

   Layer Reachability Information field, or the MP_REACH_NLRI
   attribute).

   Thus, if an UPDATE message is encountered that does contain path
   attributes other than MP_UNREACH_NLRI and doesn't encode any
   reachable NLRI, we cannot be confident that the NLRI have been
   successfully parsed as Section 3 (j) requires.  For this reason, if
   any path attribute errors are encountered in such an UPDATE message,
   and if any encountered error specifies an error-handling approach
   other than "attribute discard", then the "session reset" approach
   MUST be used.

5.3.  Syntactic Correctness of NLRI Fields

   The NLRI field or Withdrawn Routes field SHALL be considered
   "syntactically incorrect" if either of the following are true:

   o  The length of any of the included NLRI is greater than 32,

   o  When parsing NLRI contained in the field, the length of the last
      NLRI found exceeds the amount of unconsumed data remaining in the
      field.

   Similarly, the MP_REACH_NLRI or MP_UNREACH_NLRI attribute of an
   update SHALL be considered to be incorrect if any of the following
   are true:

   o  The length of any of the included NLRI is inconsistent with the
      given AFI/SAFI (for example, if an IPv4 NLRI has a length greater
      than 32 or an IPv6 NLRI has a length greater than 128),

   o  When parsing NLRI contained in the attribute, the length of the
      last NLRI found exceeds the amount of unconsumed data remaining in
      the attribute.

   o  The attribute flags of the attribute are inconsistent with those
      specified in [RFC4760].

   o  The length of the MP_UNREACH_NLRI attribute is less than 3, or the
      length of the MP_REACH_NLRI attribute is less than 5.

5.4.  Typed NLRI

   Certain address families, for example MCAST-VPN [RFC6514], MCAST-VPLS
   [RFC7117] and EVPN [I-D.ietf-l2vpn-evpn] have NLRI that are typed.
   Since supported type values within the address family are not
   expressed in the MP-BGP capability [RFC4760], it is possible for a
   BGP speaker to advertise support for the given address family and

Chen, et al.             Expires April 27, 2015                 [Page 8]
Internet-Draft       Revised Error Handling for BGP         October 2014

   sub-address family while still not supporting a particular type of
   NLRI within that AFI/SAFI.

   A BGP speaker advertising support for such a typed address family
   MUST handle routes with unrecognized NLRI types within that address
   family by discarding them, unless the relevant specification for that
   address family specifies otherwise.

6.  Operational Considerations

   Although the "treat-as-withdraw" error-handling behavior defined in
   Section 2 makes every effort to preserve BGP's correctness, we note
   that if an UPDATE received on an IBGP session is subjected to this
   treatment, inconsistent routing within the affected Autonomous System
   may result.  The consequences of inconsistent routing can include
   long-lived forwarding loops and black holes.  While lamentable, this
   issue is expected to be rare in practice, and more importantly is
   seen as less problematic than the session-reset behavior it replaces.
[O] and more importantly
[P] and, more importantly,
[R] grammar


   When a malformed attribute is indeed detected over an IBGP session,
   we RECOMMEND that routes with the malformed attribute be identified
   and traced back to the ingress router in the network where the routes
   were sourced or received externally, and then a filter be applied on
   the ingress router to prevent the routes from being sourced or
   received.  This will help maintain routing consistency in the
   network.

   Even if inconsistent routing does not arise, the "treat-as-withdraw"
   behavior can cause either complete unreachability or sub-optimal
   routing for the destinations whose routes are carried in the affected
   UPDATE message.

   Note that "treat-as-withdraw" is different from discarding an UPDATE
   message.  The latter violates the basic BGP principle of incremental
   update, and could cause invalid routes to be kept.

   Because of these potential issues, a BGP speaker MUST provide
   debugging facilities to permit issues caused by a malformed attribute
   to be diagnosed.  At a minimum, such facilities MUST include logging
   an error listing the NLRI involved, and containing the entire
   malformed UPDATE message when such an attribute is detected.  The
   malformed UPDATE message SHOULD be analyzed, and the root cause
   SHOULD be investigated.

   Section 8 mentions that attribute discard should not be used in cases
   where "the attribute in question has or may have an effect on route
   selection."  Although all cases that specify attribute discard in
   this document do not affect route selection by default, in principle

Chen, et al.             Expires April 27, 2015                 [Page 9]
Internet-Draft       Revised Error Handling for BGP         October 2014

   routing policies could be written that affect selection based on such
   an attribute.  Operators should take care when writing such policies
   to consider the possible consequences of an attribute discard.  (In
   general, as long as such policies are only applied to external BGP
   sessions, correctness issues are not expected to arise.)







-- 
I don't think the execution is relevant when it was obviously a bad
idea in the first place.
This is like putting rabid weasels in your pants, and later expressing
regret at having chosen those particular rabid weasels and that pair
of pants.
   ---maf