Network Working Group                                            J. Dong
Internet-Draft                                                  X. Zhang
Intended status: Informational                       Huawei Technologies
Expires: May 4, 2017                                               Z. Li
                                                            China Mobile
                                                        October 31, 2016


                  OSPF LSA Flushing Problem Statement
           draft-dong-ospf-maxage-flush-problem-statement-01

Abstract

   In OSPF protocol, Link State Advertisements (LSAs) are exchanged in
   Link State Update (LSU) packets to achieve link state database (LSDB)
   synchronization and consistent route calculation.  OSPF protocol
   specifies several scenarios in which an LSA is flushed with the LS
   age field set to MaxAge.  In some cases, the flushing of MaxAge LSAs
   may cause flooding storm of OSPF packets and severely impact the
   services provided by the network.

   This document describes the problem of OSPF LSA flushing, and ask for
   solutions to solve this problem.

Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [RFC2119].

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on May 4, 2017.





Dong, et al.               Expires May 4, 2017                  [Page 1]


Internet-Draft     OSPF MaxAge Flush Problem Statement      October 2016


Copyright Notice

   Copyright (c) 2016 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
   2.  Typical Scenarios of LSA Flushing . . . . . . . . . . . . . .   3
   3.  Consequence of LSA Flushing . . . . . . . . . . . . . . . . .   3
   4.  Requirements on Potential Solutions . . . . . . . . . . . . .   4
     4.1.  Solution for Problem Localization . . . . . . . . . . . .   4
     4.2.  Solution for Impact Mitigation  . . . . . . . . . . . . .   4
   5.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .   5
   6.  Security Considerations . . . . . . . . . . . . . . . . . . .   5
   7.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .   5
   8.  References  . . . . . . . . . . . . . . . . . . . . . . . . .   5
     8.1.  Normative References  . . . . . . . . . . . . . . . . . .   5
     8.2.  Informative References  . . . . . . . . . . . . . . . . .   6
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .   6

1.  Introduction

   In OSPF protocol [RFC2328], Link State Updates (LSAs) are exchanged
   in Link State Update (LSU) packets to achieve link state database
   (LSDB) synchronization and consistent route calculation.  OSPF
   specifies several scenarios in which an LSA is flushed with the LS
   age field set to MaxAge.  In some cases, the flushing of MaxAge LSAs
   may cause flooding storm of OSPF packets and severely impact the
   services in the network.  Since the MaxAge LSA may be flushed by any
   OSPF router, usually it would take a long time for troubleshooting
   and could cause huge damage to both the network provider and its
   customers.








Dong, et al.               Expires May 4, 2017                  [Page 2]


Internet-Draft     OSPF MaxAge Flush Problem Statement      October 2016


2.  Typical Scenarios of LSA Flushing

   [RFC2328] specifies several scenarios in which an LSA should be
   flushed with the LS age field set to MaxAge.  Under normal
   circumstances, the LSA flushing happens when the LS age of an LSA
   naturally reaches MaxAge, this can be done by any OSPF router.  Since
   OSPF router would generate a new instance of the self-originated LSA
   when its LS age reaches LSRefreshTime, which is usually the half of
   the value of MaxAge, the naturally aging to MaxAge case would only
   happen when the originator of the LSA is not reachable in the network
   and cannot refresh the LSA.

   Another case of LSA flushing is "Premature aging", which is to set
   the LS age of a self-originated LSA to MaxAge and then flood the LSA.
   Premature aging is used when the self-originated LSA's sequence
   number field is about to wrap, or all the external routes previously
   advertised by the LSA are no longer reachable.  Premature aging and
   flushing of LSA can also happen when a router is changed from the
   Designated Router (DR) to a non-DR, or in some rare cases the
   router's Router ID is changed.

   Field experience has shown several circumstances where MaxAge LSA
   flushing may be generated by the misbehaved router in the network.
   For example, the LS age may be corrupted to reach the MaxAge much
   earlier than normally expected.  This is difficult to detect with the
   existing OSPF checksum mechanism, as the LS age field is excluded
   from the checksum calculation of LSA.  Besides, OSPF cryptographic
   authentication can not detect the corruption of the LS age field if
   it happens before the LSA is assembled to LSU packet.

3.  Consequence of LSA Flushing

   While MaxAge LSA flushing is important for fast convergence and the
   consistency of the Link-State DataBase (LSDB) of all OSPF routers, as
   shown in several accidents happened in the production network,
   improper LSA flushing can have severe impact to the network and the
   services provided by the network.  This section evaluates the impacts
   of MaxAge LSA flushing.

   According to section 14 of [RFC2328], the MaxAge LSA can be flushed
   by any router, no matter whether this LSA is self-originated or not.
   According to the flooding scope of the LSA, this MaxAge LSA would be
   flooded either in the whole routing domain or in the specific area.
   On all the routers receiving this MaxAge LSA, this would cause the
   old LSA instance being replaced, and consequently triggers route
   calculation and installation.  When the MaxAge LSA is received by the
   originating router of this LSA, the originating router would increase
   the LSA's LS sequence number one past the received LS sequence



Dong, et al.               Expires May 4, 2017                  [Page 3]


Internet-Draft     OSPF MaxAge Flush Problem Statement      October 2016


   number, and originate a new instance of the LSA.  If the LSA flushing
   is due to systematic problem and cannot recover automatically, this
   flooding and processing would last forever, which severely impacts
   network reachability and stability.  Since OSPF is the fundamental
   protocol to build the infrastructure for other protocols such as BGP,
   LDP, etc., and various services provided by the network, it will
   cause huge damage to both the network provider and its customers.

   As the MaxAge LSA may be flushed by any OSPF router, usually it would
   take a long time for troubleshooting to locate the misbehaved router
   in the network, and during this time the LSA flushing could have
   caused huge damage to both the network provider and its customers.

4.  Requirements on Potential Solutions

   Considering the importance of OSPF protocol to the networks and the
   services carried in the networks, and the potential severe impact of
   MaxAge LSA flooding, this document calls for solutions to protect
   against or mitigate the impact of improper MaxAge LSA flushing.

   The potential solutions can be classified into two categories, and
   the requirements are provided in following sections respectively.

4.1.  Solution for Problem Localization

   Since OSPF allows the flushing of non-self originated LSAs, for
   troubleshooting and problem localization, some mechanism to identify
   the misbehaved router quickly is needed.  If the improper MaxAge LSA
   flushing is caused by systematic problem, operators would need to
   locate the misbehaved router and shut it down to stop the flooding
   storm.

   [RFC6232] proposes to add the Purge Originator Identification (POI)
   TLV into IS-IS Purge LSPs to identify the originator of IS-IS Purges.
   Although a similar TLV may be added into the OSPF extended LSAs as
   defined in [RFC7684] and [I-D.ietf-ospf-ospfv3-lsa-extend], the
   structure of the legacy OSPF LSAs as defined in [RFC2328] is not TLV-
   based and such mechanism does not apply.  Some problem localization
   solution which is backward compatible and applicable to all the OSPF
   LSAs would be preferred.

4.2.  Solution for Impact Mitigation

   Since the flooding storm caused by improper LSA flushing can have
   severe impact to network stability and the services provided by the
   network, it is important to alleviate such impact even before the
   root cause or the misbehaved router can be identified.  In addition,
   some problem localization mechanisms may rely on the availability of



Dong, et al.               Expires May 4, 2017                  [Page 4]


Internet-Draft     OSPF MaxAge Flush Problem Statement      October 2016


   the network, which means the impact mitigation mechanism is necessary
   to ensure that the problem localization mechanisms do work when
   severe flooding storm caused by LSA flushing happens in the network.

   It is important that the impact mitigation solution is backward
   compatible and can support incremental deployment.  Preferably, the
   mitigation solution should not delay the route convergence triggered
   by normal LSA flushing.

5.  IANA Considerations

   This document makes no request of IANA.

   Note to RFC Editor: this section may be removed on publication as an
   RFC.

6.  Security Considerations

   This document describes the problem of MaxAge LSA flushing, which in
   some cases is due to the lack of integrity protection of the LS age
   field.  The LS age field may be altered as a result of software or
   hardware problem, such modification cannot be detected by LSA
   checksum nor OSPF packet cryptographic authentication.  LSA flushing
   could have severe impact on network stability and the services
   provided by the network.  This may be considered as a security
   vulnerability.

7.  Acknowledgements

   The authors would like to thank Bruno Decraene, Acee Lindom and Les
   Ginsberg for the discussion on this topic.

8.  References

8.1.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <http://www.rfc-editor.org/info/rfc2119>.

   [RFC2328]  Moy, J., "OSPF Version 2", STD 54, RFC 2328,
              DOI 10.17487/RFC2328, April 1998,
              <http://www.rfc-editor.org/info/rfc2328>.







Dong, et al.               Expires May 4, 2017                  [Page 5]


Internet-Draft     OSPF MaxAge Flush Problem Statement      October 2016


8.2.  Informative References

   [I-D.ietf-ospf-ospfv3-lsa-extend]
              Lindem, A., Mirtorabi, S., Roy, A., and F. Baker, "OSPFv3
              LSA Extendibility", draft-ietf-ospf-ospfv3-lsa-extend-13
              (work in progress), October 2016.

   [RFC6232]  Wei, F., Qin, Y., Li, Z., Li, T., and J. Dong, "Purge
              Originator Identification TLV for IS-IS", RFC 6232,
              DOI 10.17487/RFC6232, May 2011,
              <http://www.rfc-editor.org/info/rfc6232>.

   [RFC7684]  Psenak, P., Gredler, H., Shakir, R., Henderickx, W.,
              Tantsura, J., and A. Lindem, "OSPFv2 Prefix/Link Attribute
              Advertisement", RFC 7684, DOI 10.17487/RFC7684, November
              2015, <http://www.rfc-editor.org/info/rfc7684>.

Authors' Addresses

   Jie Dong
   Huawei Technologies
   Huawei Campus, No.156 Beiqing Rd.
   Beijing  100095
   China

   Email: jie.dong@huawei.com


   Xudong Zhang
   Huawei Technologies
   Huawei Campus, No.156 Beiqing Rd.
   Beijing  100095
   China

   Email: zhangxudong@huawei.com


   Zhenqiang Li
   China Mobile
   No.32 Xuanwumenxi Ave., Xicheng District
   Beijing  100032
   China

   Email: li_zhenqiang@hotmail.com







Dong, et al.               Expires May 4, 2017                  [Page 6]