Network Working Group                                            H. Chen
Internet-Draft                                                 Futurewei
Intended status: Standards Track                                 A. Wang
Expires: September 9, 2020                                 China Telecom
                                                                  L. Liu
                                                                 Fujitsu
                                                                  X. Liu
                                                          Volta Networks
                                                           March 8, 2020


                   PCE for Network High Availability
                   draft-chen-pce-ctr-availability-00

Abstract

   This document describes extensions to Path Computation Element (PCE)
   communication Protocol (PCEP) for improving the reliability or
   availability of a network controlled by a controller cluster.

Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [RFC2119].

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on September 9, 2020.

Copyright Notice

   Copyright (c) 2020 IETF Trust and the persons identified as the
   document authors.  All rights reserved.




Chen, et al.            Expires September 9, 2020               [Page 1]


Internet-Draft             PCE for Network HA                 March 2020


   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (https://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
   2.  Terminologies . . . . . . . . . . . . . . . . . . . . . . . .   3
   3.  PCE for Controller Cluster Reliability  . . . . . . . . . . .   3
     3.1.  Overview of Mechanism . . . . . . . . . . . . . . . . . .   3
     3.2.  Example . . . . . . . . . . . . . . . . . . . . . . . . .   4
   4.  Extensions to PCEP  . . . . . . . . . . . . . . . . . . . . .   6
     4.1.  Capability  . . . . . . . . . . . . . . . . . . . . . . .   6
     4.2.  Controllers Object  . . . . . . . . . . . . . . . . . . .   7
   5.  Recovery Procedure  . . . . . . . . . . . . . . . . . . . . .  10
   6.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  11
   7.  Security Considerations . . . . . . . . . . . . . . . . . . .  12
   8.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  12
   9.  References  . . . . . . . . . . . . . . . . . . . . . . . . .  12
     9.1.  Normative References  . . . . . . . . . . . . . . . . . .  12
     9.2.  Informative References  . . . . . . . . . . . . . . . . .  12
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  12

1.  Introduction

   More and more networks are controlled by central controllers or
   controller clusters.  A controller cluster is a single controller
   externally.  It normally consists of two or more controllers
   internally working together as a single controller externally to
   control a network, i.e., every network element (NE) in the network.
   The reliability or availability of a network is heavily dependent on
   its controller cluster.  The issues or failures in the controller
   cluster may impact the reliability or availability of the network
   greatly.

   For a controller cluster comprising two or more controllers (i.e.,
   primary controller, secondary controller, and so on), the failures in
   the cluster may split the cluster into a few of separated controller
   groups.  These groups do not know each other and may be out of
   synchronization.  Two or more groups may be elected as primary groups
   to control the network at the same time, which may cause some issues.




Chen, et al.            Expires September 9, 2020               [Page 2]


Internet-Draft             PCE for Network HA                 March 2020


   This document proposes some procedures and extensions to PCEP for the
   separated controllers or controller groups to know each other thus
   elect one new primary controller or controller group correctly when
   the cluster is split because of failures in the cluster.

2.  Terminologies

   The following terminologies are used in this document.

   PCE:  Path Computation Element

   PCEP:  PCE communication Protocol

   PCC:  Path Computation Client

   NE:  Network Element

   CE:  Customer Edge

   PE:  Provider Edge

3.  PCE for Controller Cluster Reliability

   This section briefs the mechanism of controller cluster reliability
   or availability using PCEP, and illustrates some details through a
   simple example.

3.1.  Overview of Mechanism

   When a cluster of controllers is split into a few of separated groups
   because of failures in the cluster, the live controllers are still
   actually connected to the network (i.e., network elements).  Through
   some of these connections, each group can get the information about
   the other groups.  A new primary controller or controller group is
   correctly elected to control the network based on the information.

   Each controller has a PCEP session with each of a give number of the
   same NEs in the network and the session is established and maintained
   over an IP path between the controller and the NE.  The session is a
   session of PCEP with extensions.

   In one example or configuration, the given number of NEs is one NE
   with the highest node ID.  Suppose that node PE2 as NE has the
   highest ID.  The session between the primary controller (e.g., A) and
   the NE (e.g., PE2) is the session of PCEP with extensions.  Each of
   the non-primary controllers (e.g., B, C, ...) creates and maintains a
   PCEP session with this NE (e.g., PE2).




Chen, et al.            Expires September 9, 2020               [Page 3]


Internet-Draft             PCE for Network HA                 March 2020


   In normal operations, the cluster has all its controllers connected.
   They are the primary controller controlling the network, the
   secondary controller, and so on.  They have current position 1, 2,
   and so on respectively.  The primary controller advertises the
   information about the controllers via its PCEP sessions to the given
   number of the same NEs.

   For example, it sends the information in a PCEP message to the NE
   (e.g., PE2), which transfers the information to each of the other
   controllers via the PCEP sessions to the other controllers.

   When the cluster is split into a few separated groups of controllers,
   each group elects an intent primary controller, secondary controller
   and so on from the group, which have intent position 1, 2, and so on
   respectively.  The intent primary controller in each group advertises
   the information about the controllers in its group.

   The information advertised by the (intent) primary controller
   includes its current (intent) position, its old position, its
   priority to become a primary controller, number of controllers in its
   group or cluster, and the IDs of the controllers which are ordered
   according to their (intent) positions.  In addition, a flag C
   indicating that whether it is Controlling the network (i.e., it is
   the primary controller or intent primary controller) is included.

3.2.  Example

   Figure 1 shows a controller cluster comprising two controllers: the
   primary controller and the secondary controller.  Each controller has
   a PCEP session with the same NE, which is NE4.





















Chen, et al.            Expires September 9, 2020               [Page 4]


Internet-Draft             PCE for Network HA                 March 2020


      +---------------------------------------------------+
      | Controller Cluster                                |
      |                                                   |
      |    +------------+               +------------+    |
      |    |Controller A|  Synchronize  |Controller B|    |
      |    |(Primary)   +---------------+(Secondary) |    |
      |    +------------+               +-----------++    |
      |           ^                                 |     |
      |           |_______________                  |     |
      |                          |                  |     |
      |                          v                  |     |
      +-----------------Channels to Network---------|-----+
                            /       \               |
       PCEP session---->   /         \____          |
       between            /           \   \____     | <--PCEP session
       A and NEi         /\  .---. .---+       \    |    between
       (i=1,2,..)       |  \(     '    |'.---. |    |    B and NE4
                        |---\  Network |      '+.   |
                       (o NE1\         |       | ) /
                        (     |        |       o) /
                         (    |        |       ) NE4
                          (   o NE2    o NE3.-'
                           '               )
                            '---._.-.     )
                                     '---'

               Figure 1: Controller Cluster of 2 Controllers

   The primary PCE controller (i.e., A) has a PCEP session with each NE
   in the network, including NE4.  The secondary controller (i.e., B)
   has a PCEP session with the same NE4 in the network and the session
   is established and maintained over an IP path between B and NE4.

   In normal operations, controller A (Primary) sends NE4 a PCEP message
   containing the information about the controllers connected to it.
   NE4 transfers the information to controller B (Secondary).  The
   information includes:

   C = 1, A's current Position = 1, A's OldPosition = 1, A's Priority,
   NoControllers = 2, A's ID, B's ID

   When failures happen in the cluster, the live controllers act as
   follows:

   For the primary controller (e.g., A), if it is alive, it continues to
   be the primary controller.





Chen, et al.            Expires September 9, 2020               [Page 5]


Internet-Draft             PCE for Network HA                 March 2020


   For the secondary controller (e.g., B) alive, if the primary
   controller is dead, it promotes itself as the new primary controller;
   if the primary controller is alive but separated from the secondary
   controller, the secondary controller will not promote itself to be a
   new primary controller.

   With the extensions to PCEP, the secondary controller can determine
   the status of the primary controller based on the information about
   the primary controller received.  The conditions that the primary
   controller is alive but separated from the secondary controller
   (i.e., condition a: the connection between the primary controller and
   the secondary controller in the cluster failed, but condition b: the
   two controllers are alive) can be determined by the secondary
   controller as follows:

   For condition a, when the heartbeat from the primary stops, the
   secondary knows that the connection between the primary and secondary
   controller failed.

   For condition b, it checks whether the information about the primary
   controller is updated within a given time.  If so, the primary
   controller is alive; otherwise, it is dead.

4.  Extensions to PCEP

   This section describes extensions to PCEP.

4.1.  Capability

   During a PCEP session establishment, PCEP Speakers (PCE or PCC)
   advertise their support for PCEP extensions for network reliability,
   especially the High Availability of Controller cluster (HAC).  A new
   Controller HA Support Capability TLV is defined for HAC below.  A
   PCEP speaker indicates its support for HAC by including the TLV in
   the OPEN object in its OPEN message if it supports for HAC.

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |           Type (TBD1)         |            Length (4)         |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                             Flags                           |C|
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

              Figure 2: Controller HA Support Capability TLV

   Type (16 bits):  TBD1 is to be assigned by IANA.




Chen, et al.            Expires September 9, 2020               [Page 6]


Internet-Draft             PCE for Network HA                 March 2020


   Length (16 bits):  It indicates the length of the Capability value
      portion in octets, which is 4.

   Flag (32 bits):  One flag bit, C-bit, is defined.  When it is set to
      one, it indicates that the PCEP speaker supports the high
      availability of controller cluster as a Controller.  When it is
      set to zero, it indicates that the PCEP speaker supports the high
      availability of controller cluster as a network element (NE).

   When two PCEP speakers establish a PCEP session between them, each of
   the speakers indicates its support for HAC by including a Controller
   HA Support Capability TLV in the OPEN object in its OPEN message if
   it supports for HAC.

   For a PCEP speaker supporting for HAC, if it receives the Controller
   HA Support Capability TLV in the OPEN message from the other PCEP
   speaker over the PCEP session, it records that the other PCEP speaker
   (i.e., the other/remote end of the session) supports for HAC;
   otherwise, it records that the other speaker does not.  Thus for all
   its PCEP sessions, it knows whether each session's remote end PCEP
   speaker supports for HAC.  If the C-bit in the TLV is set to one, the
   PCEP speaker is a controller; otherwise, it is a NE.

   A PCE as a controller supporting for HAC acts on the information
   about the controllers in its cluster or group as follows:

   It sends the information in a PCEP message to each of a given set of
   NEs that runs PCEP with HAC support whenever the information changes.
   The given set of NEs may be the one NE with the highest ID.

   It adjusts the positions of the controllers accordingly whenever
   there is a change in the information about the controllers received
   from the NE supporting for HAC.

   An NE running PCEP with HAC support receives the information about
   the controllers from the PCE as a controller supporting for HAC, and
   sends the information to every PCE as a controller supporting for HAC
   and having a PCEP session with the NE except for the one from which
   the information is received.

4.2.  Controllers Object

   A new object, called Controllers Object, is defined to contain the
   information about controllers.  A controller in a cluster may
   advertise the information in a PCEP Report message containing a
   Controllers Object of the following format.





Chen, et al.            Expires September 9, 2020               [Page 7]


Internet-Draft             PCE for Network HA                 March 2020


     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |  Object-Class |   OT  |Res|P|I|     Object Length (bytes)     |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                                                               |
    +                             TLVs                              +
    |                  (including Controllers TLV)                  |
    |                                                               |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                       Figure 3: Controllers Object

   Object-Class (8 bits):  It is to be assigned by IANA.  It identifies
      the PCEP object class.

   OT (4 bits):  It is to be assigned by IANA.  It identifies the PCEP
      object type.

   Res flags (2 bits):  Reserved field.  This field MUST be set to zero
      on transmission and MUST be ignored.

   P flag and I flag:  Refer to RFC 5440, page 25.

   Object Length (16 bits):  It specifies the total object length
      including the header, in bytes.

   TLVs:  This field includes one TLV, called Controllers TLV to be
      defined below.

   Under the Controllers Object, a new TLV, called Controllers TLV, is
   defined to contain the information about controllers.  It has the
   following format.


















Chen, et al.            Expires September 9, 2020               [Page 8]


Internet-Draft             PCE for Network HA                 March 2020


     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |           Type (TBD2)         |             Length            |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |   Flags     |C|    Position   |  OldPosition  |   Priority    |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                 Reserved                      | NoControllers |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                  Connected Controller 1 ID                    |
    :                              :                                |
    |                  Connected Controller n ID                    |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                         Figure 4: Controllers TLV

   Type (16 bits):  TBD2 is to be assigned by IANA.

   Length (16 bits):  It indicates the length of the value portion in
      octets.

   Flag (8 bits):  One flag bit, C-bit, is defined.  When set, it
      indicates that the position is the position of the current active
      primary controller.  In this case, C = 1 and Position = 1, which
      indicate that the controller is the current active primary
      controller controlling the network.

   Position (8 bits):  It indicates the current/intent position of the
      controller in the controller cluster or group.  1: primary (first)
      controller, 2: secondary controller, 3: third controller, and so
      on (i.e., Controller Position of value n: n-th controller in the
      cluster or group).

   OldPosition (8 bits):  ): It indicates the old position of the
      controller in the controller cluster before it is split.

   Priority (8 bits):  It indicates the priority of the controller to be
      elected as a primary controller.

   Reserved (24 bits):  Reserved field, must set to zero for
      transmission and ignored for reception.

   NoControllers (8 bits):  It indicates the number of controllers
      connected to the controller advertising the TLV.

   Controller i ID (32 bits):  It represents the identifier (ID) of
      controller i at position i (i = 1, ..., n) in the cluster or
      group.



Chen, et al.            Expires September 9, 2020               [Page 9]


Internet-Draft             PCE for Network HA                 March 2020


5.  Recovery Procedure

   This section describes the recovery procedure for a controller
   cluster of n (n > 2) controllers, which are the primary controller A,
   the secondary controller B, ..., the n-th controller N.

   When failures happen in the cluster, it may be split into a few
   separated groups of controllers.  In one policy, the group with the
   maximum number of controllers is responsible for controlling the
   network as the primary group of the cluster, in which the new primary
   controller, secondary controller, and so on are elected.

   For each separated group of controllers, the intent primary
   controller, secondary controller, and so on are elected.  The intent
   primary controller of the group advertises the information about its
   group.  The information includes its intent position, its old
   position, its priority to become a primary controller, the number of
   controllers in the group, and identifiers of the controllers in the
   group.  The identifiers of the controllers are ordered according to
   their positions.  The identifier of the intent primary controller,
   which has position 1, is the first one; The identifier of the intent
   secondary controller, which has position 2, is the second one; and so
   on.  Thus every separated group has the information about the other
   groups and can determine which group has the maximum number of
   controllers.

   In the case of tie (i.e., two or more groups have the same maximum
   number of controllers), the group with the highest old position
   controller (e.g., the old primary controller) wins in one policy.  In
   another policy, the group with the highest priority controller wins.

   Some details of the recovery procedures in the current and intent
   primary controller in a controller cluster or group are as follows.

   In normal operations, it advertises the information about controllers
   containing:

   C = 1, Position = 1, Old Position = 1, Primary Controller's priority,
   NoControllers = n, Primary Controller's ID, secondary controller's
   ID, ..., and n-th Controller's ID.

   When failures cause the cluster split, it advertises the information
   about controllers containing:

   C = 0, Position = 1, Old Position = 1, Intent Primary Controller's
   priority, NoControllers = m (m is the number of controllers in the
   group to which the intent primary controller belongs after the




Chen, et al.            Expires September 9, 2020              [Page 10]


Internet-Draft             PCE for Network HA                 March 2020


   failures), Intent Primary Controller's ID, IDs of the other
   controllers connected.

   Then after a given time, it checks if the group is elected as the
   primary group.  If so, it advertises the information about
   controllers containing:

   C = 1, Position = 1, Old Position = 1, its Priority, NoControllers =
   m, the IDs of the controllers in the group.

   One example is that failures split the cluster into two separated
   groups: group 1 comprising A and C, group 2 consisting of B and N.
   Each group elects its intent primary controller, secondary
   controller, and so on.  Suppose that controller A and C are elected
   as the intent primary and secondary controller respectively in group
   1; controller B and N are elected as the intent primary and secondary
   controller respectively in group 2.

   Each of the intent primary controllers A and B advertises the
   information about the controllers in its group.  The information
   advertised by A includes:

   C = 0, Position = 1, OldPosition = 1, A's Priority, NoControllers =
   2, A's ID, C's ID.

   The information advertised by B includes:

   C = 0, Position = 1, OldPosition = 2, B's Priority, NoControllers =
   2, B's ID, N's ID.

   Group 1 and 2 have the same number of controllers, which is 2.  But
   OldPosition in group 1 is higher than that in group 2.  Group 1 is
   elected as the primary group, and the intent primary controller A in
   the primary group is determined as the current primary controller.
   After the determination, the information about the controllers in
   group 1 (i.e., the primary group) is changed.  The updated
   information advertised by A includes:

   C = 1, Position = 1, OldPosition = 1, A's Priority, NoControllers =
   2, A's ID, C's ID.

6.  IANA Considerations

   TBD







Chen, et al.            Expires September 9, 2020              [Page 11]


Internet-Draft             PCE for Network HA                 March 2020


7.  Security Considerations

   TBD

8.  Acknowledgements

   TBD

9.  References

9.1.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <https://www.rfc-editor.org/info/rfc2119>.

   [RFC5440]  Vasseur, JP., Ed. and JL. Le Roux, Ed., "Path Computation
              Element (PCE) Communication Protocol (PCEP)", RFC 5440,
              DOI 10.17487/RFC5440, March 2009,
              <https://www.rfc-editor.org/info/rfc5440>.

9.2.  Informative References

   [RFC8231]  Crabbe, E., Minei, I., Medved, J., and R. Varga, "Path
              Computation Element Communication Protocol (PCEP)
              Extensions for Stateful PCE", RFC 8231,
              DOI 10.17487/RFC8231, September 2017,
              <https://www.rfc-editor.org/info/rfc8231>.

   [RFC8402]  Filsfils, C., Ed., Previdi, S., Ed., Ginsberg, L.,
              Decraene, B., Litkowski, S., and R. Shakir, "Segment
              Routing Architecture", RFC 8402, DOI 10.17487/RFC8402,
              July 2018, <https://www.rfc-editor.org/info/rfc8402>.

Authors' Addresses

   Huaimo Chen
   Futurewei
   Boston, MA
   USA

   Email: Huaimo.chen@futurewei.com








Chen, et al.            Expires September 9, 2020              [Page 12]


Internet-Draft             PCE for Network HA                 March 2020


   Aijun Wang
   China Telecom
   Beiqijia Town, Changping District
   Beijing  102209
   China

   Email: wangaj3@chinatelecom.cn


   Lei Liu
   Fujitsu
   USA

   Email: liulei.kddi@gmail.com


   Xufeng Liu
   Volta Networks
   McLean, VA
   USA

   Email: xufeng.liu.ietf@gmail.com





























Chen, et al.            Expires September 9, 2020              [Page 13]