Policy Guided Prefixes with Routing In Fat Trees
draft-atlas-rift-pgp-01

Document Type Active Internet-Draft (individual)
Last updated 2019-04-25
Stream (None)
Intended RFC status (None)
Formats plain text pdf html bibtex
Stream Stream state (No stream defined)
Consensus Boilerplate Unknown
RFC Editor Note (None)
IESG IESG state I-D Exists
Telechat date
Responsible AD (None)
Send notices to (None)
RIFT                                                            A. Atlas
Internet-Draft                                                Individual
Intended status: Standards Track                                Z. Zhang
Expires: October 27, 2019                               Juniper Networks
                                                          April 25, 2019

            Policy Guided Prefixes with Routing In Fat Trees
                        draft-atlas-rift-pgp-01

Abstract

   In a fat tree, it can be sometimes desirable to guide traffic to
   particular destinations or keep specific flows to certain paths.  In
   RIFT, this traffic steering/engineering is done by using policy-
   guided prefixes with their associated communities.  Routes based on
   policy-guided prefixes are preferred over regular routes.  Any node
   can originate a policy-guided prefix and advertise it in both north
   and south directions, and the calculation in both directions are
   distance vector based.

Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC2119.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on October 27, 2019.

Atlas & Zhang           Expires October 27, 2019                [Page 1]
Internet-Draft                  rift-pgp                      April 2019

Copyright Notice

   Copyright (c) 2019 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (https://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
   2.  Specification . . . . . . . . . . . . . . . . . . . . . . . .   3
     2.1.  Ingress Filtering . . . . . . . . . . . . . . . . . . . .   4
     2.2.  Applying Policy . . . . . . . . . . . . . . . . . . . . .   4
     2.3.  Store Policy-Guided Prefix for Route Computation and
           Regeneration  . . . . . . . . . . . . . . . . . . . . . .   5
     2.4.  Re-origination  . . . . . . . . . . . . . . . . . . . . .   6
     2.5.  Reachability Computation with PGP Consideration . . . . .   6
   3.  Security Considerations . . . . . . . . . . . . . . . . . . .   7
   4.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .   7
   5.  Normative References  . . . . . . . . . . . . . . . . . . . .   7
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .   8

1.  Introduction

   In a fat tree, it can be sometimes desirable to guide traffic to
   particular destinations or keep specific flows to certain paths.  In
   RIFT, this is done by using policy-guided prefixes with their
   associated communities.  Each community is an abstract value whose
   meaning is determined by configuration.  It is assumed that the
   fabric is under a single administrative control so that the meaning
   and intent of the communities is understood by all the nodes in the
   fabric.  Any node can originate a policy-guided prefix.

   Since RIFT uses distance vector concepts in a southbound direction,
   it is straightforward to add a policy-guided prefix to an S-TIE.  For
   easier troubleshooting, the approach taken in RIFT is that a node's
   southbound policy-guided prefixes are sent in its S-TIE and the
   receiver does inbound filtering based on the associated communities
   (an egress policy is imaginable but would lead to different S-TIEs
   per adjacency possibly which is not considered in RIFT protocol

Atlas & Zhang           Expires October 27, 2019                [Page 2]
Internet-Draft                  rift-pgp                      April 2019

   procedures).  A southbound policy-guided prefix can only use links in
   the south direction.  If an PGP S-TIE is received on an East-West or
   northbound link, it must be discarded by ingress filtering.

   Conceptually, a southbound policy-guided prefix guides traffic from
   the leaves up to at most the north-most level.  It is also necessary
   to to have northbound policy-guided prefixes to guide traffic from
   the north-most level down to the appropriate leaves.  Therefore, RIFT
   includes northbound policy-guided prefixes in its N PGP-TIE and the
   receiver does inbound filtering based on the associated communities.
   A northbound policy-guided prefix can only use links in the northern
   direction.  If an N PGP TIE is received on an East-West or southbound
   link, it must be discarded by ingress filtering.

   By separating southbound and northbound policy-guided prefixes and
   requiring that the cost associated with a PGP is strictly
   monotonically increasing at each hop, the path cannot loop.  Because
   the costs are strictly increasing, it is not possible to have a loop
   between a northbound PGP and a southbound PGP.  If East-West links
   were to be allowed, then looping could occur and issues such as
   counting to infinity would become an issue to be solved (if complete
   generality of path - such as including East-West links and using both
   north and south links in arbitrary sequence - then a Path Vector
   protocol or a similar solution must be considered).

   Besides the usage for traffic engineering, PGPs can also be used to
   ensure nodes are administratively reachable for debugging purpose
   after certain failures.  For example, a node looses all its
   northbound adjacencies but is not at the top of the fabric.  If it
   detects that some other members at its level are advertising
   northbound adjacencies MAY inject its loopback address into
   southbound PGP TIE and become reachable "from the south" that way.
   Further, a solution may be implemented where based on e.g. a "well
   known" community such a southbound PGP is reflected at level 0 and
   advertised as northbound PGP again to allow for "reachability from
   the north" at the cost of additional flooding.

2.  Specification

   PGPs are advertised in PGPrefixTIEs included in PGP N/S-TIEs.  S-PGPs
   are propagated in south direction only and N-PGPs follow northern
   direction strictly.  THRIFT schema in the base RIFT specification
   needs to be updated.  For example:

   o  TIEElement needs to add "7: optional PGPrefixElement
      pog_prefixes;"

Atlas & Zhang           Expires October 27, 2019                [Page 3]
Internet-Draft                  rift-pgp                      April 2019

   o  "struct PGPrefixElement" needs to be defined.  Should
      PrefixAttributes be used for PGPrefixElement (do all defined
      fields in PrefixAttributes apply to PGPrefixElement)?

   o  "struct Community" needs to be referenced in PGPrefixElement

   Future revisions of this document and the base RIFT specification
   will coordinate the THRIFT schema.

2.1.  Ingress Filtering

   The set of policy-guided prefixes received in a TIE is subject to
   ingress filtering and then re-originated to be sent out in the
   receiver's appropriate TIE.  Both the ingress filtering and the re-
   origination use the communities associated with the policy-guided
   prefixes to determine the correct behavior.  The cost on re-
   advertisement MUST increase in a strictly monotonic fashion.

   When a node X receives a PGP S-TIE or a PGP N-TIE that is originated
   from a node Y which does not have an adjacency with X, all PGPs in
   such a TIE MUST be filtered.  Similarly, if node Y is at the same
   level as node X, then X MUST filter out PGPs in such S- and N-TIEs to
   prevent loops.

   Next, policy can be applied to determine which policy-guided prefixes
   to accept.  Since ingress filtering is chosen rather than egress
   filtering and per-neighbor PGPs, policy that applies to links is done
   at the receiver.  Because the RIFT adjacency is between nodes and
   there may be parallel links between the two nodes, the policy-guided
   prefix is considered to start with the next-hop set that has all
   links to the originating node Y.

   A policy-guided prefix has or is assigned the following attributes:

   cost:   This is initialized to the cost received

   community_list:   This is initialized to the list of the communities
      received.

   next_hop_set:   This is initialized to the set of links to the
      originating node Y.

2.2.  Applying Policy

   The specific action to apply based upon a community is deployment
   specific.  Here are some examples of things that can be done with
   communities.  The length of a community is a 64 bits number and it
   can be written as a single field M or as a multi-field (S = M[0-31],

Atlas & Zhang           Expires October 27, 2019                [Page 4]
Internet-Draft                  rift-pgp                      April 2019

   T = M[32-63]) in these examples.  For simplicity, the policy-guided
   prefix is referred to as P, the processing node as X and the
   originator as Y.

   Prune Next-Hops: Community Required:   For each next-hop in
      P.next_hop_set, if the next-hop does not have the community, prune
      that next-hop from P.next_hop_set.

   Prune Next-Hops: Avoid Community:   For each next-hop in
      P.next_hop_set, if the next-hop has the community, prune that
      next-hop from P.next_hop_set.

   Drop if Community:   If node X has community M, discard P.

   Drop if not Community:   If node X does not have the community M,
      discard P.

   Prune to ifIndex T:   For each next-hop in P.next_hop_set, if the
      next-hop's ifIndex is not the value T specified in the community
      (S,T), then prune that next-hop from P.next_hop_set.

   Add Cost T:   For each appearance of community S in P.community_list,
      if the node X has community S, then add T to P.cost.

   Accumulate Min-BW T:   Let bw be the sum of the bandwidth for
      P.next_hop_set.  If that sum is less than T, then replace (S,T)
      with (S, bw).

   Add Community T if Node matches S:   If the node X has community S,
      then add community T to P.community_list.

2.3.  Store Policy-Guided Prefix for Route Computation and Regeneration

   Once a policy-guided prefix has completed ingress filtering and
   policy, it is almost ready to store and use.  It is still necessary
   to adjust the cost of the prefix to account for the link from the
   computing node X to the originating neighbor node Y.

   There are three different policies that can be used:

   Minimum Equal-Cost:   Find the lowest cost C next-hops in
      P.next_hop_set and prune to those.  Add C to P.cost.

   Minimum Unequal-Cost:   Find the lowest cost C next-hop in
      P.next_hop_set.  Add C to P.cost.

   Maximum Unequal-Cost:   Find the highest cost C next-hop in
      P.next_hop_set.  Add C to P.cost.

Atlas & Zhang           Expires October 27, 2019                [Page 5]
Internet-Draft                  rift-pgp                      April 2019

   The default policy is Minimum Unequal-Cost but well-known communities
   can be defined to get the other behaviors.

   Regardless of the policy used, a node MUST store a PGP cost that is
   at least 1 greater than the PGP cost received.  This enforces the
   strictly monotonically increasing condition that avoids loops.

   Two databases of PGPs - from N-TIEs and from S-TIEs are stored.  When
   a PGP is inserted into the appropriate database, the usual tie-
   breaking on cost is performed.  Observe that the node retains all PGP
   TIEs due to normal flooding behavior and hence loss of the best
   prefix will lead to re-evaluation of TIEs present and re-
   advertisement of a new best PGP.

2.4.  Re-origination

   A node must re-originate policy-guided prefixes and retransmit them.
   The node has its database of southbound policy-guided prefixes to
   send in its S-TIE and its database of northbound policy-guided
   prefixes to send in its N-TIE.

   Of course, a leaf does not need to re-originate southbound policy-
   guided prefixes.

2.5.  Reachability Computation with PGP Consideration

   During reachability computation, after prefixes are attached as
   specified in section 5.2.6 "Attaching Prefixes" of the RIFT base
   specification, PGPs are considered.

   Each policy-guided prefix P has its cost and next_hop_set already
   stored in the associated database, as specified in Section 2.3; the
   cost stored for the PGP is already updated to considering the cost of
   the link to the advertising neighbor.  By definition, a policy-guided
   prefix is preferred to a regular prefix.

Atlas & Zhang           Expires October 27, 2019                [Page 6]
Internet-Draft                  rift-pgp                      April 2019

    for each policy-guided prefix P:
      if P not in route_database:
         add (P, type=PolicyGuided, P.cost, next_hop_set)
         end if
      if P in route_database :
          if (route_database[P].type is not PolicyGuided) or
             (route_database[P].cost > P.cost):
            update route_database[P] with (P, PolicyGuided, P.cost, next_hop_set)
          else if route_database[P].cost == P.cost
            update route_database[P] with (P, PolicyGuided, P.cost,
               merge(next_hop_set, route_database[P].next_hop_set))
          else
            // Not preferred route so ignore
            end if
          end if
      end for

            Figure 1: Adding Routes from Policy-Guided Prefixes

   Notice that a policy-guided prefix is always preferred to a regular
   prefix, even if the policy-guided prefix has a larger cost.

   PGPs may overlap with prefixes introduced by automatic de-
   aggregation.  The topic is under further discussion.  The break in
   connectivity that leads to infeasibility of a PGP is mirrored in
   adjacency tear-down and according removal of such PGPs.
   Nevertheless, the underlying link-state flooding will be likely
   reacting significantly faster than a hop-by-hop redistribution and
   with that the preference for PGPs may cause intermittent black-holes.

3.  Security Considerations

   To be provided.

4.  Acknowledgements

5.  Normative References

   [I-D.ietf-rift-rift]
              Team, T., "RIFT: Routing in Fat Trees", draft-ietf-rift-
              rift-05 (work in progress), April 2019.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <https://www.rfc-editor.org/info/rfc2119>.

Atlas & Zhang           Expires October 27, 2019                [Page 7]
Internet-Draft                  rift-pgp                      April 2019

Authors' Addresses

   Alia Atlas
   Individual

   EMail: akatlas@gmail.com

   Zhaohui Zhang
   Juniper Networks

   EMail: zzhang@juniper.net

Atlas & Zhang           Expires October 27, 2019                [Page 8]