Skip to main content

TRILL: Parent node Shifts in Tree Construction, Mitigation.
draft-rp-trill-parent-selection-02

The information below is for an old version of the document.
Document Type
This is an older version of an Internet-Draft whose latest revision state is "Replaced".
Author R. Parameswaran
Last updated 2017-02-23
Replaced by draft-ietf-trill-parent-selection, draft-ietf-trill-parent-selection
RFC stream (None)
Formats
Stream Stream state (No stream defined)
Consensus boilerplate Unknown
RFC Editor Note (None)
IESG IESG state I-D Exists
Telechat date (None)
Responsible AD (None)
Send notices to (None)
draft-rp-trill-parent-selection-02
Trill WG                                                R. Parameswaran,
INTERNET-DRAFT                              Brocade Communications, Inc.
Intended Status:Experimental                          February 23, 2017
Expires: August 27, 2017

         TRILL: Parent node Shifts in Tree Construction, Mitigation.
           <draft-rp-trill-parent-selection-02.txt>
Abstract 

This draft documents a known problem in the Trill tree construction 
mechanism and offers two different approaches to solve the problem.

Status of This Memo

   This Internet-Draft is submitted to IETF in full conformance with the
   provisions of BCP 78 and BCP 79.

   Distribution of this document is unlimited. Comments should be sent
   to the authors or the TRILL working group mailing list:
   trill@ietf.org.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/1id-abstracts.html. The list of Internet-Draft
   Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

Terminology and Acronyms.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119].

Table of Contents

        1. Introduction...............................................1
        2. Tree construction in Trill.................................2
        3. Issues with the Trill tree construction algorithm..........2
        4. Alternative A - Using the Affinity sub-TLV.................4
        5. Alternative B - Variant of the Djikstra SPF algorithm......7
            5.1 Choice of the policy function........................11
        6. Network wide selection of computation algorithm...........13
        7. Relationship to draft-ietf-trill-resilient-trees..........14
        8. Security Considerations...................................16
        9. IANA Considerations.......................................16
       10. Informative References....................................16

1. Introduction.

Trill is a data center technology that uses link-state routing 
mechanisms in a layer 2 setting, and serves as a replacement 
for spanning-tree.  Trill uses trees rooted at pre-determined nodes 
as a way to distribute multi-destination traffic. Multi-destination 
traffic includes traffic such as layer-2 broadcast frames, unknown 
unicast flood frames, and layer 2 traffic with multicast MAC 
addresses (collectively referred to as BUM traffic). Multi-destination 
traffic is typically hashed onto one of the available trees and sent 
over the tree, potentially reaching all nodes in the network (hosts 
behind which may own/need the packet in question).

2. Tree construction in Trill.

Tree construction in Trill is defined by [RFC6325], with additional
corrections defined in [RFC7780].

The tree construction mechanism used in Trill codifies 
certain tree construction steps which make the resultant trees
very brittle. Specifically, the parent selection mechanism in Trill
causes problems in case of node failures. Trill uses the following rule
- when constructing an SPF tree, if there are multiple possible
parents for a given node (i.e. if multiple upstream nodes can
potentially pull in a given node during SPF, all at the same 
cumulative cost, then the parent selection is imposed in the
following manner):

[RFC6325]:
"When building the tree number j, remember all possible
equal cost parents for node N.  After calculating the entire 'tree'
(actually, directed graph), for each node N, if N has 'p' parents,
then order the parents in ascending order according to the
7-octet IS-IS ID considered as an unsigned integer, and number them
starting at zero. For tree j, choose N's parent as choice j mod p."

There is an additional correction posted to this in [RFC7780]:

[RFC7780], Section 3.4:

   "Section 4.5.1 of [RFC6325] specifies that, when building 
   distribution tree number j, node (RBridge) N that has multiple 
   possible parents in the tree is attached to possible parent 
   number j mod p.  Trees are numbered starting with 1, but possible 
   parents are numbered starting with 0.  As a result, if there are 
   two trees and two possible parents, then in tree 1 parent 1 will 
   be selected, and in tree 2 parent 0 will be selected.

   This is changed so that the selected parent MUST be (j-1) mod p.  As
   a result, in the case above, tree 1 will select parent 0, and tree 2
   will select parent 1.  This change is not backward compatible with
   [RFC6325].  If all RBridges in a campus do not determine distribution
   trees in the same way, then for most topologies, the RPFC will drop
   many multi-destination packets before they have been properly
   delivered."

3. Issues with the Trill tree construction algorithm.

With this tree construction mechanism in mind,let's look at
the Spine-Leaf topology presented below and consider the
calculation of Tree number 2 in Trill.  Assume all the links in the tree 
are at the same cost.

    A--   --B
   / \ \/   /\
  /   \/\ _/_ \
 /__ _/\  /   \\
//      \/     \\
1        2       3 
 \       |      /
  \      |     /
   \     |    /
    \    |   /
     \   |  /
      \  | /
       \ |/
         C

Assume that in the above topology, when ordered by 7-octet ISIS-id,
1 < 2 < 3 holds and that the root for Tree number 2 is A. Given the
ordered set {1, 2, 3} , these nodes have the following indices (with a
starting index of 0):

Node    Index
 1       0
 2       1
 3       2

Given the SPF constraint and that the tree root is A,  the parent for
nodes 1,2, and 3 will be A. However, when the SPF algorithm tries to
pull B or C into the tree, we have a choice of parents, namely 1, 2,
or 3. 

Given that this is tree 2, the parent will be the one with index
(2-1) mod 3 (which is equal to 1). Hence the parent for node B will be
node 2.
                A
               /|\
              / | \
             /  |  \
            1   2   3 
                /\
               /  \
              B    C

However, due to Trill's parent selection algorithm, the sub-tree
rooted at Node 2 will be impacted even if Node 1 or Node 3
go down.

Take the case where Node 1 goes down. Tree 2 must now be
re-computed (this is normal) - but now, when the SPF computation is
underway, when the SPF process tries to pull in B, the list of
potential parents for B now are {2  and  3}. So, after ordering these
by ISIS-Id as {2, 3} (where 2 is considered to be at index of 0 and 3
is considered to be at index 1), for tree 1, we apply Trill's formula
of:

Parent's index = (TreeNumber-1) mod Number_of_parents.
= (2-1) mod 2
= 1 mod 2
= 1 (which is the index of  Node 3) 

The re-calculated tree now looks as shown below. The shift in
parent nodes (for B) may cause disruption to live traffic in the
network, and is unnecessary in absolute terms because the existing
parent for node B, node 2, was not perturbed in any way. 

                A
               / \
              /   \
             /     \
            2       3 
                    /\
                   /  \
                  B    C
                                    

Aside from the disruption posed by the change in the tree links,
depending upon how the concerned rbridges stripe vlans/FGLs across
trees and how they may prune these, additional disruption is possible
if the forwarding state on the new parent rbridge is not primed to 
match the new tree structure. This churn could simply be avoided 
with a better algorithm/approach.

The parent shift issue noted above can be solved in at least two
different ways:

a) by means of the affinity sub-TLV. 

b) by means of a network wide policy based parent selection
mechanism.

While the techniques identified in this draft have an immediate 
benefit when applied to spine/leaf networks popular in data-center 
designs, nothing in either of the approaches outlined below assumes a 
spine-leaf network. The techniques outlined below will work on any 
connected graph.  Furthermore, no symmetry in link-cost is assumed.

4. Alternative A - Using the Affinity sub-TLV.

At a high level, this problem can be solved by having the affected 
parent send out an affinity sub-TLV identifying the children for 
which it wants to preserve the parent-child relationship, subject to 
network events which may change the structure of the tree. The 
preferred parent node would send out an affinity sub-TLV with
multiple affinity records, one per child node, listing the 
concerned tree number.

It would be sufficient to have a local configuration option (e.g. 
a CLI) at one of the nodes which is deemed to be the preferred parent. 
In such case, the following steps may provide a way to implement this 
proposal:

  a. The operator locally configures the preferred parent to indicate 
     its stickiness in tree construction for a specific tree number 
     and tree root via the affinity sub-TLV. This can be done before
     tree construction if the operator consults the 7 octet ISIS-ID
     relative ordering of the concerned nodes and infers upfront which
     of the potential parent nodes would become the parent node for a
     given set of children on that tree number under the Trill tree
     calculation mechanism. The operator assumes the responsibility
     of configuring the preferred parent stickiness on only one node
     amongst a set of sibling nodes relative to the tree root for 
     that tree number.
     
  b. The very first tree calculation uses the default Trill specified
     parent selection rules. The configured node advertises its parent 
     preference via the affinity sub-TLV when it completes a normal 
     Trill specified tree calculation, and finds itself the parent of 
     multiple child nodes per the calculation. The affinity sub-TLV 
     must reflect the appropriate tree number and the child nodes for 
     which the concerned node is a parent node. The affinity sub-TLV
     should be published when the tree is deemed to have converged
     and it may be necessary to start a timer when triggering the 
     tree calculation, to track the convergence. Alternately, the
     publication of the affinity sub-TLV may be triggered by the
     download of the routes to the (L2) RIB, which typically happens
     after successful computation of the entire tree (however, see 
     vii. below).

  c. When any change event happens in the network, one which forces a
     tree re-calculation for the concerned tree,  the configured node
     should run through the normal Trill tree calculation agnostic of 
     the fact that it has published an affinity sub-TLV i.e the 
     node's own affinity sub-TLV should not be directly used in 
     establishing parent relationships.  During the tree
     calculation, if the node turns out to be on the list of potential
     parents for some or all of the child nodes for which it
     published the affinity sub-TLV (disregarding its possible 
     exclusion as parent due to 7 octet ISIS ID ordering logic), or
     for any other child nodes for which it was not previously a 
     parent node, then it should react in the following manner:

     i. If the node is still a potential parent for some of the 
        children identified in the existing affinity sub-TLV, it should 
        start a timer upon whose expiration, the node would intend to 
        send out an updated affinity sub-TLV identifying the correct 
        sub-set of children for which the node aspires to continue the 
        parent relationship. This case would also apply if there are
        new child nodes for which the node is now a parent (however,
        see the conflicted affinity sub-TLV rules below, and in vii
        further below).

        For its own tree computation, the node should use itself as 
        parent in order to pull the set of children identified during 
        the SPF run into the tree unless another affinity sub-TLV is
        seen for the same tree number with an overlap in the set of
        child nodes such that the other affinity sub-TLV would win
        according to the conflict resolution rules in section 5.3
        of [RFC7783]. In such case, this node should not publish
        an affinity sub-TLV (and retract it if it is already 
        published), and honor the other node's affinity sub-TLV
        instead.

    ii. if the tree structure changes such that it is no longer a
        potential parent for any of the child nodes in the advertised
        affinity sub-TLV, then it must start a timer upon whose 
        expiration, it would intend to retract the affinity sub-TLV. 
        In this case, the default Trill tie-break rule would need to 
        be used during SPF construction for the vertices that were 
        children of this node previously.

   iii. If there are any additional tree computations while the timer is
        running, the timer should be re-started/extended in order to 
        absorb the interim network events. It is possible that the 
        intended action at the expiration of the timer may change
        meanwhile. The timer needs to be large enough to absorb
        multiple network events that may happen due to a change in the
        physical state of the network, and yet short enough to
        avoid delaying the update of the affinity sub-TLV.

    iv. At the expiration of the timer, the existing state of the tree 
        should be compared with the existing affinity sub-TLV and the 
        intended change in the status of the affinity sub-TLV is 
        carried out e.g. an update to the list of children, or a 
        retraction.

     v. Alternately, the above steps (re-examination of the affinity 
        sub-TLV and update) may be tied to/triggered from the download 
        of the tree routes to the RIB, since that typically happens 
        upon a successful computation of the complete tree. An 
        additional stabilization timer could be used to counteract
        back-to-back computations of the tree due to a burst of 
        network events.

    vi. Note that this approach may cause an additional tree computation 
        at remote nodes once the updated affinity sub-TLV (or lack of 
        it) is received/perceived, beyond the network events which led
        up to the change in the tree. 

   vii. The preferred parent-node should not originate an affinity
        sub-TLV if it sees an affinity sub-TLV from some other node
        for the same tree number and for some or all of the same
        child-nodes, but only if the other node's affinity sub-TLV would 
        win using the conflict tie-break rules in section 5.3 of 
        [RFC7783]. Any existing affinity sub-TLV already published by 
        this node in such a situation should be retracted.

  d. Remote nodes should compute their trees honoring the latest 
     affinity sub-TLV or the lack thereof, sent from this node.

  e. Situations where the node advertising the Affinity sub-TLV dies
     or restarts should be handled using the normal handling for such
     scenarios relating to the parent Router Capability TLV, and as 
     specified in [RFC4971].

  f. Conflicts in the Affinity sub-TLV from different originators for 
     the same tree number and child node should be handled as 
     specified in section 5.3 of [RFC7783].

  g. Situations where a link constantly flaps, should be handled
     by having the preferred parent node retract the affinity 
     sub-TLV, if it affects the parent child relationships in 
     consideration. The long-term state of the affinity sub-TLV can 
     be monitored by the preferred parent node to see if it is being
     published and retracted repeatedly in multiple iterations or
     if a specific set of children are being constantly added and
     removed. The preferred parent may resume publication of the
     affinity sub-TLV once it perceives the network to be stable
     again in the future.

If the node is forced to retract its affinity sub-TLV due to a change, 
in the tree structure, it can then repeat these steps in a subsequent 
tree construction, if the same node becomes a parent again, so long 
as it perceives the network to be stable (free of link/node flaps). 
In terms of nodes that do not support this algorithm, they are 
expected to seamlessly inter-operate with this scheme, so long as 
they understand and honor the affinity sub-TLV. The draft assumes 
that most implementations now support the affinity sub-TLV.

5. Alternative B - Using a variant of the Djikstra SPF algorithm.

While alternative A may be preferable, there may be another way in
which the same goal can be achieved - the classic SPF algorithm may be
modified to address the above problem. However, this approach should
be regarded as experimental, and is not advocated for use in a 
production environment until further evaluation is possible. The 
approach requires all nodes in the network to use the same modified 
SPF algorithm and same tie-break policy, and may work best in small 
sized networks.

When pulling vertices from TENT to PATH, if a given vertex can
can be pulled in by more than one parent at the same optimal cost,
the modified SPF algorithm presented below allows a policy filter to
be placed during the parent selection process, which can identify a
preferred parent for the vertex being pulled in.

Initially, tree computation starts off with the Trill parent 
selection rules. For subsequent tree computations, the policy function 
references stable snap-shots of the prior tree computation and tries to 
match existing parent-child relationships on a best-effort basis.

The algorithm models the SPF algorithm in traditional terms,
referencing two sets of vertices, PATH, and TENT.  PATH contains 
vertices known to be reachable at optimal cost from the
root vertex v. TENT contains vertices being examined for optimal
reachability.  TENT contains satellite information for each vertex,
which is the immediate parent that caused the vertex to be moved into
TENT, and the cost of the link between them. The algorithm is
presented in a pseudocode that is similar in syntax to the 'C' 
language.

<CODE BEGINS>
policy_parent_selection_optimized_spf_run(vertex root, 
                                          PolicyFunction PF,
                                          Tree PreviousTree)
                                          /* representation of the
                                           *  same tree as computed 
                                           * in the previous iteration,
                                           * as a reference for the
                                           * policy tie-breaker
                                           * function below.
                                           */
{
    
    /* L, LL are lists of tuples of the following form */
    List<(vertex, parent, link(parent, vertex), 
         link-cost(parent, vertex))>  L, LL;

    /* Heap is ordered in ascending order of link-costs */ 
     HEAP tmpHEAP;

    /* 
     * map_vertex2parents_heap maps a vertex to a HEAP of tuples, each
     * identifying a parent relationship. The heap is ordered in
     * ascending order of link-cost. It also serves as a representation 
     * of the set TENT.
     */

    Mapping map_vertex2parents_heap{vertex -> HEAP of tuples of type 
                                             (vertex, parent,
                                                    link(parent,vertex)
                                          link-cost(parent, vertex)) };
    initialize_to_empty(L);

    initialize_to_empty(LL);

    initialize_to_empty(map_vertex2parents_heap);

    set distance_from_root(root) = 0;
    root.in_PATH = TRUE;
    for each vertex in graph {
        if vertex is not root then {
            set distance_from_root(vertex) = infinity;
            vertex.in_PATH = FALSE;
        }
    } 
    

    for each link (root,w) that is attached to root {
        if (distance_from_root(w) > link-cost(root, w)) {
            map_vertex2parents_heap{w}.add_to_heap(
                           (w, root, link(root,w), link-cost(root,w)));
        }
    }

    /* 
     * Iterate while there are vertices (tuples) in TENT - TENT is
     * realized via map_vertex2parents_heap which is a mapping table 
     * with each entry of the table representing a (per-vertex) HEAP
     * of tuples. Each tuple represents a link i.e. a parent-child
     * relationship along with the identity of the link and its 
     * associated cost.
     */
    while (not_empty(map_vertex2parents_heap)) {

        /* There may be more than one vertex in TENT at 
         * minimal cost, and this modified SPF algorithm considers all
         * of them simultaneously, processing them as a list, in terms
         * of their addition to PATH, and in terms of equally 
         * considering them parents of as-yet undiscovered vertices 
         * that may be reachable at the same total cost from members 
         * of the list. 
         * priority_heap_minimum_multiple_dequeue_as_list takes the 
         * heap and returns a list of all the vertices that are at 
         * minimum cost from v. The list is returned as a list of 
         * tuples, where each tuple is of the form:
         * 
         * (vertex, parent, link(parent, vertex), 
         *  link-cost(vertex, parent))
         */

         initialize_to_empty(tmpHEAP);

         /*
          * Iterate over the vertices comprising the key-space of
          * map_vertex2parents_heap. The minimum from TENT is
          * derived in two steps - find the minimum cost for each vertex
          * from its potential parents (find the tuples), and then find
          * the minimum of the above minimums using a temporary HEAP.
          */
         foreach (vertex v in valid-keys(map_vertex2parents_heap)) {
             /* tuples of the type
              * (vertex, parent,link(vertex, parent),
              * link-cost(vertex, parent)) at a minimum cost per hash
              * table vertex entry are inserted to tmpHEAP. Each entry
              * in tmpHEAP is a tuple.
              */
             if (is-empty(map_vertex2parents_heap{v})
               continue; /* Move on to the next vertex */

             /* 
              * heap_dequeue_multiple_minimum_as_list() is assumed to
              * dequeue multiple entries at the top of the heap, if 
              * they are all at the same minimum value of the heap 
              * metric (cumulative reachability cost for this 
              * application).
              */
             tmpHEAP.add_to_heap(heap_dequeue_multiple_minimum_as_list(
                                         map_vertex2parents_heap{v}));
         }
        
         L = heap_dequeue_multiple_minimum_as_list(tmpHEAP);

        /*
         * L is a list of tuples.
         * L could potentially have only a single tuple. Iterate over
         * tuples in L. Tuples in L are optimally reachable from nodes
         * already in PATH.
         */
        foreach tuple (y, y_parent, link(y_parent, y), 
                       link-cost(y_parent, y)) in L {

            /* 
             * Identify tuples relating to y's potential parents, 
             * all leading to optimal reachability for y, collect
             * them in LL.
             */
            LL = 
            heap_dequeue_multiple_minimum_as_list(
                                         map_vertex2parents_heap{y});

             /* 
              * Policy function below takes a vertex, and its list of
              * eligible parents, and chooses one parent for the vertex.
              * It can be simply thought of as a tie-breaker that 
              * chooses one parent out of the list of eligible parents 
              * of y, but based on the consideration specified by the
              * policy, and being able to do so in a way that 
              * minimizes tree churn.
              * 
              * ASSUMPTION: Policy Function needs to be able
              *  to deal with trivial case of the list of eligible
              * parents having only one parent, it must select that one
              * parent unconditionally in that case for the given child
              * node.
              *
              * PolicyFunction is not explicitly defined here, it may
              * take other parameters such as a representation of the
              * previous computation of the tree to help it make a 
              * selection that helps preserve links in the tree as they 
              * existed prior to this computation.
              */ 

             if (y_parent == 
                   PolicyFunction(y, LL, PreviousTree, TreeNumber)) {

                 /*
                  * if y is allowed by the policy function, it can get 
                  * pulled into PATH here. However, if y is excluded 
                  * at this stage, then it must have another potential 
                  * instance in TENT (but see assumption above), which
                  * will pull it in thru a different parent - 
                  * correctness of the SPF algorithm guarantees this, 
                  * and this other instance with other parent must be 
                  * in the list L.
                  */

                    y.in_PATH = TRUE; 
                    distance_from_root(y) = distance_from_root(y_parent)
                                            + link-cost(y_parent,y);
                    if (y_parent == root) 
                      /* directly connected */
                      nexthop_link_at_root(y) = link (root ,y);
                    else
                      nexthop_link_at_root(y) = 
                          nexthop_link_at_root(y_parent);

                 /* y is now in PATH. 
                  * identify_vertex_as_parent() is defined further 
                  * down below, it marks y as parent for its 
                  * downstream children, by updating the 
                  * map_vertex2parents_heap entry for nodes 
                  * immediately downstream of y. Note
                  * that we do not need to iterate over all instances
                  * of y in LL - this can be done by simply examining 
                  * all links on y and pulling in those not in PATH.
                  */

                 identify_vertex_as_parent(y, map_vertex2parents_heap);
             }
        }
    }

    for each vertex w in graph {
        if (w is not root) {
            print (distance_from_root(w), nexthop_link_at_root(w));
        }
    }
}

identify_vertex_as_parent(vertex y, Mapping map_vertex2parents_heap)
{
    for each link (y,x) that is attached to y {
        if (x.in_PATH == FALSE) // Ignore nodes already in PATH.
        {
            /* 
             * Routine is only called on vertices y that are already
             * in PATH, so child x will be reachable optimally from y.
             *
             * y is a potential parent of x, because x is optimally 
             * reachable from y. But we need to see if there are
             * other potential parents of x at the same optimal cost,
             * the policy function will help pick the right parent,
             * during its invocation in the SPF run for x's selection
             * into path.
             *
             */
            if (distance_from_root(x) > (distance_from_root(y) 
                                         + cost(y,x)) {
                map_vertex2parents_heap{x}.add_to_heap(x, y, (y,x),
                                       distance_from_root(y)+cost(y,x));
            }
        }
    }
}
<CODE ENDS>

5.1 Choice of the policy function. 

In order to solve the discussed problem, the policy function must
preserve parent-child links as they existed in previous stable 
(non-transient) computations of the tree and it can do this by 
consulting a a stable snapshot of the previous tree computation, 
to identify and preserve parent-child relationships in the prior 
SPF tree. Text further below discusses how a stable snap-shot of 
the tree may be collected.

The policy is applied on a best-effort basis and is consulted 
during the normal SPF tree construction mechanism if a given child 
node can be pulled in from a choice of multiple parents, and if one 
of those parents had pulled in that child node in a previous 
stable snap-shot of the tree.

When used in this manner, if a parent node in the previous
tree computation is no longer alive/reachable, it should not be used 
as a reference in preserving any parent-child relationships. A 
secondary policy or fall-back tie-break may be needed to identify
a parent for a given child in this situation, and this is detailed
below. Also, as noted in the code section, in the case where a child
node has only one parent, the policy function must unconditionally
select that parent node to pull in the child to the SPF tree. The
policy function may also take the tree number as an input
parameter to determine a fallback tie-break.

When the policy function is set up in the manner described above,
it helps maintain parent affinity by explicitly breaking ties
preferentially so that prior parent relationships are maintained in a
new computation of the tree. However, the challenge of synchronizing
the application of the policy across rBridges in the network needs to
be addressed. In order to do this, the following rules may need to be
adhered to when using this approach:

a. The first tree computation is carried out using the Trill
   specified tree construction mechanism.

b. So long as the root node for that tree number does not
   change physically, the second and subsequent tree computations
   for the same tree can use this algorithm and try to feed in a
   representation of the previous steady-state tree computation as 
   a guide to the policy function, subject to the following caveats:

   i. if a non-root node that was a parent in a previous computation 
      is no longer reachable/alive, then the computation should fall 
      back to the default Trill parent selection rule for the set of
      remaining parent candidates and the associated children. Note 
      that if a sibling of the parent node goes away or changes its 
      link connectivity, it does not impact the concerned parent's
      child relationships.

  ii. If a link between a parent node and child node (in the previous
      computation of that tree) goes down, then that child alone should
      be pulled in to the SPF tree using vertices that are upstream of 
      it during tree computation, subject to the Trill specified 
      tie-break rule. Note that this may result in some children being 
      pulled in through the old parent and some pulled in through a 
      different node, subject to the Trill tie-break rule.

c. If the root node for that tree number changes to a physically
   different node, then the first tree computation for that tree number
   with that new tree root should be carried out using the default
   Trill parent-selection rules. The second and subsequent computations
   can leverage this approach, each feeding in a representation of the
   previous stable snap-shot as a reference for the current computation.

d. There may be cases during tree construction with this approach
   where more than one parent finds a match in the representation
   of the previous tree - in this case the tie should be broken
   according to the default Trill parent selection rule. This can
   happen, when a node that was a parent in a previous computation
   becomes a child node of its former child in the current tree,
   due to a link going down, for example. In this case, the 
   directional nature of the (parent, child) link in the prior
   snapshot may need to be ignored while trying to find a match in
   the prior snapshot.
   
e. The policy function must always be given the representation of the
   most recent stable snapshot of the prior tree computation for that 
   tree number. Ideally, a stable snap-shot should reflect a converged 
   tree state after all recent network churn events have been absorbed. 
   A timer may be started when any tree computation starts. The timer 
   would need to be long enough to capture a converged tree state. 
   Any interim network events received while the timer is running 
   extend the timer. When the timer expires the tree is deemed to have
   converged, and a snap-shot of the tree may be collected. 

f. Alternatively, snap-shot collection may be triggered when the tree 
   routes are being programmed in the RIB, since most implementations 
   would download routes to the RIB only when the tree calculation has 
   completed successfully. An additional stabilization timer may be
   used, after the RIB trigger, to capture cases where multiple tree
   computations run back to back. The snapshot must collect all links 
   in the tree, reflecting parent child relationships in the tree.

g. To handle situations where a link or node is flapping constantly,
   nodes in the network should turn off snap-shot matching on the 
   child nodes connected to the affected link or node
   (if the node happens to be a parent node, then snapshot matching
    should be turned of for all the children of that node) and fallback
   to default Trill parent selection rules for the affected children. 
   Nodes in the network may examine parent-child relationships in the 
   successive collected snapshots to see if a specific sub-tree is 
   toggling within a pre-configured interval of time.

6. Network wide selection of computation algorithm.

Alternative A does not need any operational change to the
Trill protocol, beyond the usage of the affinity sub-TLV (which is
already in the proposed standard) for the use case identified in 
this draft.

For alternative B, this draft takes no position on how rBridges in 
the network may negotiate and select the modified algorithm as the 
preferred tree computation mechanism at this time.

7. Relationship to draft-ietf-trill-resilient-trees.

Given that both draft-ietf-trill-resilient-trees, and
draft-rp-trill-parent-selection-02 drafts use the affinity sub-TLV,
it is worthwhile to examine if there is any functional overlap 
between the two drafts.

At a high level, draft-ietf-trill-resilient-trees relates to link
protection, and defines the notion of a primary distribution tree 
and a backup distribution tree (DT), where these trees are
intentionally kept link disjoint to the extent possible, and the 
backup tree is pre-programmed in the hardware, and activated either
upfront or upon failure of the primary distribution tree. 

On the other hand, draft-rp-trill-parent-selection-02 protects
parent-child relationships of interest on the primary DT, and has
no direct notion of a backup DT.

draft-ietf-trill-resilient-trees considers the following algorithmic 
approaches to the building the backup distribution tree (section
numbers listed are from that draft):

1. Pure operator configuration for links on the backup DT/manual 
   generation of affinity sub-TLV - this is very tedious and unlikely
   to scale or be implemented in practice, and hence is disregarded 
   in the analysis here.

2. Section 3.2.1.1a: Use of MRT algorithms (which will produce conjugate 
   trees - link disjoint trees with roots for primary and backup trees
   that are coincident on the same physical rBridge).

3. Section 3.2.1.1b: Once the primary DT is constructed, the links 
   used in the primary DT are additively cost re-weighted, and a 
   second SPF is run to derive the links comprising the backup DT. 
   Affinity sub-TLV is used to mark links on the back-up DT which are 
   not also on the primary DT. This approach can handle conjugate 
   trees as well as non-conjugate trees (link disjoint trees that are 
   rooted at different physical nodes).

4. Section 3.2.2: A variation on the section 3.2.1.1b approach, but 
   without affinity sub-TLV advertisement. Once the primary DT is 
   constructed, costs for links on the primary DT are multiplied by a 
   fixed multiplier to prevent them from being selected in a 
   subsequent SPF run, unless there is no other choice, and the 
   subsequent SPF yields links on the backup DT.

All of the approaches above yield maximally link disjoint trees, 
when applied as prescribed.

Approach 4 above does not seem to use affinity sub-TLVs and instead
seems to depend upon a network wide agreement on the alternative
tree computation algorithm being used.

Approaches 2 and 3 use affinity sub-TLV on the backup DT, for links 
that are not already on the primary DT. The primary DT does not 
appear to use affinity sub-TLVs. Additionally, from an end-to-end 
perspective the backup DT comes into picture when the primary DT 
fails (this is effectively true even in the 1+1 protection mechanism 
and in the local protection case), and then again, only until the 
primary DT is recalculated. Once the primary DT is recalculated, the 
backup DT is recalculated as well, and can change corresponding to 
the new primary DT.

draft-ietf-trill-resilient-trees cannot directly prevent/mitigate a
parent node shift on the primary DT at a given parent node, and while 
usage of the affinity sub-TLV on the backup DT might confer a parent 
affinity on some nodes on the backup DT, these are not necessarily 
the nodes on which the network operator may want/prefer an explicit 
parent affinity. Further, the backup DT is only used on a transient 
basis, from a forwarding perspective, until the primary DT is 
recomputed.

In situations involving a node failure, there is no direct functional 
overlap between draft-ietf-trill-resilient-trees, and the 
draft-rp-trill-parent-selection-02 draft. The two drafts have different
goals and appear to solve unrelated problems, in this situation.

Sometimes, a parent shift can be triggered by link failure. In a
situation where both drafts are active in the implementation, failure
of a specific link may cause the backup DT to kick in, but when the
primary DT is re-calculated, draft-rp-trill-parent-selection-02 can be
used to preserve parent-child relationships on the primary DT, to the 
extent possible, during the re-calculation. So, in this case also, 
there does not appear to be a direct functional overlap in the 
simultaneous usage of these drafts.

8. Security Considerations.

The proposal primarily influences tree construction and tries to
preserve parent-child relationships in the tree from prior computations
of the same tree, without changing any of operational aspects of the 
protocol. Hence, no new security considerations for Trill are raised 
by this proposal.

9. IANA Considerations.

No new registry entries are requested to be assigned by IANA. The
Affinity Sub-TLV has been defined in [RFC7176], and this proposal
does not change its semantics in any way.

10. Informative References.

    [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, 
              DOI 10.17487/RFC2119, March 1997, 
              <http://www.rfc-editor.org/info/rfc2119>.

    [RFC6325] Perlman, R., Eastlake 3rd, D., Dutt, D., Gai, S., and A.
              Ghanwani, "Routing Bridges (RBridges): Base Protocol
              Specification", RFC 6325, DOI 10.17487/RFC6325, July 2011,
              <http://www.rfc-editor.org/info/rfc6325>.

    [RFC7780] - Eastlake 3rd, D., Zhang, M., Perlman, R., Banerjee, A.,
             Ghanwani, A., and S. Gupta, "Transparent Interconnection of
             Lots of Links (TRILL): Clarifications, Corrections, and
             Updates", RFC 7780, DOI 10.17487/RFC7780, February 2016,
             <http://www.rfc-editor.org/info/rfc7780>.

    [RFC7783] Senevirathne, T., Pathangi, J., Hudson, J., "Coordinated 
              Multicast Trees (CMT) for Transparent Interconnection of 
              Lots of Links (TRILL)", RFC 7783, February 2016,
              <http://datatracker.ietf.org/doc/rfc7783>

    [RFC4971] Vasseur, JP., Shen, N., Aggarwal, R., "Intermediate 
              System to Intermediate System (IS-IS) Extensions for 
              Advertising Router Information", RFC 4971, July 2007,
              <http://datatracker.ietf.org/doc/rfc4971>

Author's Address:

R. Parameswaran,
Brocade Communications, Inc.
120 Holger Way, 
San Jose, CA 95134.

Email: parameswaran.r7@gmail.com

Copyright and IPR Provisions

   Copyright (c) 2017 IETF Trust and the persons identified as the
   document authors. All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document. Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document. Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.  The definitive version of
   an IETF Document is that published by, or under the auspices of, the
   IETF. Versions of IETF Documents that are published by third parties,
   including those that are translated into other languages, should not
   be considered to be definitive versions of IETF Documents. The
   definitive version of these Legal Provisions is that published by, or
   under the auspices of, the IETF. Versions of these Legal Provisions
   that are published by third parties, including those that are
   translated into other languages, should not be considered to be
   definitive versions of these Legal Provisions.  For the avoidance of
   doubt, each Contributor to the IETF Standards Process licenses each
   Contribution that he or she makes as part of the IETF Standards
   Process to the IETF Trust pursuant to the provisions of RFC 5378. No
   language to the contrary, or terms, conditions or rights that differ
   from or are inconsistent with the rights and licenses granted under
   RFC 5378, shall have any effect and shall be null and void, whether
   published or posted by such Contributor, or included with or in such
   Contribution.