Skip to main content

Early Review of draft-ietf-rift-rift-06

Request Review of draft-ietf-rift-rift-06
Requested revision 06 (document currently at 24)
Type Early Review
Team Routing Area Directorate (rtgdir)
Deadline 2019-07-19
Requested 2019-06-23
Requested by Zhaohui (Jeffrey) Zhang
Authors Tony Przygienda , Jordan Head , Alankar Sharma , Pascal Thubert , Bruno Rijsman , Dmitry Afanasiev
I-D last updated 2019-07-23
Completed reviews Intdir Telechat review of -21 by Dave Thaler (diff)
Rtgdir Last Call review of -20 by Loa Andersson (diff)
Secdir Early review of -04 by Scott G. Kelly (diff)
Rtgdir Early review of -06 by Russ White (diff)
Genart Early review of -08 by Robert Sparks (diff)
Secdir Early review of -08 by Scott G. Kelly (diff)
Opsdir Early review of -08 by Nagendra Kumar Nainar (diff)
Rtgdir Early review of -08 by Jonathan Hardwick (diff)
We're planning to issue WGLC soon after your review.
Assignment Reviewer Russ White
State Completed
Request Early review on draft-ietf-rift-rift by Routing Area Directorate Assigned
Posted at
Reviewed revision 06 (document currently at 24)
Result Has issues
Completed 2019-07-23

I have been selected to do a routing directorate “early” review of this draft.

The routing directorate will, on request from the working group chair, perform
an “early” review of a draft before it is submitted for publication to the
IESG. The early review can be performed at any time during the draft’s lifetime
as a working group document. The purpose of the early review depends on the
stage that the document has reached.

As this document has recently been adopted by the working group, my focus for
the review is on providing a new perspective on the work, with the intention of
catching any issues early on in the document's life cycle.

For more information about the Routing Directorate, please see

Document: draft-ietf-rift-rift-06
Reviewer: Russ White
Review Date: 23 July 2019
Intended Status: Standards Track


I have significant concerns about this document. It needs more work before
being submitted to the IESG.


Overall this draft is very readable, although I have suggested wording changes
below. The protocol described is useful and interesting. I largely have
questions about the structure and wording of the draft, although there are a
couple of technical questions in the comments below (probably just things I
missed the explanation for, however).

Structurally, this document almost feels like two different documents. The
first specifies a set of use cases and requirements, while the second specifies
a protocol. I wonder if it wouldn't be better to split this document into two
pieces, making each one more specific, and possibly a bit easier to read? We
might too late in the process to make such a change -- if so, that's okay, just
thought it might be worth considering.

Throughout -- "fat tree," "Clos," and other terms are used in various places.
As a reader, given the different meanings for these terms, I found this a bit
confusing. It might be better to use "spine and leaf" throughout.

Throughout -- the term "folded" has two very different meanings... The first is
a multistage fabric on which traffic flows bi-directionally, the second is a
way of drawing spine and leaf fabrics in blocks. It might be useful to clarify
this from the beginning somehow, so readers who are expecting one meaning don't
misread things because a different meaning is intended (?). I think it's always
used here in the "way of drawing a spine and leaf fabric" here.

Page 9

The definitions of the various forms of TIE might be better if they clearly
differentiated between topology and reachability information (?). The OSPF
router LSA, for instance, contains both kinds of information. In IS-IS,
adjacencies, which are called Node TIES (I think) are tlv22, while the Prefix
TIE would be a tlv128 and 236. I don't think there is an equivalent separation
between topology (reachable neighbors) and reachability (reachable prefixes) in

REQ13: "Taking a path through the spine in cases where a shorter path is
available is highly undesirable."

This doesn't seem to be related to the previous statements in the
requirement... Not certain this needs to be included? Or maybe it wants to be
someplace else in the list?

REQ10 and REQ13 both seem to say the same thing -- although REQ13 is better
worded. Maybe both of these are not needed?

REQ15: "...links of a single node having same addresses." -- might be better as
"...links of a single node sharing the same address."

REQ18: " mechanisms that allow to restrict nodes, especially leafs
without proper credentials from forming three-way adjacencies."

Might be better as:

" mechanisms that allow the operator to restrict nodes, especially
leaf nodes without proper credentials, from forming a three-way adjacency."

PEND2: 500,000 seems way too small for the scale numbers I've heard in the past
-- especially if we are including the underlay protocol running on the hosts in
the mix. If there are 300k ports is at least 1 prefix per edge port, and
potentially 10-15 prefixes per edge port. If you guess around 20% of the
workloads attached to the fabric might be mobile, then the number here is more
likely 1 million at the base, with a top end of around 2-3 million prefixes. It
might be worth adjusting this number a bit larger (?).

Section 5, paragraph after the header

"...when "pointing north" and path vector [RFC4271] protocol when "pointing
south". While this is an unusual combination,..."

Elsewhere in the document "distance-vector" is used. It might be better to be

Section 5.1.1

In the first paragraph ("The most singular property..."), omitting east-west
control plane information flow is mentioned twice; could probably just be once.

The first sentence in the second paragraph is a bit of a run-on; might be best
to omit some of this, or break it into two sentences.

Second paragraph: "The application of those principle lead to RIFT..." doesn't
seem right. Might be better as: "The application of these principles lead to

Section 5.1.2, paragraph beginning: "Given the difficulty of visualizing multi
plane design which are..." First, this is one long sentence. Second, what's
illustrated looks like a crossbar, but it's not a crossbar. I find this a
little confusing (?). The problem is -- I don't know of another term to use

Section 5.1.2, paragraph beginning: "The typical topology for which RIFT is
defined is built of a number P..." This seems to describe a five stage
butterfly or pod-and-spine fabric, but does not seem to describe a seven stage

Overall -- I'm not certain trying to describe these topologies at this level in
a protocol specification draft is all that useful. The terminology is fluid
(used by different people in different ways), and this depth of explanation
does not seem to be useful to describe the protocol operation. This is one
place where splitting the document into two pieces might be helpful.

Paragraph beginning: "It is critical at this junction that the concept and the
picture of..." What is shown is not really a crossbar fabric... I'm not certain
calling this a crossbar is helpful here (?).

Section 5.1.4: "This happens naturally in single plane design but needs
additional considerations in multiplane fabrics." When aggregation is used, as
well -- but not when no aggregation of reachability information is deployed in
the fabric. It might be worth adding that qualification here.

Section 5.1.4

Sentence beginning: "In more detail, by reserving two ports on each
Top-of-Fabric node it is possible to connect them together in an interplane
bi-directional ring as illustrated in Figure 13..."

I might have missed it, but this document does not seem to address how traffic
will be prevented from flowing through these links. It's probably important to
note either that traffic will flow along these links, hence they need to be
sized appropriately, or how traffic is prevented from flowing along these links.

Section 5.1.5

Sentence beginning: "The effect of a positive disaggregation..." If positive
deaggregation takes place, it could draw all the traffic to the deaggregated
destinations along a small set of links, causing a hot spot in the fabric. This
might be taken care of "naturally" in the way RIFT does positive deaggregation,
but it might be good to explain this in the document.

A more general observation: positive deaggregation, as described in the
document, can potentially cause cascading failures where a group of links or
nodes fail, causing a lot of new information to be pushed into lower levels
rather quickly, which could potentially cause those nodes to overrun their
table or processing space and fail in turn, causing more information to be
dumped into the control plane, etc. It may be worth connsidering having some
form of "hysteresis," timer or other mechanism to prevent cascading failures of
this kind.

Section 5.2.2

"All RIFT routers MUST support IPv4 forwarding and MAY support IPv6..."

So an IPv6 only fabric is not possible?

Section 5.2.6

"After the SPF is run, it is necessary to attach according prefixes."

I think this might want to mean --

"After SPF is run, it is necessary to attach the resulting reachability
information in the form of prefixes."


Section 5.2.6

It seems like the first and third paragraphs describe the same procedure? Are
both descriptions necessary?

Section 5.2.6

"An exemplary implementation..." should probably be "an example
implementation..." ??

Section 5.2.6

"After the positive prefixes are attached and tie-broken, negative prefixes are
attached and used in case of northbound computation, ideally from the shortest
length to the longest."

It seems like the prefixes should be "tie-broken" before being attached. This
sentence is generally confusing, perhaps:

"After the positive prefixes have been attached, the negative prefixes will be
attached in their prefix-length order (from the shortest to the longest). These
negative prefixes will only be used when computing northbound reachability."

Section 5.2.6

"Observe that despite seeming quite computationally expensive the operations
are only necessary if the only available advertisements for a prefix are
negative since tie-breaking always prefers positive information for the prefix
which stops any kind of recursion since positive information never inherits
next hops."

Just because something is not done all that often does not mean it's not
computationally expensive. :-) Maybe something like this would be better:

"Although these operations can be computationally expensive, the overall load
on devices in the network is low because these computations are not run very
often, as positive route advertisements are always preferred over negative
ones. This prevents recursion in most cases because positive reachability
information never inherits next hops."

Or something like this?

Section 5.2.6

"Negative 2001:db8::/32 entry inherits from"

Probably wants to be "The negative 2001:db8::/32 entry..."

Section 5.2.6

"Finally, let us look at the case where S3 becomes available again as default
gateway, ..."

"...the default gateway..." or "...a default gateway..."

Section 5.2.7

"Each RIFT node can optionally operate in zero touch provisioning..."

"optionally" is not needed here


"RIFT identifies each node via a SystemID which is a 64 bits wide integer. It
is relatively simple to derive a, for all practical purposes collision free,
value for each node on startup. For that purpose, a node MUST use as system ID
EUI-64 MA-L format [EUI64] where the organizationally governed 24 bits can be
used to generate system IDs for multiple RIFT instances running on the system."

This seems too wordy -- maybe:

"RIFT nodes require a 64 bit SystemID in the EUI-64 MA-L format formed as
described in [EUI64]. The organizationally goverened portion of this ID  (24
bits) can be used to generate multiple IDs if required to indicate more than
one RIFT instance."

Section 5.3.1

"Overload Bit MUST be respected..."

should be

"The overload bit MUST be respected..."

Section 5.3.2

"Since the leafs do see only "one hop away" they do not need to run a full SPF
but can simply gather prefix candidates from their neighbors and build the
according routing table."

Seems like this should be

"Since leaf nodes only have one hop of topology information, they do not need
to run SPF. Instead, they can gather the available prefix candidates and build
the routing table accordingly."

Section 5.3.3

"time stamp: With this method, the infrastructure memorizes the..."

I think the correct word here is "records," rather than "memorizes" (?)

Section 5.3.3

"sequence counter: With this method, a mobile node notifies its point..."

Another problem here is the node might not "know" it has been moved from one
location to another in the fabric, particularly if a layer 2 only attached
process is moved along with it's ARP cache, and something like the EVPN default
MAC is used to enable the move. I don't know if this is worth mentioning, but
it does seem like a case that needs to be thought about to make certain it
works with the process described.

Section 5.3.3

"The leaf node MUST advertise a time stamp of the latest sighting..."

Is this for all prefixes, or just for prefixes that are somehow marked or
configured as potentially mobile on the fabric?

Section 5.3.3

"RIFT also defines an abstract negative clock (ANSC) that compares as less than
any other clock."

Does this mean lower priority, or always older than any other clock? It seems
like this means "older" based on, but the language could probably be

Section 5.3.3

One thing not covered here is what happens if some workload moves between two
RIFT fabrics interconnected via some other protocol, such as BGP. It seems the
most obvious solution is to have RIFT treat redistributed destinations the same
as directly attached destinations in terms of clocks, etc., but this is not
called out in the document. It might be worth mentioning.


This section seems a little ... muddled? You want to both have the ability to
use only the most recently available versions of any anycast destination, while
also not requiring all the anycast advertisements to have the same time stamp
(?). I'm not certain this section entirely solves that problem. This sentence
might be creating the confusion in my reading of the section, as I don't really
understand what it is saying:

"An anycast prefix does not carry a clock or all clock attributes MUST be the
same under the rules of Section"

MUST be treated the same as prefixes advertised under the rules of I'm
not certain.


"RIFT is agnostic whether any overlay technology like [MIP, LISP, VxLAN, NVO3]
and the associated signaling is deployed over it. But it is expected that leaf
nodes, and possibly Top-of-Fabric nodes can perform the according

"according" should be "correct," I think.


"All the multiplications and additions are saturating, i.e. when exceeding
range of the bandwidth type are set to highest possible value of the type."

Maybe better:

"If a calculation produces a result exceeding the range of the type, the result
is set to the highest possible value for that type."