Javascript disabled? Like other modern websites, the IETF Datatracker relies on Javascript. Please enable Javascript for full functionality.
Last Call Review of draft-ietf-rtgwg-bgp-pic-12
review-ietf-rtgwg-bgp-pic-12-genart-lc-enghardt-2021-01-10-00

Versions:
Request	Review of	draft-ietf-rtgwg-bgp-pic-12
	Requested revision	12 (document currently at 20)
	Type	Last Call Review
	Team	General Area Review Team (Gen-ART) (genart)
	Deadline	2021-01-17
	Requested	2020-12-09
	Requested by	Jeff Tantsura
	Authors	Ahmed Bashandy , Clarence Filsfils , Prodosh Mohapatra
	I-D last updated	2021-01-10
	Completed reviews	Rtgdir Early review of -00 by Bruno Decraene (diff) Secdir Last Call review of -12 by Tero Kivinen (diff) Genart Last Call review of -12 by Reese Enghardt (diff) Iotdir Last Call review of -12 by Ines Robles (diff) Rtgdir Last Call review of -12 by Bruno Decraene (diff) Tsvart Last Call review of -12 by Brian Trammell (diff)
	Comments	The chairs are about to start WGLC and appreciate your reviews and comments.
Assignment	Reviewer	Reese Enghardt
	State	Completed
	Request	Last Call review on draft-ietf-rtgwg-bgp-pic by General Area Review Team (Gen-ART) Assigned
	Posted at	https://mailarchive.ietf.org/arch/msg/gen-art/RILa_ZQmuEEYfDvWsK2ImE7Ypfo
	Reviewed revision	12 (document currently at 20)
	Result	Ready w/issues
	Completed	2021-01-10
review-ietf-rtgwg-bgp-pic-12-genart-lc-enghardt-2021-01-10-00
I am the assigned Gen-ART reviewer for this draft. The General Area
Review Team (Gen-ART) reviews all IETF documents being processed
by the IESG for the IETF Chair.  Please treat these comments just
like any other review comments.

For more information, please see the FAQ at

<https://trac.ietf.org/trac/gen/wiki/GenArtfaq>.

Document: draft-ietf-rtgwg-bgp-pic-12
Reviewer: Theresa Enghardt
Review Date: 2021-01-10
IETF LC End Date: None
IESG Telechat date: Not scheduled for a telechat

Summary: The draft is basically ready for publication as an Informational RFC,
but it has some context, clarity, and editorial issues that need to be fixed
before publication.

Major issues: None.

Minor issues:

Abstract:

"In the network comprising thousands of iBGP peers exchanging millions
of routes, many routes are reachable via more than one next-hop.
Given the large scaling targets, it is desirable to restore traffic
after failure in a time period that does not depend on the number of
BGP prefixes."
This part is missing a logical step in the argumentation between these two
sentences. Is the first statement a prerequisite for restoring traffic, and
then the question is how to make it scalable? Is the first statement the reason
for things not being scalable? Please rephrase to make the relationship between
these statements and the overall argumentation clear. Is "depending on the
number of BGP prefixes" an inherent feature of BGP, or are you making any
implicit assumptions? If so, please state them.

"In this document we proposed an architecture […]"
What does architecture mean in this context? Without any further qualification,
in a networking context, as a reader I assume that "architecture" means
"network architecture", i.e., something that involves multiple nodes such as
multiple BGP speakers. But it appears that the document is only about the
internals of each individual BGP speaker, i.e., how information is organized
within the router. So maybe it's "router architecture" or "software
architecture" or such? Please rephrase to make this clear in the abstract.

Please clarify your scope. As the abstract specifically mentions iBGP, is this
solution only about iBGP? Or is it about eBGP as well?

Introduction:

The introduction is missing a clear problem statement. Perhaps it's implicitly
stated by saying that "convergence speed is limited by the time taken to
serially propagate reachability information from the point of failure to the
device that must re-converge.", but please be specific. Is this convergence
speed that depends on information propagation time considered "too long", and
therefore it needs to be reduced? Is it "too long" specifically in certain
contexts, e.g., networks of a certain size? As the document actually appears to
focus on speeding up changes within a singe node, it's not clear how this
relates to propagation time. Does the node-internal speedup also speed up how
fast propagated information converges? Why? As the statement about reachibility
information being exchanged is the first sentence of the introduction, this
makes it seems like it's fundamental to your document. If this is not the case,
please consider starting the introduction with a clear problem statement that
is actually fundamental to your document, such as "The way that information is
currently organized within a BGP speaker [under … circumstances] is inefficient
[for … reason] and leads to long convergence times."

In the next sentence, "BGP speakers exchange reachability information about
prefixes […]", the relationship to the problem statement is still not clear. Is
this reachability information insufficient? Is there already is enough
information to converge faster, and now your solution allows converging faster?
Or something else?

"[…] for labeled address families, namely AFI/SAFI 1/4, 2/4, 1/128, and 2/128
[…]" - Please expand these acronyms on first use and provide a reference.

"[…] an edge router assigns local labels to prefixes and associates the
local label with each advertised prefix […]"
Does this apply to incoming advertisements, outgoing advertisements, or both?
Please make the context clear here.

"[…] such as L3VPN [7], 6PE
[8], and Softwire [6] using BGP label unicast technique[3]."
The "such as" is not entirely clear: If these are examples of the technique
that the rest of the sentence describes, perhaps "using technologies such as"
would be more clear. However, as the entire sentence is already very long,
please consider splitting the sentence and make the relationship between the
statements clear.

Please expand NLRI on first use and perhaps provide a definition or reference.

How does the proposal in this document relate to the techniques you mention,
i.e., L3VPN, 6PE, and Softwire? Does it require them? Is their usage optional
for your solution, but helps (and why)? Please make the relationship of your
solution to these techniques explicit and state the prerequirements of your
solution, if any.

"This document proposes a hierarchical and shared forwarding chain
organization […]"
What is your solution an alternative to? How has information previously been
organized? How does the concept of a forwarding chain relate to the details you
already gave, which were about a BGP speaker exchanging reachability
information and applying path selection - where does the forwarding chain come
in? As this appears to be a fundamental concept to your solution, please
introduce it in the first paragraph.

"incrementally deployed and enabled with zero operator intervention"
Well, deplying and enabling any solution does require operator intervention,
e.g., a software update, correct? So perhaps that's Zero other operator
intervention? Minimal operator intervention? Or not requiring a specific type
of operator intervention that would otherwise be needed? Later in Section 3.1,
the draft says "It is noteworthy to mention that the forwarding chain is
constructed without any operator intervention at all.", so perhaps it's
possible to further qualify what kind of operator intervention would otherwise
be necessary, but is not necessary with your solution - e.g., no operator
intervention is required to reconfigure routes when a link fails

1.1 Terminology

Please expand on first usage and consider defining: AFI/SAFI, PE, CE, NLRI,
forwarding plane, VPN RD's (probably VPN RDs), LSR, ASBRs, BGP-LU, FIB manager
(is this a particular entity? A software component?) You don't have to define
all BGP terms that you use, but please expand them once to make it easier to
guess what they stand for or to look them up.

For "Leaf", "IP leaf", "Label leaf": Why is it called leaf? In graph theory,
isn't the leaf of a tree the node with no children and only one parent? In your
figures, the "IP leaf" appears to have no parent and instead two children. So
isn't it more of a root in the tree? Later, you mention the pathlist being "the
parent" of the IP leaf, but in Figure 2, you have an arrow from the IP leaf
pointing to the Pathlist, so to me that looks like the Pathlist is the child of
the IP leaf. Is this a BGP convention? If so, perhaps a sentence stating that
would help, and/or a reference.

"OutLabel-List: Each labeled prefix is associated with an
          OutLabel-List. The OutLabel-List is an array of one or more
          outgoing labels and/or label actions where each label or label
          action has 1-to-1 correspondence to a path in the pathlist.
          Label actions are: push the label, pop the label, swap the
          incoming label with the label in the Outlabel-Array entry, or
          don't push anything at all in case of "unlabeled". The prefix
          may be an IGP or BGP prefix"
What are labels/label actions in this context? Are labels the same labels
mentioned in the introduction, i.e., local labels that are assigned to
prefixes? Are "outgoing labels" still local? Maybe here a brief explanation of
how labels are defined and how they work would help.

2. Overview:

"A forwarding plane that supports multiple levels of indirection:
A forwarding that starts with a destination and ends with an
outgoing interface is not a simple flat structure."
What is "A forwarding"? Do you mean a forwarding entry? Is this the same thing
as a route? Please consider adding a definition to the terminology. Is a
forwarding plane the same as a forwarding chain (mentioned in the abstract)? If
so, please unify your terminology. If not, please define the terms and explain
what the differences are.

2.1.2. Availability of more than one BGP next-hops

"The existence of a secondary next-hop is clear for the following
reason: a service caring for network availability will require two
disjoint network connections hence two BGP next-hops."

By "the existence is clear" you mean "The existence is clearly required" or "It
is clear whether a secondary next-hop exists" or something else?

2.2 BGP-PIC Illustration

"We can see that the BGP
pathlist consisting of BGP-NH1 and BGP-NH2 is shared by all NLRIs
reachable via ePE1 and ePE2."
How can we see that? ePE1 and ePE2 do not show up in Figure 2. I assume they
map to something that is shown, but it's not clear what.

3.2. Example: Primary-Backup Path Scenario

Comparing Figure 3 to Figure 2, there's a couple of differences in terminology:
Figure 2 has an "IP Leaf" and Figure 3 has an "IP prefix leaf" called VPN-IP1.
Are "IP Leaf" and "IP prefix leaf" the same concept? If so, please unify your
terminology. Same question for VPN-L11 being "OutLabel-List" (Figure 2) and
"Label-leaf" (Figure 3), VPN-L21 being part of an "OutLabel-List" (Figure 2)
and "BGP OutLabel Array" (Figure 3), and BGP-NH1 being part of a "Pathlist"
(Figure 2) and "BGP Pathlist". Figure 3 does not appear to show any Adjacency -
why? Figure 2 does not appear to show any label actions - Why? Furthermore,
making the figures more similar stylistically (e.g., having "IP prefix leaf"
being always underlined or always in brackets) would help for comparing the two
figures.

4. Forwarding Behavior

"apply the label action of the label on the packet"
What does this mean? Does "push" mean that the forwarding engine will add the
label to the packet? How will this label be used? Will it be removed from the
packet later? Will it be sent in a BGP advertisement? Please make this clearer
here, and/or please explain what labels and label actions are earlier, and how
they are used.

"the forwarding engine applies a hashing algorithm to choose the path and
the hashing at the BGP level yields path 0 while the hashing at the
IGP level yields path 1"
This sounds like ECMP, i.e., there's multiple paths and each packet is hashed
and then sent through a path based on the hash. But the earlier sections
sounded like your solution was more about primary paths and secondary failover
paths. Are these two general approaches and your solution works for either?
Please make this explicit, possibly early in the document.

5.1. Flattening the Forwarding Chain

"Suppose the platform cannot support the number of hierarchy levels
in the forwarding chain. FIB needs to reduce the number of hierarchy
levels. […]"
When in the process does this flattening happen? Only when a packet is
forwarded, like in the above steps, or does it happen when the chain is first
constructed? Does the flattening happen after a specific step in the above
process, e.g., step 3, or is it independent? If it happens for each forwarded
packet, this seems like a lot of steps. How is the overall efficiency still
maintained?

6.1. BGP-PIC core

"When a remote link or node fails, IGP on the ingress PE receives
advertisement indicating a topology change so IGP re-converges to
either find a new next-hop and/or outgoing interface or remove the
path completely from the IGP prefix used to resolve BGP next-hops."
Why IGP, when this document is about BGP?
Is implied by the scope "when a core link or node fails but the BGP next-hop
remains reachable"? If so, please make this explicit.

"As soon as the IGP convergence is
complete for the BGP next-hop IGP route, all its BGP depending
routes benefit from the new path."
What would happen in a scenario where BGP-PIC is not used? Would it take longer
until the BGP routes can use the new path, and why?

6.2.2
"the edge node attached to the failed
link performs next-hop self" - What does "perform next-hop self" mean? Is there
a word missing here, e.g., "lookup"?

"The main observation is that the loss of convergence speed due to
the loss of hierarchy depth"
Does convergence depend of the exchange of BGP messages between BGP peers, or
is the concept of convergence defined differently here? It seems like here
convergence means something related to how information is stored/updated
locally on the router, which is not what I would think about when I read "BGP
convergence". (Related to the comment at the beginning of the introduction:
What is your problem statement, i.e., what is the type of convergence you are
talking about and that your solution speeds up?))

8. Security Considerations

Are you sure that there are no security considerations?
For example, if there is a bug in the implementation of this technique, could
this make BGP prefix hijacking easier given a specific use of BGP labels?

Nits/editorial comments:

Abstract:

"In the network comprising thousands of iBGP peers" -> "In a network comprising
thousands of iBGP peers"

Please expand BGP-PIC on first use.

1.1 Terminology

"A prefix P/m (of any AFI/SAFI) that is learnt via
an Interior Gateway Protocol, such as OSPF and ISIS, has a path
for." - Is this sentence missing a subject for the "has a path for"? If this is
"A prefix that an IGP has a path for", then the "is learnt via" does not fit in
the sentence.

"one or more prefix" -> "one or more prefixes"

"a IP prefix" -> "an IP prefix"

There's a stray ") in the "Pathlist" item.

"may not necessarily has" -> "may not necessarily have"

"the forwarding engine must visits" -> "the forwarding engine must visit"

Please make all your terminology items consistent, i.e., sentences ending with
a full stop or not.

"A pathlist may contain a mix of primary and backup paths" - why is this its
own item? Isn't it about the previous item, "Pathlist", and should be part of
the same bullet point item?

2.2.1 Hierarchical Hardware FIB

"the number of memory lookup's" -> "the number of memory lookups"

5.1. Flattening the Forwarding Chain

Please unify how you write your terms, e.g., "OutLabel-list" vs.
"outlabel-list" (Section 5.1)

Please unify whether you capitalize all words in your headings or just some.
Last Call Review of draft-ietf-rtgwg-bgp-pic-12 review-ietf-rtgwg-bgp-pic-12-genart-lc-enghardt-2021-01-10-00

Last Call Review of draft-ietf-rtgwg-bgp-pic-12
review-ietf-rtgwg-bgp-pic-12-genart-lc-enghardt-2021-01-10-00