Skip to main content

Last Call Review of draft-ietf-lsvr-bgp-spf-31
review-ietf-lsvr-bgp-spf-31-rtgdir-lc-farrel-2024-07-05-00

Request Review of draft-ietf-lsvr-bgp-spf-31
Requested revision 31 (document currently at 51)
Type IETF Last Call Review
Team Routing Area Directorate (rtgdir)
Deadline 2024-07-08
Requested 2024-06-26
Requested by Ketan Talaulikar
Authors Keyur Patel , Acee Lindem , Shawn Zandi , Wim Henderickx
I-D last updated 2025-01-29 (Latest revision 2025-01-23)
Completed reviews Opsdir Early review of -01 by Fred Baker (diff)
Rtgdir Early review of -02 by Dan Frost (diff)
Rtgdir IETF Last Call review of -13 by Yingzhen Qu (diff)
Rtgdir IETF Last Call review of -31 by Adrian Farrel (diff)
Genart IETF Last Call review of -39 by Joel M. Halpern (diff)
Secdir IETF Last Call review of -39 by David Mandelberg (diff)
Rtgdir IETF Last Call review of -39 by Alvaro Retana (diff)
Comments
This document has been through previous RTGDIR reviews for earlier versions. This request is for the latest WGLC issued for this document.
Assignment Reviewer Adrian Farrel
State Completed
Request IETF Last Call review on draft-ietf-lsvr-bgp-spf by Routing Area Directorate Assigned
Posted at https://mailarchive.ietf.org/arch/msg/rtg-dir/of_zsmbYHv2SbqpSHWj6ZSMP9ds
Reviewed revision 31 (document currently at 51)
Result Not ready
Completed 2024-07-05
review-ietf-lsvr-bgp-spf-31-rtgdir-lc-farrel-2024-07-05-00
Hello

I have been selected to do a routing directorate "early" review of this draft.
https://datatracker.ietf.org/doc/draft-ietf-lsvr-bgp-spf/

The routing directorate will, on request from the working group chair, perform
an "early" review of a draft before it is submitted for publication to the
IESG. The early review can be performed at any time during the draft’s lifetime
as a working group document. The purpose of the early review depends on the
stage that the document has reached.

This document appears to have completed WG last call, been passed to the AD for
publication, received several Directorate reviews (at around revision -28), and
had a review by the AD. It seems to have been returned to the WG for further
work. Thus, this review comes before the document is returned to the AD for
publication. In that respect, the review is for consideration by the authors
and working group rather than for attention of the AD.

For more information about the Routing Directorate, please see
https://wiki.ietf.org/en/group/rtg/RtgDir

Document: draft-ietf-lsvr-bgp-spf-31.txt
Reviewer: Adrian Farrel
Review Date: date
Intended Status: Standards Track

Summary:

I have concerns about this document. I think it needs more work before being
submitted to the IESG.

Comments:

This draft is well written from a technical perspective. I believe it
could be implemented with a high probability of interoperability modulo
the issues noted below.

As noted in the comments below, some attention to the explanation in the
Introduction and the overview of the function of the protocol might be
helpful, and a little work on the English usage could be beneficial.

The document describes the use of BGP as an interior gateway protocol
for SPF calculations in a network environment that is assumed to be
highly stable and which might have a large degree of ECMP. The approach
builds on th e information elements developed for BGP-LS, but is not a
re-use of BGP-LS as a new SAFI is used.

I have a number of comments which I believe should be discussed before
this document progresses to publication.

= Major

While BGP-LS was originally invented as an East-West protocol, that idea
was discarded because of severe scaling and stability concerns and RFC
7752 was shaped as a Northbound protocol allowing a BGP speaker in the
network to report to a controller, management station of route reflector
(and subject to filters, aggregation, and policy) the full link-state
database of the network.

This work obviously decides to change that approach and to use BGP-LS
with additional information to distribute the full set of routing
information between all nodes in the network.

I think that this document needs to more clearly set out this change in
philosophy, and (more importantly) discuss the assumptions of network
scale and stability that are necessary for this protocol to be used.
While a data centre environment may be adequately stable, this
definition of a protocol usage would be very unwise in a WAN. All you
have in this respect is a statement at the top of section 4 that
implies that there are choices based on the network scale, and an
observation in the Introduction that the solution would be beneficial
if the topology is very stable.

The minimum needed for this is an upfront statement along the lines of
   The protocol described in this document is intended for deployments
   where the network topology is very stable so that topology update
   advertisements will be rare. It's use in other networks, and in
   particular in the Internet, is NOT RECOMMENDED because the protocol
   might not be stable or scale well under high load caused by rapid or
   frequent changes in the topology.
(Your choice of words may vary.)

It might even help to not call this BGP-LS-SPF but BGP-DC-SPF or
something similar. Since it is a new SAFI, there is no need to persist
with the BGP-LS name and that might help make it clear what the intended
or presumed deployment scenarios are.

Of course, the wrinkle comes in 5.2.2.1 where you want to define a new
TLV that only has use in BGP SPF, but you want to assign it from the
BGP-LS registry. Have you thought of looking for a way to keep the SPF
information separate? Or would you expect a regular BGP-LS speaker to
also share this new TLV if it knows the information?

= Medium

I find the reference to draft-psarkar-lsvr-bgp-spf-impl unsatisfactory.
Not only did that draft expire a good while ago, but this draft has been
updated six times since that implementation statement was posted. Have
there been no changes to the technical content in all those revisions?

The implementation status covers a bit more than 1.5 implementations.
It's good information, but is it enough for changes to BGP?

---

The text at the top of section 4 says that there are deployment choices
based on the network scale, but gives no advice (or pointers to advice)
on how to select between deployment options, how to ensure that the same
option is used by all nodes in the network, and what to do if two peers
discover that they are using different options.

---

In the RR mode of section 4.3, it is not clear what information the RR
readvertises and to which nodes.

---

8.2 names a registry that does not exist (with the given name).
Please don't leave IANA guessing.

---

5.2.1.1, 5.2.2.2, and 5.2.3.1 describe "SPF Status" fields of
the various TLVs.

It might help if the fields had different names to help not confuse
anything (e.g., Node SPF Status).

Adding in the Address Family field from 5.2.2.1, it is unclear whether
you intend the "undefined" settings to be available for use in future
specifications.
  - If you do, then you probably need to create registries, and while
    you describe how to handle the receipt of undefined statuses, you
    are missing a description of how to handle undefined address
    families. (Note, however, that in describing the handling of
    undefined status you say they SHOULD be readvertised. This means
    that there can be holes and discrepencies in the databases that
    different nodes see if new bits are defined in the future. So you
    probably need to make this MUST.)
  - If you don't then it is interesting to ask why you need to specify
    a difference between "reserved" and "undefined", and why you need
    more than a single bit flag (in fact, in the case of 5.2.2.2 and
    5.2.3.1 it would appear that the very presence of the TLV is enough
    and no status field is needed).

---

5.2.2.1 has

   the Address Family (AF) Link
   Descriptor SHOULD be included with the Link Local/Remote Identifiers
   TLV so that the link can be used in the respective address family
   SPF.  If the Address Family Link Descriptor is not present for an
   unnumbered link, the link will not be used in the SPF computation for
   either address family.

The use of SHOULD means that an implementation MAY leave it out. Now, in
the case of an unnumbered link, it is clear that that is fine, but in
the case of a numbered link, leaving the link descriptor out would seem
to make the link advertisement a bit pointless. So why might an
implementation make that choice?

---

In 5.2.2.2
   If a BGP SPF speaker received the Link NLRI but the SPF Status TLV is
   not received, then any previously received information is considered
   as implicitly withdrawn and the update is propagated to other BGP SPF
   speakers.

"is not received" in what time scale?

Ditto 5.2.3.1

---

9.

Given that this protocol is targetted at stable environments, one might
consider that making a part of the network fragile could destabilise the
whole. So there is a physical attack of causing a link to flap that
could result in many updates being flooded through the network. Is that
a realistic attack? How can it be mitigated?

= Minor

There is a "MAY" in 4.1 that is a bit odd. "MAY be expected..." Does
that mean that an implementation is allowed to expect, but that is
exceptional behaviour? Or does it mean that the condition might happen?
Or what? And what is the alternative to the "MAY"?

See also 4.2

---

4.2 has

   Hence, this peering model is
   RECOMMENDED over the single-hop peering model Section 4.1.

"RECOMMENDED" has the same weight as "SHOULD". So you need to say why a
deployment would ever consider option 1.

---

Since the value of 266 is only suggested to IANA for the BGP-LS Address
Family Link Descriptor TLV in section 8.2, section 5.2.2.1 should really
use a "TBD" so that the RFC editor can spot and fix any issues that
might arise if IANA assigns a different value.

---

Why are the new registries in 8.4, 8.5, and 8.6 created as IETF Review?
If you follow the thrust of BGP-LS, then it would be reasonable to make
them Expert Review (and to write some advice for the Designated
Experts). I *guess* that since you ask for this to be in a new registry
group, you are free to set your own rules, but I do wonder.

(By the way, since you call the new registry group "BGP SPF" this seems
to agree with my major point at the top of this review: this document
defines BGP-SPF not BGP-LS-SPF.

---

6.1

       NLRI originated by directly connected BGP SPF peers are
       preferred.

No objection to you setting this rule, but have you considered that NLRI
coming from remote peers are likely to convey updates relating to remote
resources faster than those that have been forwarded hop-by-hop with
each hop performing processing?

---

7.1 has

   When a BGP SPF speaker receives a BGP Update containing a malformed
   Node NLRI SPF Status TLV in the BGP-LS Attribute
   [I-D.ietf-idr-rfc7752bis], the corresponding Node NLRI is considered
   as malformed and MUST be handled as 'Treat-as-withdraw'.  An
   implementation SHOULD log an error (subject to rate-limiting) for
   further analysis.

But 5.2.4 has

   If the Sequence-Number TLV is not received, then the corresponding
   NLRI is considered as malformed and MUST be handled as 'Treat-as-
   withdraw'.  An implementation MAY log an error for further analysis.

SHOULD or MAY?

---

7.1

   Node attribute TLVs and their
   error handling rules are either defined in [I-D.ietf-idr-rfc7752bis]
   or derived from [RFC5305] and [RFC6119].

I suspect that this means that some TLVs are in one source, and some are
in another, not that there is a choic!

---

7.2 tells us what checks to perform, but not what to do if a check
fails. Compare with 7.3.

---

7.4

While it will be clear to one skilled in the art what it means to reset
a BGP session, a pointer to the procedure would be valuable.

---

Section 9 appears to use lower case rather than upper case BCP 14 terms.
Possibly worth resolving.

---

Thanks for including section 10, it's important.

10.1 says that being in a single administrative domain "allows for
consistent configuration". This is true, but I don't think it in any way
guarantees it. What measures can be taken to ensure consistency?

---

10.2 makes a RECOMMENDATION. Looks like a wise one. Why would a
deployment deviate from this, and what are the consequences?

---

In 10.2 you say...

   For
   non-loopback prefixes, the setting of the metric is a local matter
   and beyond the scope of this document.

...and that seems good to me. Why do you then go on to discuss a
possible setting of the metric? i suggest deleting the paragraph...

   Algorithms such as setting the metric inversely to the link speed as
   supported in some IGP implementations MAY be supported.  However, the
   details of how the metric is computed are beyond the scope of this
   document.

---

10.2 has a paragraph with three cases of "SHOULD"

   Within a BGP SPF Routing Domain, the IGP metrics for all advertised
   links SHOULD be configured or defaulted consistently.  For example,
   if a default metric is used for one router's links, then a similar
   metric should be used for all router's links.  Similarly, if the link
   metric is derived from using the inverse of the link bandwidth on one
   router, then this SHOULD be done for all routers and the same
   reference bandwidth SHOULD be used to derive the inversely
   proportional metric.  Failure to do so will result in incorrect
   routing based on link metric.

There is some explanation of the consequences of not following these
SHOULDs (although the cascaded SHOULDs make it hard to determine where
the choice goes wrong). Since you are allowing variation from this
advice, you need to explain the alternative "MAY" : what can be done and
why would it be done?

---

10.3 has a reasonable RECOMMENDED. But, again, you need to talk about
the reasons to vary this and the consequences.

---

10.4 has two SHOULDs. This allows variation that you need to explain.

= Nits

The English usage in the document could really do with some work. I know
that the RFC Editor is paid to fix this sort of thing, but there is
really quite a lot of work needed (more than I felt like dealing with in
this review). Perhaps you can get someone from the working group to take
a pass on it.

---

draft-ietf-idr-rfc7752bis is now RFC 9552