Skip to main content

Last Call Review of draft-ietf-grow-ix-bgp-route-server-operations-03
review-ietf-grow-ix-bgp-route-server-operations-03-rtgdir-lc-scudder-2014-10-01-00

Request Review of draft-ietf-grow-ix-bgp-route-server-operations
Requested revision No specific revision (document currently at 05)
Type Last Call Review
Team Routing Area Directorate (rtgdir)
Deadline 2014-09-22
Requested 2014-09-15
Authors Nick Hilliard , Elisa Jasinska , Robert Raszuk , Niels Bakker
Draft last updated 2014-10-01
Completed reviews Genart Last Call review of -03 by Dan Romascanu (diff)
Genart Telechat review of -03 by Dan Romascanu (diff)
Secdir Last Call review of -03 by Catherine Meadows (diff)
Opsdir Last Call review of -03 by Niclas Comstedt (diff)
Rtgdir Last Call review of -03 by John Scudder (diff)
Assignment Reviewer John Scudder
State Completed
Review review-ietf-grow-ix-bgp-route-server-operations-03-rtgdir-lc-scudder-2014-10-01
Reviewed revision 03 (document currently at 05)
Result Has Issues
Completed 2014-10-01
review-ietf-grow-ix-bgp-route-server-operations-03-rtgdir-lc-scudder-2014-10-01-00
Hello,

I have been selected as the Routing Directorate reviewer for this draft. The
Routing Directorate seeks to review all routing or routing-related drafts as
they pass through IETF last call and IESG review, and sometimes on special
request. The purpose of the review is to provide assistance to the Routing ADs.
For more information about the Routing Directorate, please see ​

http://trac.tools.ietf.org/area/rtg/trac/wiki/RtgDir

Although these comments are primarily for the use of the Routing ADs, it would
be helpful if you could consider them along with any other IETF Last Call
comments that you receive, and strive to resolve them through discussion or by
updating the draft.

Thanks,

--John

Document: draft-ietf-grow-ix-bgp-route-server-operations-03.txt
Reviewer: John Scudder
Review Date: 2014-09-18
IETF LC End Date: 2014-09-22
Intended Status: Informational

Summary:

        • I have some minor concerns about this document that I think should be
        resolved before publication.

Comments:

This is overall a good document and worth publishing, although I have found a
number of minor issues I would like the authors to address before the document
progresses. I initially flagged the first two issues as "major" but on
consideration I've moved them to the "minor" list. With the noted exceptions, I
think the document is very good in terms of its readability and fitness for
publication without major editing.

Major Issues:

- None identified.

Minor Issues:

- Throughout the document, various terms are used to describe what RFC 4271
calls a "route". The definition given in RFC 4271 is:

   Route
      A unit of information that pairs a set of destinations with the
      attributes of a path to those destinations.  The set of
      destinations are systems whose IP addresses are contained in one
      IP address prefix carried in the Network Layer Reachability
      Information (NLRI) field of an UPDATE message.  The path is the
      information reported in the path attributes field of the same
      UPDATE message.

That is, one NLRI plus its path attributes, as carried in an UPDATE, is a
"route". I would suggest adopting this term, or "BGP route" if you prefer,
instead of terms such as "NLRI UPDATE message", "NLRI message", "prefix UPDATE
message", and even just plain "NLRI" and "message". Also some, but not all, of
the uses of "prefix". I think doing so will make the document clearer, more
readable, and more technically accurate. A simple search for the terms I've
called out should show most of them so I won't enumerate them here unless you
ask me to (feel free, if you want).

- Reference [RS-ARCH] is a dead link. I found a live copy at

http://www.cs.usc.edu/assets/003/83191.pdf

. It might be worth checking with the authors of RS-ARCH to ask what a good
archival reference is.

- S. 4.2 talks about scaling. I'm trying to make sense of the analysis:

   Regardless of any Loc-RIB optimization technique is implemented, the
   route server's control plane bandwidth requirements will scale
   according to O(P * N), where P is the total number of unique paths
   received by the route server and N is the total number of route
   server clients.

So far so good. (Except nit: there seems to be a word missing, such as
"whether" as in "Regardless of whether any Loc-RIB...")

   In the case where P_avg (the arithmetic mean number
   of unique paths received per route server client) remains roughly
   constant even as the number of connected clients increases, this
   relationship can be rewritten as O((P_avg * N) * N) or O(N^2).

I don't see where the second factor of N comes from. You're basically expanding
the P in the first expression as P_avg * N -- but why? I think this would only
apply if add-path all-paths was chosen as the path hiding mitigation strategy
-- but this is not touched on in route-server-operations, only in
ix-bgp-route-server, and besides that the beginning of the paragraph implies
you're analyzing the multiple Loc-RIB strategy, so I don't guess all-path is
what you were thinking of. If you're not doing all-path, the O(N^2) analysis is
wrong AFAICT. To see this, consider that the inbound routes require O(P_avg *
N) which is just O(N), but the number of routes you're going to advertise is
bounded by the size of the Internet routing table, which is a constant for
purposes of this analysis, so also O(N). In and out are summed, not multiplied,
so the whole thing works out to be O(N), not O(N^2).

So I think this needs to either be corrected, or the assumptions need to be
better explained. Moving on:

   This
   quadratic upper bound on the network traffic requirements indicates
   that the route server model will not scale to arbitrarily large
   sizes.

If you continue to think this sentence is warranted, I think it should be
better quantified. Of course nothing can scale to *arbitrarily* large sizes,
but that still leaves a lot to the imagination. I would think it would be
beneficial for an IX operator reading this document to be able to have some
idea of how practical the limitation is. Since the analysis in question is
looking at control traffic bandwidth consumption, it wouldn't be too onerous to
throw some simple assumptions up against it -- for example, "if we suppose a RS
receives on average 100,000 routes from each client with a rate of change of 10
routes/second, sends on average 1,000,000 routes to each client with a rate of
change of 100 routes/second, and that each route consumes on average 50 bytes
in a BGP UPDATE message, simple arithmetic shows that a GigE connection to that
RS will be fully saturated by the time the number of clients reaches 25,000."
(Which does not seem like a very practical limitation, the RS will hit a CPU or
memory bottleneck first.)

Anyway, maybe you will decide on reconsideration of the big-O analysis that
this bit is not needed at all, which would be OK with me.

- S 4.2.2.1,

   If the route server
   operator has prior knowledge of interconnection relationships between
   route server clients, then the operator may configure separate Loc-
   RIBs only for route server clients with unique outbound routing
   policies.

It wasn't obvious to me what "outbound" applies to -- the client? The RS? --
and for that matter why an inbound policy (on the RS) might not apply. Possibly
this could be remedied by simply dropping the adjective "outbound".

- S. 4.2.1.2,

   destination splitting would require significant co-ordination
   between the route server operator and each route server client

It's not clear to me why it would "require significant co-ordination",
depending on what resource you're trying to conserve. Two examples of how you
could avoid coordination while still getting benefit: You could have clients
send all their routes to all the RSes, but have RSes filter out the prefixes
they don't care about. This gives the RS most of the CPU benefit it would have
gotten had the client done the filtering (prefix filtering is cheap), almost
all the memory benefit (the filtered routes need not be retained in the
Adj-RIB-In), and around half the control traffic bandwidth benefit. The client
incurs cost to send duplicate routes that are going to be discarded by the RS,
but the client is presumably not the bottleneck resource. Or better still, the
RS could use ORF towards the clients to control what routes the clients will
send.

- S. 4.6.1,

OLD:
   Prefixes sent to the route server are tagged with specific [RFC1997]
   or [RFC4360] BGP community attributes

I don't think the naked references scan well as adjectives in this context. I
suggest

NEW:
   Prefixes sent to the route server are tagged with specific standard [RFC1997]
   or extended [RFC4360] BGP community attributes

- Also in S. 4.6.1,

OLD:
   As both standard and extended BGP communities values are restricted
   to 6 octets

Actually standard communities are restricted to less than that. Perhaps reword
as

NEW:
   As both standard and extended BGP communities values are restricted
   to 6 octets or fewer

- Also in S. 4.6.1,

   route server operator should take care to ensure
   that the predefined BGP community values mechanism used on their
   route server is compatible with [RFC4893] 4-octet autonomous system
   numbers.

I suspect an RS operator reading this might be left scratching his or her head
and asking "what does it mean for me to be compatible with RFC4893 in this
context"? It would be kind to offer them some guidance, since after all this is
a guidance document.

- S. 4.7: Where you say "non-commutative" I think you mean "non-transitive".

- S. 4.7:

   Problems of this form can be dealt with using [RFC5881] bidirectional
   forwarding detection.

It's not clear to me how certain non-transitive forwarding failures can be
dealt with using BFD. To take an example, suppose clients A, B and C peer with
RS. The IX fabric has a failure such that A and B can both reach RS, but not
each other. C has connectivity to everyone. Prefix X is advertised to RS by
both B and C. For whatever reason, RS selects X via B to advertise to A. Even
if A runs BFD towards B, at best A can determine that the route from RS can't
be used. A isn't able to fail over to C's route as it would in the full-mesh
case, since it's not aware of it. Depending on A's other connectivity, this may
result in sub-optimal routing towards X, or complete loss of connectivity to X.

It's beyond the scope of the draft to solve this problem, but the text could be
made more accurate. A minimal fix would be

   Problems of this form can be partially mitigated using [RFC5881]
   bidirectional forwarding detection.

although you might want to go on a bit longer to explain what problems can't be
mitigated.

- S. 4.8:

   This problem is not specific to route servers and it can also be
   implemented using bilateral peering sessions.  However, the potential
   damage is amplified by route servers because a single BGP session can
   be used to affect many networks simultaneously.

This is true, but there is a more severe way RSes aggravate the problem: In a
full mesh, a router can (and usually does) directly enforce a "no third-party
next hops" policy against its peers. An RS peer by definition cannot enforce
this policy against the RS, so the RS is the only place it can be enforced.

- S. 4.8:

   Route server operators SHOULD check that the BGP NEXT_HOP attribute
   for NLRIs received from a route server client matches the interface
   address of the client.  If the route server receives an NLRI where
   these addresses are different

so far so good (modulo my first comment about the use of "NLRI", of course),
but:

   and where the announcing route server
   client is in a different autonomous system to the route server client
   which uses the next hop address,

Is the RS sincerely expected to enforce the above? I suppose it could be
implemented automatically although imperfectly, by noticing that multiple
clients are in the same neighbor AS and noticing when they use each other as
third-party next hops, but AFAIK people generally don't try to figure this out,
they just do what you've said in the preceding sentence -- make sure the NH
matches the interface address. If you really do propose that the RS should
allow third-party next hops but only from clients in a common AS, I think you
should talk about it specifically and in more detail. If you didn't really mean
that, then I suggest you drop the clause.

- S. 5:

   On route server installations which do not employ path hiding
   mitigation techniques, the path hiding problem outlined in section
   Section 4.1 can be used in certain circumstances to proactively block
   third party prefix announcements from other route server clients.

I don't understand what this means. Specifically, I don't know what it means to
"proactively block third party prefix announcements" or for that matter, even
what you mean by "third party prefix announcements" in this context. (As a term
of art, I normally understand "third party announcement" in a BGP context to
mean announcing a third-party next hop as you discuss in S. 4.8). I also don't
know what the "certain circumstances" are, quite likely these should be given
at least a little color if not entirely spelled out.

Also, a nit -- the xref expansion has put "section section" into your text.

- S. 7:

   BIRD, OpenBGPD and Quagga, whose open source BGP implementations
   include route server capabilities

Great, cool, but:

   which are compliant with this
   document.

I'm not sure what it actually means to be "compliant" with a document that
"describes operational considerations". Perhaps just drop the phrase?

Nits:

- In S. 2,
OLD:
        BGP sessions between each participant router
NEW:
        BGP sessions between each pair of participant routers

- In S. 4.2.1.1,

OLD:
   In
   this situation, the multiple Loc-RIB views required by each client
   are merged into a single view.

As written, this implies that each client requires multiple Loc-RIB views,
which I don't think is what was intended. I suggest:

NEW:
   In
   this situation, multiple Loc-RIB views
   are merged into a single view.

- I personally am strongly put off by the neologism "granular" to mean
"fine-grained" and suggest the latter instead. I realize it's not an unusual
usage so by all means disregard if you feel strongly about it.

- S. 4.6.2:

OLD:
   server operators to implement construct per-client routing policies.
NEW:
   server operators to construct per-client routing policies.