Skip to main content

Early Review of draft-ietf-lsvr-l3dl-03
review-ietf-lsvr-l3dl-03-tsvart-early-ott-2020-05-05-00

Request Review of draft-ietf-lsvr-l3dl
Requested revision No specific revision (document currently at 12)
Type Early Review
Team Transport Area Review Team (tsvart)
Deadline 2020-04-28
Requested 2020-02-10
Requested by Wesley Eddy
Authors Randy Bush , Rob Austein , Russ Housley , Keyur Patel
I-D last updated 2020-05-05
Completed reviews Tsvart Early review of -03 by Joerg Ott (diff)
Comments
Requested via Erik Kline.
Assignment Reviewer Joerg Ott
State Completed
Request Early review on draft-ietf-lsvr-l3dl by Transport Area Review Team Assigned
Posted at https://mailarchive.ietf.org/arch/msg/tsv-art/rLHOXu0fRbxVOIKXrBsE-SHQdmE
Reviewed revision 03 (document currently at 12)
Result Ready w/issues
Completed 2020-05-05
review-ietf-lsvr-l3dl-03-tsvart-early-ott-2020-05-05-00
The draft describes a peer/neighbour discovery mechanisms for large-scale L2/L3
topologies in data centres. The aim is provide a protocol by means of which the
involved nodes can learn about other nodes connected to their (broadcast or
point-to-point) L2 links and about their respectively support encapsulation
schemes, identifiers, L2/L3 addresses, etc. This information is then provided
to a higher layer for further processing.

The document is well written and fairly easy to follow, but could benefit from
a bit of extra context and target application domain in the introduction. E.g.,
explaining explicitly who would talk L3DL to whom.

From a transport perspective, I see three potential issues that deserve
clarification or reconsideration:

1. Section 10 spells out a default HELLO interval of 60 seconds. With a large
broadcast domain, this may create quite a bit of traffic. While this may not be
an issue in well-provisioned data center networks,  a remark about sensible
value ranges and the implications may be worthwhile. Just to provide some
guidelines to implementers (who want to offer choices) and operators (who pick
them).

2. Section 10 also suggest that in response to HELLO messages nodes will issue
OPEN PDUs to newly discovered peers. This appears to bear the clear risk of an
OPEN implosion when many system come up at the same time. Shouldn't guidance be
given to avoid repeated traffic surges and possible losses and thus unnecessary
delays? (I noted that other places foresee exponential backoff when
retransmitting OPEN and other ACKed PDUs).

3. When the protocol applies fragmentation, should there be a note on
preventing bursts?

Other notes:
Section 7 on the checksum needs more detail. It also talks about a "suggested"
algorithm but this should be clearly mandated or way to choose one by means of
configuration for a complete data centre would need to be made explicit. I also
assume that the pseudo code on p.11 would benefit from a leader '0' in
0xffffffff -> 0x0ffffffff, otherwise expansion to 64 bits might fill the high
order bits with '1's, which is clearly not intended.

Section 11, p.17, second to last para ("If a properly authenticated...").  From
the text, it is unclear what is meant by an "OPEN with the Serial Number of the
last data received".

I am curious about the error code, providing 16 bits for additional
explanation. Why not a text field? Also wondering if repeated retries (due to
failure, not lost packets) could yield fast repeated transmissions.

Section 15, should the KEEPALIVE interval have suggested (lower) bounds?
At the top of p.26, it says "One per second is the default", the previous page
at the bottom refers to the inter-KEEPALIVE interval of ten seconds. Not sure
if the two are the same, I suppose so. If they are, the numbers should match.
If they are not, we'll need some extra text to explain the difference.

Nits:
There are two spellings of "Encapsulation", capitalised and lower case. Use one
consistently. p10, first para: comprise -> comprising