Skip to main content

Early Review of draft-ietf-lsr-isis-fast-flooding-06
review-ietf-lsr-isis-fast-flooding-06-tsvart-early-kuehlewind-2024-02-02-00

Request Review of draft-ietf-lsr-isis-fast-flooding
Requested revision No specific revision (document currently at 09)
Type Early Review
Team Transport Area Review Team (tsvart)
Deadline 2024-02-09
Requested 2024-01-25
Requested by John Scudder
Authors Bruno Decraene , Les Ginsberg , Tony Li , Guillaume Solignac , Marek Karasek , Gunter Van de Velde , Tony Przygienda
I-D last updated 2024-02-02
Completed reviews Tsvart Last Call review of -07 by Mirja Kühlewind (diff)
Secdir Last Call review of -07 by Barry Leiba (diff)
Genart Last Call review of -07 by Ines Robles (diff)
Rtgdir Last Call review of -05 by Loa Andersson (diff)
Tsvart Early review of -06 by Mirja Kühlewind (diff)
Comments
Section 6, "Congestion and Flow Control", is likely to be of particular interest. Document is currently in AD Review and I anticipate sending it for IETF LC in the near future.
Assignment Reviewer Mirja Kühlewind
State Completed
Request Early review on draft-ietf-lsr-isis-fast-flooding by Transport Area Review Team Assigned
Posted at https://mailarchive.ietf.org/arch/msg/tsv-art/fpIRoIHQXuHpthkMUjhlygZS5Yw
Reviewed revision 06 (document currently at 09)
Result Not ready
Completed 2024-02-02
review-ietf-lsr-isis-fast-flooding-06-tsvart-early-kuehlewind-2024-02-02-00
First of all I have a clarification question: The use the of flags TLV with the
O flag is not clear to me. Is that also meant as a configuration parameter or
is that supposed to be a subTLV that has to be sent together with the PSNP? If
it is a configuration, doesn’t the receiver need to confirm that the
configuration is used and how does that work in the LAN scenario where multiple
configurations are used? If it has to be sent together with the PSNP, this
needs to be clarified and it seem a bit strange to me that it is part of the
same TLV. Or maybe I’m missing something completely about the flag?

Then, generally thank you for considering overload and congestion carefully.
Please see my many comments below, however, I think one important part is to
ensure that the network/link doesn’t get normally overloaded with the parameter
selected. You give some recommendation about parameters to use but not for all
and, more importantly, it would be good to define boundaries for safe use.
What’s a “safe” min or max value? I know this question is often not easy to
answer, however, if you as the expert don’t give the right recommendations, how
should a regular implementer make the right choice?

Please see further comments below.

Section 4.7:
“NOTE: The focus of work used to develop the example algorithms discussed later
in this document focused on operation over point-to-point interfaces. A full
discussion of how best to do faster flooding on a LAN interface is therefore
out of scope for this document.”

Actually this is quite important and also not clear to me. You do discuss how
to interpret parameters in a LAN scenario but then you say you only give proper
guidance how to adjust the sending rate for non-LAN. But what’s the right thing
to do in LAN then? Why is LAN out of scope? If you don’t give guidance, I think
you have to also say that this mechanism that enables using higher values in
this document MUST NOT be used on LAN setups.

Section 5.1:
“The receiver SHOULD reduce its partialSNPInterval. The choice of this lower
value is a local choice. It may depend on the available processing power of the
node, the number of adjacencies, and the requirement to synchronize the LSDB
more quickly. 200 ms seems to be a reasonable value.”

Giving some recommended value is fine, however, it would be more important to
ensure safe operation to give a range or at least a minimum value.

Also on use of normative language. Just saying “The receiver SHOULD reduce its
partialSNPInterval.” Is a bit meaningless without saying when and to with
value/by how much. I guess you should say something like “partialSNPInterval
SHOULD be set to 200ms and MUST NOT be lower than X.”

“The LPP SHOULD also be less than or equal to 90 as this is the maximum number
of LSPs that can be acknowledged in a PSNP at common MTU sizes, hence waiting
longer would not reduce the number of PSNPs sent but would delay the
acknowledgements. Based on experimental evidence, 15 unacknowledged LSPs is a
good value assuming that the Receive Window is at least 30 and that both the
transmitter and receiver have reasonably fast CPUs.”

Why is the first SHOULD a SHOULD and not a MUST? What is a reasonable fast CDU?
Why would the receive window be 30? Is that also the value that you would
recommend? So you maybe more generally aim to recommend to set the LPP to half
the Receive Window (or does it have to be those specific values)?

Section 5.2:

“Therefore implementations SHOULD prioritize the receipt of Hellos and then
SNPs over LSPs. Implementations MAY also prioritize IS-IS packets over other
less critical protocols.”

What do mean by prioritise exactly? I find the second sentence here meaningless
when you only say “less critical protocols”. What do you mean by this? How
should I as an implementer decide which protocols are more or less critical?

Section 6.1:
“Congestion control creates multiple interacting control loops between multiple
transmitters and multiple receivers to prevent the transmitters from
overwhelming the overall network.”

This is an editorial comment: I think I know what you mean but the sentence is
not clear as there is always only one congestion loop between one transmitter
and one receiver.

Section 6.2.1:
“If no value is advertised, the transmitter should initialize rwin with its own
local value.”

I think you need to give more guidance anyway but a good buffer size might be.
However, if you don’t know the other ends capability, I’m not sure if you own
value is a good idea or if it would be better to be rather conservative and
select a low value that still provides reasonable performance.

Section 6.2.1.1:
“The LSP transmitter MUST NOT exceed these parameters. After having sent a full
burst of un-acknowledged LSPs, it MUST send the following LSPs with an LSP
Transmission Interval between LSP transmissions. For CPU scheduling reasons,
this rate may be averaged over a small period, e.g., 10-30ms.”

I not sure I fully understand what you mean by “averaged over a small period”?
What exactly?

Section 6.2.1.2:
“f no PSNPs have been generated on the LAN for a suitable period of time, then
an LSP transmitter can safely set the number of un-acknowledged LSPs to zero.
Since this suitable period of time is much higher than the fast acknowledgment
of LSPs defined in Section 5.1, the sustainable transmission rate of LSPs will
be much slower on a LAN interface than on a point-to-point interface.”

What a suitable period of time? Can you be more concrete?

Section 6.2.2.1

- As a side note, I don’t think figure 1 is useful at all…
- cwin = LPP + 1: Why is LPP selected as the start/minimum value? Earlier on
you say that LPP must be equal or less than 90 and recommend a value of 15.
These values seem already large.

Section 6.2.2.2:
“This value should include a margin of error to avoid false positives (e.g.,
estimated MAT measure variance) which would have a significant impact on
performance.”

First, you call the first congestion signal “Timer” however I think it should
be call “Loss” where the loss detection algorithm you are proposing is based on
a timer. In TCP the retransmission (and therefore loss detection) timer is
initially set to 3xRTT and then adapted with more measurements (see RFC6298).
The text above, however, seems really too vague to implement that right. I
guess you can take the simple approach and just set it to 3xRTT. However, given
the delays on a point-to-point link are really small, I’m not sure a timer
based on RTT is useful at all. Is the system even able to maintain a timer with
that granularity? My understanding, however, is that LSP has a way to detect
loss already in order to retransmit, therefore it would make more sense simply
always reset the cwin when you retransmit a PDU. Or how does the LSP decide to
send a retransmission?

“Reordering: a sender can record its sending order and check that
acknowledgements arrive in the same order as LSPs. This makes an additional
assumption and should ideally be backed up by a confirmation by the receiver
that this assumption stands.”

Regarding re-ordering as an input: if a packet if “just” re-ordered but still
received, it should not be talken as a congestion signal. However, can that
even happen on a point-to-point link? If you mean the packet was never received
and there is a gap in the packet number, that’s again called loss and not
reordering, but simply using a packet number based detection mechanism instead
of a timer. However, based on the description above it is not fully clear to me
how you think this would work and what you mean by “additional assumption”…? I
think you further need to clarify this.

Sec 6.2.2.3:

You call this refinement “Fast recovery” which I guess is inspired by TCP.
However, it’s a bit confusing because in TCP i’st a different algorithm. In
TCP’s fast recovery, you do not reduce your congestion window to the initial
value but only halve it (or decrease by a different factor depending on the
congestion control algorithm). This is done if only some packets are lost while
most packets still arrive. In TCP, resetting to the initial value only happens
if a timeout occurs, meaning no data/acks have arrives for some time.

Sec 6.2.2.4:
“The rates of increase were inspired from TCP [RFC5681], but it is possible
that a different rate of increase for cwin in the congestion avoidance phase
actually yields better results due to the low RTT values in most IS-IS
deployments.”

I don’t think this is really a refinement but rather some consideration.
However, saying this without any further guidance, doesn’t seem really help or
even harmful.

However, more generally, all in all I’m also not sure a fine-grained congestion
control is really needed on a point-to-point link as there is only one receiver
that could be overload and in that case you should rather adapt your flow
control. I think what you want is to set the flow control parameter in the
first place in a reasonable range. If then there is actual congestion for some
reason, meaning you detect a constant or increasing loss rate, maybe you want
to implement some kind of circuit breaker by stop sending or fall-back to a
minimum rate for a while (e.g. see RFC8085 section 3.1.10.). However why would
that even happen on a point-point-link?

Sec 6.2.3
“Senders SHOULD limit bursts to the initial congestion window.“
I don’t think this is a requirement because based on your specified algorithm
this happens automatically. The initial window is LPP which is also the max
number of PDUs that can be asked in on PSNP thus this is also the maximum
number of packets you can sent out in receipt go the PSNP (given you don’t let
the cwin growth beyond what’s ready to send). However, you have this new
parameter for the LSP burst window. That’s what should limit the burst size
(and be rather be smaller than LPP/the initial window). Also pacing is used
over Internet link to avoid overloading small buffer on the path, as the buffer
size of the network element is unknown. This is not the case in your
point-to-point scenario. If you know all buffer sizes, it probably sufficient
to limit the burst size accordingly.

Sec 6.2.4
“If the values are too low then the transmitter will not use the full bandwidth
or available CPU resources.” As a side note, I hope that fully utilising the
link or CPU just with LSP traffic is not the goal here. Which is another reason
my you might not need a fine-grained congestion control: congestion has two
goals, avoid congestion but also fully utilise the link -> I think you only
need the former.

Sec 6.3
This section is entirely not clear to me. There is no algorithm described and I
would not know how to implement this. Also because you interchangeably use the
terms congestion control and flow control. Further you say that no input signal
from the receiver is needed, however, if you want to send with the rate of
acknowledgement received that is an input from the receiver. However, the
window based TCP-like algorithms does actually implicitly exactly that: it only
send new data if an acknowledgement is received. It further also takes the
number of PDUs that are acknowledged into account because that can be
configured. If you don’t do that, you sending rate will get lower and lower.

Some small nits:
- Sec 4: advertise flooding-related parameters parameters -> advertise
flooding-related parameters - Sec 5.1: PSNP PDUs -> PSNPs or PSN PDUs - Sec
5.2: Sequence Number Packets (SNPs) -> probably: Sequence Number PDUs (SNPs)? -
Sec 6.2.1.1.: leave space for CSNP and PSNP (SNP) PDUs -> leave space for CSNPs
and PSNPs  ?