Telechat Review of draft-ietf-trill-mtu-negotiation-06
review-ietf-trill-mtu-negotiation-06-tsvart-telechat-westerlund-2017-07-05-00

Request Review of draft-ietf-trill-mtu-negotiation
Requested rev. no specific revision (document currently at 08)
Type Telechat Review
Team Transport Area Review Team (tsvart)
Deadline 2017-07-04
Requested 2017-06-14
Draft last updated 2017-07-05
Completed reviews Rtgdir Early review of -02 by Julien Meuric (diff)
Secdir Last Call review of -05 by Vincent Roca (diff)
Genart Last Call review of -05 by Brian Carpenter (diff)
Genart Telechat review of -06 by Brian Carpenter (diff)
Tsvart Telechat review of -06 by Magnus Westerlund (diff)
Assignment Reviewer Magnus Westerlund
State Completed
Review review-ietf-trill-mtu-negotiation-06-tsvart-telechat-westerlund-2017-07-05
Reviewed rev. 06 (document currently at 08)
Review result Not Ready
Review completed: 2017-07-05

Review
review-ietf-trill-mtu-negotiation-06-tsvart-telechat-westerlund-2017-07-05

This TSV-ART review is influenced by that I did the review of draft-ietf-trill-over-ip. 

1. So draft-ietf-trill-over-ip-10 has MTU discovery needs for 
determining if the UDP encapsulation will work or not. It references  in 
Section 8.4 the old RFC, i.e. RFC 6325, which is updated by 
draft-ietf-trill-mtu-negotiation.

    TRILL IS-IS MTU PDUs, as specified in Section 5 of [RFC6325] and in
    [RFC7177], can be used to obtain added assurance of the MTU of a
    link.

However, this is not quite true, as if the IP path MTU is below 1470 
bytes, which is not unheard of, the algorithm in the MTU negotiation 
draft can't determine it. It will only report the IP path as having an 
MTU to small when the 1470 bytes probe fail.

So, if the trill-over-ip authors want to use this as a mechanism, then the MTU 
negotiation draft needs to be expanded to have more flexible lower 
boundaries. However, that appear to affect MTU negotiation quite 
significant as it needs to separate algorithm for finding MTU, from the 
different usage of the algorithm with different starting points. Where 
the normal will have a lower bound of 1470, and be more tightly coupled 
to Sz when finding Lz. While the Trill-over-IP has a different usage.

I think the trill WG needs to decide on how to slice this. If the MTU-negotiation 
only targets the explicit targets in the current draft and goes forward now. Or if
they want to meet trill-over-ip's goals which will require restructuring. 

2. Another issue, is that I think the algorithm is a bit short on 
transmission scheduling recommendations:

    1) If RB1 successfully receives the MTU-ack from RB2 to the probe of
       the value of link-wide Lz within k tries (where k is a
       configurable parameter whose default is 3), link MTU size is set
       to the size of link-wide Lz and stop.

If I do this test with all three packets back to back at line rate, I could
potentially get all probes lost in the same burst loss in router queue or switch fabric. 
What I think is needed here is a specification on how these probes are transmitted.
Spaced in a particular way, or at least minimal distance, and are the additional probes only 
sent after the previous has been judged to have been lost, which makes it interact with the next issue. 

3. This is also unclear on what the criteria is for determining that something is lost:

      a) If RB1 fails to receive an MTU-ack from RB2 after k tries, RB1
          sets the "failed minimum MTU test" flag for RB2 in RB1's Hello
          and stop.

I fail to see any specification for the criteria when an MTU-ack should be considered to have failed
to reach the probing entity. So this appear to require a timeout, and thus a timeout interval. Is the 
RTT known so that one can define something as lost after N*RTT? Are there possible delays in sending
the MTU-ack that are considered okay that can affect this? 

4. Section 3, the algorithm in Step 1 is unable to reach the first termination condition (3) "If lowerBound >= upperBound" in some cases.

  Step 1: RB1 tries to send an MTU-probe padded to the size x.   

   1) If RB1 fails to receive an MTU-ack from RB2 after k tries: 

         upperBound is set to x and x is set to [(lowerBound +
         upperBound)/2], rounded up to the nearest integer.

   2) If RB1 receives an MTU-ack to a probe of size x from RB2: 

         link MTU size is set to x, lowerBound is set to x and x is set
         to [(lowerBound + upperBound)/2], rounded up to the nearest
         integer.

   3) If lowerBound >= upperBound or Step 1 has been repeated n times
      (where n is a configurable parameter whose default value is 5),
      stop.

   4) Repeat Step 1.

I run this on the input data: Lower bound = 1470, Upper bound = 9216 and with an MTU of 7935 and gets the following sequence:

Lower	Upper	X
1470	9216	5343
5343	9216	7280
7280	9216	8248
7280	8248	7764
7764	8248	8006
7764	8006	7885
7885	8006	7946
7885	7946	7916
7916	7946	7931
7931	7946	7939
7931	7939	7935
7935	7939	7937
7935	7937	7936
7935	7936	7936
7935	7936	7936

Thus, the termination condition needs to change. The second I notice is that having a limitation on number of steps as 5, results in quite a large gap between upper and lower bound in which the MTU exists in. 

5. I frankly gets confused by the application of the binary search. First it will in many case not be run to termination where the actual MTU is determined. Then the result of the upper and lower bound are just used to confirm the Sz value. There are no discussion about using the MTU search to determine a new possible value for Sz. The text is not even explicit that lower bound is the highest known to work Transmission unit size at the time of testing. I think section 3, should conclude in determine some TU value, and if that is Sz or something other appears quite relevant for what to do in the later sections.