Ballot for draft-ietf-ippm-capacity-metric-method

Discuss

Magnus Westerlund

Yes

Martin Duke

No Objection

Deborah Brungard
Alissa Cooper
Roman Danyliw
Benjamin Kaduk
Erik Kline
Murray Kucherawy
Warren Kumari
Barry Leiba
Alvaro Retana
Martin Vigoureux
Robert Wilton

No Record

Éric Vyncke

Summary: Has a DISCUSS. Has enough positions to pass once DISCUSS positions are resolved.

Magnus Westerlund Discuss

Discuss (2021-02-25)
A) Section 8. Method of Measurement

I think the metrics are fine, what makes me quite worried here is the measurement method. My concerns with it are the following.

1. The application of this measurement method is not clearly scoped. Therefore I will assume that across the Internet measurements are possible. However in that context I think the definition and protection against severe congestion has significant short comings. The main reason is that the during a configurable time period (default 1 s) the sender will attempt to send at a specified rate by a table independently on what happens during that second. 

2. The algorithm for adjusting rate is table driven but give no guidance on how to construct the table and limitations on value changes in the table. In addition the algorithm discusses larger steps in the table without any reflection of what theses steps sides may represent in offered load. 

3. Third the algorithms reaction to any sequence number gaps is dependent on delay and how it is related to unspecified delay thresholds. Also no text discussion how these thresholds should be configured for safe operation. 

B) Section 8. Method of Measurement

There are no specification of the measurement protocol here that provides sequence numbers, and the feedback channel as well as the control channel. Is this intended to use TWAMP? 

From my perspective this document defines the metrics on standards track level. However, the method for actually running the measurements are not specified on a standards track level. No one can build implementation. And if the section is intended to provide requirements on a protocol that performs these measurements I think several aspects are missing. There appear several ways forward here to resolve this; one is to split out the method of measurement and define it separately to standard tracks level using a particular protocol, another is to write it purely as requirements on a measurement protocols.

Martin Duke Yes

Deborah Brungard No Objection

Alissa Cooper No Objection

Roman Danyliw No Objection

Comment (2021-02-22)
No email
send info
Thank you to Rifaat Shekh-Yusef for the SECDIR review.

Benjamin Kaduk No Objection

Comment (2021-02-24)
Section 3

   5.  Less emphasis on ISP gateway measurements, possibly due to less
       traffic crossing ISP gateways in future.

nit: sentence fragment.

Section 5

   This section sets requirements for the following components to
   support the Maximum IP-layer Capacity Metric.

editorial/nit: I don't think I understand what "the following
components" are.  Is this referencing some preexisting template for a
metric definition?  (Same for §6 and §7.)

Section 5.3

   The number of these IP-layer bits is designated n0[dtn,dtn+1] for a
   specific dt.

(editorial) I'm a little confused by this notation for n0.  In Section 4
we say that "dtn" references a specific sub-interval, but the brackets
and two items look like they should be the start an end of an interval,
for which perhaps just the 'n' would be more appropriate (but we'd want
a half-open interval, and would need to worry about 0- vs 1-indexing for
n, etc.).

   Anticipating a Sample of Singletons, the interval dt SHOULD be set to
   a natural number m so that T+I = T + m*dt with dtn+1 - dtn = dt and
   with 1 <= n <= m.

nit: "dt [...] set to m" doesn't make any sense.
nit: this looks like more of a "for 1 <= n <= m" than an "and with".

   Mathematically, this definition can be represented as:

                                      ( n0[dtn,dtn+1] )
                      C(T,dt,PM) = -------------------------
                                             dt

(nit) the "n" appears (in 'dtn') only on the right side but there is no
operation applied over it, so this "equation" seems unbalanced without
also specifying 'n'.  I think the introductory text should mention
something about "for each n" or for a given interval dtn" or similar.

   o  n0 is the total number of IP-layer header and payload bits that
      can be transmitted in Standard Formed packets [RFC8468] from the

(nit) RFC 8468 spells it "standard-formed".

   o  C(T,dt,PM) the IP-Layer Capacity, corresponds to the value of n0
      measured in any sub-interval ending at dtn (meaning T + n*dt),
      divided by the length of sub-interval, dt.

If it's supposed to be "ending at dtn", then why is the "dtn+1" in the
picture at all?

   o  PM represents other performance metrics [see section 5.4 below];
      their measurement results SHALL be collected during measurement of
      IP-layer Capacity and associated with the corresponding dtn for
      further evaluation and reporting.

(nit) this seems to duplicate (be a subset of) the paragraph before
"Mathematically, this definition can be represented".

   o  The bit rate of the physical interface of the measurement device
      must be higher than that of the link whose C(T,I,PM) is to be
      measured.

(nit) I thought we were measuring a path, not a link.

Section 5.4

   RTD[dtn-1,dtn] is defined as a sample of the [RFC2681] Round-trip

Do we really want to be using n0[dtn,dtn+1] but RTD[dtn-1,dtn]?  Can we
pick a consistent sub-interval notation?

Section 6.3

   Define the Maximum IP-layer capacity, Maximum_C(T,I,PM), to be the
   maximum number of IP-layer bits n0[dtn,dtn+1] that can be transmitted
   in packets from the Src host and correctly received by the Dst host,

(nit) the relevant formulae include a dt divisor, but I don't see
anything in this prose that would correspond to such a divisor.

   The interval dt SHOULD be set to a natural number m so that T+I = T +
   m*dt with dtn+1 - dtn = dt and with 1 <= n <= m.

nit: "dt [...] set to m" doesn't make any sense.
nit: this looks like more of a "for 1 <= n <= m" than an "and with".

   Mathematically, this definition can be represented as:

                                      max  ( n0[dtn,dtn+1] )
                                     [T,T+I]
                Maximum_C(T,I,PM) = -------------------------
                                               dt
               where:
                  T                                      T+I
                  _________________________________________
                  |   |   |   |   |   |   |   |   |   |   |
              dtn=1   2   3   4   5   6   7   8   9  10  n+1
                                                     n=m

(nit) as mentioned previously, the definition of "dtn" lists it as being
the sub-interval, not the boundary point/time of a sub-interval.

Section 6.5

   If traffic conditioning (e.g., shaping, policing) applies along a
   path for which Maximum_C(T,I,PM) is to be determined, different
   values for dt SHOULD be picked and measurements be executed during
   multiple intervals [T, T+I].  A single constant interval dt SHOULD be
   chosen so that is an integer multiple of increasing values k times
   serialisation delay of a path MTU at the physical interface speed
   where traffic conditioning is expected.  [...]

nit: "so that is an integer multiple" seems to be missing a word.

Also, this seems to say that different values for dt SHOULD be picked,
but also that a constant dt SHOULD be chosen.  How can those both be
recommended and be consistent with each other?  (I mean, I assume that
the intent is to multiple runs with different (fixed) dt, but that's not
what the text seems to say.)

Section 7.3

   Define the IP-layer Sender Bit Rate, B(S,st), to be the number of IP-
   layer bits (including header and data fields) that are transmitted
   from the Source during one contiguous sub-interval, st, during the
   test interval S (where S SHALL be longer than I), and where the
   fixed-size packet count during that single sub-interval st also
   provides the number of IP-layer bits in any interval: n0[stn-1,stn].

(1) there doesn't seem to be any restriction that the observed packets
list Dst as the destination address, so formally it seems this would
count *all* traffic generated by Sender, not just the traffic relevant
for the (path capacity) test.
(2) It seems a little unfortunate that we reuse the 'n0' symbol here for
a different meaning than in the earlier capacity metrics.

   Measurements according to these definitions SHALL use the UDP
   transport layer.  Any feedback from Dst host to Src host received by
   Src host during an interval [stn-1,stn] MUST NOT result in an
   adaptation of the Src host traffic conditioning during this interval
   (rate adjustment occurs on st boundaries).

Hmm, this "MUST NOT" is interesting, as it seems to imply extremely
tight coordination between the measurement point for this metric and the
Source itself.  (Note that the toplevel §7 admits the possibility that
measurement will occur at a location other than the Src host to network
path interface, via "(or as close as practical)".)

Section 8.1

   At the beginning of a test, the sender begins sending at rate R1 and
   the receiver starts a feedback timer at interval F (while awaiting

It's a little hard to search for, but I didn't find any previous mention
of 'F' or it being defined as a parameter or term.  Should it be a
listed parameter somewhere?

   If the feedback indicates that sequence number anomalies were
   detected OR the delay range was above the upper threshold, the
   offered load rate is decreased.  Also, if congestion is now confirmed
   by the current feedback message being processed, then the offered
   load rate is decreased by more than one rate (e.g., Rx-30).  [...]

Does "congestion is now confirmed" mean that "congestion confirmed" is
like a one-way latch and this transition only occurs at most once over
the course of a test?  Or could the Rx-30 happen multiple times?
(The pseudocode indicates the former.)

   If the feedback indicates that there were no sequence number
   anomalies AND the delay range was above the lower threshold, but
   below the upper threshold, the offered load rate is not changed.

The way this is written suggests that there will always be a lower and
an upper threshold for delay, but the rest of the document so far didn't
give me that impression.  E.g., we talk about PM only as "at least one
fundamental metric and target performance threshold MUST be supplied",
and to me having both upper and lower thresholds would be two
thresholds, not one.

Section 8.2

   Here, as with any Active Capacity test, the test duration must be
   kept short. 10 second tests for each direction of transmission are
   common today.  The default measurement interval specified here is I =
   10 seconds).  In combination with a fast search method and user-
   network coordination, the concerns raised in RFC 6815[RFC6815] are
   alleviated.  [...]

I skimmed RFC 6815 and had a bit of a hard time making the connection
for why combining a 10-second interval, fast search method, and
user-network coordination alleviate the concerns of RFC 6815.  There
doesn't seem to be much in 6815 itself about how testing in production
can be done safely, so my current working assumption is that the
conclusion presented here reflects the results of "new work" being
recorded for the first time (in the RFC series) in this document.  If
that assumption is correct, I'd suggest spending some more words to
support the conclusion, e.g., making analogies to other "normal" traffic
patterns and how the benchmarking setup is not qualitatively different
from them.

Section 8.3

   As testing continues, implementers should expect some evolution in
   the methods.  The ITU-T has published a Supplement (60) to the
   Y-series of Recommendations, "Interpreting ITU-T Y.1540 maximum IP-
   layer capacity measurements", [Y.Sup60], which is the result of
   continued testing with the metric and method described here.

I pulled up the [Y.Sup60] reference, and it does not seem to reference
this draft by name.  On what basis do we conclude that it "is the result
of continued testing with the metric and method described here"?
Skimming/searching, I do see many similar formulae and methods
presented, but how do we conclude they are precisely the same?

Section 10

Should we say something about making sure that I is reasonably bounded?
IIRC we say so elsewhere in the text but not exactly here.

   2.  A REQUIRED user client-initiated setup handshake between
       cooperating hosts and allows firewalls to control inbound
       unsolicited UDP which either go to a control port [expected and
       w/authentication] or to ephemeral ports that are only created as
       needed.  [...]

nit: the grammar is odd in the first part of this sentence; the part
before the "and" doesn't seem like it can join up with anything after
the "and".  Is the intent something like "It is REQUIRED to have a user
client-initiated setup handshake between cooperating hosts that allows
firewalls to [...]"?

   3.  Integrity protection for feedback messages conveying measurements
       is RECOMMENDED.

(In some sense you want authentication as well as integrity protection.)

   5.  Senders MUST be rate-limited.  This can be accomplished using the
       pre-built table defining all the offered load rates that will be
       supported (Section 8.1).  The recommended load-control search
       algorithm results in "ramp up" from the lowest rate in the table.

nit: since (effectively) each implementation will have their own
pre-built table, I think it should be "using a pre-built table".

Appendix 13

If we start at Rx (row) 1, is it going to cause problems when we drop
down to Rx = 0 in the loss/congestion cases?

The mechcanism in the pseudocode to stop taking large increments in
sending rate above the "hSpeedThresh" does not seem to be described in
the prose in §8.1.  (That said, it seems like a good idea, given the
likely table composition.)

(Also, indenting one tab for the outer conditionals and two more for the
inner ones looks a bit unusual.)

Section 14

It's not entirely clear to me why RFC 2330 is classified as normative
but RFC 7312 is informative, just based on the locations where they are
referenced.

Erik Kline No Objection

Comment (2021-02-23)
[[ comments ]]

[ section 4 ]

* RFC 6438 isn't only about flow label treatment at Tunnel End Points, since
  other devices in the network can do ECMP where the flow label is part of
  the flow.

  Maybe just a full stop after "when routers have complied with RFC6438
  guidelines."?


[[ questions ]]

[ sections 5.6/6.6/7.4 ]

* Should the "megabit = 1,000,000 bits" text from section 6.6 be also used
  in sections 5.6 and 7.4, or even called out separately earlier on?

  (Maybe it's just my own experience, but I've found that, while "mibi" 100%
  of the time means base 2, "mega" only mostly means base 10 and occasionally
  can still be interpreted as base 2.)


[[ nits ]]

[ section 2 ]

* "Also, to foster the development..."

  This appears to be a sentence fragment rather than a grammatically correct
  sentence.  I assume the point is to say that fostering the development ...
  is also goal of the document.

* "The supporting" -> "Supporting the", might read more naturally?

[ section 5.4 ]

* "The statistics used to to summarize" -> "The statistics used to summarize"

[ section 6.5 ]

* "so that is" -> "so that it is"?

Murray Kucherawy (was Discuss) No Objection

Comment (2021-02-25)
Since a SHOULD leaves an implementer with a choice, it's preferable to see prose explaining why one might deviate from the SHOULD advice.  Thus, the SHOULDs in Sections 5.3 and 6.3 leave me wondering under what circumstances an implementer might legitimately choose to do something else.  If there are none, should it be a MUST?

Warren Kumari No Objection

Comment (2021-02-25)
No email
send info
I'm stealing Robert's ballot text, because it perfectly explains my position/views:
"Thank you for this document, I found it interesting to read.

I have no comments that haven't already been raised by other AD reviews.
"

Barry Leiba No Objection

Alvaro Retana No Objection

Martin Vigoureux No Objection

Robert Wilton No Objection

Comment (2021-02-25)
No email
send info
Thank you for this document, I found it interesting to read.

I have no comments that haven't already been raised by other AD reviews.

Éric Vyncke No Record