Skip to main content

Alternate-Marking Method

Note: This ballot was opened for revision 02 and is now closed.

Éric Vyncke
(was Discuss) Yes
Comment (2022-08-25) Sent
# Éric Vyncke, INT AD, comments for draft-ietf-ippm-rfc8321bis-02
CC @evyncke

Thank you for the work put into this document and for addressing my previous DISCUSS and most of the COMMENTs. Those are kept below for archiving and not for action.

Please note that Tim Winters is the Internet directorate reviewer (at my request) and you may want to consider this int-dir review as well when Tim will complete the review (no need to wait for it though):

Special thanks to Tommy Pauly for the shepherd's detailed write-up including the WG consensus and the justification of the intended status. 

I hope that this review helps to improve the document,



## Previous DISCUSS (kept for archiving)

As noted in, a DISCUSS ballot is a request to have a discussion on the following topics:

### Section 5

Unsure whether I understand correctly:
   Color switching is the reference for all the network devices, and the
   only requirement to be achieved is that all network devices have to
   recognize the right batch along the path.
Why do *all network devices* have to recognize the right batch? Isn't this transparent for them?

## Previous COMMENTS (kept for archiving)

### Roman's DISCUSS

Just to let you know that I support Roman Danyliw's DISCUSS point.

But, I also wonder why there is a recommendation to use this method only within controlled domains (except to falsify measurements).

### Changes of reference types between RFC 8321 and the -bis

What is the reason why some references (e.g., RFC 3393) moved from the normative (in RFC 8321) to the informative section (in this document). 

### Section 1

   RFC 7799 [RFC7799] defines Passive and Hybrid Methods of Measurement.
   In particular, Passive Methods of Measurement are based solely on
   observations of an undisturbed and unmodified packet stream of
   interest; Hybrid Methods are Methods of Measurement that use a
   combination of Active Methods and Passive Methods.

This short summary would benefit of an "active methods" definition.

### Section 3.1

In `A safe choice is to wait L/2 time units`, some experimental feedbacks or a theoretical reasoning would be welcome. (I am not a transport expert, but a packet delayed a lot is probably worse than a packet loss).

### Section using the mean

Just wondering whether the authors have experimented with other statistical metrics, e.g., the median (more 'complex' to compute of course) or taking into account the standard deviation ?

Also, what is the impact of the arrival rate distribution on using the mean ?

### Section 3.2.2

While this section answers my previous comment, may I suggest moving the description of "double-marking" earlier in the flow ? It now appears "out of the blue" ;-)

Moreover, the description is rather opaque, e.g., some examples would be welcome.

### Section 4.3 telemetry

Is there a YANG model specified (or under specification) for data collection ?

### Section 5

   Additionally, in practice, besides clock
   errors, packet reordering is also very common in a packet network due
   to equal-cost multipath (ECMP). 
Unsure whether ECMP really causes a "very common packet re-ordering". Suggest to s/very common/common/ at least ;-)

### Section 5 bound

   The network delay between the network devices can be represented as a
   data set and 99.7% of the samples are within 3 standard deviation of
   the average
Does the above assume a specific packet distribution ?

### Section 6 fragmentation

Should there be a note about:

* IPv6 routers never fragment
* use of DF bit for IPv4 


### Capitalized Passive

Unsure whether "Passive" needs to be capitalized in the text.

### Section 3.2

s/There are three alternatives, as described hereinafter./There are three methodologies, as described hereinafter./ ? (notably because there can only be *TWO* alternatives AFAIK)

## Notes

This review is in the ["IETF Comments" Markdown format][ICMF], You can use the
[`ietf-comments` tool][ICT] to automatically convert this review into
individual GitHub issues. 

Erik Kline
No Objection
John Scudder
No Objection
Comment (2022-07-11 for -02) Sent
Thanks for this quite readable document.

I support Roman's DISCUSS. I also have some questions and comments:

1. In §2,

                                                   Each change of color
   represents a sort of auto-synchronization signal that guarantees the
   consistency of measurements taken by different devices along the
I realize this is fairly picky, but isn't "guarantee" over-promising? In situations like this there's usually some low-probability chance that something will go awry, and that does seem to be possible in this case. Presumably some slightly weaker language like "maximizes the probability of" or similar would be more accurate?

2. In §3.2.1,

        Since the timestamps refer to specific packets (the first packet
   of each block), we are sure that timestamps compared to compute delay
   refer to the same packets.
But two paragraphs down you explain how in real life, we are actually not sure, because of the potential for packet loss and misordering. It might, then, be worthwhile to correct the sentence? E.g., "... in the ideal case where no packet loss or misordering exists, we would be sure that timestamps compared to compute the delay refer to the same packets."

3. In §4.3, I must be missing something:

                                                          In any case,
      the NMS has to collect all the relevant values from all the
      routers within one cycle of the timer.

"Within one cycle of the timer" seems to assume that the routers don't maintain a table of past values that can be collected at some less demanding cadence, as long as entries aren't allowed to age out. Is there some reason that's not so? (I'm assuming that the timer you're referring to is the fixed color switching timer. If that's not what it is, clarification is needed here.)

4. Thank you very much for reporting results of the earlier experiment, in §7. However, I don't understand why a section titled "Results of the Alternate Marking Experiment" uses RFC 2119 keywords. I guess they'd make more sense if the part that includes 2119 keywords were called something like "Recommendations for Deployment"... or you could just retitle the section as "Results of the Alternate Marking Experiment, with Recommendations for Deployment" as a minimal patch.

5. In §7.1,

   For security reasons, the Alternate Marking Method is RECOMMENDED
   only for controlled domains.

Do you mean, "the Alternate Marking Method is NOT RECOMMENDED other than in controlled domains"? Because surely you aren't saying all controlled domains SHOULD deploy alt-mark, which is what a strict reading of this sentence, considering the formal definition of RECOMMENDED, means.

Also, what exactly is the controlled domain supposed to control? Is it only the scope of marked packets? Or is it also the scope of the data derived from the marked packets? 

6. In §10 I find myself bemused by,

   This document specifies a method to perform measurements in the
   context of a Service Provider's network and has not been developed to
   conduct Internet measurements, so it does not directly affect
   Internet security nor applications that run on the Internet.

The Internet is nothing other than the concatenation of many networks, many of which are Service Provider networks, so I don't see how your premise leads to your conclusion?


7. In §4.2, where you write "(included)" although the meaning is clear, I think "(inclusive)" is more standard usage.
Paul Wouters
No Objection
Comment (2022-07-14 for -02) Not sent
I do support the DISCUSSes filed, especially those by Roman and Warren.
Roman Danyliw
(was Discuss) No Objection
Comment (2022-08-05) Sent for earlier
Thank you for addressing my DISCUSS feedback.
Warren Kumari
(was Discuss) No Objection
Comment (2022-08-26) Sent for earlier
Many of my original concerns around the controlled / limited domains text still remain; the document has attempted to address them, and, while not perfect, I think it is good enough (and the impact of escaping packets is IMO sufficiently low) that I am clearing my DISCUSS. 

Much thanks to the authors, and apologies for my somewhat pissey / snarky tone in the original DISCUSS.
Zaheduzzaman Sarker
(was Discuss) No Objection
Comment (2022-08-29) Sent
Thanks for addressing my discuss point.
Martin Duke Former IESG member
Yes (for -02) Unknown

Alvaro Retana Former IESG member
(was Discuss) No Objection
No Objection (2022-08-02) Sent for earlier
[Thanks for addressing my DISCUSS.]
Lars Eggert Former IESG member
(was Discuss) No Objection
No Objection (2022-07-12) Sent for earlier
## Comments

### Section 1, paragraph 5
     [RFC7276] provides a good overview of existing Operations,
     Administration, and Maintenance (OAM) mechanisms defined in the IETF,
     ITU-T, and IEEE.  In the IETF, a lot of work has been done on fault
     detection and connectivity verification, while little has been thus
     far dedicated to performance monitoring.  The IETF has defined
     standard metrics to measure network performance; however, its methods
     mainly focus on Active measurement techniques.For example, [RFC6374]
     defines mechanisms for measuring packet loss, one-way and two-way
     delay, and delay variation in MPLS networks, but its applicability to
     Passive measurements has some limitations, especially for connection-
     less networks.
This statement on the current state of work in the IETF might not age well in an
RFC - maybe remove?

### Section 1.2, paragraph 0
  1.2.  Requirements Language
Like draft-ietf-ippm-rfc8889bis, this document is very light on RFC2119
language, which leaves it quite unclear what the actually specified technique is
intended to be. There are a lot of options for various pieces, but not enough
guidance on when an option MUST/SHOULD/MAY be chosen and why.

### Section 3.1, paragraph 17
     Using a fixed timer for color switching offers better control over
     the method: the (time) length of the blocks can be chosen large
     enough to simplify the collection and the comparison of measures
     taken by different network devices.  It's preferable to read the
     value of the counters not immediately after the color switch: some
     packets could arrive out of order and increment the counter
     associated with the previous block (color), so it is worth waiting
     for some time.  A safe choice is to wait L/2 time units (where L is
     the duration for each block) after the color switch, to read the
     counter of the previous color.  The drawback is that the longer the
     duration of the block, the less frequently the measurement can be
Example of a paragraph that can be removed or moved, since the "fixed timer"
option isn't being standardized.

### Section 4.3, paragraph 9
     The Alternate-Marking Method described in this document literally
     splits the packets of the measured flow into different measurement
     blocks.  An implementation MAY use a Block Number (BN) for data
     correlation.  The BN MAY be assigned to each measurement block and
     associated with each packet count and timestamp reported to or pulled
     by other nodes or NMSs.  
Shouldn't the second MAY be a MUST? It MAY be done, and if it's done it MUST be
done in this specific way?

### Section 5, paragraph 8
     It is assumed that all network devices are synchronized to a common
     reference time with an accuracy of +/- A/2.  Thus, the difference
     between the clock values of any two network devices is bounded by A.
This should be made an RFC2119 deployment requirement, because if this cannot be
guaranteed, wouldn't the mechanism break down?

### Section 5, paragraph 17
     If the time duration L of each block is not so small, the
     synchronization requirement could be satisfied even with a relatively
     inaccurate synchronization method.  This is true for packet loss and
     two-way delay measurement, but not for one-way delay measurement,
     where clock synchronization must be accurate.  Therefore, a system
     that uses only packet loss and two-way delay measurement may not
     require a very precise synchronization.  This is because the value of
     the clocks of network devices does not affect the computation of the
     two-way delay measurement.
This paragraph does not give actionable guidance on what clock sync scheme to
use for a given deployment.

### Section 6, paragraph 5
        Measurement points MAY simply ignore unmarked fragments and count
        marked fragments as full packets.  However, if resources allow,
        measurement points MAY make note of both marked and unmarked
        initial fragments and only increment the corresponding counter if
        (a) other fragments are also marked, or (b) it observes all other
        fragments and they are unmarked.
Shouldn't this more strongly RECOMMEND noting marked fragments, and give
resource consumption as a reason to not follow the recommendation?

### Section 7, paragraph 2
     Either one or two flag bits might be available for marking in
     different deployments:
The combination of SHOULDs and MAYs in the two following paragraphs leaves
things underspecified.

### Section 7.1, paragraph 3
     For security reasons, the Alternate Marking Method is RECOMMENDED
     only for controlled domains.
Shouldn't this be a MUST-level requirement? I thought the security
considerations were only valid inside a limited domain.

### Inclusive language

Found terminology that should be reviewed for inclusivity; see for background and more

 * Term `man`; alternatives might be `individual`, `people`, `person`

## Nits

All comments below are about very minor potential issues that you may choose to
address in some way - or ignore - as you see fit. Some were flagged by
automated tools (via, so there
will likely be some false positives. There is no need to let me know what you
did with these suggestions.

### Boilerplate

Document still refers to the "Simplified BSD License", which was corrected in
the TLP on September 21, 2021. It should instead refer to the "Revised BSD

### Grammar/style

#### Section 3.1, paragraph 1
packets is an additional possibility but its detailed specification is out o
Use a comma before "but" if it connects two independent clauses (unless they
are closely connected and short).

#### Section 3.1, paragraph 18
 where load-balancing is seldom used and static joins are frequently used to
Use a comma before "and" if it connects two independent clauses (unless they
are closely connected and short).

#### Section 3.1, paragraph 22
rent to intermediate nodes and independent from the path followed by traffic
The usual collocation for "independent" is "of", not "from". Did you mean
"independent of"?

#### Section 13.2, paragraph 4
ut not detailed. o Explanation of the the intrinsic error in section 3.3.1 on
Possible typo: you repeated a word.

## Notes

This review is in the ["IETF Comments" Markdown format][ICMF], You can use the
[`ietf-comments` tool][ICT] to automatically convert this review into
individual GitHub issues. Review generated by the [`ietf-reviewtool`][IRT].

Robert Wilton Former IESG member
(was Discuss) No Objection
No Objection (2022-09-08) Sent for earlier
Thanks for updating the text.