Alternate-Marking Method
draft-ietf-ippm-rfc8321bis-03
Yes
(Martin Duke)
No Objection
Erik Kline
Note: This ballot was opened for revision 02 and is now closed.
Éric Vyncke
(was Discuss)
Yes
Comment
(2022-08-25)
Sent
# Éric Vyncke, INT AD, comments for draft-ietf-ippm-rfc8321bis-02 CC @evyncke Thank you for the work put into this document and for addressing my previous DISCUSS and most of the COMMENTs. Those are kept below for archiving and not for action. Please note that Tim Winters is the Internet directorate reviewer (at my request) and you may want to consider this int-dir review as well when Tim will complete the review (no need to wait for it though): https://datatracker.ietf.org/doc/draft-ietf-ippm-rfc8321bis/reviewrequest/16061/ Special thanks to Tommy Pauly for the shepherd's detailed write-up including the WG consensus and the justification of the intended status. I hope that this review helps to improve the document, Regards, -éric ## Previous DISCUSS (kept for archiving) As noted in https://www.ietf.org/blog/handling-iesg-ballot-positions/, a DISCUSS ballot is a request to have a discussion on the following topics: ### Section 5 Unsure whether I understand correctly: ``` Color switching is the reference for all the network devices, and the only requirement to be achieved is that all network devices have to recognize the right batch along the path. ``` Why do *all network devices* have to recognize the right batch? Isn't this transparent for them? ## Previous COMMENTS (kept for archiving) ### Roman's DISCUSS Just to let you know that I support Roman Danyliw's DISCUSS point. But, I also wonder why there is a recommendation to use this method only within controlled domains (except to falsify measurements). ### Changes of reference types between RFC 8321 and the -bis What is the reason why some references (e.g., RFC 3393) moved from the normative (in RFC 8321) to the informative section (in this document). ### Section 1 ``` RFC 7799 [RFC7799] defines Passive and Hybrid Methods of Measurement. In particular, Passive Methods of Measurement are based solely on observations of an undisturbed and unmodified packet stream of interest; Hybrid Methods are Methods of Measurement that use a combination of Active Methods and Passive Methods. ``` This short summary would benefit of an "active methods" definition. ### Section 3.1 In `A safe choice is to wait L/2 time units`, some experimental feedbacks or a theoretical reasoning would be welcome. (I am not a transport expert, but a packet delayed a lot is probably worse than a packet loss). ### Section 3.2.1.1 using the mean Just wondering whether the authors have experimented with other statistical metrics, e.g., the median (more 'complex' to compute of course) or taking into account the standard deviation ? Also, what is the impact of the arrival rate distribution on using the mean ? ### Section 3.2.2 While this section answers my previous comment, may I suggest moving the description of "double-marking" earlier in the flow ? It now appears "out of the blue" ;-) Moreover, the description is rather opaque, e.g., some examples would be welcome. ### Section 4.3 telemetry Is there a YANG model specified (or under specification) for data collection ? ### Section 5 ``` Additionally, in practice, besides clock errors, packet reordering is also very common in a packet network due to equal-cost multipath (ECMP). ``` Unsure whether ECMP really causes a "very common packet re-ordering". Suggest to s/very common/common/ at least ;-) ### Section 5 bound ``` The network delay between the network devices can be represented as a data set and 99.7% of the samples are within 3 standard deviation of the average ``` Does the above assume a specific packet distribution ? ### Section 6 fragmentation Should there be a note about: * IPv6 routers never fragment * use of DF bit for IPv4 ## NITS ### Capitalized Passive Unsure whether "Passive" needs to be capitalized in the text. ### Section 3.2 s/There are three alternatives, as described hereinafter./There are three methodologies, as described hereinafter./ ? (notably because there can only be *TWO* alternatives AFAIK) ## Notes This review is in the ["IETF Comments" Markdown format][ICMF], You can use the [`ietf-comments` tool][ICT] to automatically convert this review into individual GitHub issues. [ICMF]: https://github.com/mnot/ietf-comments/blob/main/format.md [ICT]: https://github.com/mnot/ietf-comments
Erik Kline
No Objection
John Scudder
No Objection
Comment
(2022-07-11 for -02)
Sent
Thanks for this quite readable document. I support Roman's DISCUSS. I also have some questions and comments: 1. In §2, Each change of color represents a sort of auto-synchronization signal that guarantees the consistency of measurements taken by different devices along the path. I realize this is fairly picky, but isn't "guarantee" over-promising? In situations like this there's usually some low-probability chance that something will go awry, and that does seem to be possible in this case. Presumably some slightly weaker language like "maximizes the probability of" or similar would be more accurate? 2. In §3.2.1, Since the timestamps refer to specific packets (the first packet of each block), we are sure that timestamps compared to compute delay refer to the same packets. But two paragraphs down you explain how in real life, we are actually not sure, because of the potential for packet loss and misordering. It might, then, be worthwhile to correct the sentence? E.g., "... in the ideal case where no packet loss or misordering exists, we would be sure that timestamps compared to compute the delay refer to the same packets." 3. In §4.3, I must be missing something: In any case, the NMS has to collect all the relevant values from all the routers within one cycle of the timer. "Within one cycle of the timer" seems to assume that the routers don't maintain a table of past values that can be collected at some less demanding cadence, as long as entries aren't allowed to age out. Is there some reason that's not so? (I'm assuming that the timer you're referring to is the fixed color switching timer. If that's not what it is, clarification is needed here.) 4. Thank you very much for reporting results of the earlier experiment, in §7. However, I don't understand why a section titled "Results of the Alternate Marking Experiment" uses RFC 2119 keywords. I guess they'd make more sense if the part that includes 2119 keywords were called something like "Recommendations for Deployment"... or you could just retitle the section as "Results of the Alternate Marking Experiment, with Recommendations for Deployment" as a minimal patch. 5. In §7.1, For security reasons, the Alternate Marking Method is RECOMMENDED only for controlled domains. Do you mean, "the Alternate Marking Method is NOT RECOMMENDED other than in controlled domains"? Because surely you aren't saying all controlled domains SHOULD deploy alt-mark, which is what a strict reading of this sentence, considering the formal definition of RECOMMENDED, means. Also, what exactly is the controlled domain supposed to control? Is it only the scope of marked packets? Or is it also the scope of the data derived from the marked packets? 6. In §10 I find myself bemused by, This document specifies a method to perform measurements in the context of a Service Provider's network and has not been developed to conduct Internet measurements, so it does not directly affect Internet security nor applications that run on the Internet. The Internet is nothing other than the concatenation of many networks, many of which are Service Provider networks, so I don't see how your premise leads to your conclusion? Nits: 7. In §4.2, where you write "(included)" although the meaning is clear, I think "(inclusive)" is more standard usage.
Paul Wouters
No Objection
Comment
(2022-07-14 for -02)
Not sent
I do support the DISCUSSes filed, especially those by Roman and Warren.
Roman Danyliw
(was Discuss)
No Objection
Comment
(2022-08-05)
Sent for earlier
Thank you for addressing my DISCUSS feedback.
Warren Kumari
(was Discuss)
No Objection
Comment
(2022-08-26)
Sent for earlier
Many of my original concerns around the controlled / limited domains text still remain; the document has attempted to address them, and, while not perfect, I think it is good enough (and the impact of escaping packets is IMO sufficiently low) that I am clearing my DISCUSS. Much thanks to the authors, and apologies for my somewhat pissey / snarky tone in the original DISCUSS.
Zaheduzzaman Sarker
(was Discuss)
No Objection
Comment
(2022-08-29)
Sent
Thanks for addressing my discuss point.
Martin Duke Former IESG member
Yes
Yes
(for -02)
Unknown
Alvaro Retana Former IESG member
(was Discuss)
No Objection
No Objection
(2022-08-02)
Sent for earlier
[Thanks for addressing my DISCUSS.]
Lars Eggert Former IESG member
(was Discuss)
No Objection
No Objection
(2022-07-12)
Sent for earlier
## Comments ### Section 1, paragraph 5 ``` [RFC7276] provides a good overview of existing Operations, Administration, and Maintenance (OAM) mechanisms defined in the IETF, ITU-T, and IEEE. In the IETF, a lot of work has been done on fault detection and connectivity verification, while little has been thus far dedicated to performance monitoring. The IETF has defined standard metrics to measure network performance; however, its methods mainly focus on Active measurement techniques.For example, [RFC6374] defines mechanisms for measuring packet loss, one-way and two-way delay, and delay variation in MPLS networks, but its applicability to Passive measurements has some limitations, especially for connection- less networks. ``` This statement on the current state of work in the IETF might not age well in an RFC - maybe remove? ### Section 1.2, paragraph 0 ``` 1.2. Requirements Language ``` Like draft-ietf-ippm-rfc8889bis, this document is very light on RFC2119 language, which leaves it quite unclear what the actually specified technique is intended to be. There are a lot of options for various pieces, but not enough guidance on when an option MUST/SHOULD/MAY be chosen and why. ### Section 3.1, paragraph 17 ``` Using a fixed timer for color switching offers better control over the method: the (time) length of the blocks can be chosen large enough to simplify the collection and the comparison of measures taken by different network devices. It's preferable to read the value of the counters not immediately after the color switch: some packets could arrive out of order and increment the counter associated with the previous block (color), so it is worth waiting for some time. A safe choice is to wait L/2 time units (where L is the duration for each block) after the color switch, to read the counter of the previous color. The drawback is that the longer the duration of the block, the less frequently the measurement can be taken. ``` Example of a paragraph that can be removed or moved, since the "fixed timer" option isn't being standardized. ### Section 4.3, paragraph 9 ``` The Alternate-Marking Method described in this document literally splits the packets of the measured flow into different measurement blocks. An implementation MAY use a Block Number (BN) for data correlation. The BN MAY be assigned to each measurement block and associated with each packet count and timestamp reported to or pulled by other nodes or NMSs. ``` Shouldn't the second MAY be a MUST? It MAY be done, and if it's done it MUST be done in this specific way? ### Section 5, paragraph 8 ``` It is assumed that all network devices are synchronized to a common reference time with an accuracy of +/- A/2. Thus, the difference between the clock values of any two network devices is bounded by A. ``` This should be made an RFC2119 deployment requirement, because if this cannot be guaranteed, wouldn't the mechanism break down? ### Section 5, paragraph 17 ``` If the time duration L of each block is not so small, the synchronization requirement could be satisfied even with a relatively inaccurate synchronization method. This is true for packet loss and two-way delay measurement, but not for one-way delay measurement, where clock synchronization must be accurate. Therefore, a system that uses only packet loss and two-way delay measurement may not require a very precise synchronization. This is because the value of the clocks of network devices does not affect the computation of the two-way delay measurement. ``` This paragraph does not give actionable guidance on what clock sync scheme to use for a given deployment. ### Section 6, paragraph 5 ``` Measurement points MAY simply ignore unmarked fragments and count marked fragments as full packets. However, if resources allow, measurement points MAY make note of both marked and unmarked initial fragments and only increment the corresponding counter if (a) other fragments are also marked, or (b) it observes all other fragments and they are unmarked. ``` Shouldn't this more strongly RECOMMEND noting marked fragments, and give resource consumption as a reason to not follow the recommendation? ### Section 7, paragraph 2 ``` Either one or two flag bits might be available for marking in different deployments: ``` The combination of SHOULDs and MAYs in the two following paragraphs leaves things underspecified. ### Section 7.1, paragraph 3 ``` For security reasons, the Alternate Marking Method is RECOMMENDED only for controlled domains. ``` Shouldn't this be a MUST-level requirement? I thought the security considerations were only valid inside a limited domain. ### Inclusive language Found terminology that should be reviewed for inclusivity; see https://www.rfc-editor.org/part2/#inclusive_language for background and more guidance: * Term `man`; alternatives might be `individual`, `people`, `person` ## Nits All comments below are about very minor potential issues that you may choose to address in some way - or ignore - as you see fit. Some were flagged by automated tools (via https://github.com/larseggert/ietf-reviewtool), so there will likely be some false positives. There is no need to let me know what you did with these suggestions. ### Boilerplate Document still refers to the "Simplified BSD License", which was corrected in the TLP on September 21, 2021. It should instead refer to the "Revised BSD License". ### Grammar/style #### Section 3.1, paragraph 1 ``` packets is an additional possibility but its detailed specification is out o ^^^^ ``` Use a comma before "but" if it connects two independent clauses (unless they are closely connected and short). #### Section 3.1, paragraph 18 ``` where load-balancing is seldom used and static joins are frequently used to ^^^^ ``` Use a comma before "and" if it connects two independent clauses (unless they are closely connected and short). #### Section 3.1, paragraph 22 ``` rent to intermediate nodes and independent from the path followed by traffic ^^^^^^^^^^^^^^^^ ``` The usual collocation for "independent" is "of", not "from". Did you mean "independent of"? #### Section 13.2, paragraph 4 ``` ut not detailed. o Explanation of the the intrinsic error in section 3.3.1 on ^^^^^^^ ``` Possible typo: you repeated a word. ## Notes This review is in the ["IETF Comments" Markdown format][ICMF], You can use the [`ietf-comments` tool][ICT] to automatically convert this review into individual GitHub issues. Review generated by the [`ietf-reviewtool`][IRT]. [ICMF]: https://github.com/mnot/ietf-comments/blob/main/format.md [ICT]: https://github.com/mnot/ietf-comments [IRT]: https://github.com/larseggert/ietf-reviewtool
Robert Wilton Former IESG member
(was Discuss)
No Objection
No Objection
(2022-09-08)
Sent for earlier
Thanks for updating the text.