Skip to main content

Early Review of draft-ietf-pce-sr-p2mp-policy-13
review-ietf-pce-sr-p2mp-policy-13-rtgdir-early-ceccarelli-2025-11-20-00

Request Review of draft-ietf-pce-sr-p2mp-policy
Requested revision No specific revision (document currently at 13)
Type Early Review
Team Routing Area Directorate (rtgdir)
Deadline 2025-11-21
Requested 2025-11-06
Requested by Dhruv Dhody
Authors Hooman Bidgoli , Daniel Voyer , Anuj Budhiraja , Rishabh Parekh (editor) , Siva Sivabalan
I-D last updated 2025-10-19 (Latest revision 2025-10-19)
Completed reviews Rtgdir Early review of -13 by Daniele Ceccarelli
Opsdir Early review of -13 by Yingzhen Qu
Assignment Reviewer Daniele Ceccarelli
State Completed
Request Early review on draft-ietf-pce-sr-p2mp-policy by Routing Area Directorate Assigned
Posted at https://mailarchive.ietf.org/arch/msg/rtg-dir/opLbTSmYFtwpnp5U5wMIBVjbgTA
Reviewed revision 13
Result Has nits
Completed 2025-11-20
review-ietf-pce-sr-p2mp-policy-13-rtgdir-early-ceccarelli-2025-11-20-00
Hello WG and authors,

i've been selected as RTG-DIR reviewer for this draft.
I think the draft is mature and well written, there are some operations and
backward compatibility considerations to be done and some nits to be fixed. The
draft is well aligned with SR policy architecture, reuses existing PCEP
constructs, and extends them in a logical way. The use of replication segments
is clean and matches existing SR semantics.

Pardon my inclination to consider the draft from an operational point fo view
but i wanted to share a couple of thought on complexity and scaling. Given P2MP
policies involve replication segments on possibly many transit nodes, the state
(on PCE and PCCs) may grow significantly. Could this be an issue? Have the
authors thought of it? Can PCE/PCC platform scale to support a large number of
Path Instances and replication segments, especially in large multicast trees?

Another potential concern is related to fragmentation (Section 4.3.9) adds
operational complexity: dealing with fragmentation of PCEP messages might lead
to interoperability risks if different implementations fragment differently.

I'm not a P2MP expert but when it comes to global optimization, the draft says
“make before break” is supported, but i'm wondering if traffic lost or
mis-replicated, especially for critical leaf nodes, could be an issue? Maybe
not...just wanted to make sure it's not the case.

Manageability & Monitoring: The draft’s manageability considerations (Section
9) mention liveness detection, but it may be helpful to add more detailed
guidance: how to monitor per-PTI health, replication correctness, leaf
reachability, and detect silent failures (e.g., a replication node dropping
traffic).

Stateful vs Stateless PCE: The draft explicitly scopes only stateful PCE;
stateless PCE is out of scope. This avoids backward-compatibility in stateless
deployments, but operators migrating from stateless setups will need to move to
stateful PCE, which is potentially a big lift. I mean, is it possible to have
legacy environments with stateless setup? They would need to be migrated to
stateful?

A few typos/nits and not very clear text that would need rephrasing to improve
readability:

- Section 4.3 s/Prodecures/Procedures
- Section 5.5.1, TLV “SRPOLICY-CPATH-PREFRENCE” — “PREFRENCE” should be
“PREFERENCE.”

- Introduction (first paragraph):
“A SR P2MP Policy is constructed using one or more Replication segments … from
a Root node to a set of Leaf nodes, optionally through a set of intermediate
transit nodes that perform replication.” This sentence is dense. Suggest
splitting:

“An SR P2MP Policy uses one or more replication segments (per RFC 9524) to
deliver data from a Root node to multiple Leaf nodes. Optionally, intermediate
(transit) nodes may be used to replicate data, forming a tree.”

- Section 4.3.4: The draft describes “make-before-break” but does not clearly
articulate the timing or coordination: perhaps rephrase to clarify that “PCE
may compute a new globally optimized PTI, but only signal it to the PCC once
ready; the switch occurs once the new PTI is installed on all relevant nodes,
thereby minimizing disruption.”

- Manageability (Section 9): The phrase “Verify Correct Operations” is vague.
It would help to define more precisely what “correct operations” means: e.g.,
leaf reachability, replication fidelity, path instance liveness.

Thanks
Daniele