Skip to main content

Early Review of draft-ietf-cats-usecases-requirements-06
review-ietf-cats-usecases-requirements-06-rtgdir-early-robles-2025-05-18-00

Request Review of draft-ietf-cats-usecases-requirements
Requested revision No specific revision (document currently at 07)
Type Early Review
Team Routing Area Directorate (rtgdir)
Deadline 2025-05-18
Requested 2025-04-26
Requested by Adrian Farrel
Authors Kehan Yao , Luis M. Contreras , Hang Shi , Shuai Zhang , Qing An
I-D last updated 2025-06-10 (Latest revision 2025-06-10)
Completed reviews Rtgdir Early review of -06 by Ines Robles (diff)
Comments
This is a request for an early review. The document is approaching being considered "ready".
It is not certain whether this document will be pursued to RFC, but the WG will still need to call consensus on the document.
Assignment Reviewer Ines Robles
State Completed
Request Early review on draft-ietf-cats-usecases-requirements by Routing Area Directorate Assigned
Posted at https://mailarchive.ietf.org/arch/msg/rtg-dir/KWa9ftJnOgqWxHTrDA8Kr_dCmao
Reviewed revision 06 (document currently at 07)
Result Not ready
Completed 2025-05-18
review-ietf-cats-usecases-requirements-06-rtgdir-early-robles-2025-05-18-00
This is a rtg-dir review of draft-ietf-cats-usecases-requirements-06

Summary:

This document provides the problem statement and the typical scenarios for
CATS, which shows the necessity of considering more factors when steering
traffic to the appropriate computing resource to better meet the customer
expectations. This document also describes CATS requirements.

The document presents good examples and requirements. Comments/Suggestions
below, as follows:

Suggestions/Comments:

1- It would be relevant to add a definition of a CATS System.

2-Section 5.1:
"R1: MUST provide a discovery and resolving method for the mapping of a service
identifier to a specific address.
 R2: MUST provide a method to determine the availability of a service instance."

2.1- The way R1 and R2 are currently written does not explicitly guarantee that
discovery and availability assessment are dynamic.

2.2- R1 does not say whether this mapping is done once at boot, periodically
updated, or dynamically on request. A static mapping (e.g., DNS or config file)
would satisfy this requirement which contradicts CATS goal of reacting to
changing conditions. What about something like: "R1: MUST provide a dynamic
discovery and resolution method for mapping a service identifier to one or more
current service instance addresses, based on real-time system state." ?

2.3- In R2, "Determine availability" is vague, it could be a one-time health
check. It does not guarantee continuous or reactive monitoring, which is
essential for dynamic steering.  What about something like: "R2: MUST provide a
method to dynamically assess the availability and readiness of service
instances, based on up-to-date status metrics (e.g., health, load,
reachability)."?

3- Section 5.2:"R3: MUST agree on using metrics.. and their representation
among service instances..."

3.1- It is not clear who must agree: the operators? the implementations? at
design time or runtime?

3.2- What do you think about adding a requirement regarding freshness
indicators and staleness handling for metrics? In my view, when a system
"understands" a metric, semantic interoperability goes beyond just knowing what
the metric represents and how it is encoded. It also includes understanding how
recent or valid the data is.

3.3- "R8: There MUST set up metric information that can be understood by CATS
components...."

3.3.1- The requirement lacks a clear subject. It is unclear who is responsible
for setting up the metric information, is it the service instance, the ingress
node, a controller, or the operator?

3.3.2- The phrase "set up metric information" is vague. It is not evident what
this setup entails: does it refer to configuring metric schemas, encoding data,
or runtime registration? Also, this requirement is not verifiable; that is, how
would someone test or confirm that the requirement has been met in practice?

3.3.4- What about something like: "All metric information used in CATS MUST be
produced and encoded in a format that is understood by participating CATS
components. For metrics that CATS components do not understand or support, CATS
components will ignore them."?

3.4-"R9: The computing metrics in CATS MUST be simple, that is distributing
metrics and selecting path based on these metrics will not cause routing loops
and route oscillation."

"simple" is subjective and vague. What about to replace "simple" with the
actual intent of the requirement, (loop- and oscillation-freedom)? What about
something like: "R9: The computation and use of metrics in CATS MUST be
designed to avoid introducing routing loops or path oscillations when metrics
are distributed and used for path selection."?

3.5- What about to add a requirement related negotiation or discovery of metric
types or capabilities? Perhaps something like: "R#: CATS components SHOULD
support a mechanism to advertise or negotiate supported metric types and
encodings to ensure compatibility across implementations"?

4- Section 5.3:

4.1- The draft states: "It has to be determined at what interval or based on
what events such information needs to be distributed." It is unclear who is
responsible for making this determination.

4.2- The draft states: "thanks to the comprehensive load...". It is not very
clear to me.

4.3-  The draft states: "While existing routing protocols may serve as a
baseline for signaling metrics, other means to convey the metrics can equally
be considered and even be realized." It may be helpful to briefly mention some
examples of alternative dissemination mechanisms, and to clarify the scenarios
where such alternatives may be more appropriate than routing protocols.
Likewise, it would be useful to include examples of situations where routing
protocols are suitable for metric dissemination,

4.4- In "R11: MUST declare the entity that collect metrics.", what about
rephrasing to "R11: MUST specify which entity is responsible for collecting
metrics."?

4.5- In "R14: MUST be clear of the update frequency of CATS metrics and its
corresponding distribution method." what about to rephrasing to something like:
"R14: MUST specify the update frequency of CATS metrics and its corresponding
distribution method."

4.6- The draft states: "Sometimes, a metric that is chosen is not accurate for
service instance selection, in such case, a desirable system..." It would be
helpful to include a reference or guidance on how metric accuracy is defined in
this context, and how it can be measured or evaluated.

5- Section 5.4:

5.1- The draft states: "The decision logic of the instance selection are
subject to the normal packet level communication..." What is normal packet
level?

5.2- The draft states: "...the access point might change and successively lead
to the same result of the change of service instance..." "successively lead to
the same result of the change of" it is hard to follow, kindly rephrase it.

5.3- The draft states: "If execution changes from one (e.g., virtualized)
service instance to another, state/context needs transfer to another." "needs
transfer to another" --> "needs to be transferred to the new instance" ?

5.4- In "R16: Instance affinity MUST be maintained when the transaction is
stateful" State may persist not only within a single transaction but across a
session involving multiple transactions. Therefore, the requirement should
possibly refer to both stateful sessions and transactions. Then, what about
something like: "R16: CATS systems MUST maintain instance affinity for stateful
sessions or transactions"?

5.5- In "R17: Instance affinity MUST be maintained for service requests or
transactions that belong to the same flow." The term "flow" is ambiguous in
this context. It is unclear whether it refers to: A transport-layer flow, or an
application-layer flow, such as a session, or user interaction. Kindly add a
definition in draft-ietf-cats-framework, and reference it here, if applicable.

5.6- In "R18: MUST avoid keeping fine runtime-state granularity in network
nodes for providing instance affinity. For example, as mentioned above,
maintaining per-flow states for a specific APP." What does "fine-granular"
mean? How fine is too fine?

5.7- In "R20: SHOULD support the UE and service instance mobility."
It is unclear whether "support" refers to session continuity, seamless
handover, detection only, or some other behavior. What about "R20: SHOULD
support service continuity in the presence of UE or service instance mobility."?

6- Section 5.5:

6.1- The draft states: "Exposing the information of computing resources to the
network may lead to the leakage of computing domain and application privacy."
"the information of computing resources" it is a bit awkward, and unclear what
"computing domain privacy" refers to. Kindly add a definition in
draft-ietf-cats-framework, and reference it here, if applicable.

6.2- The draft states: "In order to prevent it, it need to consider the methods
to process the sensitive information related to computing domain." "it need to
consider" --> "it is necessary to consider"; "to computing domain"-->"to the
computing domain" ?

6.3- The draft states: "At the same time, when anonymity is achieved, it is
also necessary to consider whether the computing information exposed in the
network can help make full use of traffic steering" "help make full use of" is
ambiguous, What about: "At the same time, when anonymity is achieved, it is
important to ensure that the exposed computing information remains sufficient
to enable effective traffic steering."?

7- Section 6

7.1- The draft states: "Some security issues need to be considered..." -->
"Security issues need to be considered..."?

7.2- In "R22: service data MUST be protected from interception.", "service
data" is undefined, is it application-layer data, control plane data, or
metrics?

7.3- In "R23: the nature of user's activities SHOULD be hidden as much as
possible." "hidden as much as possible" is vague and not measurable. What about
something like "R23: The nature of a user activities SHOULD be protected to
preserve user privacy, including minimizing the exposure of identifying or
behavioral patterns." ?

7.4- In "R24: secure advertisements are REQUIRED to prevent rogue nodes from
participating in the network" Secure how? Authenticated? Encrypted?

7.5- In "R25: When making service decisions, the security status of computing
resources SHOULD be taken into consideration." what does "security status"
mean? Is it trust level? Threat score?

8. Nits:

8.1- Define PE in the figure 1.

8.2- "specificly" --> "specifically"

8.3- "Qualty of Service" --> "Quality of Service"

8.4- "anonymous methods" --> "anonymization methods"

8.5- "data of services maybe stolen," --> "...data of services may be
stolen,..."?

8.6- "round trip network delay(network), which should be bounded to
20-1.5-5.5-7.9 = 5.1ms." --> Round-trip network delay: The remaining latency
budget is 5.1 ms, calculated as 20 - 1.5 -7.9 - 5.5 = 5.1 ms.?

8.7- "Each instance provides equivalent service functionality to their
respective clients." --> "Each instance provides equivalent service
functionality to its respective clients."

Thanks for this document,

Ines.