Skip to main content

IETF Last Call Review of draft-ietf-cats-usecases-requirements-10
review-ietf-cats-usecases-requirements-10-tsvart-lc-sarker-2025-12-17-00

Request Review of draft-ietf-cats-usecases-requirements
Requested revision No specific revision (document currently at 14)
Type IETF Last Call Review
Team Transport Area Review Team (tsvart)
Deadline 2025-12-17
Requested 2025-12-03
Authors Kehan Yao , Luis M. Contreras , Hang Shi , Shuai Zhang , Qing An
I-D last updated 2026-05-20 (Latest revision 2026-02-02)
Completed reviews Rtgdir Early review of -07 by Ines Robles (diff)
Tsvart IETF Last Call review of -10 by Zaheduzzaman Sarker (diff)
Dnsdir IETF Last Call review of -10 by Jim Reid (diff)
Genart IETF Last Call review of -10 by Roni Even (diff)
Artart IETF Last Call review of -10 by Tim Bray (diff)
Secdir IETF Last Call review of -11 by Daniel Migault (diff)
Rtgdir IETF Last Call review of -10 by Linda Dunbar (diff)
Opsdir IETF Last Call review of -12 by Samier Barguil (diff)
Artart Telechat review of -12 by Tim Bray (diff)
Tsvart Telechat review of -12 by Zaheduzzaman Sarker (diff)
Assignment Reviewer Zaheduzzaman Sarker
State Completed
Request IETF Last Call review on draft-ietf-cats-usecases-requirements by Transport Area Review Team Assigned
Posted at https://mailarchive.ietf.org/arch/msg/tsv-art/CYYBVJSyrupO0YWRE407avzXj8c
Reviewed revision 10 (document currently at 14)
Result Not ready
Completed 2025-12-17
review-ietf-cats-usecases-requirements-10-tsvart-lc-sarker-2025-12-17-00
This document has been reviewed as part of the transport area review team's
ongoing effort to review key IETF documents. These comments were written
primarily for the transport area directors, but are copied to the document's
authors and WG to allow them to address any issues raised and also to the IETF
discussion list for information.

When done at the time of IETF Last Call, the authors should consider this
review as part of the last-call comments they receive. Please always CC
tsv-art@ietf.org if you reply to or forward this review.

I didn't find any particual issues related to transport protocols. However,
this document made me wonder about number of things hence I am not sure I
understood the context and requirements properly. With better understanding my
view on transport protocol issues might change.

As I had to read and understand this document. I encountered lots of issues
which I noted below. I believe addressing those would make this document more
understandbale. Form that point of view I don't think this document is ready to
be published.

By reading this document it seems CATS is trying to solve issues from
application layer to routing layer. However, it didn't give comprehensive hints
where it should converge. Lots of times it describes an application requirement
and tries to justify network responsibilities, which confuses me a number of
times. It does not provide a clear description about ingestion points at the
network and assumes the CATS service providers have full control of the service
instance deployment. # Introduction

     It says

              Offloading compute intensive processing to the user devices is
              not acceptable, since it would place pressure on local resources
              such as the battery and incur some data privacy issues if the
              needed data for computation is not provided locally.

     Is that even an option for CATS or other network resources deployments? If
     not, then why is it mentioned here?

     What is an "edge site"? What is the "edge of a network"? I haven't found
     any definition or description of those in this document to fully
     understand the meaning in the context.

     The introduction gives an impression that CATS is only for edge computing.
     Is that the intention?

# Definition of terms

      It is not very clear to me about the difference between "Network edge"
      and "provider network" that is mentioned in the CATS framework draft.
      What are the differences? and why is "Network edge" defined here and not
      used in the rest of the document?

      What kind of service identity are we talking about? Is this service
      identity that can be obtained by a TLS client? or an ALTO endpoint
      Identifier? or something else?

# Problem statement

       It says -
             , a number of representative cities have deployed multi-edge sites
             and the typical applications, and there are more edge sites to be
             deployed in the future.

       I find this unnecessary, specially when there is no provided information
       to verify this claim.

       Section 3.1 mentions one expired ALTO draft and ALTO protocol, but then
       it does not really say if ALTO helps picking up the best node in the
       network to reach then what is left to be fixed and what differentiate
       ALTO solution from CATS.

       As per charter CATS is for network nodes to pick the right place to
       deploy the services. But I fail to see that part in this problem
       statement.

       What is the difference between an "edge node" and "edge site"? and what
       are their relation with "service site", "service instance" defined in
       CATS framework? If they are supposed to mean the same, why aren't the
       CATS framework defined terms used here?

        It says -

           If the resources are insufficient to support new instances, the
           operator can be informed to increase the hardware resources.

        OK,fine. Does CATS do that? Do we need a protocol to inform the
        operator? Who tells them about the need to invest in hardware. Sorry,
        why are we talking about this here in the problem statement?

         We have a section in the problem statement that talks about
         multi-deployment of the edge sites and services, but then it ends
         saying "where to locate service instances and when to create new ones
         in order to provide the right levels of resource to support user
         demands" is out of scope of CATS. So, what are really the problems
         here?

         ## Section 3.2 says -

            Traffic is steered to an edge site that is "closest" or to one of a
            few "close" sites using load-balancing

         Who is steering this and is this load-balancing static? Is there
         support for mid-session steering and load-balancing? if these are
         dynamic and only done at the beginning of the client sessions then it
         should be already possible to pick the right edge site but not the
         closed one.

        "we assume" who are we? authors or the wg or the IETF?

        Please describe an "edge router".

        It says
              selection of one of candidate service instances is done using
              traffic steering methods, where the steering decision may take
              into account pre-planned policies (assignment of certain clients
              to certain service instances), realize shortest-path to the
              'closest' service instance, or utilize more complex and possibly
              dynamic metric information, such as load of service instances,
              latency experienced or similar, for a more dynamic selection of a
              suitable service instance.

         Why can't anycast routing be used here? or is that the idea here?

         It says

                 It is important to note that clients may move. This means that
                 the service instance that was "best" at one moment might no
                 longer be best when a new service request is issued.

        OK, will CATS solve the issue of mid-session mobility? The service
        instances will have states, those states need to be migrated to the new
        site so it is not a plug-n-play solution unless the service is
        completely stateless. Does this mean the services that CATS can
        entertain need to be stateless? This seems like a requirement to the
        service instances. I am asking this as afaik the big use cases written
        in the document AR or VR or vehicle that maintain lots of states in the
        servers and at the client and they even need stickiness. Just moving to
        a low load service instance might not be the ideal solution. So, what
        is the main problem CATS is going to solve in this context?

# Section 4.1

        It is not clear to me the meaning or "dynamically steer traffic". does
        it mean start of a service session or it mean mid-session steering or
        both? Can this be clarified?

# Section 4.2

        In the first paragraph, it describes the need to increase the
        transmission capacity, video processing and network bandwidth. I fail
        to see what is the relation toward CATS.

        It says -

            The notion of sending the request to the "nearest" edge node is
            important for being able to collate the video information of
            "nearby" vehicles, using relative location information among the
            vehicles. Furthermore, data privacy may lead to a requirement to
            process the data by an edge node (or an adjacent vehicle as a
            cluster node ) as close to the source as possible to limit the
            data's spread across many network components in the network

        So, here the video should not be processed anywhere else so it is kind
        of fixed with the "nearest edge node" policy. Do we need CATS for this?

        In the 3rd paragraph, it starts to talk about "closest" but it was
        discussing "nearest" in the previous paragraph. Do they mean the same
        thing? If yes, then please use the same terminologies.

        According to my understanding these scenarios can be satisfied by ALTO
        protocol as there the network provides information to the clients to
        pick the right service instance or even change. So, it is not clear to
        me why this is a CATS use case.

        Also I didn't find any discussion on moving speed, vehicles usually
        move fast, that means it is possible that by the time CATS realizes the
        compute or bandwidth is loaded the vehicle has moved to another
        basestation that might need a new PE/edge site all together. What are
        the considerations on the requirements regarding this?

# Section 4.3

        This section is clear about the use case of having decentralized
        storage, but previous use cases are not clear about how they relate to
        CATS.

# Requirements

       R1 : is this to inform the clients accessing the service instances? or
       is this for the CATS system to decide where to send a client request?
       Throughout the document so far the discussion on applications made me
       wonder this.
        What is "real-time system state"?

       R2 : see comments for R1, what is the periodicity for the "up-to-date
       status"? without proper understanding of that, it is hard to understand
       the requirement.

       R3 : are these service instances in the participating edges
       administered, implemented and deployed by several entities in the
       service provider's network? If not then why is this a requirement? Now,
       even if we agree on a certain metric, say "CPU load" and my instance has
       5 fully loaded CPU and 5 ideal ones, then my instance is 50% loaded,
       will it be an understandable metric for another instance about my
       situation?    R4 : Who are "we" again? and I am not sure I understand
       the requirements ( there are actually two requirements here ) . What are
       the requirements on the CATS system?     R5 : it seems like a
       requirement on the resource model not on the CATS.

       R6 : not clear at all. What is an "agent" here?

       R7 : not executable unless we have someone/something deciding on the
       usefulness. My resource model may be very useful to me, but can be
       garbage to anyone else. I picked up my last car just because it has the
       best sound system, my friend didn't find that useful at all.

       R8 : not sure this is a system requirement rather seems like a
       requirement of how the workgroup should decide if they ever create with
       metrics.

       R9 : And again we have more than one requirement here. and I simply
       don't get the reason for a SHOULD in the applicability for non-CATS
       network. By the way, is CATS a system or Network?

       R10 and R11 are good requirements as they are easily understandable and
       executable. However, I am not sure how the fast changing compute load
       can be handled to avoid path oscillation. How do they work together?

       R12 to R17, you can simply create two requirements from all of them -
       one for metric collection and one for metric distribution.

           This section uses "service provider" along with GPU utilizations, is
           this a CATS service provider or a cloud service provider, if it is
           CATS then I suggest to reference the CATS framework definition.

       R18 : it says
            the affinity to a particular service instance may span more than
            one request, as in the AR/VR use case, where the previous client
            input is needed to render subsequent frames.

          This is to me more stickiness than affinity. That means when there is
          strike stickiness the CATS system MUST not migrate the service
          instance, rather try to pick a site where the stickiness can be
          preserved. How does CATS know if the session/transaction is stateful?

       R19 : what is the difference between R18 and this one?

       R21 : is this a requirement on CATS or the application clients?

# Appendix A:

       This might be helpful to some, but I am not sure why they are kept here
       while the text says - It is a temporary and procedural section which
       might be deleted or merged in future updates.

# Appendix B :

        I find it strange that Appendix B yields some Normative requirements
        but this is not part of the main body. I would suggest either to remove
        the normative reference, or to move them into the main body of the
        document use case ( and recharter the working group to work on this as
        they are so interesting and important that they cannot be removed all
        together).