Skip to main content

Minutes IETF118: cats: Fri 08:30
minutes-118-cats-202311100830-00

Meeting Minutes Computing-Aware Traffic Steering (cats) WG
Date and time 2023-11-10 08:30
Title Minutes IETF118: cats: Fri 08:30
State Active
Other versions markdown
Last updated 2023-11-15

minutes-118-cats-202311100830-00

CATS Agenda for IETF 118 (Prague and Online)
Friday 10 November 2023
Room: Congress Hall 3
09:30 - 11:30 Prague

See Materials: https://datatracker.ietf.org/meeting/118/session/cats
Note taking: https://notes.ietf.org/notes-ietf-118-cats
Meetecho:
https://meetings.conf.meetecho.com/ietf118/?group=cats&short=cats&item=1

Zulip https://zulip.ietf.org/#narrow/stream/cats

Note takers: Cheng Li, Adrian Farrel, Daniel King

#1 09:30 5mins Title: Intro, WG Status
Presenter: Chairs
[Peng] Welcome to the third CATS meeting. All drafts are welcome.
Please submit the drafts before the meeting and discuss them on the
mailing list. It is premature for solutions drafts (in other WGs) to
cite CATS as motivation.

Working Group Status:
Milestone #1 Adopting Use Case draft has been achieved.

Total Attendees: 87(10:14) 95(10:48) 97(10:56)

#2 09:35 65mins Title: Use cases and requirements

#2a AI4ME and BBC CATS use cases 30mins
Presenter: Rajiv Ramdhany

[Tom Hill] The network on slide 9 looks a lot like the one I run. Are
you talking to BT? Contact me if you would like to continue the
discussion.
[Rajiv] Yes, we are talking to BT (at least Andy Gower), about object
based media preperation or content preperation side.
[Tom Hill] How do you see metric sharing, what kind of metric sharing
are you experting to have? Do you see it as over the top or out of band,
or do you expect some change to Internet protocols?
[Rajiv] Right now, the answer is we do not know. We have multiple type
of deployments now, and we are trying to figure out what architecture
and metric collecting is good for us. It could be inband, and it could
be CDN-like solutions.
[Daniel King] The architecture is fluid, we expect changes. We didn't
want to tlk about what the underlay looks like. We want to see if CATS
sees commonality of required metrics and job scheduler, can we
reuse/develop together? We see opportunity for working with CATS on how
to expose service instance information from the provider to the
application orchestrator.

[Daniel Huang] There are two scenarios: Service Deployment and traffic
steering. Regarding Service deployment, some metrics may be collected
but they will not be dymanic, and there is not traffic steering.
Regarding Traffic steering, the metric will be collected from cloud side
and network side, and the metrics are far more dynamic than Service
Deployment. I am not really sure that we can find some common metrics
for these two scenarios.
[Rajiv] In order to provide suitable quality to end users to ensure
the user experience, we also monitor the experience of end users in real
time, and we need to collect the network metrics like latency, jitter,
bandwidth among service instances. We try to differentiate the metrics
in different phrases. Also adapt the content using quality ladders.

[Cheng] Many thanks to the presentation, this is an interesting use
case. Hope to have more discussion in the mail list and see if we can
have this use case into the use case draft, because this seems to be a
use case that might CATS can be used in a short term.
[Rajiv] Happy to do that.

[Weiqiang] From the architecture, the network nodes seem to not need
to be aware of the compute metrics. Right? But the orchestrator does
need to know it.
[Rajiv] You might need the orchestrator to do the work, and you might
need a SDN controller to do the job. But we haven't explored that
deeply.

[Adrian] Thank you for the presentation, hope CATS can work for your
requirements.

Comments from Meetecho Chat during the AI4ME presentation.

[Joel Halpern] While the presentation describes some really
interesting problems, I am having trouble understanding the relationship
to CATS. While there is some commonality of metrics, service
provisioning is not within scope of CATS as I understand it.
[Daniel King] The metrics is where we see overlap.
[Adrian Farrel] The job scheduler in that figure might be overlap.
[Daniel King] We are looking at framework for AI4ME and seeing if some
of the functional components defined by CATS could also be reused.
[Joel Halpern] @Daniel, there are metrics that overlap. I hav eno
problem making sure that the metrics are defined only once. I even have
no problem with other entitites subscribing to the metric distribution.
But most of the metrics (like how much memory on the server is actually
in use, for deciding where to deploy new isntances) are distinctly
outside CATS scope. I know that this also relates to the next
presentaiton.
@Adrian, as far as I can tell, no, the job scheduler is not within our
scope. THe result of the jbo scheduler is instances that are within CATS
scope.
[Daniel King] The BBC has its own network, but it also interconnects
with BT in multi-locations. how we get CATS instance info could be a
challange.
[Joel Halpern] Indeed, getting CATS-needed metrics across operational
boundaries is an itneresting problem that does seem to fall within our
scope.
[Jim Guichard] @joel from how I understand it CATS is essentially
looking at the problem of how to use compute metrics (whatever they may
be) and network metrics to decide how to steer traffic through a
selected set.
[Joel Halpern] Trying to think where there may be overlap, it does
occur to me that there is one quasi-metric I have not seen referenced
that may be in the intersection. An indication of "please stop sending
new sessions to instance A", presumably as a step towards removing that
isntance.
@Jim, what has become clear is that there arelevant compute metrics for
steering, and an overlapping but largely distinct set of metrics that
are important for compute instance placement.
[Daniel King] @Joel [Re: composite metric] Oh, I like that.
[Joel Halpern] (For folks who seem to be wondering why I am harping on
this; in my experience WGs that solve their chartered problem do MUCH
better than WGs that take on all the neighboring problems.)

#2b Compute-Aware Metrics Working with ALTO
Presenter: Jordi Ros Giralt 10mins
Draft:
https://datatracker.ietf.org/doc/draft-rcr-opsawg-operational-compute-metrics/

[Adrian] Three questions for the room (unscientific)

  1. Who in the room would review compute metrics drafts?
    > > Maybe 30 hands

  2. Who in the room would work hard to produce text for compute metrics
    drafts?
    > > Maybe 20 hands

  3. Who in the room is so interested in compute metrics that they would
    persuade their employers to give up time and travel budget to attend
    a meeting to discuss them?
    > > No hands.

[Hang Shi] You mentioned two phases: deployment phase and traffic
steering phase. Do you think they will use unified metric model or
different ones?
[Jordi] Some metrics may be different, and some may be common. They
need to be in an agreement. We need to have a common language between
these two phases.

[Peng Liu] One point. It mentioned a common set of metrics may be
different in different use cases. But we need people to discuss and
reach the consensus of a common set of metrics. Different use case may
use a sub-set of this common set, and they may be different between use
cases.
[Jordi] Yes, I think so.

[Adrian] I was getting depressed on the metric discussion in the WG
until recently we saw some discussions in the mailing list. Let's do
more and move faster.

#2c CATS Problem Statement, Use Cases, and Requirements
Presenter: Kehan Yao(Qing An) 15mins
Draft:
https://datatracker.ietf.org/doc/draft-ietf-cats-usecases-requirements/

Draft: https://datatracker.ietf.org/doc/draft-an-cats-usecase-ai/

[Kehan] Will merge the use case of Computing-Aware AI model after this
meeting. Will update the use case and requirements, keep the discussion
on the requirement of the metircs definition. Update security
requirements and considerations.

[Jim Guichard as individual] From the WG pespective, we do have some
high level discussion of compute metrics and deciding which service
instance to use, but we do not have some specific discussion of service
identifiers. For example, how to map the service ID to a specific
service instance in the real network, and what's the excat metrics for
this service. We had some discussions in SFC WG but haven't solved it
yet.
[Adrian] Yes. The problem is even worse for CATS. For CATS little
clusters of packets go to a specific service instance, but in SFC all
packets in a flow will be forwarded to the same service instance., but
in CATS, they might be forwarded to different service instances. Problem
should be solved.

[Joel Halpern] You had a requirement for "default interprettion of
metrics" that does not match our agreement. Hope to revise it. How to
use the metric to steer the traffic is a local matter so saying how a
metric will be used is out of scope. Should add a note to tell people
who would like to define new metrics that if a node does not support an
unknown metric, the node will ignore it.

[Daniel Huang] Two comments for requirements part. 1. Talking about
traffic steering, two cretertia, one is the metrics what we have,
another one is the requirements from the service (latency, b/w etc) for
E2E. I do not see the requirements from that document. 2. About
discovering services, it will be better to change the text to say
mapping a service ID to service instance, instead of mapping a service
ID to an IP address, becase a service ID might not be an IP address.
[Kehan] We had some discussions already about first comment, we hope
people can focus on the discussion in the mailing list, and we can move
forward better.

#2d Problem Statement and Requirements of end-to-end CATS 10mins
Presenter: Yuan Dongyu
Draft:
https://datatracker.ietf.org/doc/draft-yuan-cats-end-to-end-problem-requirement/

[Adrian speaking as contributor, not chair] I think you make some good
points. Please go and look at the most recent version of the ldbc
framework draft. In San Francisco, comments made led to an update that
introduced the Service Contact Instance that provides aggregation of
Service Instances. I think this is probably the function you want to
see.

[Cheng] Similar comment. The requirements are similar to the
requirements in the existing requirement draft, hope to have more
cooperation between two drafts, and see we if can merge the text into
the existing gap analysis and requirement draft. Invitation to work with
the co-authors.
[Yuan] We would like to have some cooperation.

[Xing Zhao From chat window] What is the difference between this draft
and the existing requirement draft? hope to merge them.
[Adrian In chat window] We will have a single requirement draft. but
it is NOT unreasonable to propose new ideas by submitting new drafts.
But they can merge into the requirements draft as you suggested.

#3 10:40 20mins Title: Framework and architecture

#3a Hybird Computing and Network Awareness and Routing Solution for CATS

Presenter: Xinxin Yi 10mins
Draft: https://datatracker.ietf.org/doc/draft-yi-cats-hybrid-solution/

[Xinxin] This is my first presentation at the IETF. Please forgive my
nervousness.
[Adrian] That went OK, Xinxin. No need to be nervous!
[Adrian] Please read the discussion in the chat for the meeting.

[Adrian] I think it is fine and good that people write up their ideas
in new Internet-Drafts for discussion. But please do not expect that all
parallel drafts wil advance separately. Your long-term objective should
be to work out how your ideas can be merged into the existing WG draft
and the framework draft. A merger will not take 100% of the the text
from both drafts. So please look at those drafts and suggest how some of
your ideas can be included as text in the other drafts. This applies to
oher people posting drafts as well.

[Dongyu Yuan] For delay sensitive service and application, the
distributed architecture is suggested, and for resource allocation, the
centralized architecture is suggested, is that right?
[Xinxin] Yes.
[Dongyu] For different service flows, I'd like to see a unified
architecture for devices, orchestrators, users that share similar
functionality for different service flows. Maybe this could be next
steps for the work.
[Xinxin] Centralised system can include network controller.
[Peng] Please move the discussion to mailing list.

#3b Computing and Network Information Awareness (CNIA) system
architecture for CATS
Presenter: Daniel Huang 10mins
Draft:
https://datatracker.ietf.org/doc/draft-yao-cats-awareness-architecture/

[Daniel Huang] We do not intend to push the draft alone, instead, we
could like to add some components into the existing draft. The idea in
this draft is complementary with the current architecture draft, and
would like to discuss with the authors of the framework draft to see
what can be added into that draft. Agree with Adrian's comment on
cooperating between drafts.

[Peng Liu] I see most some overlap of the authors between the drafts,
so please go ahead to discuss and try to merge the drafts.

[Julien Maisonneuve From chat window] I support Adrian's view: there
are far too many drafts to properly focus. Ultimately we should strctly
target the WG's milestones.

#4 11:00 10mins Title: Gap analysis
Presenter: Kehan Yao
Draft: https://datatracker.ietf.org/doc/draft-yao-cats-gap-analysis/

[Adrian] This is charter work, but can be a background work as the
understanding of use cases and requirements develops. Please try to move
your requirements into the WG draft. Then you can just refer to that
draft (with requirement number). Do not introduce new requirements as a
surprise in this document.

[Julien] When we create some requirements, we should motivate them in
some forms. Like, linking it to specific use cases (where it comes
from). Currently, these requirements don't have background.
[Julien] Some solutions here are unfairly critized. E.g., hyperscalers
use LB in the Internet and it works quite well. Should be careful to
critize a solution and provide detailed evidences. Single point of
failiure of LB is not proved. When talking about DNS, what you are
presenting is not quite DNS, it is a system that you have to based on
some DNS aspects (using the DNS protocol in some way). And then you
describe the shortcoming of your solution, but not DNS. Similarly for
ALTO you describe a way to use ALTO and then describe the shortcomings
of the way you use ALTO. Please provide precise argument to prove the
analysis. We can build some solutions based on DNS and ALTO to avoid the
shortcoming described in the document. Please try to focus on the point
that is problematic.

[Luis] Like Julien's comments, you said the shortcoming of ALTO is the
signalling latency, and we should work on it with more details to prove
the conclusion.

#5 11:10 20mins Title: Open Discussion and Next Steps
Chairs and ALL

[Adrian] 3 things.

  1. Draft merging, if you meet some challenges, please let Chairs know,
    and we can help.
  2. Slide format error, because your ppt is translated to pdf. Please
    check your slides after they have been uploaded.
  3. Two liaisons from ITU-T very recently. The chairs plan to look at
    these soon. Please feel free to comment on the list if you have any
    input.

[Julien] I have concern about the way the WG is working. The list has
been very quiet except for the last two weeks. I wish any work was done
in the open and we saw more traffic on the mail list. Too many drafts -
we have two milestones, why do we have so many drafts? There is a need
to work together and this working together needs to be done on the list
and not behind closed doors.

[Adrian]See you next time.