CATS Agenda for IETF 118 (Prague and Online) Friday 10 November 2023 Room: Congress Hall 3 09:30 - 11:30 Prague See Materials: https://datatracker.ietf.org/meeting/118/session/cats Note taking: https://notes.ietf.org/notes-ietf-118-cats Meetecho: https://meetings.conf.meetecho.com/ietf118/?group=cats&short=cats&item=1 Zulip https://zulip.ietf.org/#narrow/stream/cats Note takers: Cheng Li, Adrian Farrel, Daniel King \#1 09:30 5mins Title: Intro, WG Status Presenter: Chairs \[Peng\] Welcome to the third CATS meeting. All drafts are welcome. Please submit the drafts before the meeting and discuss them on the mailing list. It is premature for solutions drafts (in other WGs) to cite CATS as motivation. Working Group Status: Milestone #1 Adopting Use Case draft has been achieved. Total Attendees: 87(10:14) 95(10:48) 97(10:56) \#2 09:35 65mins Title: Use cases and requirements \#2a AI4ME and BBC CATS use cases 30mins Presenter: Rajiv Ramdhany \[Tom Hill\] The network on slide 9 looks a lot like the one I run. Are you talking to BT? Contact me if you would like to continue the discussion. \[Rajiv\] Yes, we are talking to BT (at least Andy Gower), about object based media preperation or content preperation side. \[Tom Hill\] How do you see metric sharing, what kind of metric sharing are you experting to have? Do you see it as over the top or out of band, or do you expect some change to Internet protocols? \[Rajiv\] Right now, the answer is we do not know. We have multiple type of deployments now, and we are trying to figure out what architecture and metric collecting is good for us. It could be inband, and it could be CDN-like solutions. \[Daniel King\] The architecture is fluid, we expect changes. We didn't want to tlk about what the underlay looks like. We want to see if CATS sees commonality of required metrics and job scheduler, can we reuse/develop together? We see opportunity for working with CATS on how to expose service instance information from the provider to the application orchestrator. \[Daniel Huang\] There are two scenarios: Service Deployment and traffic steering. Regarding Service deployment, some metrics may be collected but they will not be dymanic, and there is not traffic steering. Regarding Traffic steering, the metric will be collected from cloud side and network side, and the metrics are far more dynamic than Service Deployment. I am not really sure that we can find some common metrics for these two scenarios. \[Rajiv\] In order to provide suitable quality to end users to ensure the user experience, we also monitor the experience of end users in real time, and we need to collect the network metrics like latency, jitter, bandwidth among service instances. We try to differentiate the metrics in different phrases. Also adapt the content using quality ladders. \[Cheng\] Many thanks to the presentation, this is an interesting use case. Hope to have more discussion in the mail list and see if we can have this use case into the use case draft, because this seems to be a use case that might CATS can be used in a short term. \[Rajiv\] Happy to do that. \[Weiqiang\] From the architecture, the network nodes seem to not need to be aware of the compute metrics. Right? But the orchestrator does need to know it. \[Rajiv\] You might need the orchestrator to do the work, and you might need a SDN controller to do the job. But we haven't explored that deeply. \[Adrian\] Thank you for the presentation, hope CATS can work for your requirements. Comments from Meetecho Chat during the AI4ME presentation. \[Joel Halpern\] While the presentation describes some really interesting problems, I am having trouble understanding the relationship to CATS. While there is some commonality of metrics, service provisioning is not within scope of CATS as I understand it. \[Daniel King\] The metrics is where we see overlap. \[Adrian Farrel\] The job scheduler in that figure might be overlap. \[Daniel King\] We are looking at framework for AI4ME and seeing if some of the functional components defined by CATS could also be reused. \[Joel Halpern\] @Daniel, there are metrics that overlap. I hav eno problem making sure that the metrics are defined only once. I even have no problem with other entitites subscribing to the metric distribution. But most of the metrics (like how much memory on the server is actually in use, for deciding where to deploy new isntances) are distinctly outside CATS scope. I know that this also relates to the next presentaiton. @Adrian, as far as I can tell, no, the job scheduler is not within our scope. THe result of the jbo scheduler is instances that are within CATS scope. \[Daniel King\] The BBC has its own network, but it also interconnects with BT in multi-locations. how we get CATS instance info could be a challange. \[Joel Halpern\] Indeed, getting CATS-needed metrics across operational boundaries is an itneresting problem that does seem to fall within our scope. \[Jim Guichard\] @joel from how I understand it CATS is essentially looking at the problem of how to use compute metrics (whatever they may be) and network metrics to decide how to steer traffic through a selected set. \[Joel Halpern\] Trying to think where there may be overlap, it does occur to me that there is one quasi-metric I have not seen referenced that may be in the intersection. An indication of "please stop sending new sessions to instance A", presumably as a step towards removing that isntance. @Jim, what has become clear is that there arelevant compute metrics for steering, and an overlapping but largely distinct set of metrics that are important for compute instance placement. \[Daniel King\] @Joel \[Re: composite metric\] Oh, I like that. \[Joel Halpern\] (For folks who seem to be wondering why I am harping on this; in my experience WGs that solve their chartered problem do MUCH better than WGs that take on all the neighboring problems.) \#2b Compute-Aware Metrics Working with ALTO Presenter: Jordi Ros Giralt 10mins Draft: https://datatracker.ietf.org/doc/draft-rcr-opsawg-operational-compute-metrics/ \[Adrian\] Three questions for the room (unscientific) 1. Who in the room would review compute metrics drafts? > > Maybe 30 hands 2. Who in the room would work hard to produce text for compute metrics drafts? > > Maybe 20 hands 3. Who in the room is so interested in compute metrics that they would persuade their employers to give up time and travel budget to attend a meeting to discuss them? > > No hands. \[Hang Shi\] You mentioned two phases: deployment phase and traffic steering phase. Do you think they will use unified metric model or different ones? \[Jordi\] Some metrics may be different, and some may be common. They need to be in an agreement. We need to have a common language between these two phases. \[Peng Liu\] One point. It mentioned a common set of metrics may be different in different use cases. But we need people to discuss and reach the consensus of a common set of metrics. Different use case may use a sub-set of this common set, and they may be different between use cases. \[Jordi\] Yes, I think so. \[Adrian\] I was getting depressed on the metric discussion in the WG until recently we saw some discussions in the mailing list. Let's do more and move faster. \#2c CATS Problem Statement, Use Cases, and Requirements Presenter: Kehan Yao(Qing An) 15mins Draft: https://datatracker.ietf.org/doc/draft-ietf-cats-usecases-requirements/ Draft: https://datatracker.ietf.org/doc/draft-an-cats-usecase-ai/ \[Kehan\] Will merge the use case of Computing-Aware AI model after this meeting. Will update the use case and requirements, keep the discussion on the requirement of the metircs definition. Update security requirements and considerations. \[Jim Guichard as individual\] From the WG pespective, we do have some high level discussion of compute metrics and deciding which service instance to use, but we do not have some specific discussion of service identifiers. For example, how to map the service ID to a specific service instance in the real network, and what's the excat metrics for this service. We had some discussions in SFC WG but haven't solved it yet. \[Adrian\] Yes. The problem is even worse for CATS. For CATS little clusters of packets go to a specific service instance, but in SFC all packets in a flow will be forwarded to the same service instance., but in CATS, they might be forwarded to different service instances. Problem should be solved. \[Joel Halpern\] You had a requirement for "default interprettion of metrics" that does not match our agreement. Hope to revise it. How to use the metric to steer the traffic is a local matter so saying how a metric will be used is out of scope. Should add a note to tell people who would like to define new metrics that if a node does not support an unknown metric, the node will ignore it. \[Daniel Huang\] Two comments for requirements part. 1. Talking about traffic steering, two cretertia, one is the metrics what we have, another one is the requirements from the service (latency, b/w etc) for E2E. I do not see the requirements from that document. 2. About discovering services, it will be better to change the text to say mapping a service ID to service instance, instead of mapping a service ID to an IP address, becase a service ID might not be an IP address. \[Kehan\] We had some discussions already about first comment, we hope people can focus on the discussion in the mailing list, and we can move forward better. \#2d Problem Statement and Requirements of end-to-end CATS 10mins Presenter: Yuan Dongyu Draft: https://datatracker.ietf.org/doc/draft-yuan-cats-end-to-end-problem-requirement/ \[Adrian speaking as contributor, not chair\] I think you make some good points. Please go and look at the most recent version of the ldbc framework draft. In San Francisco, comments made led to an update that introduced the Service Contact Instance that provides aggregation of Service Instances. I think this is probably the function you want to see. \[Cheng\] Similar comment. The requirements are similar to the requirements in the existing requirement draft, hope to have more cooperation between two drafts, and see we if can merge the text into the existing gap analysis and requirement draft. Invitation to work with the co-authors. \[Yuan\] We would like to have some cooperation. \[Xing Zhao From chat window\] What is the difference between this draft and the existing requirement draft? hope to merge them. \[Adrian In chat window\] We will have a single requirement draft. but it is NOT unreasonable to propose new ideas by submitting new drafts. But they can merge into the requirements draft as you suggested. \#3 10:40 20mins Title: Framework and architecture \#3a Hybird Computing and Network Awareness and Routing Solution for CATS Presenter: Xinxin Yi 10mins Draft: https://datatracker.ietf.org/doc/draft-yi-cats-hybrid-solution/ \[Xinxin\] This is my first presentation at the IETF. Please forgive my nervousness. \[Adrian\] That went OK, Xinxin. No need to be nervous! \[Adrian\] Please read the discussion in the chat for the meeting. \[Adrian\] I think it is fine and good that people write up their ideas in new Internet-Drafts for discussion. But please do not expect that all parallel drafts wil advance separately. Your long-term objective should be to work out how your ideas can be merged into the existing WG draft and the framework draft. A merger will not take 100% of the the text from both drafts. So please look at those drafts and suggest how some of your ideas can be included as text in the other drafts. This applies to oher people posting drafts as well. \[Dongyu Yuan\] For delay sensitive service and application, the distributed architecture is suggested, and for resource allocation, the centralized architecture is suggested, is that right? \[Xinxin\] Yes. \[Dongyu\] For different service flows, I'd like to see a unified architecture for devices, orchestrators, users that share similar functionality for different service flows. Maybe this could be next steps for the work. \[Xinxin\] Centralised system can include network controller. \[Peng\] Please move the discussion to mailing list. \#3b Computing and Network Information Awareness (CNIA) system architecture for CATS Presenter: Daniel Huang 10mins Draft: https://datatracker.ietf.org/doc/draft-yao-cats-awareness-architecture/ \[Daniel Huang\] We do not intend to push the draft alone, instead, we could like to add some components into the existing draft. The idea in this draft is complementary with the current architecture draft, and would like to discuss with the authors of the framework draft to see what can be added into that draft. Agree with Adrian's comment on cooperating between drafts. \[Peng Liu\] I see most some overlap of the authors between the drafts, so please go ahead to discuss and try to merge the drafts. \[Julien Maisonneuve From chat window\] I support Adrian's view: there are far too many drafts to properly focus. Ultimately we should strctly target the WG's milestones. \#4 11:00 10mins Title: Gap analysis Presenter: Kehan Yao Draft: https://datatracker.ietf.org/doc/draft-yao-cats-gap-analysis/ \[Adrian\] This is charter work, but can be a background work as the understanding of use cases and requirements develops. Please try to move your requirements into the WG draft. Then you can just refer to that draft (with requirement number). Do not introduce new requirements as a surprise in this document. \[Julien\] When we create some requirements, we should motivate them in some forms. Like, linking it to specific use cases (where it comes from). Currently, these requirements don't have background. \[Julien\] Some solutions here are unfairly critized. E.g., hyperscalers use LB in the Internet and it works quite well. Should be careful to critize a solution and provide detailed evidences. Single point of failiure of LB is not proved. When talking about DNS, what you are presenting is not quite DNS, it is a system that you have to based on some DNS aspects (using the DNS protocol in some way). And then you describe the shortcoming of your solution, but not DNS. Similarly for ALTO you describe a way to use ALTO and then describe the shortcomings of the way you use ALTO. Please provide precise argument to prove the analysis. We can build some solutions based on DNS and ALTO to avoid the shortcoming described in the document. Please try to focus on the point that is problematic. \[Luis\] Like Julien's comments, you said the shortcoming of ALTO is the signalling latency, and we should work on it with more details to prove the conclusion. \#5 11:10 20mins Title: Open Discussion and Next Steps Chairs and ALL \[Adrian\] 3 things. 1. Draft merging, if you meet some challenges, please let Chairs know, and we can help. 2. Slide format error, because your ppt is translated to pdf. Please check your slides after they have been uploaded. 3. Two liaisons from ITU-T very recently. The chairs plan to look at these soon. Please feel free to comment on the list if you have any input. \[Julien\] I have concern about the way the WG is working. The list has been very quiet except for the last two weeks. I wish any work was done in the open and we saw more traffic on the mail list. Too many drafts - we have two milestones, why do we have so many drafts? There is a need to work together and this working together needs to be done on the list and not behind closed doors. \[Adrian\]See you next time.