CATS Metrics Definition
draft-ysl-cats-metric-definition-02
This document is an Internet-Draft (I-D).
Anyone may submit an I-D to the IETF.
This I-D is not endorsed by the IETF and has no formal standing in the
IETF standards process.
The information below is for an old version of the document.
| Document | Type |
This is an older version of an Internet-Draft whose latest revision state is "Replaced".
|
|
|---|---|---|---|
| Authors | Kehan Yao , Hang Shi , Cheng Li , Luis M. Contreras , Jordi Ros-Giralt | ||
| Last updated | 2024-11-08 (Latest revision 2024-11-06) | ||
| Replaced by | draft-ietf-cats-metric-definition | ||
| RFC stream | Internet Engineering Task Force (IETF) | ||
| Formats | |||
| Additional resources | Mailing list discussion | ||
| Stream | WG state | Candidate for WG Adoption | |
| Document shepherd | (None) | ||
| IESG | IESG state | I-D Exists | |
| Consensus boilerplate | Unknown | ||
| Telechat date | (None) | ||
| Responsible AD | (None) | ||
| Send notices to | (None) |
draft-ysl-cats-metric-definition-02
Computing-Aware Traffic Steering Y. Kehan
Internet-Draft China Mobile
Intended status: Informational H. Shi
Expires: 10 May 2025 C. Li
Huawei Technologies
L. M. Contreras
Telefonica
J. Ros-Giralt
Qualcomm Europe, Inc.
6 November 2024
CATS Metrics Definition
draft-ysl-cats-metric-definition-02
Abstract
This document defines a set of computing metrics used for Computing-
Aware Traffic Steering(CATS).
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on 10 May 2025.
Copyright Notice
Copyright (c) 2024 IETF Trust and the persons identified as the
document authors. All rights reserved.
Kehan, et al. Expires 10 May 2025 [Page 1]
Internet-Draft CATS Metrics November 2024
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents (https://trustee.ietf.org/
license-info) in effect on the date of publication of this document.
Please review these documents carefully, as they describe your rights
and restrictions with respect to this document. Code Components
extracted from this document must include Revised BSD License text as
described in Section 4.e of the Trust Legal Provisions and are
provided without warranty as described in the Revised BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Conventions and Definitions . . . . . . . . . . . . . . . . . 3
3. Definition of Metrics . . . . . . . . . . . . . . . . . . . . 3
3.1. Level 0: Raw Metrics . . . . . . . . . . . . . . . . . . 4
3.2. Level 1: Normalized Metrics in Categories . . . . . . . . 5
3.3. Level 2: Fully Normalized Metric. . . . . . . . . . . . . 5
4. Representation of Metrics . . . . . . . . . . . . . . . . . . 6
4.1. Level 0 Metric Representation . . . . . . . . . . . . . . 7
4.1.1. Compute Raw Metrics . . . . . . . . . . . . . . . . . 7
4.1.2. Storage Raw Metrics . . . . . . . . . . . . . . . . . 7
4.1.3. Network Raw Metrics . . . . . . . . . . . . . . . . . 8
4.1.4. Delay Raw Metrics . . . . . . . . . . . . . . . . . . 8
4.1.5. Considerations on the Sources of Metrics and the
Statistics . . . . . . . . . . . . . . . . . . . . . 9
4.2. Level 1 Metric Representation . . . . . . . . . . . . . . 9
4.2.1. Normalized Compute Metrics . . . . . . . . . . . . . 9
4.2.2. Normalized Storage Metrics . . . . . . . . . . . . . 9
4.2.3. Normalized Network Metrics . . . . . . . . . . . . . 10
4.2.4. Normalized Delay . . . . . . . . . . . . . . . . . . 10
4.2.5. Considerations on the Sources of Metrics and the
Statistics . . . . . . . . . . . . . . . . . . . . . 10
4.3. Level 2 Metric Representation . . . . . . . . . . . . . . 11
5. Comparison of three layers of metric . . . . . . . . . . . . 11
6. Security Considerations . . . . . . . . . . . . . . . . . . . 12
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12
8. References . . . . . . . . . . . . . . . . . . . . . . . . . 12
8.1. Normative References . . . . . . . . . . . . . . . . . . 12
8.2. Informative References . . . . . . . . . . . . . . . . . 13
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 14
1. Introduction
Many computing services are deployed in a distributed way. In such
deployment mode, multiple service instances are deployed in multiple
service sites to provide equivalent service to end users. In order
to provide better service to end users, a framework called Computing-
Aware Traffic Steering(CATS) [I-D.ietf-cats-framework] is defined.
Kehan, et al. Expires 10 May 2025 [Page 2]
Internet-Draft CATS Metrics November 2024
CATS is a traffic engineering approach that takes into account the
dynamic nature of computing resources and network state to optimize
service-specific traffic forwarding towards a given service contact
instance[I-D.ietf-cats-framework]. Various metrics may be used to
enforce such computing-aware traffic steering policies.
To steer traffic to a service contact instance, CATS components(C-PS,
C-Forwarders, etc.) need information of the service instance's
computing status. In addition to network-related metrics, a common
definition of relevant computing metrics is essential for effective
coordination between network devices and compute instances.
Standardized metrics enable precise traffic steering decisions that
optimize resource utilization and improve overall system performance.
Various considerations for metric definition are proposed in
[I-D.du-cats-computing-modeling-description], which are useful in
defining computing metrics.
Based on the considerations defined in
[I-D.du-cats-computing-modeling-description], this document defines
relevant computing metrics for CATS by categorizing the metrics into
three levels based on their complexity and granularity details.
2. Conventions and Definitions
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in
BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
capitals, as shown here.
This document uses the following terms defined in
[I-D.ietf-cats-framework]:
* Computing-Aware Traffic Steering (CATS).
* Service.
* Service contact instance.
3. Definition of Metrics
Definition and usage of specific metrics are related to the intended
use case. However, when considering disseminating compute metrics to
network devices, appropriate categorization and abstraction of CATS
metrics is required in order to avoid introducing extra complexity
into the network.
Kehan, et al. Expires 10 May 2025 [Page 3]
Internet-Draft CATS Metrics November 2024
This document defines three abstract metric levels to meet different
requirements use cases listed in
[I-D.ietf-cats-usecases-requirements]:
* Level 0 (L0): Raw metrics. The metrics are not abstracted, so
different metrics use their own unit and format as used within a
compute orchestration domain.
* Level 1 (L1): Normalized metrics in categories. The metrics are
categorized into multiple dimensions, such as network, computing,
and storage. Each category metric is normalized into a value or a
set of values with a range of scores.
* Level 2 (L2): Fully normalized metric. Metrics are normalized
into a single value. The category information or raw metrics
information cannot be interpreted from the value directly.
3.1. Level 0: Raw Metrics
Level 0 metrics encompass detailed, raw metrics, including but not
limit to:
* CPU: Base Frequency, Number of Cores, Boosted Frequency, Memory
Bandwidth, Memory Size, Utilization Ratio, Core Utilization Ratio,
Power Consumption.
* GPU: Frequency, Number of Render Unit, Memory Bandwidth, Memory
Size, Memory Utilization Ratio, Core Utilization Ratio, Power
Consumption.
* NPU: Computing Power, Utlization Ratio, Power Consumption.
* Network: Bandwidth, Capacity, Throughput, TXBytes, RXBytes,
HostBusUtilization.
* Storage: Available Space, Read Speed, Write Speed.
* Delay: Time taken to process a request.
Detailed information of a metric in L0 can be encoded into
Application Programming Interface(API)(e.g., Restful API), and
different services have their own metrics with different information
elements. L0 metrics are used widely in IT systems.
Kehan, et al. Expires 10 May 2025 [Page 4]
Internet-Draft CATS Metrics November 2024
Regarding network related raw metrics, IPPM WG has defined many types
of metrics in [performance-metrics]. [RFC9439] also defines many
metrics of packet performance and Throughput/Bandwidth. Regarding
computing metrics, [I-D.rcr-opsawg-operational-compute-metrics] lists
a set of cloud resource metrics.
3.2. Level 1: Normalized Metrics in Categories
The metrics in L1 are categorized into different categories, and
abstraction will be applied to each category. L0 raw metrics can be
classified into multiple categories, such as computing, networking,
storage, and delay. In each category, the metrics are normalized
into a value that present the state of a resource. Potential
categories are shown below:
* Computing: A normalized value generating from the computing
related L0 metrics, such as CPU/GPU/NPU L0 metrics
* Networking: A normalized value generating from the network related
L0 metrics.
* Storage: A normalized value generating from the storage L0
metrics.
* Delay: A normalized value generating from computing/networking/
storage metrics, reflecting the processing delay of a request.
Editor note: detailed categories can be updated according to the CATS
WG discussion.
The L0 metrics, such as the ones defined in [performance-metrics]
,[RFC9439] and [I-D.rcr-opsawg-operational-compute-metrics] can be
categorized into above categories. Each category will use its own
method(weighted summary, etc.) to generate the normalized value. In
this way, the protocol only care about the metric categories and its
normalized value, and avoid to process the detailed metrics.
3.3. Level 2: Fully Normalized Metric.
L2 metric is a one-dimensional value derived from a weighted sum of
L1 metrics or from L0 metrics directly. Services may have their own
normalization method which might use different metrics with different
weight. Some implementations may support configuration of Ingress
CATS-Forwarders with the metric normalizing method so that it can
decode the affection from the L1 or L0 metrics.
Kehan, et al. Expires 10 May 2025 [Page 5]
Internet-Draft CATS Metrics November 2024
The definition of L2 metric simplifies the complexity of transmission
and management of multiple metrics by consolidating them into a
single, unified measure.
Figure 1 shows the logic of metrics in Level 0, Level 1, and Level 2.
+--------------+
Level 2 +---------| Normalized M |------------+
| +--------------+ |
| | |
| | Normalizing |
+-------------+ +------------+ +------------+
Level 1 | Category M1 | | Category M2| | Category M3| ...
+-------------+ +------------+ +------------+
| | | | |
| | |Normalizing | |
+------+ +------+ +------+ +------+ +------+
Level 0 |Raw M1| |Raw M2|.....|Raw M3|.........|Raw M4| |Raw M5| ...
+------+ +------+ +------+ +------+ +------+
Figure 1: Logic of CATS Metrics in levels
4. Representation of Metrics
A hierarchical view of metrics has been shown in Figure 1. This
section includes the detailed representation of metrics.
[RFC9439] gives a good way to show the representation of some network
metrics which is used for network capabilities exposure to
applications. This document further describes the representation of
CATS metrics.
Basically, in each metric level and for each metric, there will be
some common fields for representation, including metric type, unit,
and precision. Metric type is a label for network devices to
recognize what the metric is. "unit" and "precision" are usually
associated with the metric. How many bits a metric occupies in
protocols is also required.
Beyond these basic representations, the source of the metrics must
also be declared, since there are multiple levels of metrics and
their sources are different. As defined in [RFC9439], there are
three cost-sources, nominal, sla, and estimation. This document
further divide the estimation type into three sub-types, direct
measurement, aggregation, and normalization, since different levels
of metrics require different sources to acquire CATS metrics.
Directly measured metrics have physical meanings and units without
any processing. Aggregated metrics can be either physically
Kehan, et al. Expires 10 May 2025 [Page 6]
Internet-Draft CATS Metrics November 2024
meaningful or not, and they maintain their meanings compared to the
directly measured metrics. Normalized metrics can have physical
meanings or not, but they do not have units, and they are just
numbers that used for routing decision making.
To be more fine-grained, this document refers to the definition of
[RFC9439] on the metrics statistics.
4.1. Level 0 Metric Representation
Raw metrics have exact physical meanings and units. They are
directly measured from the underlying computing resources providers.
Lots of definition on this level of metrics have been defined in IT
industry and other standardizations[DMTF], and this document only
show some examples for different categories of metrics for reference.
4.1.1. Compute Raw Metrics
The metric type of compute resources are named as “compute_type: CPU”
or “compute_type: GPU”. Their frequency unit is GHZ, the compute
capabilities unit is FLOPS. Format should support integer and FP8.
It will occupy 4 octets. Example:
Basic fields:
Metric type: “compute type_CPU”
Format: integer, FP8
Bits occupation: 4 octets
Special fields:
Frequency unit: GHZ
Compute capabilities unit: FLOPs
Source:
Direct measurement
Statistics:
Mean
Figure 2: An Example for Compute Raw Metrics
4.1.2. Storage Raw Metrics
The metric type of storage resources like SSD are named as
“storage_type: SSD”. The storage space unit is megaBytes(MBs).
Format is integer. It will occupy 2 octets. The unit of read or
write speed is denoted as MB per second. Example:
Kehan, et al. Expires 10 May 2025 [Page 7]
Internet-Draft CATS Metrics November 2024
Basic fields:
Metric type: “storage type_SSD”
Format: integer
Unit: GB
Bits occupation: 2 octets
Source:
nominal
Statistics:
cur
Figure 3: An Example for Storage Raw Metrics
4.1.3. Network Raw Metrics
The metric type of network resources like bandwidth are named as
"network_type: Bandwidth”. The unit is gigabits per second(Gb/s).
Format is integer. It will occupy 2 octets. The unit of TXBytes and
RXBytes is denoted as MB per second. Example:
Basic fields:
Metric type: “network type_Bandwidth”
Format: integer
Unit: Gb/s
Bits occupation: 2 octets
Source:
nominal
Statistics:
cur
Figure 4: An Example for Network Raw Metrics
4.1.4. Delay Raw Metrics
Delay is a kind of synthesized metric which is influenced by
computing, storage access, and network transmission. It is named as
“delay_raw”. Format should support integer and FP8. Its unit is
microsecond. It will occupy 4 octets. Example:
Basic fields:
Metric type: “delay_raw”
Format: integer, FP8
Unit: Microsecond(us)
Bits occupation: 4 octets
Source:
aggregation
Statistics:
max
Kehan, et al. Expires 10 May 2025 [Page 8]
Internet-Draft CATS Metrics November 2024
Figure 5: An Example for Delay Raw Metrics
4.1.5. Considerations on the Sources of Metrics and the Statistics
The sources of L0 metrics can be nominal, directly measured, or
aggregated. Nominal L0 metrics are provided initially by resource
providers. Dynamic L0 metrics are measured and updated during
service stage. L0 metrics also support aggregation, in case that
there are multiple service instances.
The statistics of L0 metrics will follow the definition of
Section 3.2 of [RFC9439].
4.2. Level 1 Metric Representation
Normalized metrics in categories have physical meanings but they do
not have unit. They are numbers after some ways of abstraction, but
they can represent their type, in case that in some use cases, some
specific types of metrics require more attention.
4.2.1. Normalized Compute Metrics
The metric type of normalized compute metrics is “compute_norm”, and
its format is integer. It has no unit. It will occupy an octet.
Example:
Basic fields:
Metric type: “compute_norm”
Format: integer
Bits occupation: an octet
Score: 1
Source:
normalization
Figure 6: An Example for Normalized Compute Metrics
4.2.2. Normalized Storage Metrics
The metric type of normalized compute metrics is “storage_norm”, and
its format is integer. It has no unit. It will occupy a octet.
Example:
Kehan, et al. Expires 10 May 2025 [Page 9]
Internet-Draft CATS Metrics November 2024
Basic fields:
Metric type: “storage_norm”
Format: integer
Bits occupation: an octet
Score: 1
Source:
normalization
Figure 7: An Example for Normalized Storage Metrics
4.2.3. Normalized Network Metrics
The metric type of normalized compute metrics is “network_norm”, and
its format is integer. It has no unit. It will occupy a octet.
Example:
Basic fields:
Metric type: “network_norm”
Format: integer
Bits occupation: an octet
Score: 1
Source:
normalization
Figure 8: An Example for Normalized Network Metrics
4.2.4. Normalized Delay
The metric type of normalized compute metrics is “delay_norm”, and
its format is integer. It has no unit. It will occupy a octet.
Example:
Basic fields:
Metric type: “delay_norm”
Format: integer
Bits occupation: an octet
Score: 1
Source:
normalization
Figure 9: An Example for Normalized Delay Metrics
4.2.5. Considerations on the Sources of Metrics and the Statistics
The sources of L1 metrics is normalized. Based on L0 metrics,
service providers design their own algorithms to normalize metrics.
For example, assigning different cost values to each raw metric and
do summation. L1 metric do not need further statistical values.
Kehan, et al. Expires 10 May 2025 [Page 10]
Internet-Draft CATS Metrics November 2024
4.3. Level 2 Metric Representation
A fully normalized metric is a single value which does not have any
physical meaning or unit. Each provider may have its own methods to
derive the value, but all providers must follow the definition in
this section to represent the fully normalized value.
Metric type is “norm_fi”. The format of the value is non-negative
integer. It has no unit. It will occupy a octet. Example:
Basic fields:
Metric type: “norm_fi”
Format: non-negative integer
Bits occupation: an octet
Score: 1
Source:
normalization
Figure 10: An Example for Fully Normalized Metric
The fully normalized value also supports aggregation when there are
multiple service instances providing these fully normalized values.
When providing fully normalized values, service instances do not need
to do further statistics.
5. Comparison of three layers of metric
From L0 to L1 to L2, the computing metric is consolidated. Different
level of abstraction can meet the requirements from different
services. Table 1 shows the comparison among metric levels.
+=======+=============+===============+===========+==========+
| Level | Encoding | Extensibility | Stability | Accuracy |
| | Complexity | | | |
+=======+=============+===============+===========+==========+
| Level | Complicated | Bad | Bad | Good |
| 0 | | | | |
+-------+-------------+---------------+-----------+----------+
| Level | Medium | Medium | Medium | Medium |
| 1 | | | | |
+-------+-------------+---------------+-----------+----------+
| Level | Simple | Good | Good | Medium |
| 2 | | | | |
+-------+-------------+---------------+-----------+----------+
Table 1: Comparison among Metrics Levels
Kehan, et al. Expires 10 May 2025 [Page 11]
Internet-Draft CATS Metrics November 2024
Since Level 0 metrics are raw metrics, therefore, different services
may have their own metrics, resulting in hundreds or thousands of
metrics in total, this brings huge complexity in protocol encoding
and standardization. Therefore, this kind of metrics are always used
in customized IT systems case by case. In Level 1 metrics, metrics
are categorized into several categories and each category is
normalized into a value, therefore they can be encoded into the
protocol and standardized. Regarding the Level 2 metrics, all the
metrics are normalized into one single metric, it is easier to be
encoded in protocol and standardized. Therefore, from the encoding
complexity aspect, Level 2 and Level 1 metrics are suggested.
Similarly, when considering extensibility, new services can define
their own new L0 metrics, which requires protocol to be extended as
needed. Too many metrics type can create a lot of overhead to the
protocol resulting in a bad extensibility of the protocol. Level 1
introduce only several metrics categories, which is acceptable for
protocol extension. Level 2 metric only need one single metric, so
it brings least burden to the protocol. Therefore, from the
extensibility aspect, Level 2 and Level 1 metrics are suggested.
Regarding Stability, new Level 0 raw metrics may require new
extension in protocol, which brings unstable format for protocol,
therefore, this document does not recommend to standardize Level 0
metrics in protocol. Level 1 metrics request only few categories,
and Level 2 Metric only introduce one metric to the protocol, so they
are preferred from the stability aspect.
In conclusion, for computing-aware traffic steering, it is
recommended to use the L2 metric due to its simplicity. If advanced
scheduling is needed, L1 metric can be used. L2 metrics are the most
comprehensive and dynamic, therefore transferring them to network
devices is discouraged due to their high overhead.
Editor notes: this draft can be updated according to the discussion
of metric definition in CATS WG.
6. Security Considerations
TBD
7. IANA Considerations
TBD
8. References
8.1. Normative References
Kehan, et al. Expires 10 May 2025 [Page 12]
Internet-Draft CATS Metrics November 2024
[I-D.ietf-cats-framework]
Li, C., Du, Z., Boucadair, M., Contreras, L. M., and J.
Drake, "A Framework for Computing-Aware Traffic Steering
(CATS)", Work in Progress, Internet-Draft, draft-ietf-
cats-framework-04, 17 October 2024,
<https://datatracker.ietf.org/doc/html/draft-ietf-cats-
framework-04>.
[I-D.ietf-cats-usecases-requirements]
Yao, K., Contreras, L. M., Shi, H., Zhang, S., and Q. An,
"Computing-Aware Traffic Steering (CATS) Problem
Statement, Use Cases, and Requirements", Work in Progress,
Internet-Draft, draft-ietf-cats-usecases-requirements-04,
21 October 2024, <https://datatracker.ietf.org/doc/html/
draft-ietf-cats-usecases-requirements-04>.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997,
<https://www.rfc-editor.org/rfc/rfc2119>.
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
May 2017, <https://www.rfc-editor.org/rfc/rfc8174>.
8.2. Informative References
[DMTF] "DMTF", n.d., <https://www.dmtf.org/>.
[I-D.du-cats-computing-modeling-description]
Du, Z., Yao, K., Li, C., Huang, D., and Z. Fu, "Computing
Information Description in Computing-Aware Traffic
Steering", Work in Progress, Internet-Draft, draft-du-
cats-computing-modeling-description-03, 6 July 2024,
<https://datatracker.ietf.org/doc/html/draft-du-cats-
computing-modeling-description-03>.
[I-D.rcr-opsawg-operational-compute-metrics]
Randriamasy, S., Contreras, L. M., Ros-Giralt, J., and R.
Schott, "Joint Exposure of Network and Compute Information
for Infrastructure-Aware Service Deployment", Work in
Progress, Internet-Draft, draft-rcr-opsawg-operational-
compute-metrics-08, 21 October 2024,
<https://datatracker.ietf.org/doc/html/draft-rcr-opsawg-
operational-compute-metrics-08>.
Kehan, et al. Expires 10 May 2025 [Page 13]
Internet-Draft CATS Metrics November 2024
[performance-metrics]
"performance-metrics", n.d.,
<https://www.iana.org/assignments/performance-metrics/
performance-metrics.xhtml>.
[RFC9439] Wu, Q., Yang, Y., Lee, Y., Dhody, D., Randriamasy, S., and
L. Contreras, "Application-Layer Traffic Optimization
(ALTO) Performance Cost Metrics", RFC 9439,
DOI 10.17487/RFC9439, August 2023,
<https://www.rfc-editor.org/rfc/rfc9439>.
Authors' Addresses
Kehan Yao
China Mobile
China
Email: yaokehan@chinamobile.com
Hang Shi
Huawei Technologies
China
Email: shihang9@huawei.com
Cheng Li
Huawei Technologies
China
Email: c.l@huawei.com
L. M. Contreras
Telefonica
Email: luismiguel.contrerasmurillo@telefonica.com
Jordi Ros-Giralt
Qualcomm Europe, Inc.
Email: jros@qti.qualcomm.com
Kehan, et al. Expires 10 May 2025 [Page 14]