NSIS Working Group Attila Bader INTERNET-DRAFT Lars Westberg Ericsson Expires: 23 April 2007 Georgios Karagiannis University of Twente Cornelia Kappler Siemens Tom Phelan Sonus October 23, 2006 RMD-QOSM - The Resource Management in Diffserv QOS Model <draft-ietf-nsis-rmd-08.txt> Status of this Memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on December 23, 2006. Copyright Notice Copyright (C) The Internet Society (2006). Bader, et al. [Page 1]
INTERNET-DRAFT RMD-QOSM Abstract This document describes an NSIS QoS Model for networks that use the Resource Management in Diffserv (RMD) concept. RMD is a technique for adding admission control and preemption function to Differentiated Services (Diffserv) networks. The RMD QoS Model allows devices external to the RMD network to signal reservation requests to edge nodes in the RMD network. The RMD Ingress edge nodes classify the incoming flows into traffic classes and signals resource requests for the corresponding traffic class along the data path to the Egress edge nodes for each flow. Egress nodes reconstitute the original requests and continue forwarding them along the data path towards the final destination. In addition, RMD defines notification functions to indicate overload situations within the domain to the edge nodes. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . .5 3. Overview of RMD and RMD-QOSM . . . . . . . . . . . . . .. . .5 3.1 RMD . . . . . . . . . . . . . . . . . . . . . . . . . . .5 3.2 Basic features of RMD-QOSM . . . . . . . . . . . . . . . 8 3.2.1 Role of the QNEs . . . . . . . .. . . . . . . . . .8 3.2.2 RMD-QOSM signaling . . . . . . . . . . . . . . . . 9 3.2.3 RMD-QOSM Applicability and considerations. . . . .11 4. RMD-QOSM, Detailed Description . . . . . . . . . . . .. . . 12 4.1 RMD-QSpec Definition . . . . . . . . . . . . . . . . . .12 4.1.1 RMD-QOSM QoS Description . . . . . . . . . . . . 13 4.1.2 PHR RMD-QOSM control information . . . . . . . . .14 4.1.3 PDR RMD-QOSM control information . . . . . . . . 15 4.2 Message format . . . . . . . . . . . . . . . . . . . . .18 4.3 RMD node state management . . . . . . . . . . . . . . . 18 4.3.1 Aggregated versus per flow reservations at the QNE edges . . . . . . . . . . . . . . . . . . . . 18 4.3.2 Measurement-based method . . . . . . . . . . . . .20 4.3.3 Reservation-based method . .. . . . . . . . . . . 22 4.4 Transport of RMD-QOSM messages . . . . . . . . . . . . .23 4.5 Edge discovery and addressing of messages . . . . . . . 25 4.6 Operation and sequence of events . . . . . . . . . . . .26 4.6.1 Basic unidirectional operation . . . . . . . . . .26 4.6.1.1 Successful reservation. . . . . . . . . . . .27 4.6.1.2 Unsuccessful reservation . . . . . . . . . . 37 Bader, et al. [Page 2]
INTERNET-DRAFT RMD-QOSM 4.6.1.3 RMD refresh reservation. . . . . . . . . . . 41 4.6.1.4 RMD modification of aggregated reservation . 44 4.6.1.5 RMD release procedure. . . . . . . . . . . . 45 4.6.1.6 Severe congestion handling . . . . . . . . .52 4.6.1.7 Admission control using congestion notification based on probing . . . . . . . 58 4.6.2 Bidirectional operation . . . . . . . . . . . . . 61 4.6.2.1 Successful and unsuccessful reservation . . .63 4.6.2.2 Refresh reservation . . . . . . . . . . . . .66 4.6.2.3 Modification of aggregated reservation . . . 67 4.6.2.4 Release procedure . . . . . . . . . . . . . .68 4.6.2.5 Severe congestion handling . . . . . . . . . 68 4.6.2.6 Admission control using congestion notification based on probing . . . . . . . .71 4.7 Handling of additional errors . . . . . . . . . . . . . 73 5. Security Consideration. . . . . . . . . . . . . . . . . . . 73 6. IANA Considerations. . . . . . . . . . . . . . . . . . . . .76 7. Acknowledgments. . . . . . . . . . . . . . . . . . . . . . .76 8. Authors' Addresses. . . . . . . . . . . . . . . . . . . . . 77 9. Normative References . . . . . . . . . . . . . . . . . . . .78 10. Informative References . . . . . . . . . . . . . . . . . . 78 1. Introduction This document describes a Next Steps In Signaling (NSIS) QoS model for networks that use the Resource Management in Diffserv (RMD) framework ([RMD1], [RMD2], [RMD3], [RMD4]). RMD adds admission control to Diffserv networks and allows nodes external to the networks to dynamically reserve resources within the Diffserv domains. The Quality of Service NSIS Signaling Layer Protocol (QoS-NSLP) [QoS-NSLP] specifies a generic protocol for carrying Quality of Service(QoS) signaling information end-to-end in an IP network. Each network along the end-to-end path is expected to implement a specific QoS Model (QOSM) specified by the QSpec template [QSP-T] that interprets the requests and installs the necessary mechanisms, in a manner that is appropriate to the technology in use in the network, to ensure the delivery of the requested QoS. Bader, et al. [Page 3]
INTERNET-DRAFT RMD-QOSM This document specifies an NSIS QoS Model for RMD networks (RMD- QOSM), and an RMD-specific QSpec (RMD-QSPec) for expressing reservations in a suitable form for simple processing by internal nodes. They are used in combination with the QoS-NSLP to provide QoS signaling service in an RMD network. Figure 1 shows an RMD network with the respective entities. Stateless or reduced state Egress Ingress RMD nodes Node Node (Interior Nodes; I-Nodes) (Stateful (Stateful | | | RMD QoS RMD QoS NLSP | | | NSLP Node) Node) V V V +-------+ Data +------+ +------+ +------+ +------+ |-------|--------|------|------|------|-------|------|---->|------| | | Flow | | | | | | | | |Ingress| |I-Node| |I-Node| |I-Node| |Egress| | | | | | | | | | | +-------+ +------+ +------+ +------+ +------+ =================================================> <================================================= Signaling Flow FIGURE 1: Actors in the RMD-QOSM Internally to the RMD network, RMD-QOSM defines a scalable QoS signaling model in which per-flow QoS-NSLP and NTLP states are not stored in Interior nodes but per-flow signaling is performed (see [QoS-NSLP]). In the RMD-QOSM, only routers at the edges of a Diffserv domain (Ingress and Egress nodes) support the (QoS-NSLP) stateful operation, see Section 4.7 of [QoS-NSLP]. Interior nodes support either the(QoS-NSLP) stateless operation, or a reduced-state operation with coarser granularity than the edge nodes. The remainder of this draft is structured following the suggestions in Appendix A of [QSP-T] for the description of QoS Models and QSPECs and their relation. After the terminology in Section 2, we give an overview of RMD and the RMD-QOSM in Section 3. In Section 4 we give a detailed description of the RMD-QOSM, including the role of QNEs, the definition of the QSpec, mapping of QSpec generic parameters onto RMD-QOSM parameters, state management in QNEs, and operation and sequence of events. Section 5 discusses security issues. Bader, et al. [Page 4]
INTERNET-DRAFT RMD-QOSM 2. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119. The terminology defined by GIST [GIST] and QoS-NSLP [QoS-NSLP] applies to this draft. In addition, the following terms are used: Edge node: a QoS-NSLP node on the boundary of some administrative domain. Ingress node: An edge node that handles the traffic as it enters the domain. Egress node: An edge node that handles the traffic as it leaves the domain. Interior nodes: the set of QOS-NSLP nodes which form an administrative domain, excluding the edge nodes. 3. Overview of RMD and RMD-QOSM 3.1. RMD The Differentiated Services (Diffserv) architecture ([RFC2475], [RFC2638]) was introduced as a result of efforts to avoid the scalability and complexity problems of Intserv [RFC1633]. Scalability is achieved by offering services on an aggregate rather than per-flow basis and by forcing as much of the per-flow state as possible to the edges of the network. The service differentiation is achieved using the Differentiated Services (DS) field in the IP header and the Per-Hop Behavior (PHB) as the main building blocks. Packets are handled at each node according to the PHB indicated by the DS field in the message header. The Diffserv architecture does not specify any means for devices outside the domain to dynamically reserve resources or receive indications of network resource availability. In practice, service providers rely on short active time Service Level Agreements (SLAs) that statically define the parameters of the traffic that will be accepted from a customer. Bader, et al. [Page 5]
INTERNET-DRAFT RMD-QOSM RMD was introduced as a method for dynamic reservation of resources within a Diffserv domain. It describes a method that is able to provide admission control for flows entering the domain and a congestion handling algorithm that is able to terminate flows in case of congestion due to a sudden failure (e.g., link, router) within the domain. In RMD, scalability is achieved by separating a fine-grained reservation mechanism used in the edge nodes of a Diffserv domain from a much simpler reservation mechanism needed in the Interior nodes. Typically it is assumed that edge nodes support per- flow QoS states in order to provide QoS guarantees for each flow. Interior nodes use only one aggregated reservation state per traffic class or no states at all. In this way it is possible to handle large numbers of flows in the Interior nodes. Furthermore, due to the limited functionality supported by the Interior nodes, this solution allows fast processing of signaling messages. The possible RMD-QOSM explicabilities are described in Section 3.2.3. Two main basic admission control modes are supported: reservation-based and measurement-based admission control that can be used in combination with a severe congestion handling solution. The severe congestion handling solution is used in the situation when a link/node becomes severely congested due to the fact that the traffic supported by a failed link/node is rerouted and has to be processed by this link/node. Furthermore, RMD-QOSM supports both uni-directional and bi-directional reservations. Another important feature of RMD-QOSM is that the intra-domain sessions supported by the edges can be either per flow sessions or per aggregate sessions. In case of the per flow intra-domain sessions, the maintained per flow intra-domain states have a one-to- one dependency to the per flow end-to-end states supported by the same edge. In case of the per-aggregate sessions the maintained per- aggregate states have a one-to-many relationship to the per flow end-to-end states supported by the same edge. In the reservation-based method, each Interior node maintains only one reservation state per traffic class. The Ingress edge nodes aggregate individual flow requests into PHB traffic classes, and signal changes in the class reservations as necessary. The reservation is quantified in terms of resource units (or bandwidth). These resources are requested dynamically per PHB and reserved on demand in all nodes in the communication path from an Ingress node to an Egress node. Bader, et al. [Page 6]
INTERNET-DRAFT RMD-QOSM The measurement-based algorithm continuously measures traffic levels and the actual available resources, and admits flows whose resource needs are within what is available at the time of the request. Once an admission decision is made, no record of the decision need be kept. The advantage of measurement-based resource management protocols is that they do not require pre-reservation state nor explicit release of the reservations. Moreover, when the user traffic is variable, measurement based admission control could provide higher network utilization than, e.g., peak-rate reservation. However, this can introduce an uncertainty in the availability of the resources. Two types of measurement based admission control schemes are possible: * Congestion notification function based on probing: This method can be used to implement a simple measurement-based admission control within a Diffserv domain. In this scenario the interior nodes are not NSIS aware nodes. In these interior nodes thresholds are set for the traffic belonging to different PHBs in the measurement based admission control function. In this scenario an end-to-end NSIS message is used as a probe packet, meaning that the DSCP field in the header of the IP packet that carries the NSIS message is re-marked when the predefined congestion threshold is exceeded. Note that when the predefined congestion threshold is exceeded all packets are remarked by a node, including NSIS messages. In this way the edges can admit or reject flows that are requesting resources. The rate of the re-marked data packets is used to detect a congestion situation that can influence the admission control decisions. * NSIS measurement-based admission control: In this case the measurement-based admission control functionality is implemented in NSIS aware stateless routers. The main difference between this type of admission control and the congestion notification based on probing is related to the fact that this type of admission control is applied mainly on NSIS aware nodes, giving the possibility to apply measuring techniques, see e.g., [JaSh97], [GrTs03], that are using current and past information on NSIS sessions that requested resources from an NSIS aware interior node. The admission decision is positive if the currently carried traffic, as characterized by the measured statistics, plus the requested resources for the new flow exceeds the system capacity with a probability smaller than some alpha. Otherwise, the admission decision is negative. Bader, et al. [Page 7]
INTERNET-DRAFT RMD-QOSM RMD describes the following procedures: * Classification of an individual resource reservation or a resource query into Per Hop Behavior (PHB) groups at the Ingress node of the domain, * Hop-by-hop admission control based on a PHB within the domain. There are two possible modes of operation for internal nodes to admit requests. One mode is the stateless or measurement-based mode, where the resources within the domain are queried. Another mode of operation is the reduced-state reservation or reservation based mode, where the resources within the domain are reserved. * a method to forward the original requests across the domain up to the Egress node and beyond. * a congestion control algorithm that notifies the egress edge nodes about congestion. It is able to terminate the appropriate number of flows in case a of congestion due to a sudden failure (e.g., link or router failure) within the domain. 3.2. Basic features of RMD-QOSM 3.2.1 Role of the QNEs The protocol model of the RMD-QOSM is shown in Figure 2. The figure shows QNI and QNR nodes, not part of the RMD network, that are the ultimate initiator and receiver of the QoS reservation requests. It also shows QNE nodes that are the Ingress and Egress nodes in the RMD domain (QNE Ingress and QNE Egress), and QNE nodes that are Interior nodes (QNE Interior). All nodes of the RMD domain are usually QoS-NSLP aware nodes. However, in the scenarios where the congestion notification function based on probing is used, then the interior nodes are not NSIS aware. Edge nodes store and maintain QoS-NSLP and NTLP states and therefore are stateful nodes. The NSIS aware Interior nodes are NTLP stateless. Furthermore they are either QoS-NSLP stateless (for NSIS measurement-based operation), or are reduced state nodes storing per PHB aggregated QoS-NSLP states (for reservation-based operation). Bader, et al. [Page 8]
INTERNET-DRAFT RMD-QOSM |------| |-------| |------| |------| | e2e |<->| e2e |<------------------------->| e2e |<->| e2e | | QoS | | QoS | | QoS | | QoS | | | |-------| |------| |------| | | |-------| |-------| |-------| |------| | | | | | local |<->| local |<->| local |<->| local| | | | | | QoS | | QoS | | QoS | | QoS | | | | | | | | | | | | | | | | NSLP | | NSLP | | NSLP | | NSLP | | NSLP | | NSLP | |st.ful| |st.ful | |st.less/ |st.less/ |st.ful| |st.ful| | | | | |red.st.| |red.st.| | | | | | | |-------| |-------| |-------| |------| | | |------| |-------| |-------| |-------| |------| |------| ------------------------------------------------------------------ |------| |-------| |-------| |-------| |------| |------| | NTLP |<->| NTLP |<->| NTLP |<->| NTLP |<->| NTLP |<->|NTLP | |st.ful| |st.ful | |st.less| |st.less| |st.ful| |st.ful| |------| |-------| |-------| |-------| |------| |------| QNI QNE QNE QNE QNE QNR (End) (Ingress) (Interior) (Interior) (Egress) (End) st.ful: stateful, st.less: stateless st.less red.st.: stateless or reduced state Figure 2: Protocol model of stateless/reduced state operation Note that the RMD domain may contain Interior nodes that are not NSIS aware nodes (not shown in the figure). These nodes are assumed to have sufficient capacity for flows that might be admitted. Furthermore, some of these NSIS unaware nodes may be used for measuring the traffic congestion level on the data path. These measurements can be used by RMD-QOSM in the congestion control based on probing operation and/or severe congestion operation (see Section 4.6.1.6). 3.2.2 RMD-QOSM Signaling The basic RMD-QOSM signaling is shown in Figure 3. A RESERVE message is created by a QNI with an Initiator QSpec describing the reservation and forwarded along the path towards the QNR. When the original RESERVE message arrives at the Ingress node, an RMD-QSpec is constructed based on the top-most QSpec in the message (usually the Initiator QSpec). The RMD-QSpec is sent in a intra-domain, independent RESERVE message through the Interior nodes towards the QNR. This intra-domain RESERVE message uses the GIST datagram signaling mechanism. Note that the RMD-QOSM cannot directly specify that the GIST datagram mode should be used. This can however be notified by using the GIST API Transfer-Attributes, such as unreliable, low level of security and use of local policy. Meanwhile, the original RESERVE message is sent to the Egress node on the path to the QNR using the reliable transport mode of NTLP. Bader, et al. [Page 9]
INTERNET-DRAFT RMD-QOSM QNE QNE QNE QNE Ingress Interior Interior Egress NTLP stateful NTLP stateless NTLP stateless NTLP stateful | | | | RESERVE | | | | -------->| RESERVE | | | +--------------------------------------------->| | RESERVE' | | | +-------------->| | | | | RESERVE' | | | +-------------->| | | | | RESERVE' | | | +------------->| | | | | RESERVE | | | +-------> | | | |RESPONSE | | | |<------- | | | RESPONSE | |<---------------------------------------------+ RESPONSE| | | | <--------| | | | Figure 3: Sender-initiated reservation with Reduced State Interior Nodes Each QoS-NSLP node on the data path processes the intra-domain RESERVE message and checks the availability of resources with either the reservation-based or the measurement-based method. When the message reaches the Egress node, and the reservation is successful in each Interior nodes, the original (end-to-end) RESERVE message is forwarded to the next domain. When the Egress node receives a RESPONSE message from the downstream end, it is forwarded directly to the Ingress node. If an intermediate node cannot accommodate the new request, it indicates this by marking a single bit in the message, and continues forwarding the message until the Egress node is reached. From the Egress node a RESPONSE message is sent directly the Ingress node. As a consequence in the stateless/reduced state domain only sender- initiated reservation can be performed and functions requiring per flow NTLP or QoS-NSLP states, like summary refreshes, cannot be used. If per flow identification, is needed, i.e., associating the flow IDs for the reserved resources, Edge nodes act on behalf of Interior nodes. Bader, et al. [Page 10]
INTERNET-DRAFT RMD-QOSM 3.2.3 RMD-QOSM Applicability and considerations The RMD-QOSM is a Diffserv-based bandwidth management methodology that is not able to provide a full Diffserv support. The reason of this is that the RMD-QOSM concept can only support the (Expedited Forwarding) EF-like functionality behavior, where the required bandwidth can be signaled in the <QoS Desired> parameter. The RMD- QOSM is not able to support the full set of (Assured Forwarding) AF- like functionality where multiple PHBs/DSCPs are used. This is because the signaled <QoS Desired> parameter should contain two token buckets needed to signal AF in full generality. Note however, that RMD-QOSM could also support a single AF PHB, when the traffic or the upper limit of the traffic can be characterized by a single bandwidth parameter. A very important consideration on using RMD-QOSM is that within one RMD domain only one of the following RMD-QOSM schemes can be used at a time. Thus a RMD router can never process and use two different RMD-QOSM signaling schemes at the same time. The operator of an RMD domain has to pre-configure all routers in the domain such that within one RMD domain only one of the below described RMD-QOSM schemes can be used at a time. The available RMD-QOSM signaling schemes are: * per flow congestion notification based on probing (see Sections 4.3.2, 4.6.1.7, 4.6.2.6). Note that this scheme uses for severe congestion handling the Severe congestion handling by proportional data packet marking, see Section 4.6.1.6.2, 4.6.2.5.2) * per flow RMD NSIS measurement based admission control (see Sections 4.3.2, 4.6.1, 4.6.2). Note that this scheme uses for severe congestion handling the Severe congestion handling by proportional data packet marking, see Section 4.6.1.6.2, 4.6.2.5.2) * per flow RMD reservation based in combination with severe congestion handling by the RMD-QOSM refresh procedure (see Sections 4.3.3, 4.6.1, 4.6.1.6.1, 4.6.2.5.1). Note that this scheme uses for severe congestion handling the Severe congestion handling by the RMD-QOSM refresh procedure, see Section 4.6.1.6.1, 4.6.2.5.1) * per flow RMD reservation based in combination with severe congestion handling by proportional data packet marking procedure (see Sections 4.3.3, 4.6.1, 4.6.1.6.2, 4.6.2.5.2). Note that this scheme uses for severe congestion handling the Severe congestion handling by proportional data packet marking procedure, see Section 4.6.1.6.2, 4.6.2.5.2) Bader, et al. [Page 11]
INTERNET-DRAFT RMD-QOSM * per aggregate RMD reservation based in combination with severe congestion handling by the RMD-QOSM refresh procedure (see Sections 4.3.1, 4.6.1, 4.6.1.6.1, 4.6.2.5.1). Note that this scheme uses for severe congestion handling the Severe congestion handling by the RMD-QOSM refresh procedure, see Section 4.6.1.6.1, 4.6.2.5.1) * per aggregate RMD reservation based in combination with severe congestion handling by proportional data packet marking procedure (see Sections 4.3.1, 4.6.1, 4.6.1.6.2, 4.6.2.5.2). Note that this scheme uses for severe congestion handling the Severe congestion handling by proportional data packet marking procedure, see Section 4.6.1.6.2, 4.6.2.5.2) 4. RMD-QOSM, Detailed Description This section describes the RMD-QOSM in more detail. In particular, it defines the role of stateless and reduced-state QNEs, the RMD-QOSM QSpec Object, the format of the RMD-QOSM QoS-NSLP messages and how QSpecs are processed and used in different protocol operations. 4.1. RMD-QSpec Definition The RMD-QOSM uses the QSpec format specified in [QSP-T]. The <QSPEC Version> and <QoSM ID> used by the RMD-QOSM are assigned by IANA, see Section 6. The <QSPEC Control Information> contains the following fields: <QSPEC Control Information> = <PHR container> <PDR container> The Per Hop Reservation container (PHR container) and the Per Domain Reservation container (PDR container) are specified in Section 4.1.2 and 4.1.3, respectively. The <PHR container> contains the QoS specific control information for intra-domain communication and reservation. The <PDR container> contains additional control information that is needed for edge-to-edge communication. The parameter IDs used by the <PHR container> and <PDR container> are assigned by IANA, see Section 6. For clarity Reasons we will assigned temporarily, the following names to the PHR and PDR containers: * PHR_1 to PHR_3 for the <PHR container> * PDR_4 to PDR_10 for the <PDR container> After IANA assigns the proper values to the PHR and PDR containers, then the above list has to be replaced accordingly. Bader, et al. [Page 12]
INTERNET-DRAFT RMD-QOSM The <QoS Description> when used with RMD-QOSM contains the <RMD-QOSM QoS description field> that is specified in Section 4.1.1. The <RMD- QOSM QoS Description> field, the <PHR container> are used and processed by the Edge and Interior nodes. The <PDR container> field is only processed by Edge nodes. 4.1.1. RMD-QOSM QoS Description The RMD-QOSM QoS Description carried by the RESERVE message only contains the QoS Desired object [QSP-T]. The QoS Reserved object is carried by the RESPONSE message. <RMD-QOSM QoS Description> = <QoS Desired> for RESERVE <RMD-QOSM QoS Description> = <QoS Reserved> for RESPONSE <QoS Desired> = <Bandwidth> <PHB Class> <Admission Priority> <QoS Reserved> = <Bandwidth> <PHB Class> <Admission Priority> The bit format of the <Bandwidth>, <PHB Class> (see Figure 4 and Figure 5) and <Admission Priority> complies to the bit format specified in [QSP-T]. Note that for the RMD-QOSM a reservation established without an <Admission Priority> parameter is equivalent to a reservation established with an <Admission Priority> whose value is 1. 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | DSCP |0 0 0 0 0 0 0 0 X 0| +---+---+---+---+---+---+---+---+ Figure 4: DSCP parameter 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | PHB ID code |0 0 X 0| +---+---+---+---+---+---+---+---+ Figure 5: PHB ID Code parameter Bader, et al. [Page 13]
INTERNET-DRAFT RMD-QOSM 4.1.2. PHR Container This section describes the parameters used by the PHR container. <PHR container> = <Overload %>, <S>,<M>, <Admitted Hops>, <B>, <Hop_U> <Time Lag> The bit format of the PHR container can be seen in Figure 6. Note that in Figure 6 <Hop U> is represented as <U>. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0|E|N|R| Container ID |r|r|r|r| 1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |S|M| Admitted Hops|B|U| Time Lag | Overload % | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 6: PHR container Parameter/Container ID: 8 bit field, indicating the PHR type: PHR_Resource_Request, PHR_Release_Request, PHR_Refresh_Update. "PHR_Resource_Request" (Container ID = PHR_1): initiate or update the traffic class reservation state on all nodes located on the communication path between the QNE(Ingress) and QNE(Egress) nodes. "PHR_Refresh_Update" (Container ID = PHR_2): refresh the traffic class reservation soft state on all nodes located on the communication path between the QNE(Ingress) and QNE(Egress) nodes according to a resource reservation request that was successfully processed during a previous refresh period. "PHR_Release_Request" (Container ID = PHR_3): explicitly release, by subtraction, the reserved resources for a particular flow from a traffic class reservation state. <S> (Severe Congestion): 1 bit. In case of a route change refreshing RESERVE messages follow the new data path, and hence resources are requested there. If the resources are not sufficient to accommodate the new traffic severe congestion occurs. Severe congested Interior nodes SHOULD notify Edge QNEs about the congestion by setting the S bit. Bader, et al. [Page 14]
INTERNET-DRAFT RMD-QOSM <Overload %>: 8 bits In case of severe congestion the level of overload is indicated by the Overload %. Overload % SHOULD be higher than 0 if S bit is set. If overload in a node is greater than the overload in a previous node then Overload % SHOULD be updated. For more details see Section 4.6.1.6.1. <M>: 1 bit. In case of unsuccessful resource reservation or resource query in an Interior QNE, this QNE sets the M bit in order to notify the Egress QNE. <Admitted Hops>: 8 bit field. The <Admitted Hops> counts the number of hops in the RMD domain where the reservation was successful. The <Admitted Hops> is set to "0" when a RESERVE message enters a domain and it MUST be incremented by each Interior QNE, provided that the M bit is not set. However when a QNE that does not have sufficient resources to admit the reservation is reached, the M Bit is set, and the <Admitted Hops> value is frozen. <Hop_U> (NSLP_Hops unset): 1-bit. The QNE(Ingress) node MUST set the <Hop_U> parameter to 0. This parameter SHOULD be set to "1" by a node when the node does not increase the <Admitted Hops> value. This is the case when an RMD-QOSM reservation-based node is not admitting the reservation request. When <Hop_U> is set "1" the <Admitted Hops> SHOULD NOT be changed. Note that this flag in combination with the <Admitted Hops> flag are used to locate the last node that successfully processed a reservation request, see Section 4.6.1.2. <B>: 1 bit. Indicates bi-directional reservation. <Time Lag>: 8 bit field. The time lag used in a sliding window over the refresh period. 4.1.3. PDR container This section describes the parameters of the PDR container. The bit format of the PDR container can be seen in Figure 7. <PDR container> = <Overload %> <S> <M> <Max Admitted Hops> <B> [<PDR Bandwidth>] Bader, et al. [Page 15]
INTERNET-DRAFT RMD-QOSM Note that in Figure 7 <Max Admitted Hops> is represented as <Max Adm Hops>. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0|E|N|R| Container ID |r|r|r|r| 2 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |S|M| Max Adm Hops |B| Overload % | Empty | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |PDR Bandwidth(32-bit IEEE floating p.number) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 7: PDR container Parameter/Container ID: 8-bit field identifying the type of PDR container field. "PDR_Reservation_Request" (Parameter/Container ID = PDR_4): Generated by the QNE(Ingress) node in order to initiate or update the QoS-NSLP per domain reservation state in the QNE(Egress) node "PDR_Refresh_Request" (Parameter/Container ID = PDR_5): generated by the QNE(Ingress) node and sent to the QNE(Egress) node to refresh, in case needed, the QoS-NSLP per domain reservation states located in the QNE(Egress) node "PDR_Release_Request" (Parameter/Container ID = PDR_6): generated and sent by the QNE(Ingress) node to the QNE(Egress) node to release the per domain reservation states explicitly "PDR_Reservation_Report" (Parameter/Container ID = PDR_7): generated and sent by the QNE(Egress) node to the QNE(Ingress) node to report that a "PHR_Resource_Request" and a "PDR_Reservation_Request" control information fields have been received and that the request has been admitted or rejected "PDR_Refresh_Report" (Parameter/Container ID = PDR_8) generated and sent by the QNE(Egress) node in case needed, to the QNE(Ingress) node to report that a "PHR_Refresh_Update" control information field has been received and has been processed Bader, et al. [Page 16]
INTERNET-DRAFT RMD-QOSM "PDR_Release_Report" (Parameter/Container ID = PDR_9) generated and sent by the QNE(Egress) node in case needed, to the QNE(Ingress) node to report that a "PHR_Release_Request" and a "PDR_Release_Request" control information fields have been received and have been processed. "PDR_Congestion_Report" (Parameter/Container ID = PDR_10): generated and sent by the QNE(Egress) node to the QNE(Ingress) node and used for congestion notification <S> (PDR Severe Congestion): 1-bit. Specifies if a severe congestion situation occurred. It can also carry the <S> parameter of the "PHR_Resource_Request" or "PHR_Refresh_Update" fields. <Overload %>: 8-bit. It includes the Overload % of the "PHR_Resource_Request" or "PHR_Refresh_Update" control information fields, indicating the level of overload to the Ingress node. For more details see Section 4.6.1.6.1. <M> (PDR Marked): 1-bit. Carries the <M> value of the "PHR_Resource_Request" or "PHR_Refresh_Update" control information fields. <B>: 1 bit Indicates bi-directional reservation. <Max Admitted Hops>: 8-bit. The <Admitted Hops> value that has been carried by the PHR container field used to identify the RMD reservation based node that admitted or process a "PHR_Resource_Request" <PDR Bandwidth>: 32 bits. This field specifies the bandwidth that either applies when the "B" flag is set to "1" and when this parameter is carried by a RESPONSE message, or when a severe congestion occurs and the QNE edges maintain an aggregated intra-domain QoS-NSLP operational state and it is carried by a NOTIFY message. In the situation that the "B" flag is set to "1" this parameter specifies the requested bandwidth that have to be reserved by a node in the reverse direction and when the intra-domain signaling procedures require a bi-directional reservation procedure. In the severe congestion situation this parameter specifies the bandwidth that has to be released. Bader, et al. [Page 17]
INTERNET-DRAFT RMD-QOSM 4.2. Message Format The format of the messages used by the RMD-QOSM complies with the QoS-NSLP and QSpec template specifications. The QSpec used by RMD- QOSM is denoted in this document as RMD-QSpec and is described in Section 4.1. 4.3. RMD node state management The QoS-NSLP state creation and management is specified in [QoS-NSLP]. This section describes the state creation and management functions of the Resource Management Function (RMF) in the RMD nodes. 4.3.1 Aggregated operational and reservation states at the QNE Edges The QNE Edges maintain both the intra-domain QoS-NSLP operational and reservation states, while the QNE Interior nodes maintain only reservation states. The structure of the intra-domain QoS-NSLP operational state used by the QNE edges is specified in [QoS-NSLP]. Note that the method of selecting the end-to-end sessions that form an aggregate is not specified in this document. An example how this can be accomplished is by monitoring the GIST routing states used by the end-to-end sessions and group the ones that use the same <PHB class>, QNE Ingress and QNE Egress addresses and the value of the priority level. Note that this priority level should be deduced from the <priority> parameters carried by the end-to-end QSpec object. The operational state of this aggregated intra-domain session must contain a list with BOUND_SESSION_IDs. The structure of the list depends on whether a unidirectional reservation or a bidirectional reservation is supported. Bader, et al. [Page 18]
INTERNET-DRAFT RMD-QOSM When the operational state (at QNE ingress and QNE egress) supports unidirectional reservations then this state must contain a list with BOUND_SESSION_IDs maintaining the SESSION_ID values of its bound end-to-end sessions. The BINDING_CODE associated with this BOUND_SESSION_ID is set to code (Aggregated sessions). Thus the operational state maintains a list of BOUND_SESSION_IDs entries. Each entry is created when an end-to-end session joins the aggregated intra-domain session and is removed when an end-to-end session leaves the aggregate. It is important to emphasize that in this case, the operational state (at QNE ingress and QNE egress) that is maintained by each end-to-end session bound to the aggregated intra-domain session it must contain in the BOUND_SESSION_ID, the SESSION_ID value of the bound tunnelled intra-domain (aggregate) session. The BINDING_CODE associated with this BOUND_SESSION_ID is set to code (Aggregated sessions). When the operational state (at QNE ingress and QNE egress) supports bidirectional reservations then the operational state must contain a list of BOUND_SESSION_ID sets. Each set contain two BOUND_SESSION_IDs. One of the BOUND_SESSION_IDs maintains the SESSION_ID value of one of bound end-to-end session. The BINDING_CODE associated with this BOUND_SESSION_ID is set to code (Aggregated sessions). Another BOUND_SESSION_ID, within the same set entry, maintains the SESSION_ID of the bidirectional bound end-to- end session. The BINDING_CODE associated with this BOUND_SESSION_ID is set to code (Bi-directional sessions). Note that in each set, a one to one relation exists between each BOUND_SESSION_ID with BINDING_CODE set to (Aggregate sessions) and each BOUND_SESSION_ID with BINDING_CODE set to (Bi-directional sessions). Each set is created when an end-to-end session joins the aggregated operational state and is removed when an end-to-end session leaves the aggregated operational state. It is important to emphasize that in this case, the operational state (at QNE ingress and QNE egress) that is maintained by each end-to-end session bound to the aggregated intra-domain session it must contain two types of BOUND_SESSION_IDs. One is the BOUND_SESSION_ID that must contain the SESSION_ID value of the bound tunelled aggregated intra-domain session that is using the BINDING_CODE set to (Aggregated sessions). The other BOUND_SESSION_ID maintains the SESSION_ID of the bound bidirectional end-to-end session. The BINDING_CODE associated with this BOUND_SESSION_ID is set to code (Bi-directional sessions). When the QNE Edges use aggregated QoS-NSLP reservation states, then the PHB class value and the size of the aggregated reservation, e.g., reserved bandwidth have to be maintained. Note that this type of aggregation is an edge to edge aggregation and is similar to the aggregation type specified in [RFC3175]. Bader, et al. [Page 19]
INTERNET-DRAFT RMD-QOSM The size of the aggregated reservations needs to be greater or equal to the sum of bandwidth of the inter domain (end-to-end) reservations/sessions it aggregates, see e.g., Section 1.4.4 of [RFC3175]. A policy can be used to maintain the amount of required bandwidth on a given aggregated reservation by taking into account the sum of the underlying inter domain (end-to-end) reservations, while endeavouring to change reservation less frequently. This MAY require a trend analysis. If there is a significant probability that in the next interval of time the current aggregated reservation is exhausted, the Ingress router MUST predict the necessary bandwidth and request it. If the Ingress router has a significant amount of bandwidth reserved but has very little probability of using it, the policy MAY predict the amount of bandwidth required and release the excess. To increase or decrease the aggregate, the RMD modification procedures SHOULD be used (see Section 4.6.1.4). The QNE interior node are reduced state nodes, i.e., they do not store NTLP/GIST states but they do store per PHB-aggregated QoS-NSLP reservation states. These reservation states are maintained and refreshed in the same way as described in Section 4.3.3. 4.3.2 Measurement-based method The QNE Edges maintain per flow intra-domain QoS-NSLP operational and reservation states that are containing similar data structures as described in Section 4.3.1. The main difference is associated with the different types of the used MRI and the bound end-to-end sessions. The structure of the maintained BOUND_SESSION_IDs depends on whether a unidirectional reservation or a bidirectional reservation is supported. When unidirectional reservations are supported then the operational state associated with this per flow intra-domain session must contain in the BOUND_SESSION_ID the SESSION_ID value of its bound end-to-end session. The BINDING_CODE associated with this BOUND_SESSION_ID is set to code (Tunnelled and end-to-end sessions). It is important to emphasize that in this case, the operational state (at QNE ingress and QNE egress) that is maintained by the end- to-end session bound to the per-flow intra-domain session it must contain in the BOUND_SESSION_ID, the SESSION_ID value of the bound tunnelled per-flow intra-domain session. The BINDING_CODE associated with this BOUND_SESSION_ID is set to code (Tunnelled and end-to-end sessions). Bader, et al. [Page 20]
INTERNET-DRAFT RMD-QOSM When bidirectional reservations are supported then the operational state (at QNE ingress and QNE egress) must contain two types of BOUND_SESSION_IDs. One is the BOUND_SESSION_ID that maintains the SESSION_ID value of the bound tunnelled per-flow intra-domain session. The BINDING_CODE associated with this BOUND_SESSION_ID is set to code (Tunnelled and end-to-end sessions). The other BOUND_SESSION_ID maintains the SESSION_ID of the bound bidirectional end-to-end session. The BINDING_CODE associated with this BOUND_SESSION_ID is set to code (Bi-directional sessions). It is important to emphasize, in this case, that the operational state (at ingress and egress) that is maintained by the end-to-end session bound to the per-flow intra-domain session must contain two types of BOUND_SESSION_IDs. One of the BOUND_SESSION_IDs must contain the SESSION_ID of its bound end-to-end session that is using a BINDING_CODE with value set to code (Tunnelled and end-to-end sessions). Another BOUND_SESSION_ID maintains the SESSION_ID of the bound bidirectional end-to-end session. The BINDING_CODE associated with this BOUND_SESSION_ID is set to code (Bi-directional sessions). Furthermore, the QoS-NSLP reservation state maintains the PHB class value, the value of the bandwidth requested by the end-to-end session bound to the intra-domain session and the value of the priority level. The measurement-based method can be classified in two schemes: * Congestion notification based on probing: In this scheme the interior nodes are Diffserv aware but not NSIS aware nodes. Each interior node counts the bandwidth that is used by each PHB traffic class. This counter value is stored in an RMD_QOSM state. For each traffic belonging to a PHB traffic class a predefined congestion threshold is set. The predefined congestion notification threshold is set according to, an engineered bandwidth limitation based on e.g. agreed Service Level Agreement or a capacity limitation of specific links. The threshold is usually less than the capacity limit, i.e., admission threshold, in order to avoid congestion due to the error of estimating the actual traffic load. The value of this threshold SHOULD be stored in another RMD_QOSM state. In this scenario end-to-end NSIS message is used as a probe packet. In this case the DSCP field of the GIST message is re-marked when Bader, et al. [Page 21]
INTERNET-DRAFT RMD-QOSM the predefined congestion notification threshold is exceeded in an interior node. In this way it is ensured that the end-to-end NSIS message passed through the node that it is congested. This feature is very useful when ECMP (Equal Cost Multiple Path) based routing is used to detect only flows that are passing through the congested node. Note that in this situation, not only the probe packet is remarked, but also data packets passing though the congested node are re-marked. The rate of the re-marked data packets is used to detect a congestion situation that can influence the admission control decisions. * NSIS measurement-based admission control: The measurement based admission control is implemented in NSIS aware stateless routers. In particular, the QNE Interior nodes operating in NSIS measurement-based mode are QoS-NSLP stateless nodes, i.e., they do not support any QoS-NSLP or NTLP/GIST states. These measurement-based nodes store two RMD-QOSM states per PHR group. These states reflect the traffic conditions at the node and are not affected by QoS-NSLP signaling. One state stores the measured user traffic load associated with the PHR group and another state stores the maximum traffic load threshold that can be admitted per PHR group. When a measurement-based node receives a intra-domain RESERVE message, it compares the requested resources to the available resources (maximum allowed minus current load) for the requested PHR group. If there are insufficient resources, it sets the <M> bit in the RMD-QSpec. No change to the RMD-QSpec is made when there are sufficient resources. 4.3.3 Reservation-based method The QNE Edges maintain intra-domain QoS-NSLP operational and reservation states that are containing similar data structures as described in Section 4.3.2. The QNE Interior nodes operating in reservation-based mode are QoS- NSLP reduced state nodes, i.e., they do not store NTLP/GIST states but they do store per PHB-aggregated QoS-NSLP states. The reservation-based PHR installs and maintains one reservation state per PHB, in all the nodes located in the communication path from the QNE Ingress node up to the QNE Egress node. This state is identified by the PHB class value and it maintains the number of currently reserved resource units (or bandwidth). Thus, the QNE Ingress node signals only the resource units requested by each flow. These resource units, if admitted, are added to the currently reserved resources per PHB. For each PHB a threshold is maintained that specifies the maximum number of resource units that can be reserved. This threshold could, for example, be statically configured. Bader, et al. [Page 22]
INTERNET-DRAFT RMD-QOSM An example of how the admission control and its maintenance process occurs in the interior nodes is described in Section 3 of [CsTa05]. The simplified concept that is used by the per traffic class admission control process in the interior nodes, is based on the following equation: last + p <= T, where p: requested bandwidth rate, T: admission threshold, which reflects the maximum traffic volume that can be admitted in the traffic class, last: a counter that records the aggregated sum of the signaled bandwidth rates of previous admitted flows. The per-PHB group reservation states maintained in the interior nodes are soft states, which are refreshed by sending periodic refresh intra-domain RESERVE messages, which are initiated by the Ingress QNEs. If a refresh message corresponding to a number of reserved resource units (i.e., bandwidth) is not received, the aggregated reservation state is decreased in the next refresh period by the corresponding amount of resources that were not refreshed. The refresh period can be refined using a sliding window algorithm described in [RMD3]. The reserved resources for a particular flow can also be explicitly released from a PHB reservation state by means of a intra-domain RESERVE release/tear message, which is generated by the Ingress QNEs. The usage of explicit release enables the instantaneous release of the resources regardless of the length of the refresh period. This allows a longer refresh period, which also reduces the number of periodic refresh messages. Note that both in case of measurement- and (per-flow and aggregated) RMD reservation-based methods,the way of how the maximum bandwidth thresholds are maintained is out of the specification of this document. However, when admission priorities are supported, the Maximum Allocation [RFC4125] or the Russian Dolls [RFC4127] bandwidth allocation model may be used. In this case three types of priority traffic classes within the same PHB, e.g., Expedited Forwarding, can be differentiated. These three different priority traffic classes, which are associated to the same PHB, are denoted in this document as PHB_low_priority, PHB_normal_priority and PHB_high_priority, and are identified by the PHB class value and the priority value, which is carried in the <Admission priority> RMD-QSpec parameter. 4.4. Transport of RMD-QOSM messages The intra-domain messages used by the RMD-QOSM should operate in the NTLP/GIST Datagram mode (see [GIST]). Therefore, the NSLP functionality available in all QoS NSLP nodes that are able to support the RMD-QOSM MUST require from the intra-domain GIST functionality available in these nodes to operate in the datagram mode, i.e., require GIST to: Bader, et al. [Page 23]
INTERNET-DRAFT RMD-QOSM * operate in unreliable mode. This can be satisfied by passing this requirement from the QoS-NSLP layer to the GIST layer via the API Transfer-Attributes. * do not create a message association state. This requirement can be satisfied by a local policy, e.g., the QNE is configured to do not create a message association state * the interior nodes do not create any NTLP routing state. This can be satisfied by passing this requirement from the QoS-NSLP layer to the GIST layer via the API. However, between the QNE Egress and QNE Ingress routing states that are associated with intra-domain sessions should be created that can be used for the communication of GIST Data messages sent by a QNE Egress directly to a QNE Ingress. This type of routing state associated with an intra-domain session can be generated and used in the following way: * When the QNE Ingress has to send an initial intra-domain RESERVE message, the QoS-NSLP sends this message by including in the GIST API SendMessage primitive, the Unreliable and No security attributes. The GIST then, will probably send this NSLP message by piggybacking it on a GIST QUERY message. The GIST functionality in each QNE Interior node will receive the GIST QUERY message and by using the RecvMessage GIST API primitive it will pass the intra-domain RESERVE message to the QoS-NSLP functionality. At the same time the GIST functionality uses the Routing-State-Check boolean to find out if the QoS-NSLP needs to create a routing state. The QoS-NSLP sets this Boolean to inform GIST to not create a routing state and to forward the GIST QUERY further downstream with the modified QoS-NSLP payload, which will include the modified intra- domain RESERVE message. The intra-domain RESERVE is sent in the same way up to the QNE Egress. The QNE Egress needs to create a routing state. Therefore at the moment that the GIST functionality passes the intra-domain RESERVE message, via the GIST RecvMessage primitive, to the QoS-NSLP, then at the same time the QOS-NSLP sets the Routing-State-Check boolean such that a routing state is created. The GIST creates the routing state using normal GIST procedures. After this phase the QNE Ingress and QNE Egress have, for the particular session, routing states that can route traffic directly from QNE Ingress to QNE Egress and from QNE Egress to QNE Ingress. The routing state at the QNE Egress can be used by the QoS-NSLP and GIST to send an intra-domain RESPONSE or intra- domain NOTIFY directly to the QNE Ingress using GIST Data messages. Note that this routing state is refreshed using normal GIST procedures. Bader, et al. [Page 24]
INTERNET-DRAFT RMD-QOSM * When the QNE Ingress needs to send an intra-domain RESERVE message that is not an initial RESERVE, then the QoS-NSLP sends this message by including in the GIST API SendMessage primitive the Unreliable and No security attributes. Furthermore the Local policy attribute is set such that GIST sends the intra-domain RESERVE message in a Q-mode even if there is a routing state at the QNE Ingress. In this way the GIST functionality uses its local policy to send the intra-domain RESERVE message by piggybacking it on a GIST DATA message and sending it in Q-mode even if there is a routing state for this session. The intra-domain RESERVE message is piggybacked on the GIST DATA message that is forwarded and processed by the QNE Interior nodes up to the QNE Egress. The transport of the original (end-to-end) RESERVE message is accomplished in the following way: At the QNE ingress the original (end-to-end) RESERVE message is forwarded but ignored by the stateless or reduced-state nodes, see Figure 3. The intermediate (interior) nodes are bypassed using multiple levels of the router alert option (see [QoS-NSLP]). In that case, interior routers are configured to handle only certain levels of router alert (RAO) values. This is accomplished by marking the end-to-end RESERVE message, i.e., modifying the QoS-NSLP default NSLP-ID value to another NSLP-ID predefined value. The marking MUST be accomplished by the ingress by modifying the QoS_NSLP default NSLP-ID value to a NSLP-ID predefined value. In This way the egress MUST stop this marking process by reassigning the QoS-NSLP default NSLP-ID value to the original (end-to-end) RESERVE message. Note that the assignment of these NSLP-ID values is a QOS-NSLP issue, which should be accomplished via IANA [QoS-NSLP]. 4.5 Edge discovery and message addressing Mainly, the Egress node discovery can be performed either by using the GIST discovery mechanism [GIST], manual configuration or any other discovery technique. The addressing of signaling messages depends on the used GIST transport mode. The RMD QoS signaling messages that are processed only by the Edge nodes use the peer-peer addressing of the GIST connection (C) mode. RMD QoS signaling messages that are processed by all nodes of the Diffserv domain, i.e., Edges and Interior nodes, use the end-end addressing of the GIST datagram (D) mode. Note that the RMD-QOSM cannot directly specify that the GIST connection or the GIST datagram mode should be used. This can only be specified by using the GIST API Transfer- Attributes, such as reliable or unreliable, high or low level of Bader, et al. [Page 25]
INTERNET-DRAFT RMD-QOSM security and by the use of local policies. RMD QoS signaling messages that are addressed to the data path end nodes are intercepted by the Egress nodes. In particular, at the ingress and for downstream intra-domain messages, the RMD-QOSM instructs the GIST functionality, via the GIST API to use among others: * unreliable and low level security Transfer-Attributes * do not create a GIST routing state * uses the D-mode MRI The intra-domain RESERVE messages can then be transported by using the Query D-mode, see Section 4.4.. At the QNE Egress and for upstream intra-domain messages, the RMD- QOSM instructs the GIST functionality, via the GIST API to use among others: * unreliable and low level of security Transfer-Attributes * The GIST functionality uses the routing state associated with the intra-domain session to send an upstream intra-domain message directly to the QNE Ingress, see Section 4.4. 4.6. Operation and sequence of events 4.6.1. Basic unidirectional operation This section describes the basic unidirectional operation and sequence of events of the RMD-QOSM. The following basic operation cases are distinguished: * Successful reservation (Section 4.6.1.1), * Unsuccessful reservation (Section 4.6.1.2), * RMD refresh reservation (Section 4.6.1.3), * RMD modification of aggregated reservation (4.6.1.4) * RMD release procedure (Section 4.6.1.5) * Severe congestion handling (Section 4.6.1.6) * Admission control using congestion notification based on probing (Section 4.6.1.7). The QNEs at the Edges of the RMD domain support the RMD QoS Model and end-to-end QoS models, which process the RESERVE message differently. Note that the term end-to-end QoS model applies to any QoS model that is initiated and terminated outside the RMD-QOSM aware domain. However, there might be situations where a QoS model is initiated and/or terminated by the QNE Edges and is considered to be an end-to- end QoS model. This can occur when the QNE Edges can also operate as either QNI or as QNR and at the same time they can operate as either sender or receiver of the data path. Note that the described functionality described in Sections 4.6.1.1, 4.6.1.2, 4.6.1.3, 4.6.1.5, and 4.6.1.6 applies to the Bader, et al. [Page 26]
INTERNET-DRAFT RMD-QOSM RMD reservation-based and to the NSIS measurement-based admission control methods. The described functionality in Section 4.6.1.7 applies to the admission control procedure that uses the congestion notification based on probing. The QNE Edge nodes maintain either per flow QoS-NSLP operational and reservation states or aggregated QoS- NSLP operational and reservation states. When the QNE Edges maintain aggregated QoS-NSLP operational and reservation states, the RMD-QOSM functionality may accomplish a RMD modification procedure (see Section 4.6.1.4), instead of the reservation initiation procedure that is described in this subsection. Note that it is recommended that the QNE implementations of RMD-QOSM process the QoS-NSLP signaling messages with a higher priority than data packets. This can be accomplished as described in Section 3.3.4 of [QoS-NSLP]. 4.6.1.1. Successful reservation This section describes the operation of the RMD-QOSM where a reservation is successfully accomplished. The QNI generates the initial RESERVE message, and it is forwarded by the NTLP as usual [GIST]. 4.6.1.1.1. Operation in Ingress node <<Editors note: The number of QSPEC-1 parameters and the process of their remapping is based on draft-ietf-nsis-qspec-12.txt, QSPEC template draft. It is expected that the new version of the QSPEC template draft will be severely revised. This means that also the number of QSPEC-1 parameters and their remapping procedures described in this section will also be modified.>> When an end-to-end reservation request (RESERVE) arrives at the Ingress node (QNE), see Figure 8, it is processed based on the end- to-end QoS model. Note that, according to [QSP-T], when the QOSM ID of the end-to-end QoS model is not known to the Ingress node (QNE), the Ingress MUST interpret at least the QSPEC-1 parameters. If the QSpec object contains also QSPEC-2 parameters that are not used by the RMD-QOSM, then the N-flag of each of these objects MUST be set. Subsequently, the RMD QoS Description: <Bandwidth>, <PHB Class>, <Admission priority> are derived from the QoS Description of the end- to-end QSpec. When the <Bandwidth>, <PHB> class and <Admission priority> parameters used by the RMD QSpec cannot be strictly interpreted from the end-to-end QSpec, then the Ingress QNE performs the following functionality. If the end-to-end QSpec contains, in the <QoS Desired> or <QoS Available> parameter, only the <Token Bucket> parameter then this parameter can be remapped to the <Bandwidth> parameter used by the intra-domain RMD-QSpec, in the following way. The value of the "Peak Data Rate [p]" is copied into the <Bandwidth> parameter. Bader, et al. [Page 27]
INTERNET-DRAFT RMD-QOSM Note that in this case the "R" flag included in the end-to-end <Token Bucket> parameter carried by the tunnelled end to end QSpec MUST be set. In all cases the QNE egress uses the tunnelled end-to-end <Token Bucket> parameter, which can be strictly interpreted. However, in this case the non-QOSM-hop "Q" flag MUST be set. If the end-to-end QSpec does not contain any <QoS class> parameter, then the selection of the <PHB class> that is carried by the intra- domain RMD-QSpec is defined by a local policy, see [QSP-T]. For example, in the situation that the end to end QSpec is used by the IntServ Controlled Load QOSM then the Expedited Forwarding (EF) PHB is appropriate to set the <PHB class> parameter carried by the intra-domain RMD-QSpec, see [QSP-T]. If the end-to-end QSpec does not contain the <PHB class> parameter, but it contains either the <DSTE QoS class> or the <Y.1541 QoS class> parameter, then the selection of the <PHB class>, which is carried by the intra-domain RMD-QSpec is defined in [QSP-T] that remaps the <DSTE QoS class> or <Y.1541 QoS class> parameter to the <PHB class> parameter. Note that in this case the "R" flag in the <QoS class> parameter carried by the tunnelled end to end QSpec MUST be set. Note that in the above described case, the QNE egress uses, if available, the tunnelled end-to-end <QoS Class> parameter, which can be strictly interpreted. However, in this case the non-QOSM-hop "Q" flag MUST be set. If the end-to-end QSpec does not carry the <Priority> parameter then the <Admission Priority> parameter in the RMD-QSpec will not be populated. If the end-to-end QSpec does not carry the <Admission Priority> parameter, but it carries other <priority> parameters, then it is considered that edges as being stateful nodes, are able and to control the priority of the sessions that are entering or leaving the RMD domain in accordance to the <priority> parameter. Note that the RMF reservation states, see Section 4.3, in the QNE edges store the value of the <priority> parameter that is used within the RMD domain in case of pre-emption and severe congestion situations, see Section 4.6.1.6. Note that in the above described case, the QNE egress uses, if available, the tunnelled end-to-end <priority> parameter, which can be strictly interpreted. Therefore, the "R" flag and the non-QOSM-hop "Q" flag MUST not be set. If the end-to-end QSpec carries the <Excess Treatment> parameter, then the QNE ingress and QNE egress nodes MUST control the excess traffic that is entering or leaving the RMD domain in accordance to the <Excess Treatment> parameter. Note that the RMD-QSpec does not carry the <Excess Treatment> parameter. However, by using the <PHB class> parameter the RMD domain uses the excess treatment procedures specified by the particular PHB standard. Therefore, in this case the Bader, et al. [Page 28]
INTERNET-DRAFT RMD-QOSM "R" flag in the <Excess Treatment> carried by the tunneled end to end QSpec MUST not be set. If the requested <Bandwidth> parameter, cannot be satisfied, then an end to end RESPONSE message has to be generated. An end-to- end QSpec object MUST be included in the RESPONSE message. The parameters included in the QSpec <QoS Reserved> object are copied from the original <QoS Desired> values. The "E" flag associated with the QSPEC <QoS Reserved> object and the "E" flag associated with the <Bandwidth> parameter are set. In addition, the INFO-SPEC object is included in the end to end RESPONSE message. The error code used by this INFO-SPEC is: Error severity class: Transient Failure Error code value: Reservation failure Furthermore, all the other RESPONSE parameters are set according to the end-to-end QoS model or according to [QoS-NSLP] and [QSP-T]. If the request was satisfied locally (see Section 4.3), the Ingress QNE node generates two RESERVE messages: one intra-domain and one end-to-end RESERVE message. Note however, that when the aggregated QOS-NSLP operational and reservation states are used by the QNE Ingress, then the generation of the intra-domain RESERVE message depends on the availability of the aggregated QoS-NSLP operational state. If this aggregated QoS-NSLP operational state is available, then the RMD modification of aggregated reservations described in section 4.6.1.4 is used. In case that retransmission of both RESERVE messages are needed, then this are following the retransmission procedures described in [QoS- NSLP]. Furthermore, if a rerouting takes place then the stateful QNE ingress is following the procedures specified in [QoS-NSLP]. At this point the intra-domain and end-to-end operational states MUST be initiated or modified according to the required binding procedures. The way of how the BOUND_SESSION_IDs are initiated and maintained in the intra-domain and end-to-end QoS-NSLP operational states is described in Section 4.3.1 and 4.3.2. These two messages are bound together in the following way. The end- to-end RESERVE SHOULD contain in the BOUND_SESSION_ID the SESSION_ID of its bound intra-domain session. Furthermore, if the QNE Edge nodes maintain intra-domain per flow QoS-NSLP reservation states then the value of Binding_Code MUST be set to code "Tunnel and end-to-end sessions", see Section 4.3.2. If the QOS-NSLP edges maintain aggregated intra-domain QoS-NSLP operational states then the value of Binding_Code MUST be set to code "Aggregated sessions". Bader, et al. [Page 29]
INTERNET-DRAFT RMD-QOSM The intra-domain RESERVE message is associated with the (local NTLP) SESSION_ID mentioned above. The selection of the IP source and IP destination address of this message depends on how the different inter-domain (end-to-end) flows are aggregated by the QNE Ingress node (see Section 4.3.1). As described in Section 4.3.1, the QNE Edges maintain either per flow, or aggregated QoS-NSLP reservation states for the RMD QoS model, which are identified by (local NTLP) SESSION_IDs (see [GIST]). Note that this NTLP SESSION ID is a different one than the SESSION_ID associated with the end-to-end RESERVE message. If no QOS-NSLP aggregation procedure at the QNE Edges is supported then the IP source and IP destination address of this message MUST be equal to the IP Source and IP destination addresses of the data flow. The intra-domain RESERVE message is sent using the NTLP datagram mode (see Sections 4.4, 4.5). Note that the GIST datagram mode can be selected using the unreliable GIST API Transfer-Attributes. In addition, the intra-domain RESERVE (RMD-QSpec) message MUST include a PHR container (PHR_Resource_Request) and the "RMD QOS Description" field. The end-to-end RESERVE message includes the end-to-end QSpec and it is sent towards the Egress QNE. If the end-to-end RESERVE message does not carry an <RII> object, then an <RII> object has to be generated and included into the end-to-end RESERVE message. Note that after completing the initial discovery phase, the GIST connection mode can be used between the QNE Ingress and QNE Egress. Note that the GIST connection mode can be selected using the reliable GIST API Transfer-Attributes. The end-to-end RESERVE message is forwarded using the GIST forwarding procedure to bypass the Interior stateless or reduced- state QNE nodes, see Figure 8. The bypassing procedure is described in Section 4.4. At the QNE Ingress the end-to-end RESERVE message is marked, i.e., modifying the QoS-NSLP default NSLP-ID value to another NSLP-ID predefined value, which corresponds to a RAO value that will be used by the GIST message carrying the end-to-end RESPONSE message to bypass the QNE Interior nodes. Note that the QNE Interior nodes, see [GIST], are configured to handle only certain NSLP-Ids (and their related router alert (RAO) values), see [QoS- NSLP]. Furthermore, note that the initial discovery phase and the process of sending the end-to-end RESERVE message towards the QNE Egress MAY be done simultaneously. This can be accomplished only if the GIST implementation is configured to perform that, via e.g., a local policy. However, the selection of the discovery procedure cannot be selected by the RMD-QOSM. The (initial) intra-domain RESERVE message MUST be sent by the QNE Ingress and it MUST contain the following values: Bader, et al. [Page 30]
INTERNET-DRAFT RMD-QOSM * the value of the <RSN> object is generated and processed as described in [QoS-NSLP]; * the SCOPING flag MUST not be set, meaning that a default scoping of the message is used. Therefore, the QNE Edges MUST be configured as RMD boundary nodes and the QNE Interior nodes MUST be configured as Interior (intermediary) nodes; * If the QNE Ingress uses per flow intra-domain QoS-NSLP operational states, see Section 4.3.2, 4.3.3, then the <RII> object MUST not included in this message. If the QNE Edge nodes maintain intra- domain aggregated QoS-NSLP operational states, see Section 4.3.1, then the <RII> MUST be included in this message, see [QoS-NSLP]. * The flag REPLACE MUST be set to FALSE = 0; * the value of the <REFRESH_PERIOD> object MUST be calculated and set by the QNE Ingress node as described in Section 4.6.1.3; * the value of the <PACKET_CLASSIFIER> object is associated with the path-coupled routing MRM, since RMD-QOSM is used with the path-coupled MRM. The flag that has to be set is the flag T (traffic class) meaning that the packet classification of packets is based on the DSCP value included in the IP header of the packets. Note that the DSCP value used in the MRI can be derived by the value of <PHB class> parameter. Note that the QNE Ingress being a QNI for the intra-domain session it can pass this value to GIST, via the GIST API. * the PHR resource units MUST be included into the <Bandwidth> parameter of the "RMD QoS Description" field. When the QNE edges use per flow intra-domain QoS-NSLP states, then the value of the <Bandwdith> parameter can be obtained by using the method of copying/remapping the <Token Bucket/Bandwidth> parameter carried by the end-to-end QSpec into this <Bandwidth> parameter, which is described above in this subsection. When the QNE edges use aggregated intra-domain QoS-NSLP operational states, then the value of the <Bandwdith> parameter can be obtained by using the bandwidth aggregation method described in Section 4.3.1; * the value of the <PHB class> parameter can be defined by using the method of copying/remapping the <QoS Class> parameter carried by the end-to-end QSpec into the <PHB class> carried by the RMD-QSpec, which is described above in this subsection. * the value of the Parameter/Container ID field of the PHR container MUST be set to PHR_1, (i.e., PHR_Resource_Request;) * the value of the <Admitted Hops> parameter in the PHR container MUST be set to "1". Note that during a successful reservation each time a RMD-QOSM aware node processes the RMD-QSpec, the <Admitted Hops> parameter is increased by one. Bader, et al. [Page 31]
INTERNET-DRAFT RMD-QOSM * the value of the <PHB class> parameter can be defined by using the method of copying/remapping the <QoS Class> parameter carried by the end-to-end QSpec into the <PHB class> carried by the RMD-QSpec, which is described above in this subsection. * the value of the Parameter/Container ID field of the PHR container MUST be set to PHR_1, (i.e., PHR_Resource_Request;) * the value of the <Admitted Hops> parameter in the PHR container MUST be set to "1". Note that during a successful reservation each time a RMD-QOSM aware node processes the RMD-QSpec, the <Admitted Hops> parameter is increased by one. * the value of the <Hop_U> parameter in the PHR container MUST be set to "0"; * If the end-to-end QSpec carried an <Admission priority> parameter, then this parameter should be copied into the RMD-QSpec and carried by the (initiating) intra-domain RESERVE. Note that for the RMD-QOSM a reservation established without an <Admission Priority> parameter is equivalent to a reservation with Admission Priority value 1. Note that in this case each admission priority is associated with a priority traffic class. The three priority traffic classes (PHB_low_priority, PHB_normal_priority, PHB_high_priority) may be associated with the same PHB, see Section 4.3.3. * In a single RMD domain case the PDR container MAY not be included into the message. Note that the intra-domain RESERVE message does not carry the BOUND_SESSION_ID object. The reason of this is that the end-to-end RESERVE carries in the BOUND_SESSION_ID object the SESSION_ID value of the intra-domain session. When an end-to-end RESPONSE(PDR) message is received by the QNE Ingress node, which was sent by a QNE Egress nodesee Section 4.6.1.1.3, the RMD-QSpec has to be identified, processed and removed from the end-to-end RESPONSE message. The QoS-NSLP operational states of the intra-domain and end-to-end sessions in the QNE Ingress, see Section 4.3.1, 4.3.2, store and maintain the binding between each end-to-end session and each intra-domain session. In this way the QNE Ingress can match the PHR container that has been carried by the intra-domain RESERVE with the received PDR container that has been carried by the end-to-end RESPONSE message. When an (aggregated) intra-domain RESPONSE(PDR) message is received by the QNE Ingress node, which was sent by a QNE Egress see Section 4.6.1.1.3, it uses the QoS-NSLP procedures to match it to the earlier sent intra-domain RESERVE message. After this phase, the RMD-QSpec has to be identified and processed. Bader, et al. [Page 32]
INTERNET-DRAFT RMD-QOSM The RMD QoS model functionality is notified by reading the <M> parameter of the "PDR RMD control information" container that the reservation has been successful. Furthermore, the INFO_SPEC object SHOULD be read by the QoS-NSLP functionality. In case of successful reservation the INFO_SPEC object SHOULD have the following values: * Error Severity Class: Success * Error Code value: Reservation successful If the end-to-end RESPONSE message has to be forwarded to a node outside the RMD-QOSM aware domain then the values of the objects contained in this message (i.e., <RII/RSN>, <INFO_SPEC>, [ *QSPEC ]) MUST be set by the QOS-NSLP protocol functions of the QNE. 4.6.1.1.2 Operation in the Interior nodes Each QNE Interior node MUST use the QoS-NSLP and RMD-QOSM parameters of the intra-domain RESERVE (RMD-QSpec) message as follows: * the values of the <RSN>, <RII>, <PACKET_CLASSIFIER>, <REFRESH_PERIOD>, objects MUST NOT be changed. The interior node is informed by the <PACKET_CLASSIFIER> object that the packet classification should be done on the DSCP value. The flag that has to be set in this case is the flag T (traffic class). Note that the DSCP value MUST be obtained from the MRI values obtained from GIST. The value of the DSCP value SHOULD be obtained via the MRI parameters that the QoS-NSLP receives from GIST. A QNE Interior MUST be able to associate the value carried by the RMD-QSpec <PHB class> parameter and the DSCP value obtained via GIST. This is required, because there are situations that the <PHB class> parameter is not carrying a DSCP value, but a "PHB ID code", see Section 4.1.1. * The flag REPLACE MUST be set to FALSE = 0; * when the RMD reservation based methods described in Section 4.3.1 and 4.3.3 are used, the value of <Bandwidth> parameter of the "RMD QoS Description" field is used by the QNE Interior node for admission control. Furthermore, if the <Admission Priority> parameter is carried by the "RMD QoS Description" field this parameter is processed as described in the following bullets. * in case of the RMD reservation-based procedure, and if these resources are admitted (see Section 4.3.1, 4.3.3), they are added to the currently reserved resources. Furthermore, the value of the <Admitted Hops> parameter in the PHR container has to be increased by one. Bader, et al. [Page 33]
INTERNET-DRAFT RMD-QOSM * If the bandwidth allocated for the PHB_high_priority traffic is fully utilized, and a high priority request arrives, other policies can be used, which are beyond the scope of this document. One example for these policies can be that the high priority session is admitted through preemption of ongoing lower priority sessions, when the bandwidth reserved by the lower priority sessions can satisfy the high priority bandwidth request.. When the available bandwidth for the PHB_lower_priority and for the PHB_normal_priority is not enough to support the high priority traffic, then it will generate congestion for these PHB traffic classes. A solution to this congestion problem can be accomplished by using the severe congestion detection mechanism specified in Section 4.6.1.6.2.1. The degree of this congested bandwidth is indicated by using a specific DSCP (see Section 4.6.1.6.2.1) by marking the bytes proportionally to the degree of congestion. Other mechanisms may also be used as queues for the new high priority requests until capacity becomes available for the high priority sessions. Note that the process of preemption should take into account the situation that more than one high priority session requests could arrive simultaneously at the node. Therefore, short term history about the amount of preempted traffic might be needed. Note that the three priority traffic classes are associated with the same PHB, see Section 4.3.3. * in case of the RMD measurement based method (see Section 4.3.2), and if the requested value of the <Bandwidth> parameter is admitted, using a MBAC algorithm, then the number of this resources will be used to update the MBAC algorithm according to the operation described in Section 4.3.2. 4.6.1.1.3 Operation in the Egress node When the end-to-end RESERVE message is received by the egress node, it is only forwarded further, towards QNR, if the processing of the intra-domain RESERVE(RMD-QSpec) message was successful at all nodes in the RMD domain. In this case, the QNE Egress MUST stop the marking process that was used to bypass the QNE Interior nodes by reassigning the QoS-NSLP default NSLP-ID value to the end-to-end RESERVE message, see Section 4.4. Furthermore the carried BOUND_SESSION_ID object associated with the intra-domain session MUST be removed after processing. Note that the received end to end RESERVE was tunneled within the RMD domain. Therefore, the tunnelled end-to-end QSpec carried by the end-to-end RESERVE message has to be processed/set according to the [QSP-T] specification. Note that the QNE ingress might, among others, have set the "R" flag and "Q" flag. If a rerouting takes place, then the stateful QNE egress is following the procedures specified in [QoS-NSLP]. At this point the intra-domain and end-to-end operational states MUST be initiated or modified according to the required binding procedures. Bader, et al. [Page 34]
INTERNET-DRAFT RMD-QOSM The way of how the BOUND_SESSION_IDs are initiated and maintained in the intra-domain and end-to-end QoS-NSLP operational states is described in Section 4.3.1 and 4.3.2. If the processing of the intra-domain RESERVE(RMD-QSpec) was not successful at all nodes in the RMD domain then the inter domain (end- to-end) reservation is considered as being failed. Furthermore, note that when the QNE Egress uses per flow intra-domain QoS-NSLP operational states, see Sections 4.3.2 and 4.3.3, the QNE Egress should maintain a timer, that uses a pre-configured value, which can be used to synchronize the arrival of the end to end RESERVE and the intra-domain RESERVE (RMD-QSpec) messages. If these two messages do not arrive during the time defined by the timer, then the reservation is considered as being failed. Note that the timer has to be pre-configured and it has to have the same value in the RMD domain. In this case an end-to-end RESPONSE message is sent towards the QNE ingress with the following INFO_SPEC values: Error Class: Transient Failure Error Code: Mismatch synchronization between end-to-end RESERVE and intra-domain RESERVE When the intra-domain RESERVE(RMD-QSpec) is received by the QNE Egress node of the session associated with the intra-domain RESERVE(RMD-QSpec) (the PHB session) with the session included in its <BOUND_SESSION_ID> object MUST be bound according to the specification given in [QoS-NSLP]. The SESSION_ID included in the BOUND_SESSION_ID parameter stored in the intra-domain QoS-NSLP operational state object is the SESSION_ID of the session associated with the end-to-end RESERVE message(s). Note that if the QNE Edge nodes maintain per flow intra-domain QoS NSLP operational states then the value of Binding_Code = (Tunnel and end-to-end sessions) is used If the QNE Edge nodes maintain per aggregated QoS-NSLP intra-domain reservation states then the value of Binding_Code = (Aggregated sessions), see Sections 4.3.1, 4.3.2. Note that when the interior nodes are using mechanisms to admit high priority session through preemption of ongoing lower priority sessions, the QNE Egress mechanisms of solving the congestion on a low priority traffic PHB may use the solution specified in Section 4.6.1.6.2.2. The end-to-end RESERVE message is generated/forwarded further upstream according to the [QoS-NSLP] and [QSP-T] specifications. Note that if the tunneled end-to-end QSpec contains one or more parameters with the "R" flag and "Q" flag set then also the "R" flag and "Q" flag contained in the end-to-end QSpec, which is carried by the generated/forwarded end-to-end RESERVE message MUST be set. Furthermore, the "B" (BREAK) QoS-NSLP flag in the end to end RESERVE message MUST not be set. Bader, et al. [Page 35]
INTERNET-DRAFT RMD-QOSM If the binding between the intra-domain session and the end-to-end session uses a Binding_Code is (Tunnel and end-to-end sessions), then the QNE Egress MUST wait for the end-to-end RESPONSE message that has the same SESSION ID and RII object as the end-to-end RESERVE message forwarded towards QNR, see [QOS-NSLP]. The non-default values of the objects contained in the end-to-end RESPONSE(PDR) message MUST be used and/or set by the QNE Egress as follows: * the values of the <RII/RSN>, <INFO_SPEC>, [ QSPEC ] objects are set according to [QoS-NSLP] and/or [QSP-T]. The INFO_SPEC object SHOULD be set by the QoS-NSLP functionality. In case of successful reservation the INFO_SPEC object SHOULD have the following values: Error Severity Class: Success, Error Code value: Reservation successful, Furthermore, an end-to-end QSpec object MUST be included in the RESPONSE message. The parameters included in the QSPEC <QoS Reserved> object are copied from the original <QoS Desired> values. QNE (Ingress) QNE (Interior) QNE (Interior) QNE (Egress) NTLP stateful NTLP stateless NTLP stateless NTLP stateful | | | | RESERVE | | | --->| | | RESERVE | |------------------------------------------------------------>| |RESERVE(RMD-QSpec) | | | |------------------->| | | | |RESERVE(RMD-QSpec) | | | |------------------>| | | | | RESERVE(RMD-QSpec) | | | |------------------->| | | | RESERVE | | | |--> | | | RESPONSE | | | |<-- | |RESPONSE(PDR) | | |<------------------------------------------------------------| RESPONSE | | | <---| | | | Figure 8: Basic operation of successful reservation procedure used by the RMD-QOSM In addition to the above, the QNE Egress MUST also generate a RMD- QSpec object that is carried by the end-to-end RESPONSE(PDR) message, see Section 4.2. Note that this method of QSpec stacking is in a session only applied for the initial RESERVE - RESPONSE messages. The refresh scenario uses a different procedure, see Section 4.6.1.3.3. Bader, et al. [Page 36]
INTERNET-DRAFT RMD-QOSM The following parameters of the RMD-QSpec object MUST be used and/or set in the following way: * the value of the Parameter/Container ID field of the PDR container MUST be set "PDR_7" (i.e., PDR_Reservation_Report); * the value of the <M> field of the PDR container MUST be equal to the value of the <M> parameter of the PHR container that was carried by its associated intra-domain RESERVE(RMD-QSpec) message. The end-to-end RESPONSE(PDR) message are delivered as normal, i.e., is addressed and sent to its upstream QoS-NSLP neighbor, i.e., QNE Ingress node. If the binding between the intra-domain session and the end-to-end session uses a Binding_Code is (Aggregated sessions), and there is no aggregated QoS-NSLP operational state associated with the intra- domain session available, then the QNE egress MUST generate an (aggregated) intra-domain RESPONSE message. Note that if such an operational and reservation state is already available, then the RMD modification of aggregated reservation procedure described in Section 4.6.1.4 is used. The intra-domain RESPONSE (RMD-QSpec) message MUST be sent to the QNE Ingress node, i.e., the previous stateful hop by using the procedures described in Sections 4.4 and 4.5. The values of the RMD-QSpec that is carried by the (aggregated) intra-domain RESPONSE message are generated and set similar to the method described above and followed for the generation of the RMD- QSpec carried by an end-to-end RESPONSE(PDR) message. Furthermore, the RII object carried by the intra-domain RESERVE message, see Section 4.6.1.1.1, has to be copied and carried by the intra-domain RESPONSE message. 4.6.1.2. Unsuccessful reservation This section describes the operation where a request for reservation cannot be satisfied by the RMD-QOSM. The QNE Ingress, the QNE Interior and QNE Egress nodes process and forward the end-to-end RESERVE message and the intra-domain RESERVE(RMD-QSpec) message in a similar way as specified in Section 4.6.1.1. The main difference between the unsuccessful operation and successful operation is that one of the QNE nodes does not admit the request due to lack of resources. This also means that the QNE edge node MUST NOT forward the end-to-end RESERVE message towards the QNR node. Bader, et al. [Page 37]
INTERNET-DRAFT RMD-QOSM Note that the described functionality applies to the RMD reservation- Based methods, see Sections 4.3.1, 4.3.2, and to the NSIS measurement-based admission control method, see Section 4.3.2. The QNE Edge nodes maintain either per flow QoS-NSLP reservation states or aggregated QoS-NSLP reservation states. When the QNE edges maintain aggregated QoS-NSLP reservation states, the RMD-QOSM functionality may accomplish a RMD modification procedure (see Section 4.6.1.4), instead of the reservation initiation procedure that is described in this subsection. 4.6.1.2.1 Operation in the Ingress nodes When an end-to-end RESERVE message arrives at the QNE Ingress and if there are no resources available, the QNE Ingress MUST reject this end-to-end RESERVE message and send an end-to-end RESPONSE message back to the sender, as described in the QoS-NSLP specification, see [QoS-NSLP] and [QSP-T]. When an end-to-end RESPONSE(PDR) message is received by an Ingress node, see Section 4.6.1.2.3, the values of the <RII/RSN>, [<INFO_SPEC> ], [<QSPEC>] objects are processed according to the QoS- NSLP procedures. The RMD-QSpec object, see Section 4.2, has to be processed and removed. The RMD Resource Management Function (RMF) is notified by reading the <M> parameter of the PDR container that the reservation has been unsuccessful. If the end-to-end RESPONSE message has to be forwarded upstream to a node outside the RMD-QOSM aware domain then the values of the objects contained in this message (i.e., <RII/RSN>, <INFO_SPEC>, [ *QSPEC ]) MUST be set by the QOS-NSLP protocol functions of the QNE. When an (aggregated) intra-domain RESPONSE(PDR) message is received by the QNE Ingress node, which was sent by a QNE Egress, see Section 4.6.1.2.3, it uses the QoS-NSLP procedures to match it to the earlier sent intra-domain RESERVE message. After this phase, the RMD-QSpec has to be identified and processed. Note that in this case the RMD Resource Management Function (RMF) is notified that the reservation has been unsuccessful, by reading the <M> parameter of the PDR container. Note that when the QNE edges maintain a per flow QoS-NSLP reservation state the RMD-QOSM functionality, has to start an RMD release procedure (see Section 4.6.1.5). When the QNE edges maintain aggregated QoS-NSLP reservation states the RMD-QOSM functionality MAY start a RMD modification procedures (see Section 4.6.1.4). Bader, et al. [Page 38]
INTERNET-DRAFT RMD-QOSM 4.6.1.2.2 Operation in the Interior nodes In case of the RMD reservation based scenario, and if the intra-domain reservation request is not admitted by the QNE Interior node then the <Hop_U> and <M> parameters of the PHR container MUST be set to "1". The <Admitted Hops> counter MUST NOT be increased. Furthermore, the "E" flag associated with the QSpec <QoS Desired> object and the "E" flag associated with the <Bandwidth> parameter SHOULD be set. In case of the RMD measurement based scenario, the <M> parameter of the PHR container MUST be set to "1". Furthermore, the "E" flag associated with the QSpec <QoS Desired> object and the "E" flag associated with the <Bandwidth> parameter SHOULD be set. Note that the <M> flag seems to be set in a similar way as the "E" flag used by the <Bandwidth> parameter. However, the ways of how the two flags are processed by a QNE are different. In general, if a QNE Interior node receives a QSpec <Bandwidth> parameter with the "E" flag set and a PHR container type "PHR_Resource_Request", with the <M> parameter set to "1", then this PHR container and the "RMD QoS Description" field MUST NOT be processed. 4.6.1.2.3 Operation in the Egress nodes In the RMD reservation based, see Sections 4.3.3, and the RMD NSIS measurement based scenario, see Section 4.3.2, when the <M> marked intra-domain RESERVE(RMD-QSpec) is received by the QNE Egress node (see Figure 9) the session associated with the intra-domain RESERVE(RMD-QSpec) (the PHB session) and the end-to-end session MUST be bound. When the QNE Egress uses per flow intra-domain QoS-NSLP operational states, see Section 4.3.2 and 4.3.3, then the QNE Egress node MUST generate an end-to-end RESPONSE message that has to be sent to its previous stateful QoS-NSLP hop. * the values of the <RII/RSN>, <INFO_SPEC> objects are set by the standard QoS-NSLP protocol functions. In case of the unsuccessful reservation the INFO_SPEC object SHOULD have the following values: Error Severity Class: Transient Failure Error Code value: Reservation failure The QSpec that was carried by the end to end RESERVE belonging to the same session as this end-to-end RESPONSE is included in this message. The parameters included in the QSpec <QoS Reserved> object are copied from the original <QoS Desired> values. The "E" flag associated with the QSpec <QoS Reserved> object and the "E" flag associated with the <Token Bucket> or <Bandwidth> parameter are set. Bader, et al. [Page 39]
INTERNET-DRAFT RMD-QOSM QNE (Ingress) QNE (Interior) QNE (Interior) QNE (Egress) NTLP stateful NTLP stateless NTLP stateless NTLP stateful | | | | RESERVE | | | --->| | | RESERVE | |------------------------------------------------------------>| |RESERVE(RMD-QSpec:M=0) | | |------------------->| | | | |RESERVE(RMD-QSpec:M =1) | | |------------------>| | | | | RESERVE(RMD-QSpec:M=1) | | |------------------->| | |RESPONSE(PDR) | | |<------------------------------------------------------------| RESPONSE | | | <---| | | | RESERVE(RMD-QSpec: Tear=1, M=1, <Admitted Hops>=<Max_Admitted Hops> |------------------->| | | Figure 9: Basic operation during unsuccessful reservation initiation used by the RMD-QOSM In addition to the above, similarly to the successful operation, see Section 4.6.1.1.3, the QNE Egress MUST also generate an RMD-QSpec object that is carried by the end-to-end RESPONSE message. The following fields of the RMD-QSpec object MUST be used and/or set in the following way: * the value of the <PDR Control Type> of the PDR container MUST be set to "PDR_7" (PDR_Reservation_Report); * the value of the <Admitted Hops> parameter of the PHR container included in the received <M> marked PDR container MUST be included in the <Max_Admitted Hops> parameter of the PDR container; * the value of the <M> parameter of the PDR container MUST be set to "1". When the QNE Egress uses per aggregate intra-domain QoS-NSLP operational states, see Section 4.3.1, then the QNE Egress node MUST generate an (aggregated) intra-domain RESPONSE message that has to be sent to its previous stateful QoS-NSLP hop. The values of the <RII/RSN>, <INFO_SPEC> objects are set by the standard QoS- NSLP protocol functions. In case of the unsuccessful reservation the INFO_SPEC object SHOULD have the following values: Error Severity Class: Transient Failure Error Code value: Reservation failure Bader, et al. [Page 40]
INTERNET-DRAFT RMD-QOSM The values of the RMD-QSpec MUST be set in the same way as described above, for the situation that the QNE Egress uses per flow QoS-NSLP operational states. 4.6.1.3 RMD refresh reservation In case of RMD measurement-based method, see Section 4.3.2, QoS-NSLP reservation states in the RMD domain are typically not maintained, therefore, this method typically does not use an intra-domain refresh procedure. However, there are measurement based optimization schemes, see [GrTs03], which may use the refresh procedures described in Sections 4.6.1.3.1, and 4.6.1.3.3. However, this measurement based optimization schemes can only be applied in the RMD domain if the QNE edges are configured to perform intra-domain refresh procedures and if all the QNE interior nodes are configured to perform the measurement based optimization schemes. In the description given in this subsection it is assumed that the RMD measurement based scheme does not use the refresh procedures. When the QNE edges maintain aggregated or per flow QoS-NSLP operational and reservation states, see Sections 4.3.1 and 4.3.3, then the refresh procedures are very similar. If the RESERVE messages arrive within the soft state time-out period, the corresponding number of resource units are not removed. However, the transmission of the intra-domain and end-to-end (refresh) RESERVE message are not necessarily synchronized. Furthermore, the generation of the end-to-end RESERVE message, by the QNE edges, depends on the locally maintained refreshed interval (see [QoS-NSLP]). 4.6.1.3.1 Operation in the Ingress node The Ingress node MUST be able to generate an intra-domain (refresh) RESERVE(RMD-QSpec) at any time defined by the refresh period/timer. Before generating this message, the RMD QoS signaling model functionality is using the RMD traffic class (PHR) resource units for refreshing the RMD traffic class state. Note that the RMD traffic class refresh periods MUST be equal in all QNE edge and QNE Interior nodes and SHOULD be smaller (default: more than two times smaller) than the refresh period at the QNE Ingress node used by the end-to-end RESERVE message. The intra-domain RESERVE (RMD-QSpec) message MUST include a "RMD QoS Description" field and a PHR container (i.e., PHR_Refresh_Update). An example of this RMD specific refresh operation can be seen in Figure 10. Bader, et al. [Page 41]
INTERNET-DRAFT RMD-QOSM QNE (Ingress) QNE (Interior) QNE (Interior) QNE (Egress) NTLP stateful NTLP stateless NTLP stateless NTLP stateful | | | | |RESERVE(RMD-QSpec) | | | |------------------->| | | | |RESERVE(RMD-QSpec) | | | |------------------>| | | | | RESERVE(RMD-QSpec) | | | |------------------->| | | | | | |RESPONSE(RMD-QSpec)| | |<------------------------------------------------------------| | | | | Figure 10: Basic operation of RMD specific refresh procedure Most of the non-default values of the objects contained in this message MUST be used and set by the QNE Ingress in the same way as described in Section 4.6.1.1. The following objects are used and/or set differently: * The flag REPLACE MUST be set to FALSE = 0; * the PHR resource units MUST be included into the <Bandwidth> parameter. The value of the <Bandwidth> parameter depends on how the different inter domain (end-to-end) flows are aggregated by the QNE Ingress node (e.g., the sum of all the PHR requested resources of the aggregated flows), see Section 4.3.1. If no QOS-NSLP aggregation is accomplished by the QNE Ingress node, the value of the <Bandwidth> parameter SHOULD be equal to the <Bandwidth> parameter of its associated new (initial) intra-domain RESERVE (RMD-QSpec) message, see Section 4.3.3. ; * the value of the Parameter/Container field of the "PHR RMD-QOSM control information" container MUST be set to "PHR_2", i.e., "PHR_Refresh_Update"; * In a single-domain case the PDR container field is not needed in the message. * the value of the <RII> object, MUST be included in the intra- domain RESERVE message. Its value is calculated according to [QoS-NSLP]. When the intra-domain RESPONSE (RMD-QSpec) message, see Section 4.6.1.3.3., is received by the QNE Ingress node, then: * the values of the <RII/RSN>, <INFO_SPEC>, [*QSPEC] objects are processed by the standard QoS-NSLP protocol functions (see Section 4.6.1.1); Bader, et al. [Page 42]
INTERNET-DRAFT RMD-QOSM * the PDR has to be processed and removed by the RMD-QOSM functionality in the QNE Ingress node. The RMD-QOSM functionality is notified by the <PDR M> parameter of the PDR container that the refresh procedure has been successful or unsuccessful. All session(s) (when aggregated QoS-NSLP operational and reservation states are used, see Section 4.3.1, there will be more than one sessions) associated with this RMD specific refresh session MUST be informed about the success or failure of the refresh procedure. In case of failure, the QNE Ingress node has to generate (in a standard QoS-NSLP way) an error end-to-end RESPONSE message that will be sent towards QNI. 4.6.1.3.2 Operation in the Interior node The intra-domain RESERVE (RMD-QSpec) message is received and processed by the QNE Interior nodes. Any QNE edge or QNE Interior node that receives a "PHR_Refresh_Update" control information field MUST identify the traffic class state (PHB) (using the <PHB Class> parameter). Most of the parameters in this refresh intra-domain RESERVE (RMD-QSpec) message MUST be used and/or set by a QNE Interior node in the same way as described in Section 4.6.1.1. The following objects are used and/or set differently: * the value of <Bandwidth> parameter of the "RMD QoS Description" field is used by the QNE Interior node for refreshing the RMD traffic class state. These resources (included in <Bandwidth>), if reserved, are added to the currently reserved resources per PHB and therefore they will become a part of the per traffic class (per-PHB) reservation state, see Sections 4.3.1 and 4.3.3. If the refresh procedure cannot be fulfilled then the <M> parameter of the PHR container MUST be set to "1". Furthermore, the "E" flag associated with <QoS Desired> object and the "E" flag associated with the <Bandwidth> parameter SHOULD be set. Any PHR container of type "PHR_Refresh_Update", and its associated "RMD QoS Description" field (i.e., <Bandwidth>), whether it is marked or not and independent of the "E" flag value of the <Bandwdith> parameter, is always processed, but marked bits are not changed. 4.6.1.3.3 Operation in the Egress node The intra-domain RESERVE(RMD-QSpec) message is received and processed by the QNE Egress node. A new intra-domain RESPONSE (RMD-QSpec) message is generated by the QNE Egress node and MUST include a PDR (type PDR_Refresh_Report). Bader, et al. [Page 43]
INTERNET-DRAFT RMD-QOSM The (refresh) intra-domain RESPONSE (RMD-QSpec) message MUST be sent to the QNE Ingress node, i.e., the previous stateful hop. The (refresh) intra-domain RESPONSE (RMD-QSpec) message MUST be explicitly routed to the QNE Ingress node, i.e., the previous stateful hop, using the procedures described in Section 4.5. * the values of the <RII/RSN>, <INFO_SPEC> objects are set by the standard QoS-NSLP protocol functions, see [QoS-NSLP]. * The value of the <PDR Control Type> parameter of the PDR container MUST be set "PDR_8" (i.e. PDR_Refresh_Report). In case of successful reservation the INFO_SPEC object SHOULD have the following values: Error Severity Class: Success Error Code value: Reservation successful * In case of unsuccessful reservation the INFO_SPEC object SHOULD have the following values: Error Severity Class: Transient Failure Error Code value: Reservation failure The RMD-QSpec that was carried by the intra-domain RESERVE belonging to the same session as this intra-domain RESPONSE is included in the intra-domain RESPONSE message. The parameters included in the QSPec <QoS Reserved> object are copied from the original <QoS Desired> values. If the reservation is unsuccessful then "E" flag associated with the QSpec <QoS Reserved> object and the "E" flag associated with the <Bandwidth> parameter are set. 4.6.1.4. RMD modification of aggregated reservations In the case when the QNE edges maintain QoS-NSLP aggregated operational and reservation states and the aggregated reservation has to be modified (see Section 4.3.1) the following procedure is applied: * When the modification request requires an increase of the reserved resources, the QNE Ingress node MUST include the corresponding value into the <Bandwidth> parameter of the "RMD QoS Description" field, which is sent together with a "PHR_Resource_Request" control information. If a QNE edge or QNE Interior node is not able to reserve the number of requested resources, the "PHR_Resource_Request" control information that is associated with the <Bandwidth> parameter MUST be marked. In this situation the RMD specific operation for unsuccessful reservation will be applied (see Section 4.6.1.2). Bader, et al. [Page 44]
INTERNET-DRAFT RMD-QOSM * When the modification request requires a decrease of the reserved resources, the QNE Ingress node MUST include this value into the <Bandwidth> parameter of the "RMD QoS Description" field. Subsequently an RMD release procedure SHOULD be accomplished (see Section 4.6.1.5). 4.6.1.5 RMD release procedure This procedure is applied to all RMD mechanisms that maintain reservation states. If a refresh RESERVE message does not arrive at a QNE Interior node within the refresh time-out period then the resources associated with this message are removed. This soft state behavior provides certain robustness for the system ensuring that unused resources are not reserved for long time. Resources can be removed by an explicit release at any time. However, in the situation that an end-to-end (tear) RESERVE is retransmitted, see Section 5.2.4 in [QoS-NSLP], then this message MUST not initiate an intra-domain (tear) RESERVE message. This is because the RMF values related to the end-to-end (tear) RESERVE message have been already released during the process of the original (initial) end-to-end (tear) RESERVE message. When the RMD-RMF of a QNE edge or QNE Interior node processes a "PHR_Release_Request" control information it MUST identify the <PHB Class> parameter and estimate the time period that elapsed after the previous refresh, see also Section 3 of [CsTa05]. This MAY be done by indicating the time lag, say "T_lag", between the last sent "PHR_Refresh_Update" and the "PHR_Release_Request" control information container by the QNE Ingress node. The value of "T_Lag" is first normalized to the length of the refresh period, say "T_period". The ratio between the "T_Lag" and the length of the refresh period, "T_period", is calculated. This ratio is then introduced into the <Time Lag> field of the "PHR_Release_Request" control information. When a node (QNE edge or QNE Interior) receives the "PHR_Release_Request" control information, it MUST store the arrival time. Then it MUST calculate the time difference, "Tdiff", between the arrival time and the start of the current refresh period, "T_period". Furthermore, this node MUST derive the value of the "T_Lag", from the <Time Lag> parameter. "T_Lag" can be found by multiplying the value included in the <Time Lag> parameter with the length of the refresh period, "T_period". If the derived time lag, "T_lag", is smaller than the calculated time difference, "T_diff", then this node MUST decrease the PHB reservation state with the number of resource units indicated in the <Bandwidth> parameter of the "RMD QoS Description" field that has been sent together with the "PHR_Release_Request" control information container, but not below zero. Bader, et al. [Page 45]
INTERNET-DRAFT RMD-QOSM An RMD specific release procedure can be triggered by an end-to-end RESERVE with a TEAR flag set ON (see Section 4.6.1.5.1) or it can be triggered by either an intra-domain RESPONSE, an end-to-end RESPONSE or an end-to-end NOTIFY message that includes a marked (i.e., PDR <M> and/or PDR <S> parameters are set ON) "PDR_Reservation_Report" or "PDR_Congestion_Report" and/or an INFO_SPEC object. 4.6.1.5.1. Triggered by a RESERVE message This RMD explicit release procedure can be triggered by a tear (TEAR flag set ON) end-to-end RESERVE message. When a tear (TEAR flag set ON) end-to-end RESERVE message arrives to the QNE Ingress then the QNE Ingress node SHOULD process the message in a standard QoS-NSLP way (see [QoS-NSLP]). In addition to this, the RMD RMF is notified, as specified in [QoS-NSLP]. Same as for the scenario described in Section 4.6.1.1, a bypassing procedure has to be initiated by the QNE Ingress node. The bypassing procedure is performed according to the description given in Section 4.4. At the QNE Ingress the end-to-end RESERVE message is marked, i.e., modifying the QoS- NSLP default NSLP-ID value to another NSLP-ID predefined value, which corresponds to a RAO value that will be used by the GIST message that carries the end-to-end RESPONSE message to bypass the QNE Interior nodes. It will generate an intra-domain RESERVE(RMD-QSpec) message. Before generating this message, the RMD RMF is using the RMD traffic class (PHR) resources (specified in <Bandwidth>) and the PHB type (specified in <PHB Class>) for a RMD release procedure. This can be achieved by subtracting the amount of the requested resources from the total reserved amount of resources stored in the RMD traffic class state. QNE (Ingress) QNE (Interior) QNE (Interior) QNE (Egress) NTLP stateful NTLP stateless NTLP stateless NTLP stateful | | | | RESERVE | | | --->| | | RESERVE | |------------------------------------------------------------>| |RESERVE(RMD-QSpec:Tear=1) | | |------------------->| | | | |RESERVE(RMD-QSpec:Tear=1) | | |------------------->| | | | RESERVE(RMD-QSpec:Tear=1) | | |------------------->| | | | RESERVE | | | |--> | | | Figure 11: Explicit release triggered by RESERVE used by the RMD-QOSM Bader, et al. [Page 46]
INTERNET-DRAFT RMD-QOSM The intra-domain RESERVE (RMD-QSpec) message MUST include a "RMD QoS Description" field and a PHR container, (i.e., "PHR_Resource_Release") and it MAY include a PDR container, (i.e., PDR_Release_Request). An example of this operation can be seen in Figure 11. Most of the non default values of the objects contained in the tear intra-domain RESERVE message are set by the QNE Ingress node in the same way as described in Section 4.6.1.1. The following objects are set differently: * The flag REPLACE MUST be set to FALSE = 0; * The <RII> object MUST not be included in this message. This is because the QNE Ingress node does not need to receive a response from the QNE Egress node; * the TEAR flag MUST be set to ON; * the PHR resource units MUST be included into the <Bandwidth> parameter of the "RMD QoS Description" field; * the value of the <Admitted Hops> parameter MUST be set to "1"; * the value of the <Time Lag> parameter of the PHR container is calculated by the RMD-QOSM functionality (see 4.6.1.5)the value of the <Control Type> parameter of PHR container is set to "PHR_3" (i.e., PHR_Resource_Release). The intra-domain tear RESERVE (RMD-QSpec) message is received and processed by the QNE Interior nodes. Most of the non-default values of the objects contained in this refresh intra-domain RESERVE (RMD-QSpec) message are set by a QNE Interior node in the same way as described in Section 4.6.1.1. The following objects are set and processed differently: * Any QNE Interior node that receives the combination of the "RMD QoS Description" field and the "PHR_Resource_Release" control information container MUST identify the traffic class (PHB) and release the requested resources included in the <Bandwidth> parameter. This can be achieved by subtracting the amount of RMD traffic class requested resources, included in the <Bandwidth> parameter, from the total reserved amount of resources stored in the RMD traffic class state. The value of the <Time Lag> parameter of the "PHR_Resource_Release" container is used during the release procedure as explained in Section 4.6.1.5. The intra-domain tear RESERVE (RMD-QSpec) message is received and processed by the QNE Egress node. The "RMD QoS Description" and the "PHR RMD-QOSM control " container (and if available the "PDR RMD-QOSM control information" container) are read and processed by the RMD QoS node. Bader, et al. [Page 47]
INTERNET-DRAFT RMD-QOSM The value of the <Bandwidth> parameter of the "RMD QoS Description" field and the value of the <Time Lag> field of the PHR container MUST be used by the RMD release procedure. This can be achieved by subtracting the amount of RMD traffic class requested resources, included in the <Bandwidth> parameter, from the total reserved amount of resources stored in the RMD traffic class state. The end-to-end RESERVE message is forwarded by the next hop (i.e., the QNE Egress) only if the intra-domain tear RESERVE (RMD-QSpec) message arrives at the QNE Egress node. Furthermore, the QNE Egress MUST stop the marking process that was used to bypass the QNE Interior nodes by reassigning the QoS-NSLP default NSLP-ID value to the end-to-end RESERVE message, see Section 4.4. Note that when the QNE edges maintain aggregated QoS-NSLP reservation states the RMD-QOSM functionality may start a RMD modification procedures (see Section 4.6.1.4) that uses the explicit release procedure described above, in this subsection. 4.6.1.5.2 Triggered by a marked RESPONSE or NOTIFY message This RMD explicit release procedure can be triggered by either an end-to-end RESPONSE message with a <M> marked PDR container (see Section 4.6.1.2) an intra-domain RESPONSE message with a <S> marked PDR container (see Section 4.6.1.6.1) or an end to end NOTIFY message (see Section 4.6.1.6) with an INFO_SPEC object with the following values: Error Severity Class: Informational Error Code value: Congestion situation When the aggregated intra-domain QoS-NSLP operational states are used then an end-to-end NOTIFY message used to trigger an RMD release procedure may contain a PDR container that carries a <S> marked and a bandwidth value in the <PDR Bandwidth> parameter included in a "PDR_Congestion_Report" container. The RMD specific release procedure that is triggered by an end-to-end RESPONSE message with a <M> marked PDR container (see Section 4.6.1.2) can be terminated at any QNE edge or any QNE Interior node using the <Max_Admitted Hops> field. Bader, et al. [Page 48]
INTERNET-DRAFT RMD-QOSM The RMD specific explicit release procedure that is terminated at a QNE Interior (or QNE edge) node is denoted as RMD partial release procedure. This explicit release procedure can be used, for example, during a RMD specific operation for unsuccessful reservation (see Section 4.6.1.2). When the RMD QoS signaling model functionality of a QNE Ingress node receives a <M> or <S> marked PDR container of type "PDR_Reservation_Report" or "PDR_Congestion_Report", it MUST start an RMD partial release procedure. The QNE Ingress node generates an intra-domain RESERVE (RMD-QSpec) message. Before generating this message, the RMD-QOSM functionality is using the RMD traffic class (PHR) resource units for a RMD release procedure. This can be achieved by subtracting the amount of RMD traffic class requested resources from the total reserved amount of resources stored in the RMD traffic class state. Furthermore, note that the tear intra-domain RESERVE message is generated as it is shown in Figure 12, when it is triggered by either an end-to-end NOTIFY message or RESPONSE message that do not carry a PDR container, but an INFO_SPEC object. The error code values carried by this NOTIFY message are: Error Severity Class: Informational Error Code value: Congestion situation An example of this message exchange can be seen in Figure 12. Most of the non-default values of the objects contained in the tear intra-domain RESERVE(RMD-QSpec) message are set by the QNE Ingress node in the same way as described in Section 4.6.1.1. The following objects MUST be used and/or set differently: * The flag REPLACE MUST be set to FALSE; * The value of the <M> parameter of the PHR container MUST be set to "1". * the value of the <S> parameter of the PHR container MUST be set to "1". * The RESERVE message MAY include a PDR container. Note that this could be needed in the situation that a bi-directional scenario is used, see Section 4.6.2. Bader, et al. [Page 49]
INTERNET-DRAFT RMD-QOSM QNE (Ingress) QNE (Interior) QNE (Interior) QNE (Egress) NTLP stateful NTLP stateless NTLP stateless NTLP stateful | | | | | NOTIFY | | | |<-------------------------------------------------------| |RESERVE(RMD-QSpec:Tear=1,M=1,S=SET) | | | ---------------->|RESERVE(RMD-QSpec:Tear=1, M=1,S=SET) | | | | | | |----------------->| | | | RESERVE(RMD-QSpec:Tear=1, M=1,S=SET) | | |----------------->| Figure 12: Basic operation during RMD explicit release procedure triggered by NOTIFY used by the RMD-QOSM When the generation of the intra-domain RESERVE(RMD-QSpec) message is triggered by an end-to-end RESPONSE(PDR) message, wich carries a <M> marked "PDR_7" (PDR Reservation_Report) container, then this generated intra-domain RESERVE(RMD-QSpec) message MUST include a <RMD QoS Description> field and a PDR container, (i.e., PHR_Resource_Release) and it MAY include a PDR container, (i.e., PDR_Release_Request). An example of this operation can be seen in Figure 13. Most of the non-default values of the objects contained in the tear intra-domain RESERVE(RMD-QSpec) message are set by the QNE Ingress node in the same way as described in Section 4.6.1.1. The following objects MUST be used and/or set differently: * The flag REPLACE MUST be set to FALSE; * The value of the <M> parameter of the PHR container MUST be set to "1". * The RESERVE message MAY include a PDR container. * When the tear intra-domain RESERVE message is triggered by an intra-domain RESPONSE(RMD-QSpec) message, then the value of the <Max Admitted Hops> parameter of the PDR container included in the received <M> or <S> marked intra-domain RESPONSE(PDR) message MUST be included in the <Max Admitted Hops> parameter of the PDR container of the RESERVE message. Note that this procedure is applied for the severe congestion handling by the RMD-QOSM refresh procedure (see Section 4.6.1.6.1). The tear intra-domain RESERVE message propagates in this case until the QNE egress (similar to Figure 12). Bader, et al. [Page 50]
INTERNET-DRAFT RMD-QOSM QNE (Ingress) QNE (Interior) QNE (Interior) QNE (Egress) Node that marked PHR_Resource_Request <PHR> object NTLP stateful NTLP stateless NTLP stateless NTLP stateful | | | | | | | | | RESPONSE (RMD-QSpec: M=1) | |<------------------------------------------------------------| RESERVE(RMD-QSpec: Tear=1, M=1, <Admitted Hops>=<Max_Admitted Hops>) |------------------->| | | | | | | Figure 13: Basic operation during RMD explicit release procedure Triggered by RESPONSE used by the RMD-QOSM Any QNE edge or QNE Interior node that receives the "RMD QoS Description" field and the PHR container MUST identify the traffic class state (PHB), using the <PHB Class> parameter, and release the requested resources included in the <Bandwidth> field. This can be achieved by subtracting the amount of RMD traffic class requested resources, included in the <Bandwidth> field, from the total reserved amount of resources stored in the RMD traffic class state. The value of the <Time Lag> parameter of the PHR field is used during the release procedure as explained in Section 4.6.1.5. The <Admitted Hops> value included in the PHR container is increased by one. If the value of <M> parameter of the "PHR_Resource_Release" control information container is "1" and if the value of the <S> parameter is set to "0" then the <Max_Admitted Hops> value included in the PDR container MUST be compared with the calculated <Admitted Hops> value. When these two values are equal then the intra-domain RESERVE(RMD-QSpec) has to be terminated and it will not be forwarded downstream. The reason of this is that the QNE node that is currently processing this message was the last QNE node that successfully processed the "RMD QoS Description" field and PHR container of its associated initial reservation request (i.e., initial intra-domain RESERVE(RMD-QSpec) message). Its next QNE downstream node was unable to successfully process the initial Bader, et al. [Page 51]
INTERNET-DRAFT RMD-QOSM reservation request, therefore, this QNE node marked the <M> parameter of the "PHR_Resource_Request" control information. When the values of the <M> and <S> parameters are set to "0", then this message will not be terminated by a QNE Interior node, but it will be forwarded in the downstream direction. The QNE Egress node will receive and process the PHR_Resource_Release control information. Afterwards, the QNE Egress node MUST terminate the intra-domain RESERVE(RMD-QSpec) message. Note that the above described procedure applies to the situation that the QNE edges maintain a per flow QoS-NSLP reservation state. When the QNE edges maintain aggregated intra-domain QoS-NSLP operational states and a severe congestion occurs, then the QNE Ingress may receive an end to end NOTIFY message (see Section 4.6.1.6) with a PDR container that carries a <S> marked and a bandwidth value in the <PDR Bandwidth> parameter included in a "PDR_Congestion_Report" container. Furthermore the same end-to-end NOTIFY message carries an INFO_SPEC object with the following values: Error Severity Class: Informational Error Code value: Congestion situation The end-to-end session associated with this NOTIFY message maintains the BOUND_SESSION_ID of the bound aggregated session, see Sections 4.3.1. The RMD-QOSM at QNE Ingress MUST start a RMD modification procedures (see Section 4.6.1.4) that uses the RMD explicit release procedure described above in this section. In particular, the RMD explicit release procedure releases the bandwidth value included in the <PDR Bandwidth> parameter, within the "PDR_Congestion_Report" container, from the reserved bandwidth associated with the aggregated intra-domain QoS-NSLP operational state. 4.6.1.6. Severe congestion handling This section describes the operation of the RMD-QOSM when a severe congestion occurs within the Diffserv domain. When a failure in a communication path, e.g., a router or a link failure occurs, the routing algorithms will adapt to failures by changing the routing decisions to reflect changes in the topology and traffic volume. As a result, the re-routed traffic will follow a new path, which may result in overloaded nodes as they need to support more traffic. This may cause severe congestion in the communication path. In this situation the available resources, are not enough to meet the required QoS for all the flows along the new path. Therefore, one or more flows SHOULD be terminated, or forwarded in a lower priority queue. Bader, et al. [Page 52]
INTERNET-DRAFT RMD-QOSM Interior nodes notify edge nodes by data marking or marking the refresh messages. 4.6.1.6.1 Severe congestion handling by the RMD-QOSM refresh procedure This procedure applies to all RMD scenarios that use a RMD refresh procedure. The QoS-NSLP and RMD are able to cope with congested situations using the refresh procedure, see Section 4.6.1.3. If the refresh is not successful in an QNE Interior node, edge nodes are notified by "S" marking the refresh messages and by including the percentage of overload into the <Overload %> field in the "PHR_Refresh_Update" container, carried by the intra-domain RESERVE message. The intra-domain RESPONSE message that is sent by the QNE Egress towards QNE Ingress will contain a PDR container with a Parameter/Container ID = PDR_10, i.e., "PDR_Congestion_Report". The values of the <S> and <Overload %> fields of this container should be set equal to the values of the <S> and <Overload %> fields, respectively, carried by the "PHR_Refresh_Update" container. Part of the flows, corresponding to the <Overload %>, are terminated, or forwarded in a lower priority queue. Note that an example of how this value can be calculated is given in appendix A.1.1 and denoted as the signaled_overload_rate parameter. The flows can be terminated by the RMD release procedure described in Section 4.6.1.5. Note that the above described functionality applies to the RMD reservation- based and to the NSIS measurement-based admission control schemes. Furthermore, note that the above functionalities apply also for the scenario where the QNE Edge nodes maintain either per flow QoS-NSLP reservation states or aggregated QoS-NSLP reservation states. In general, relying on the soft state refresh mechanism solves the congestion within the time frame of the refresh period. If this mechanism is not fast enough additional functions should be used, which are described in Section 4.6.1.6.2. 4.6.1.6.2 Severe congestion handling by proportional data packet marking This severe congestion handling method requires the following functionalities. 4.6.1.6.2.1 Operation in the Interior nodes The detection and marking/remarking functionality described in this section applies to NSIS aware, but also to NSIS unaware nodes. This means however, that the "not NSIS aware" nodes must be configured such that they can detect the congestion/severe congestion situations and remark packets in the same way as the "NSIS aware" nodes do. Bader, et al. [Page 53]
INTERNET-DRAFT RMD-QOSM The Interior node detecting severe congestion remarks data packets passing the node. For this remarking, two additional DSCPs can be allocated for each traffic class. One DSCP MAY be used to indicate that the packet passed a congested node. This type of DSCP is denoted in this document as "affected DSCP" and is used to indicate that a packet passed through a severe congested node. The use of this DSCP type eliminates the possibility that, due to e.g. ECMP (Equal Cost Multiple Paths) enabled routing, the egress node either does not detect packets passed a severe congested node or erroneously detects packets that actually did not pass the severe congested node. Note that this type of DSCP MUST only be used if all the nodes within the RMD domain are configured to use it. Otherwise, this type of DSCP MUST not be applied. The other DSCP MUST be used to indicate the degree of congestion by marking the bytes proportionally to the degree of congestion. This type of DSCP is denoted in this document as "encoded DSCP". Note that in this document the terms marked packets or marked bytes refer to the "encoded DSCP". The terms unmarked packets or unmarked bytes are representing the packets or the bytes belonging to these packets that their DSCP is either the "affected DSCP" or the original DSCP. Furthermore, in the algorithm described below it is considered that the router may drop received packets. The counting/measuring of marked or unmarked bytes described in this section is accomplished within measurement periods. All nodes within a RMD domain use the same, fixed measurement interval, say T seconds, which MUST be pre-configured. It is RECOMMENDED that the total number of additional (local and experimental) DSCPs needed for severe congestion handling within an RMD domain should be as low as possible and it should not exceed the limit of 8. One possibility to reduce the number of used DSCPs is to use only the "encoded DSCP" and not to use "affected DSCP" marking. Another possible solution is for example, to allocate one DSCP for severe congestion indication for each of the AF classes, independently from their dropping precedence. An example of a remarking procedure can be found in Appendix A.1.1. 4.6.1.6.2.2 Operation in the Egress nodes When the QNE edges maintain a per flow intra-domain QoS-NSLP operational state, see sections 4.3.2, 4.3.3, then the following procedure is followed. The QNE Egress node applies a predefined policy to solve the severe congestion situation, by selecting a number of inter-domain (end-to-end) flows that SHOULD be terminated, or forwarded in a lower priority queue. Bader, et al. [Page 54]
INTERNET-DRAFT RMD-QOSM When the RMD domain does not use the "affected DSCP" marking then the egress MUST generate an ingress/egress pair aggregated state, for each ingress and for each supported PHB. This is because the edges must be able to detect in which ingress/egress pair a severe congestion occurs. This is because otherwise the QNE Egress will not have any information on which flows or groups of flows were affected by the severe congestion. When the RMD domain supports the "affected DSCP" marking then the egress is able to detect all flows that are affected by the severe congestion situation. Therefore, when the RMD domain supports the "affected DSCP" marking, then the Egress MAY not generate and maintain the ingress/egress pair aggregated reservation States. Note that these aggregated reservation states may not be associated with aggregated intra-domain QoS-NSLP operational states. The ingress/egress pair aggregated reservation state can be derived by detecting, which flows are using the same PHB and are sent by the same Ingress (via the per flow end-to-end QoS-NSLP states). Some flows, belonging to the same PHB traffic class might get other priority than other flows belonging to the same PHB traffic class. This difference in priority can be notified to the egress and ingress nodes either by the RESERVE message that carries the QSpec associated with the end-to-end QoS model, e.g.,, <Preemption Priority> & <Defending Priority> parameter, or by using a local defined policy. The priority value is kept in the reservation states, see Section 4.3, which might be used during admission control and/or severe congestion handling procedures. The terminated flows are selected from the flows having the same PHB traffic class as the PHB of the marked (as "encoded DSCP") and "affected DSCP" (when applied in the complete RMD domain) packets and (when the ingress/egress pair aggregated states are available) that are belonging to the same ingress/egress pair aggregate. For flows associated with the same PHB traffic class the priority of the flow plays a significant role. An example of calculating the number of flows associated with each priority class that have to be terminated is explained in Appendix A.1.2. For the flows (sessions) that have to be terminated, the QNE Egress node generates and sends an end-to-end NOTIFY message to the QNE Ingress node (its upstream stateful QoS-NSLP peer) to indicate the severe congestion in the communication path. The non-default values of the objects contained in the NOTIFY message MUST be set by the QNE Egress node as follows: * the values of the <INFO_SPEC> object is set by the standard QoS-NSLP protocol functions. Bader, et al. [Page 55]
INTERNET-DRAFT RMD-QOSM * the INFO_SPEC object MUST include information that notifies that the end-to-end flow MUST be terminated. This information is as follows: Error Severity Class: Informational Error Code value: Congestion situation When the QNE edges maintain a per aggregate intra-domain QoS-NSLP operational state, see sections 4.3.1 then the QNE Edge has to calculate, per each aggregate intra-domain QoS-NSLP operational state, the bandwidth that has to be terminated in order to solve the severe congestion. An example of how this bandwidth is calculated can be found in the Appendix A.1.3. Note that for the aggregated sessions that are affected, the QNE Egress node generates and sends one end- to-end NOTIFY message to the QNE Ingress node(its upstream stateful QoS-NSLP peer) to indicate the severe congestion in the communication path. Note that this end-to-end NOTIFY message is associated with one of the end-to-end sessions that is bound to the aggregated intra- domain QoS-NSLP operational state. The non-default values of the objects contained in the NOTIFY message MUST be set by the QNE Egress node in the same way as the ones used by the end-to-end NOTIFY message described above for the situation that the QNE Egress maintains a per flow intra-domain operational state. In addition to this the end-to-end NOTIFY MUST carry the RMD-Qspec, which contains a PDR container with a Parameter/Container ID = PDR_10, i.e., "PDR_Congestion_Report". The value of the <S> should be set. Furthermore, the value of the <PDR Bandwidth> parameter MUST contain the bandwidth, associated with the aggregated QoS-NSLP operational state, which has to be released. Note that QNE egress SHOULD restore the original DSCP values of the remarked packets, otherwise multiple actions for the same event might occur. However, this value MAY be left in its remarking form if there is an SLA agreement between domains that a downstream domain handles the remarking problem. 4.6.1.6.2.3 Operation in the Ingress nodes Upon receiving the (end-to-end) NOTIFY message, the QNE Ingress node resolves the severe congestion by a predefined policy, e.g., by refusing new incoming flows (sessions), terminating the affected and notified flows (sessions), and blocking their packets or shifting them to an alternative RMD traffic class (PHB). This operation is depicted in Figure 14, where the QNE Ingress, for each flow (session) to be terminated, receives a NOTIFY message that carries the "Congestion situation" error code. Bader, et al. [Page 56]
INTERNET-DRAFT RMD-QOSM When the QNE Ingress node receives the end-to-end NOTIFY message, it associates this NOTIFY message with its bound intra-domain session, see Sections 4.3.2, 4.3.3. via the BOUND_SESSION_ID information included in the end-to-end per-flow QoS-NSLP state. The QNE Ingress uses the operation described in Section 4.6.1.5.2 to terminate the intra-domain session. QNE (Ingress) QNE (Interior) QNE (Interior) QNE (Egress) user | | | | data | user data | | | ------>|----------------->| user data | user data | | |---------------->S(# marked bytes) | | | S----------------->| | | S(# unmarked bytes)| | | S----------------->|Term. | NOTIFY |flow? |<----------------|------------------|------------------|YES |RESERVE(RMD-QSpec:Tear=1,M=1,S=SET) | | | --------------->|RESERVE(RMD-QSpec:T=1, M=1,S=SET) | | | | | | |----------------->| | | | RESERVE(RMD-QSpec:Tear=1, M=1,S=SET) | | |----------------->| Figure: 14 RMD severe congestion handling Note that the above functionality applies to the RMD reservation- Based, see Section 4.3.3 and to both measurement-based admission control methods (i.e., congestion notification based on probing and the NSIS measurement-based admission control), see Section 4.3.2. In the case that the QNE edges support aggregated intra-domain QoS- NSLP operational states the following actions take place. The QNE Ingress may receive an end to end NOTIFY message with a PDR container that carries a <S> marked and a bandwidth value in the <PDR Bandwidth> parameter included in a "PDR_Congestion_Report" container. Furthermore the same end-to-end NOTIFY message carries an INFO_SPEC object with the "Congestion situation" error code. When the QNE Ingress node receives this end-to-end NOTIFY message, it associates the NOTIFY message with the aggregated intra-domain QoS-NSLP operational state via the BOUND_SESSION_ID information included in the end-to-end per-flow QoS-NSLP operational state, see Section 4.3.1. The RMD-QOSM at the QNE Ingress node by using the bandwidth value included in the <PDR Bandwidth> parameter MUST reduce the bandwidth associated and reserved by the RMD aggregated session. This is accomplished by triggering the RMD modification for Aggregated reservations procedure described in Section 4.6.1.4. Bader, et al. [Page 57]
INTERNET-DRAFT RMD-QOSM In addition to the above, the QNE Ingress MUST select a number of inter-domain (end-to-end) flows (sessions) that must be terminated. The terminated end-to-end sessions are selected from the end-to-end sessions bound to the aggregated intra-domain QoS-NSLP operational state. Note that the end-to-end session associated with the received end-to-end NOTIFY message that notified the severe congestion must also be selected for termination. The number of the end -to-end sessions, to be terminated, is selected such that the sum of their associated and reserved bandwidth values equals the bandwidth value carried by the <PDR Bandwidth> parameter within the PDR_Congestion_Report" container. For the flows (sessions) that have to be terminated, the QNE Ingress node generates and sends an end-to-end NOTIFY message upstream towards the sender (QNI). The values carried by this message are: * the values of the <INFO_SPEC> object is set by the standard QoS-NSLP protocol functions. * the INFO_SPEC object MUST include information that notifies that the end-to-end flow MUST be terminated. This information is as follows: Error Severity Class: Informational Error Code value: Congestion situation 4.6.1.7 Admission control using congestion notification based on probing The congestion notification function based on probing can be used to implement a simple measurement-based admission control within a Diffserv domain. At interior nodes along the data path congestion notification thresholds are set in the measurement based admission control function for the traffic belonging to different PHBs. These interior nodes are not NSIS aware nodes. 4.6.1.7.1 Operation in Ingress nodes When an end-to-end reservation request (RESERVE) arrives at the Ingress node (QNE), see Figure 15, it is processed based on the procedures defined by the end-to-end QoS model. The "N", "R" and "Q" flags are set in the same way as described in section 4.6.1.1.1. Bader, et al. [Page 58]
INTERNET-DRAFT RMD-QOSM The DSCP field of the GIST datagram message that is used to transport this probe RESERVE message, SHOULD be marked with the same value of DSCP as the data path packets associated with the same session. In this way it is ensured that the end-to-end RESERVE (probe) packet passed through the node that it is congested. This feature is very useful when ECMP based routing is used to detect only flows that are passing through the congested router. When (end-to-end) RESPONSE message is received by the Ingress node,it will be processed based on the procedures defined by the end-to-end QoS model. 4.6.1.7.2 Operation in Interior nodes These Interior nodes are not needed to be NSIS aware nodes and they do not need to process NSIS functionality of NSIS messages. Note that the "not NSIS aware" nodes must be configured such that they can detect the congestion/severe congestion situations and remark packets in the same way as the "NSIS aware" nodes do. Using standard functionalities congestion notification thresholds are set for the traffic belonging to different PHBs, see Section 4.3.2. The end-to-end RESERVE message, see Figure 15, is used as a probe packet. The DSCP field of all data packets and of the GIST message carrying the RESERVE message will be re-marked when the corresponding "congestion notification" threshold is exceeded, see Section 4.3.2. Note that when the data rate is higher than the congestion notification threshold then also the data packets are remarked. An example of the detailed operation of this procedure is given in Appendix A.2.1. 4.6.1.7.3 Operation in Egress nodes As emphasised in Section 4.6.1.6.2.2, the egress node, by using the per flow end-to-end QoS-NSLP states, can derive which flows are using the same PHB and are sent by the same ingress. For each ingress, the egress SHOULD generate an ingress/egress pair aggregated (RMF) reservation state for each supported PHB. Note that this aggregated reservation state does not require that also an aggregated intra-domain QoS-NSLP operational state is needed. In Appendix A.2.2 an example is described how and when a (probe) RESERVE message that arrives at the egress, is admitted or rejected. Bader, et al. [Page 59]
INTERNET-DRAFT RMD-QOSM If the request is rejected then the Egress node SHOULD generate an (end-to-end) RESPONSE message to notify that the reservation is unsuccesfull. In particular it will generate an INFO_SPEC object of: Error Severity Class: Transient Failure Error Code value: Reservation failure The QSpec that was carried by the end to end RESERVE belonging to the same session as this end to end RESPONSE is included in this message. The parameters included in the QSpec <QoS Reserved> object are copied from the original <QoS Desired> values. The "E" flag associated with the <QoS Reserved> object and the "E" flag associated with <Bandwidth> parameter are also set. This RESPONSE message will be sent to the Ingress node and it will be processed based on the end-to-end QoS model. Note that QNE egress SHOULD restore the original DSCP values of the remarked packets, otherwise multiple actions for the same event might occur. However, this value MAY be left in its remarking form if there is an SLA agreement between domains that a downstream domain handles the remarking problem. Note that the break "B" flag carried by the end-to-end RESERVE message MUST not be set. QNE (Ingress) Interior Interior QNE (Egress) (not NSIS aware) (not NSIS aware) user | | | | data | user data | | | ------>|----------------->| user data | | | |---------------->| user data | | | |----------------->| user | | | | data | user data | | | ------>|----------------->| user data | user data | | |---------------->S(# marked bytes) | | | S----------------->| | | S(# unmarked bytes)| | | S----------------->| | | S | RESERVE | | S | ------->| | S | |----------------------------------->S | | | RESERVE(re-marked DSCP in GIST) | | S----------------->| | |RESPONSE(unsuccessful INFO-SPEC) | |<------------------------------------------------------| RESPONSE(unsuccessful INFO-SPEC) | | <------| | | | Figure: 15 Using RMD congestion notification function for admission control based on probing Bader, et al. [Page 60]
INTERNET-DRAFT RMD-QOSM 4.6.2 Bi-directional operation RMD-QOSM assumes that asymmetric routing may be applied in the RMD domain. Combined sender-receiver initiated reservation cannot be efficiently done in the RMD domain because upstream NTLP states are not stored in Interior routers. Therefore, the bi-directional operation SHOULD be performed by two sender-initiated reservations (sender&sender). We assume that the QNE edge nodes are common for both upstream and downstream directions, therefore, the two reservations/sessions can be bound at the QNE edge nodes. Note that if this is not the case then the bi-directional procedure could be managed and maintained by nodes located outside the RMD domain, by using other procedures than the ones defined in RMD-QOSM. This bi-directional sender&sender procedure can then be applied between the QNE edges (QNE Ingress and QNE Egress) nodes of the RMD QoS signaling model. In the situation a security association exists between the QNE Ingress and QNE Egress nodes (see Figure 15), and the QNE Ingress node has the required <Bandwidth> parameters for both directions, i.e., QNE Ingress towards QNE Egress and QNE Egress towards QNE Ingress, then the QNE Ingress MAY include both <Bandwidth> parameters (needed for both directions) into the RMD-QSpec within a RESERVE message. In this way the QNE Egress node is able to use the QoS parameters needed for the "Egress towards Ingress" direction (QoS-2). The QNE Egress is then able to create a RESERVE with the right QoS parameters included in the QSpec, i.e., RESERVE (QoS-2). Both directions of the flows are bound by inserting <BOUND_SESSION_ID> objects at the QNE Ingress and QNE Egress, which will be carried by bound end-to-end RESERVE messages. |------ RESERVE (QoS-1, QoS-2)----| | V | Interior/stateless QNEs +---+ +---+ |------->|QNE|-----|QNE|------ | +---+ +---+ | | V +---+ +---+ |QNE| |QNE| +---+ +---+ ^ | | | +---+ +---+ V | |-------|QNE|-----|QNE|-----| | +---+ +---+ Ingress/ Egress/ statefull QNE statefull QNE | <--------- RESERVE (QoS-2) -------| Figure 16: The bi-directional reservation scenario in the RMD domain Bader, et al. [Page 61]
INTERNET-DRAFT RMD-QOSM Note that it is recommended that the QNE implementations of RMD-QOSM process the QoS-NSLP signaling messages with a higher priority than data packets. This can be accomplished as described in Section 3.3.4 in [QoS-NSLP]. A bidirectional reservation, within the RMD domain, is indicated by the PHR <B> and PDR <B> flags, which are set in all messages. In this case two BOUND_SESSION_ID objects SHOULD be used. When the QNE edges maintain per-flow intra-domain QoS-NSLP operational states then the end-to-end RESERVE message carries two BOUND_SESSION_IDs. One BOUND_SESSION_ID carries the SESSION_ID of the tunneled intra-domain (per-flow) session that is using a BINDING_CODE with value set to code (Tunneled and end-to-end sessions). Another BOUND_SESSION_ID carries the SESSION_ID of the bound bidirectional end-to-end session. The BINDING_CODE associated with this BOUND_SESSION_ID is set to code (Bi-directional sessions). When the QNE edges maintain aggregated intra-domain QoS-NSLP operational states then the end-to-end RESERVE message carries two BOUND_SESSION_IDs. One BOUND_SESSION_ID carries the SESSION_ID of the tunneled aggregated intra-domain session that is using a BINDING_CODE with value set to code (Aggregated sessions). Another BOUND_SESSION_ID carries the SESSION_ID of the bound bidirectional end-to-end session. The BINDING_CODE associated with this BOUND_SESSION_ID is set to code (Bi-directional sessions). The intra-domain and end-to-end QoS-NSLP operational states are initiated/modified depending on the binding type, see Section 4.3.1, 4.3.2, 4.3.3. If no security association exists between the QNE Ingress and QNE Egress nodes the bi-directional reservation for the sender&sender scenario in the RMD domain SHOULD use the scenario specified in [QoS-NSLP] as "Bi-directional reservation for sender&sender scenario". This is because in this scenario the RESERVE message sent from QNE Ingress to QNE Egress does not have to carry the QoS parameters needed for the "Egress towards Ingress" direction (QoS-2). In the following sections it is considered that the QNE edge nodes are common for both upstream and downstream directions and therefore, the two reservations/sessions can be bound at the QNE edge nodes. Furthermore, it is considered that a security association exists between the QNE Ingress and QNE Egress nodes, and the QNE Ingress node has the required <Bandwidth> parameters for both directions, i.e., QNE Ingress towards QNE Egress and QNE Egress towards QNE Ingress. Bader, et al. [Page 62]
INTERNET-DRAFT RMD-QOSM 4.6.2.1 Successful and unsuccessful reservations This section describes the operation of the RMD-QOSM where a RMD bi-directional reservation operation is either successfully or unsuccessfully accomplished. The bi-directional successful reservation is similar to a combination of two unidirectional successful reservations that are accomplished in opposite directions, see Figure 17. The main differences of the bi-directional successful reservation procedure with the combination of two unidirectional successful reservations accomplished in opposite directions are as follows. Note also that the intra-domain and end-to-end QoS-NSLP operational states generated and maintained by the end-to-end RESERVE messages contain, compared to the unidirectional reservation scenario, a different BOUND_SESSION_ID data structure, see Section 4.3.1, 4.3.2, 4.3.3. In this scenario the intra-domain RESERVE message sent by the QNE Ingress node towards the QNE Egress node, is denoted in Figure 17 as RESERVE (RMD-QSpec): "forward". The main differences between the intra-domain RESERVE (RMD-QSpec):"forward" message used for the bi- directional successful reservation procedure and a RESERVE (RMD- QSpec) message used for the unidirectional successful reservation are as follows: * the RII object MUST NOT be included in the message. This is because no RESPONSE message is required. * the <B> bit of the PHR container indicates a bi-directional reservation and it MUST be set to "1". * the PDR container is also included into the RESERVE(RMD-QSpec): "forward" message. The value of the Parameter/Container ID is "PDR_4", i.e., "PDR_Reservation_Request". Note that the response PDR container sent by a QNE Egress to a QNE Ingress node is not carried by an end-to-end RESPONSE message, but it is carried by an intra-domain RESERVE message that is sent by the QNE Egress node towards the QNE Ingress node (denoted in Figure 16 as RESERVE(RMD-QSpec):"reverse"). * the <B> PDR bit indicates a bi-directional reservation and is set to "1". * the <PDR Bandwidth> field specifies the requested bandwidth that has to be used by the QNE Egress node to initiate another intra-domain RESERVE message in the reverse direction. Bader, et al. [Page 63]
INTERNET-DRAFT RMD-QOSM The RESERVE(RMD-QSpec):"reverse" message is initiated by the QNE Egress node at the moment that the RESERVE(RMD-QSpec):"forward" message is successfully processed by the QNE Egress node. The main differences between the RESERVE(RMD-QSpec):"reverse" message used for the bi-directional successful reservation procedure and a RESERVE(RMD-QSpec) message used for the unidirectional successful reservation are as follows: QNE (Ingress) QNE (int.) QNE (int.) QNE (int.) QNE (Egress) NTLP stateful NTLP st.less NTLP st.less NTLP st.less NTLP stateful | | | | | | | | | | |RESERVE(RMD-QSpec) | | | |"forward" | | | | | | RESERVE(RMD-QSpec): | | |--------------->| "forward" | | | | |------------------------------>| | | | | |------------->| | | | | | | | |RESERVE(RMD-QSpec) | | RESERVE(RMD-QSpec) | "reverse" |<-------------| | "reverse" | |<--------------| | |<-------------------------------| | | Figure 17: Intra-domain signaling operation for successful bi-directional reservation * the RII object is not included in the message. This is because no RESPONSE message is required; * the value of the <Bandwidth> parameter is set equal to the value of the <PDR Bandwidth> field included in the RESERVE(RMD-QSpec):"forward" message that triggered the generation of this RESERVE(RMD-QSpec): "reverse" message; * the <B> bit of the PHR container indicates a bi-directional reservation and is set to "1"; * the PDR container is included into the RESERVE(RMD-QSpec):"reverse" message. The value of the Parameter/Container ID is "PDR_7", i.e., "PDR_Reservation_Report"; * the <B> PDR bit indicates a bi-directional reservation and is set to "1". Figure 18 and Figure 19 show the flow diagrams used in case of a unsuccessful bi-directional reservation. In Figure 18 it is considered that the QNE that is not able to support the requested <Bandwidth> is located in the direction QNE Ingress towards QNE Egress. In Figure 19 it is considered that the QNE that is not able to support the requested <Bandwidth> is located in the direction QNE Egress towards QNE Ingress. Bader, et al. [Page 64]
INTERNET-DRAFT RMD-QOSM The main differences between the bi-directional unsuccessful procedure shown in Figure 18 and the bi-directional successful procedure are as follows: * the QNE node that is not able to reserve resources for a certain request is located in the "forward" path, i.e., path from QNE Ingress towards the QNE Egress. * the QNE node that is not able to support the requested <Bandwidth> it MUST mark the <M> bit, i.e., set to value "1", of the RESERVE(RMD-QSpec): "forward". The operation for this type of unsuccessful bi-directional reservation is similar to the operation for unsuccessful uni- directional reservation shown in Figure 9. The main difference is that the QNE Egress generates an intra-domain RESPONSE(PDR) message that is sent towards QNE Ingress node. QNE(Ingress) QNE (int.) QNE (int.) QNE (int.) QNE (Egress) NTLP stateful NTLP st.less NTLP st.less NTLP st.less NTLP stateful | | | | | |RESERVE(RMD-QSpec): | | | | "forward" | RESERVE(RMD-QSpec): | | |--------------->| "forward" | M RESERVE(RMD-QSpec): | |--------------------------->M "forward-M marked" | | | M-------------->| | | RESPONSE(PDR) M | | | "forward - M marked"M | |<------------------------------------------------------------| |RESERVE(RMD-QSpec) | M | |"forward - T tear" | M | |----------------> | M | Figure 18: Intra-domain signaling operation for unsuccessful bi-directional reservation (rejection on path QNE(Ingress) towards QNE(Egress)) The main differences between the bi-directional unsuccessful procedure shown in Figure 19 and the in bi-directional successful procedure are as follows: * the QNE node that is not able to reserve resources for a certain request is located in the "reverse" path, i.e., path from QNE Egress towards the QNE Ingress. * the QNE node that is not able to support the requested <Bandwidth> it MUST mark the <M> bit, i.e., set to value "1", the RESERVE(RMD-QSpec):"reverse". Bader, et al. [Page 65]
INTERNET-DRAFT RMD-QOSM * the QNE Ingress uses the information contained in the received PHR and PDR containers of the RESERVE(RMD-QSpec): "reverse" and generates a tear intra-domain RESERVE(RMD-QSpec): "forward - T tear" message. This message carries a "PHR_Release_Request" and a "PDR_Release_Request" control information. This message is sent to QNE Egress node. The QNE Egress node uses the information contained in the "PHR_Release_Request" and the "PDR_Release_Request" control info containers to generate a RESERVE(RMD-QSpec):"reverse - T tear" message that is sent towards the QNE Ingress node. QNE (Ingress) QNE (int.) QNE (int.) QNE (int.) QNE (Egress) NTLP stateful NTLP st.less NTLP st.less NTLP st.less NTLP stateful | | | | | |RESERVE(RMD-QSpec) | | | |"forward" | RESERVE(RMD-QSpec): | | |--------------->| "forward" | RESERVE(RMD-QSpec): | | |-------------------------------->|"forward" | | | RESERVE(RMD-QSpec): |------------->| | | "reverse" | | | | | RESERVE(RMD-QSpec) | | | RESERVE(RMD-QSpec): M "reverse" |<-------------| | "reverse - M marked" M<---------------| | |<--------------------------------M | | | | M | | |RESERVE(RMD-QSpec): M | | |"forward - T tear" M | | |--------------->| RESERVE(RMD-QSpec): | | | | "forward - T tear" | | | |-------------------------------->| | | | M |------------->| | | M RESERVE(RMD-QSpec): | | M reverse - T tear" | | | M |<-------------| Figure 19: Intra-domain signaling normal operation for unsuccessful bi-directional reservation (rejection on path QNE(Egress) towards QNE(Ingress) 4.6.2.2 Refresh reservations This section describes the operation of the RMD-QOSM where a RMD bi-directional refresh reservation operation is accomplished. The refresh procedure in case of RMD reservation-based method follows a similar scheme as the successful reservation procedure, described in Section 4.6.2.1, and depicted in Figure 17 and the way of how the refresh process of the reserved resources is maintained, is similar to the refresh process used for the intra- domain uni-directional reservations (see Section 4.6.1.3). Bader, et al. [Page 66]
INTERNET-DRAFT RMD-QOSM Note that the RMD traffic class refresh periods used by the bound bi- directional sessions MUST be equal in all QNE edge and QNE Interior nodes. The main differences between the RESERVE(RMD-QSpec):"forward" message used for the bi-directional refresh procedure and a RESERVE(RMD-QSpec):"forward" message used for the bi- directional successful reservation procedure are as follows: * the value of the Parameter/Container ID of the PHR container is "PHR_2", i.e., "PHR_Refresh_Update". * the value of the Parameter/Container ID of the PDR container is "PDR_5", i.e., "PDR_Refresh_Request". The main differences between the RESERVE(RMD-QSpec):"reverse" message used for the bi-directional refresh procedure and the RESERVE (RMD-QSpec): "reverse" message used for the bi-directional successful reservation procedure are as follows: * the value of the Parameter/Container ID of the PHR container is "PHR_2", i.e., "PHR_Refresh_Update". * the value of the Parameter/Container ID of the PDR container is "PDR_8", i.e., "PDR_Refresh_Report". 4.6.2.3 Modification of aggregated intra-domain QoS-NSLP operational reservation states This section describes the operation of the RMD-QOSM where a RMD In the case when the QNE edges maintain, for the RMD QoS model, QoS-NSLP aggregated reservation states and if such an aggregated reservation has to be modified (see Section 4.3.1) then similar procedures to Section 4.6.1.4 are applied. In particular: * When the modification request requires an increase of the reserved resources, the QNE Ingress node MUST include the corresponding value into the <Bandwidth> parameter of the "RMD QoS Description" field, which is sent together with a "PHR_Resource_Request" control information. If a QNE edge or QNE Interior node is not able to reserve the number of requested resources, then the "PHR_Resource_Request" control information associated with the <Bandwidth> parameter MUST be marked. In this situation the RMD specific operation for unsuccessful reservation will be applied (see Section 4.6.2.1). Note that the value of the <PDR Bandwidth> parameter, which is sent within a "PDR_Reservation_Request" container, represents the increase of the reserved resources in the "reverse" direction. Bader, et al. [Page 67]
INTERNET-DRAFT RMD-QOSM * When the modification request requires a decrease of the reserved resources, the QNE Ingress node MUST include this value into the <Bandwidth> parameter of the "RMD QoS Description" field. Subsequently an RMD release procedure SHOULD be accomplished (see Section 4.6.2.4). Note that the value of the <PDR Bandwidth> parameter, which is sent within a "PDR_Release_Request" container, represents the decrease of the reserved resources in the "reverse" direction. 4.6.2.4 Release procedure This section describes the operation of the RMD-QOSM where a RMD bi-directional reservation release operation is accomplished. The message sequence diagram used in this procedure is similar to the one used by the successful reservation procedures, described in Section 4.6.2.1, and depicted in Figure 17. However, the way of how the release of the reservation is accomplished, is similar to the RMD release procedure used for the intra-domain uni-directional reservations (see Section 4.6.1.5 and Figure 18 and Figure 19). The main differences between the RESERVE (RMD-QSpec): "forward" message used for the bi-directional release procedure and a RESERVE (RMD-QSpec): "forward" message used for the bi- directional successful reservation procedure are as follows: * the value of the Parameter/Container ID of the PHR container is "PHR_3", i.e."PHR_Release_Request"; * the value of the Parameter/Container ID of the PDR container is "PDR_6", i.e., "PDR_Release_Request"; The main differences between the RESERVE (RMD-QSpec): "reverse" message used for the bi-directional release procedure and the RESERVE (RMD-QSpec): "reverse" message used for the bi-directional successful reservation procedure are as follows: * the value of the Parameter/Container ID of the PHR container is "PHR_3", i.e., "PHR_Release_Request"; * the PDR container is not included in the RESERVE (RMD-QSpec): "reverse" message. 4.6.2.5 Severe congestion handling This section describes the severe congestion handling operation used in combination with bi-directional reservation procedures. This severe congestion handling operation is similar to the one described in Section 4.6.1.6. Bader, et al. [Page 68]
INTERNET-DRAFT RMD-QOSM 4.6.2.5.1 Severe congestion handling by the RMD-QOSM bi-directional refresh procedure This procedure is similar to the severe congestion handling procedure described in Section 4.6.1.6.1. The difference is related to how the refresh procedure is accomplished, see Section 4.6.2.2 and to how the flows are terminated, see Section 4.6.2.4. 4.6.2.5.2 Severe congestion handling by proportional data packet marking This section describes the severe congestion handling by proportional data packet marking when this is combined with a bi-directional reservation procedure. Note that the detection and marking/remarking functionality described in this section and used by Interior nodes, applies to NSIS aware, but also to NSIS unaware nodes. This means however, that the "not NSIS aware" Interior nodes must be configured such that they can detect the congestion/severe congestion situations and remark packets in the same way as the Interior "NSIS aware" nodes do. QNE(Ingress) QNE (int.) QNE (int.) QNE (int.) QNE (Egress) NTLP stateful NTLP st.less NTLP st.less NTLP st.less NTLP stateful user| | | | | data| user | | | | --->| data | user data | |user data | |--------------->| | S | | |--------------------------->S (#marked bytes) | | | S-------------->| | | | S(#unmarked bytes) | | | S-------------->|Term | | | S |flow? | | NOTIFY (PDR) S |YES |<------------------------------------------------------------| |RESERVE(RMD-QSpec) | S | |"forward - T tear" | S | |--------------->| | RESERVE(RMD-QSpec):| | |--------------------------->S"forward - T tear" | | | S-------------->| | | | RESERVE(RMD-QSpec): | | | | "reverse - T tear" | | RESERVE(RMD-QSpec): | |<--------------| |"reverse - T tear" |<-------------S | |<-----------------------------| S | Figure 20: Intra-domain RMD severe congestion handling for bi-directional reservation (congestion on path QNE(Ingress) towards QNE(Egress)) Bader, et al. [Page 69]
INTERNET-DRAFT RMD-QOSM This procedure is similar to the severe congestion handling procedure described in Section 4.6.1.6.2. The main difference is related to the location of the severe congested node, i.e. "forward" or "reverse" path. Another difference is associated with the way of how the QNE Edge node selects the flows that have to be terminated (QNE Egress in case of per-flow reservation and QNE Ingress in case of aggregated reservation). Note that when a severe congestion situation occurs on e.g. on a forward path, and flows are terminated to solve the severe congestion in forward path, then the reserved bandwidth associated with the terminated bidirectional flows will also be released. Therefore, a careful selection of the flows that have to be terminated should take place. An example of such a selection is given in Appendix A.3.1. Furthermore, a special case of this operation is associated to the severe congestion situation occurring simultaneously on the forward and reverse paths. An example of this operation is given in Appendix A.3.2. QNE (Ingress) QNE (int.) QNE (int.) QNE (int.) QNE (Egress) NTLP stateful NTLP st.less NTLP st.less NTLP st.less NTLP stateful user| | | | | data| user | | | | --->| data | user data | |user data | |--------------->| | | | | |--------------------------->|user data |user | | | |-------------->|data | | | | |---> | | | user | |<--- | user data | | data |<--------------| | (#marked bytes)| S<----------| | |<--------------------------------S | | | (#unmarked bytes) S | | Term|<--------------------------------S | | Flow? | S | | YES |RESERVE(RMD-QSpec): S | | |"forward - T tear" s | | |--------------->| RESERVE(RMD-QSpec): | | | | "forward - T tear" | | | |--------------------------->| | | | S |-------------->| | | S RESERVE(RMD-QSpec): | | S "reverse - T tear" | | RESERVE(RMD-QSpec) S |<--------------| | "reverse - T tear" S<----------| | |<--------------------------------S | | Figure 21: Intra-domain RMD severe congestion handling for bi-directional reservation (congestion on path QNE(Egress) towards QNE(Ingress)) Bader, et al. [Page 70]
INTERNET-DRAFT RMD-QOSM Figure 20 shows the scenario where the severe congested node is located in the "forward" path. This scenario is very similar to the severe congestion handling scenario described in Section 4.6.1.6.2 and shown in Figure 14. The difference is related to the release procedure, which is accomplished in the same way as described in Section 4.6.2.4. Figure 21 shows the scenario where the severe congested node is located in the "reverse" path. The main difference between this scenario and the scenario shown in Figure 20 is that no end-to-end NOTIFY(PDR) message has to be generated by the QNE Egress node. This is because the (#marked and #unmarked) user data is arriving at the QNE Ingress. The QNE Ingress node will be able to calculate the number of flows that have to be terminated or forwarded in a lower priority queue. For the flows that have to be terminated a release procedure, see Section 4.6.2.4, is initiated to release the reserved resources on the "forward" and "reverse" paths. 4.6.2.6 Admission control using congestion notification based on probing This section describes the admission control scheme that uses the congestion notification function based on probing when bi-directional reservations are supported. QNE(Ingress) Interior QNE (int.) Interior QNE (Egress) NTLP stateful not NSIS aware not NSIS aware not NSIS aware NTLP stateful user| | | | | data| | | | | --->| | user data | |user data | |-------------------------------------------->S (#marked bytes) | | | S-------------->| | | | S(#unmarked bytes) | | | S-------------->| | | | S | | | RESERVE(re-marked DSCP in GIST)):| | | | S | |-------------------------------------------->S | | | | S-------------->| | | | S | | | RESPONSE(unsuccessful INFO-SPEC) | |<------------------------------------------------------------| | | | S | Figure 22: Intra-domain RMD congestion notification based on probing for bi-directional admission control (congestion on path from QNE(Ingress) towards QNE(Egress)) Bader, et al. [Page 71]
INTERNET-DRAFT RMD-QOSM This procedure is similar to the congestion notification for admission control procedure described in Section 4.6.1.7. The main difference is related to the location of the severe congested node, i.e., "forward" path (i.e., path between QNE Ingress towards QNE Egress) or "reverse" path (i.e., path between QNE Egress towards QNE Ingress). Figure 22 shows the scenario where the severe congested node is located in the "forward" path. The functionality of providing admission control is the same as the one described in Section 4.6.1.7, Figure 15. Figure 23 shows the scenario where the congested node is located in the "reverse" path. The probe RESERVE message sent in the "forward" direction will not be affected by the severe congested node, while the DSCP value in the IP header of any packet of the "reverse" direction flow and also of the GIST message that carries the probe RESERVE message sent in the "reverse" direction will be remarked by the congested node. The QNE ingress is in this way notified that a congestion occurred in the network and therefore it is able to refuse the new initiation of the reservation. QNE (Ingress) Interior QNE (int.) Interior QNE (Egress) NTLP stateful not NSIS aware NTLP st.less not NSIS aware NTLP stateful user| | | | | data| | | | | --->| | user data | | | |-------------------------------------------->|user data |user | | | |-------------->|data | | | | |---> | | | | |user | | | | |data | | | | |<--- | S | user data | | | S user data |<--------------------------| | user data S<---------------| | | |<---------------S | | | | user data S | | | | (#marked bytes)S | | | |<---------------S | | | | S RESERVE(unmarked DSCP in GIST)):| | S | | | |----------------S------------------------------------------->| | S RESERVE(re-marked DSCP in GIST) | | S<-------------------------------------------| |<---------------S | | | Figure 23: Intra-domain RMD congestion notification for bi-directional admission control (congestion on path QNE(Egress) towards QNE(Ingress)) Bader, et al. [Page 72]
INTERNET-DRAFT RMD-QOSM Note that the "not NSIS aware" Interior nodes must be configured such that they can detect the congestion/severe congestion situations and remark packets in the same way as the Interior "NSIS aware" nodes do. 4.7 Handling of additional errors During the QSpec processing, additional errors may occur. The way in which these additional errors are handled and notified is specified in [QSP-T] and [QoS-NSLP]. 5. Security Considerations A router implementing a QoS signaling protocol can, similar to a router without QoS signaling, do a lot of harm to a system. If taken over by an adversary, a router can delay, drop, inject, duplicate or modify packets. Additional threats are, however, introduced with new protocols and they are subject for a discussion below. The RMD-QOSM aims to be very lightweight signaling with regard to the number of signaling message roundtrips and the amount of state established at involved signaling nodes with and without reduced state on QNEs. This implies the usage of the Datagram Mode which does not allow channel security to be used. As such, RMD signaling is targeted towards intra-domain signaling only. QNE QNE QNE QNE Ingress Interior Interior Egress NTLP stateful NTLP stateless NTLP stateless NTLP stateful | | | | | RESERVE (1) | | | +--------------------------------------------->| | RESERVE' (2) | | | +-------------->| | | | | RESERVE' | | | +-------------->| | | | | RESERVE' | | | +------------->| | | | | | | | RESPONSE (1) | |<---------------------------------------------+ | | | | Figure 24: RMD message exchange In the context of RMD-QOSM signaling a classification between on-path adversaries and off-path adversaries needs to be made. Furthermore, it might be necessary to differentiate between off-path nodes that never participate in the RMD signaling exchange and nodes Bader, et al. [Page 73]
INTERNET-DRAFT RMD-QOSM that are only off-path with regard to a specific signaling session whereby routing asymmetry might even mean that the downstream and the upstream signaling direction matters for this classification. Note that RMD always uses the message exchange shown in Figure 24 even if there is no end-to-end signaling session. If the RMD-QOSM is triggered based on an E2E signaling exchange then the RESERVE message is created by a node outside the RMD domain and will subsequently travel further on (e.g., to the data receiver). Such an exchange is shown in Figure 3. As such, an evaluation of RMD's security must always been seen as a combination of the two signaling sessions, (1) and (2) of Figure 24. The following security requirements are set as goals for the intra-domain communication, namely: * Nodes, which are never supposed to participate in the NSIS signaling exchange, SHOULD NOT interfere with QNE Interior nodes. Off-path nodes (off-path with regard to the path taken by a particular signaling message exchange) SHOULD NOT be able to interfere with other on-path signaling nodes. * The actions allowed by a QNE Interior node SHOULD be minimal (i.e., only those specified by the RMD-QOSM). For example, only the QNE Ingress and the QNE Egress nodes are allowed to initiate certain signaling messages. QNE Interior nodes are, for example, allowed to modify certain signaling message payloads. Note that the term 'interfere' refers to all sorts of security threats, such as denial of service, spoofing, replay, signaling message injection, etc. If we assume that the RESERVE/RESPONSE is sent with hop-by-hop channel security provided by GIST and protected between the QNE Ingress and the QNE Egress node then we can be sure that the payloads of these messages MUST be authenticated, integrity, replay protected and encrypted. Encryption is necessary to prevent an adversary that is located along the path of the RESERVE message to learn information about the session that can later be used to inject a valid RESERVE'. The following messages need to relate to each other to make sure that the occurrence of one message is not without the other one: a) the RESERVE and the RESERVE' relate to each other at the QNE Egress and b) the RESPONSE and the RESERVE relate to each other at the QNE Ingress and c) the RESERVE' and the RESPONSE' (carried in the RESPONSE) relate to each other Bader, et al. [Page 74]
INTERNET-DRAFT RMD-QOSM The RESERVE and the RESERVE' message are tied together using the BOUND_SESSION_ID(s) maintained by the intra-domain and end-to-end QoS-NSLP operational states maintained at the QNE edges, see Section 4.3.1, 4.3.2, 4.3.3. Hence, there cannot be a RESERVE' without a corresponding RESERVE. The SESSION_ID can fulfill this purpose quite well if the aim is to provide protection against off-path adversaries that do not see the SESSION_ID carried in the RESERVE and the RESERVE' messages. If, however, the path changes (due to re-routing or due to mobility) then an adversary could inject RESERVE' messages (with a previously seen SESSION_ID) and could potentially cause harm. An off-path adversary can, of course, create RESERVE' messages that cause intermediate nodes to create some state (and cause other actions) but the message would finally hit the QNE Egress node. The QNE Egress node would then be able to determine that there is something going wrong and generate an error message. The severe congestion handling can be triggered by intermediate nodes (unlike other messages). In many cases, however, intermediate nodes experiencing congestion use refresh messages modify the <S> and <Overload %> parameters of the message. These messages are still initiated by the QNE Ingress node and carry the SESSION_ID. The QNE Egress node will use the SESSION_ID and subsequently the BOUND_SESSION_ID, maintained by the intra-domain QoS-NSLP operational state, to refer to a flow that might be terminated. The aspect of intermediate nodes initiating messages for severe congestion handling is for further study. QNE QNE QNE QNE Ingress Interior Interior Egress NTLP stateful NTLP stateless NTLP stateless NTLP stateful | | | | | REFRESH | | | | RESERVE' | | | +-------------->| REFRESH | | | (+RII) | RESERVE' | | | +-------------->| REFRESH | | | (+RII) | RESERVE' | | | +------------->| | | | (+RII) | | | | | | | | REFRESH | | | | RESPONSE'| |<---------------------------------------------+ | | | (+RII) | | | | | Figure 25: RMD REFRESH message exchange Bader, et al. [Page 75]
INTERNET-DRAFT RMD-QOSM During the refresh procedure a RESERVE' creates a RESPONSE', see Figure 25. The RII is carried in the RESERVE' message and the RESPONSE' message that is generated by the QNE Egress node contains the same RII as the RESERVE'. The RII can be used by the QNE Ingress to match the RESERVE' with the RESPONSE'. The QNE Egress is able to determine whether the RESERVE' (as a refresh) was created by the QNE Ingress node since the intra-domain session, which sent the RESERVE', is bound to an end-to- end session via the BOUND_SESSION_ID value included in the intra- domain QoS-NSLP operational state maintained at the QNE Egress. A further aspect is marking of data traffic. Data packets can be modified by an intermediary without any relationship to a signaling session (and a SESSION_ID). The problem appears if an off-path adversary injects spoofed data packets. The adversary thereby needs to spoof data packets that relate to the flow identifier of an existing end-to-end reservation that should be terminated. Therefore the question arises how an off-path adversary should create a data packet that matches an existing flow identifier (if a 5-tuple is used). Hence, this might not turn out to be simple for an adversary unless we assume the previously mentioned mobility/re-routing case where the path through the network changes and the set of nodes that are along a path changes over time. 6. IANA Considerations RMD-QOSM requires a new IANA registry for RMD QoS Model Identifiers. It is a 32-bit value carried in a QSpec object [QSP-T]. RMD-QOSM defines 2 new objects for the QSpec Template: PHR container and PDR container, see 4.1.2 and 4.1.3. For these new containers, new IDs in the QSpec Template Object Type registry should be assigned. Note to the editor: in this draft a list with temporarily parameter ID values are given to the PHR containers and PDR containers, see 4.1.2 and 4.1.3. The given PHR container values are in a range from PHR_1 to PHR_3 and the given PDR container values are in the range from PDR_4 to PDR_10. After the IANA will assign new parameter ID values, then all these temporarily assigned values have to be reassigned. 7. Acknowledgments The authors express their acknowledgement to people who have worked on the RMD concept: Z. Turanyi, R. Szabo, G. Pongracz, A. Marquetant, O. Pop, V. Rexhepi, G. Heijenk, D. Partain, M. Jacobsson, S. Oosthoek, P. Wallentin, P. Goering, A. Stienstra, M. de Kogel, M. Zoumaro-Djayoon, M. Swanink, R. Klaver G. Stokkink, J. W. van Houwelingen, D. Dimitrova Bader, et al. [Page 76]
INTERNET-DRAFT RMD-QOSM 8. Authors' Addresses Attila Bader Ericsson Research Ericsson Hungary Ltd. Laborc 1, Budapest, Hungary, H-1037 EMail: Attila.Bader@ericsson.com Lars Westberg Ericsson Research Torshamnsgatan 23 SE-164 80 Stockholm, Sweden EMail: Lars.Westberg@ericsson.com Georgios Karagiannis University of Twente P.O. BOX 217 7500 AE Enschede, The Netherlands EMail: g.karagiannis@ewi.utwente.nl Cornelia Kappler Siemens AG Siemensdamm 62 Berlin 13627, Germany Email: cornelia.kappler@siemens.com Hannes Tschofenig Siemens AG Otto-Hahn-Ring 6 Munich 81739, Germany EMail: Hannes.Tschofenig@siemens.com Tom Phelan Sonus Networks 250 Apollo Dr. Chelmsford, MA USA 01824 EMail: tphelan@sonusnet.com Attila Takacs Ericsson Research Ericsson Hungary Ltd. Laborc 1, Budapest, Hungary, H-1037 EMail: Attila.Takacs@ericsson.com Andras Csaszar Ericsson Research Ericsson Hungary Ltd. Laborc 1, Budapest, Hungary, H-1037 EMail: Andras.Csaszar@ericsson.com Bader, et al. [Page 77]
INTERNET-DRAFT RMD-QOSM 9. Normative References [RFC2119] S. Bradner, "Key words for use in RFCs to Indicate Requirement Levels", RFC 2119, March 1997. [QoS-NSLP] Manner, J., Karagiannis, G.,McDonald, A., Van de Bosch, S., "NSLP for Quality-of-Service signaling", draft-ietf-nsis-qos- nslp (work in progress). [QSP-T] Ash, J., Bader, A., Kappler C., "QoS-NSLP QSpec Template" draft-ietf-nsis-QSpec (work in progress). 10. Informative References [CsTa05] Csaszar, A., Takacs, A., Szabo, R., Henk, T., "Resilient Reduced-State Resource Reservation", Journal of Communication and Networks, Vol. 7, Nr. 4, December 2005. [JaSh97] Jamin, S., Shenker, S., Danzig, P., "Comparison of Measurement-based Admission Control Algorithms for Controlled-Load Service", Proceedings IEEE Infocom '97, Kobe, Japan, April 1997 [GrTs03] Grossglauser, M., Tse, D.N.C, "A Time-Scale Decomposition Approach to Measurement-Based Admission Control", IEEE/ACM Transactions on Networking, Vol. 11, No. 4, August 2003 [RFC2961] Berger, L., Gan, D., Swallow, G., Pan, P., Tommasi, F. and S. Molendini, "RSVP Refresh Overhead Reduction Extensions", RFC 2961, April 2001. [RFC3175] Baker, F., Iturralde, C. Le Faucher, F., Davie, B., "Aggregation of RSVP for IPv4 and IPv6 Reservations", IETF RFC 3175, 2001. [RFC4125] Le Faucheur & Lai, "Maximum Allocation Bandwidth Constraints Model for Diffserv-aware MPLS Traffic Engineering", RFC 4125, June 2005. Bader, et al. [Page 78]
INTERNET-DRAFT RMD-QOSM [RFC4127] Le Faucheur et al, Russian Dolls Bandwidth Constraints Model for Diffserv-aware MPLS Traffic Engineering, RFC 4127, June 2005 [GIST] Schulzrinne, H., Hancock, R., "GIST: General Internet Messaging Protocol for Signaling", draft-ietf-nsis-ntlp (work in progress). [RFC1633] Braden R., Clark D., Shenker S., "Integrated Services in the Internet Architecture: an Overview", RFC 1633 [RFC2475] Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z. and W. Weiss, "An Architecture for Differentiated Services", RFC 2475, December 1998 [RFC2638] Nichols K., Jacobson V., Zhang L. "A Two-bit Differentiated Services Architecture for the Internet", RFC 2638, July 1999 [RMD1] Westberg, L., et al., "Resource Management in Diffserv (RMD): A Functionality and Performance Behavior Overview", IFIP PFHSN'02 [RMD2] G. Karagiannis, et al., "RMD - a lightweight application of NSIS" Networks 2004, Vienna, Austria. [RMD3] Marquetant A., Pop O., Szabo R., Dinnyes G., Turanyi Z., "Novel Enhancements to Load Control - A Soft-State, Lightweight Admission Control Protocol", Proc. of the 2nd Int. Workshop on Quality of Future Internet Services, Coimbra, Portugal, Sept 24-26, 2001, pp. 82-96. [RMD4] A. Csaszar et al., "Severe congestion handling with resource management in diffserv on demand", Networking 2002 Appendix A.1.1 Example of a remarking operation during severe congestion in the Interior nodes Per supported PHB, the interior node can support the operation states depicted in Figure A.1, when the per-flow congestion notification based on probing signaling scheme is used in combination with this severe congestion type. Figure A.2 depicts the same functionality when the per-flow congestion notification based on probing scheme is not used in combination with the severe congestion scheme. Bader, et al. [Page 79]
INTERNET-DRAFT RMD-QOSM --------------------------------------------- | event B | | V ---------- ------------- ---------- | Normal | event A | Congestion | event B | Severe | | state |---------->| notification|-------->|congestion| | | | state | | state | ---------- ------------- ---------- ^ ^ | | | | event C | | | ----------------------- | | event D | ------------------------------------------------ Figure A.1: States of operation, severe congestion combined with congestion notification based on probing ---------- ------------- | Normal | event B | Severe | | state |-------------->| congestion | | | | state | ---------- ------------- ^ | | event E | --------------------------- Figure A.2: States of operation, severe congestion without congestion notification based on probing The terms used in Figure A.1 and Figure A.2 are: Normal state: represents the normal operation conditions of the node, i.e. no congestion Severe congestion state: it represents the state when state the interior node is severely congested related to a certain PHB Congestion notification: state where the load is relatively high, close to the level when congestion can occur event A: this event occurs when the incoming PHB rate is higher than the "congestion notification detection" threshold. This threshold is used by the congestion notification based on probing scheme, see Section 4.6.1.7, 4.6.2.6. event B: this event occurs when the incoming PHB rate is higher than the "severe congestion detection" threshold. event C: this event occurs when the incoming PHB rate is lower than the "congestion notification detection" threshold. event D: this event occurs when the incoming PHB rate is lower than the "severe_congestion_restoration" threshold. Bader, et al. [Page 80 INTERNET-DRAFT RMD-QOSM event E: this event occurs when the incoming PHB rate is lower than the "severe congestion restoration" threshold. Note that the "severe congestion detection", "severe congestion restoration" and admission thresholds should be higher than the "congestion notification detection" threshold, i.e.,: "severe congestion detection" > "congestion notification detection" and "severe congestion restoration" > "congestion notification detection" Furthermore, the "severe congestion detection" threshold should be higher than or equal to the admission threshold that is used by the reservation based and NSIS measurement based signaling schemes. "severe congestion detection" >= admission threshold Moreover, the "severe congestion restoration" threshold should be lower than or equal to the "severe congestion detection" threshold that is used by the reservation based and NSIS measurement based signaling schemes, i.e.,: "severe congestion restoration" <= "severe congestion detection" During severe congestion the interior node calculates, per traffic class (PHB), the incoming rate that is above the "severe congestion restoration" threshold, denoted as signaled_overload_rate, in the following way: * A severe congested interior node should take into account that packets might be dropped. Therefore, before queuing and eventually dropping packets, the interior node should count the total number of unmarked and remarked bytes received by the severe congested node, denote this number as total_received_bytes. Note that there are situations when more than one interior nodes in the same path become severe congested. Therefore, any interior node located behind a severe congested node may receive marked bytes. * before queuing and eventually dropping the packets, at the end of each measurement interval of T seconds, calculate the current estimated overloaded rate, say measured_overload_rate, by using the following equation: measured_overload_rate = =((total_received_bytes)/T) - severe_congestion_restoration) Bader, et al. [Page 81 INTERNET-DRAFT RMD-QOSM Note that since marking is done in interior nodes, the decisions are made at egress nodes, and termination of flows are performed by ingress nodes, there is a significant delay until the overload information is learned by the ingress nodes, see Section 6 of [CsTa05]). The delay consists of the trip time of data packets from the severe congested interior node to the egress, the measurement interval, i.e., T, and the trip time of the notification signaling messages from egress to ingress. Moreover, until the overload decreases at the severe congested interior node, an additional trip time from the ingress node to the severe congested interior node must expire. This is because immediately before receiving the congestion notification, the ingress may have sent out packets in the flows that where selected for termination. That is, a terminated flow may contribute to congestion for a time longer that is taken from the ingress to the interior node. Without considering the above, interior nodes would continue marking the packets until the measured utilization falls below the severe congestion restoration threshold. In this way, in the end more flows will be terminated than necessary, i.e., an over-reaction takes place. [CsTa05] provides a solution to this problem, where the interior nodes use a sliding window memory to keep track of the signaling overload in a couple of previous measurement intervals. At the end of a measurement intervals, T, before encoding and signaling the overloaded rate as "encoded DSCP" packets, the actual overload is decreased with the sum of already signaled overload stored in the sliding window memory, since that overload is already being handled in the severe congestion handling control loop. The sliding window memory consists of an integer number of cells, i.e, n = maximum number of cells. Guidelines for configuring the sliding window parameters are given in [CsTa05]. At the end of each measurement interval, the newest calculated overload is pushed into the memory, and the oldest cell is dropped. If Mi is the overload_rate stored in ith memory cell (i = [1..n]), then at the end of every measurement interval, the overload rate that is signaled to the egress node, i.e., signaled_overload_rate is calculated as follows: Sum_Mi =0 For i =1 to n { Sum_Mi = Sum_Mi + Mi } signaled_overload_rate = measured_overload_rate - Sum_Mi, where Sum_Mi is calculated as above. Next, the sliding memory is updated as follows: Bader, et al. [Page 82 INTERNET-DRAFT RMD-QOSM for i = 1..(n-1): Mi <- Mi+1 Mn <- signaled_overload_rate The bytes that have to be remarked to satisfy the signaled overload rate: signaled_remarked_bytes, are calculated as follows: signaled_remarked_bytes = signaled_overload_rate*T/N The signal_remarked_bytes represents also the number of the outgoing packets (after the dropping stage) that must be remarked, during each measurement interval T, by a node when operates in severe congestion mode. Note that in order to process an overload situation higher than 100% of the maintained severe congestion threshold all the nodes within the domain must be configured and maintain a scaling parameter, e.g., N used in the above equation, which in combination with the marked bytes, e.g., signaled_remarked_bytes, such a high overload situation ca be calculated and represented. Note that when incoming remarked bytes are dropped, the operation of the severe congestion algorithm may be affected, e.g., the algorithm may become in certain situations slower. An implementation of the algorithm may assure as much as possible that the incoming marked bytes are not dropped. This could for example be accomplished by using different dropping rate thresholds for marked and unmarked bytes. Note that when the "affected DSCP" marking is applied by a severe congested node then all the outgoing packets that are not marked (i.e., by using the "encoded DSCP") have to be remarked using the "affected DSCP" code. Furthermore, note that when the congestion notification based on probing is used in combination with severe congestion, then in addition to the possible "encoded DSCP" and "affected DSCP" another DSCP for the remarking of the same PHB might be used, see Section 4.6.1.7. This additional DSCP might be denoted in this document as "notified DSCP". When an interior node operates in the severe congested state, see Figure A.2, and receives "notified DSCP" packets, these packets are considered to be unmarked packets (but not "affected DSCP" packets). Appendix A.1.2 Example of a detailed severe congestion operation in the Egress nodes The states of operation in Egress nodes are similar to the ones described in A.1.1. The definition of the events, see below, is how ever different than the definition of the events given in Figure A.1 and Figure A.2: Bader, et al. Page 83] INTERNET-DRAFT RMD-QOSM * event A: the egress node measures the rate of the incoming "notified_DSCP" marked packets and compare it with a predefined congestion notification detection threshold at the egress. When the measured rate of "notified DSCP" bytes is higher than this threshold then event_A is activated, see Section 4.6.1.7 and A.2.2. This is applied when the whole RMD domain uses "notified DSCP" for this purpose. If the "notified DSCP" marking is not used in the whole RMD domain, the "encoded_DSCP" marking is used to notify the congestion notification state. In this case the egress should measure the rate of the incoming "encoded_DSCP" marked packets and compare it with a predefined congestion notification detection threshold and to a severe congestion detection threshold in the egress. Note that the detection thresholds used in the egress for congestion notification and severe congestion may be different than the ones used in interior nodes. When the measured rate of "encoded DSCP" bytes is higher than the congestion notification threshold but lower than the severe congestion threshold then event_A is activated. * event B: this event occurs when the egress receives packets marked as either "encoded DSCP" or "affected DSCP" (when "affected DSCP" is applied in the whole RMD domain). However, when the "encoded_DSCP" marking is also used for congestion notification detection purposes, see description of event_A, then event_B is only activated if either "affected DSCP" packets are received or if the rate of the incoming "encoded_DSCP" marked packets is higher than the preconfigured severe congestion detection egress threshold. * event C: this event occurs when the rate of incoming "notified DSCP" packets decreases below the congestion notification detection threshold. This is applied when whole RMD domain uses "notified DSCP" for this purpose. When the "encoded_DSCP" marking is also used for congestion notification detection, see description of event_A, then event_C is activated when the rate of incoming "encoded DSCP" packets decreases below the congestion notification threshold. * event D: this event occurs when the egress does not receive packets marked as either "encoded DSCP" or "affected DSCP" (when "affected DSCP" is applied in the whole RMD domain). When the "encoded_DSCP" marking is also used for congestion notification detection, see description of event_A, event_B, event_C, then the event_D is only activated if either "affected DSCP" packets are not anymore received or if the rate of the incoming "encoded_DSCP" marked packets is slower than the preconfigured severe congestion restoration threshold in egress. * event E: this event occurs when the egress does not receive packets marked as either "encoded DSCP" or "affected DSCP" (when "affected DSCP" is applied in the whole RMD domain) An example of the algorithm for calculation of the number of flows associated with each priority class that have to be terminated is explained by the pseudocode below. Bader, et al. [Page 84]
INTERNET-DRAFT RMD-QOSM First, when the egress operates in the severe congestion state then the total amount of remarked bandwidth associated with the PHB traffic class, say total_congested_bandwidth, is calculated. Note that when the node maintains information about each ingress/egress pair aggregate, then the total_congested_bandwidth must be calculated per ingress/egress pair reservation aggregate. This bandwidth represents the severe congested bandwidth that should be terminated. The total_congested_bandwidth can be calculated as follows: total_congested_bandwidth = N*input_remarked_bytes/T Where, input_remarked_bytes represents the number of marked bytes that arrive at the egress, during one measurement interval T, N is defined as in Section 4.6.1.6.2.1 and A.1.1. The term denoted as terminated_bandwidth is a temporal variable representing the total bandwidth that have to be terminated, belonging to the same PHB traffic class. The terminate_flow_bandwidth(priority_class) is the total of bandwidth associated with flows of priority class equal to priority_class. The parameter priority_class is an integer fulfilling 0 < priority_class =< Maximum_priority. The calculate_terminate_flows(priority_class) function determines the flows for a given priority class and per PHB that has to be terminated. This function also calculates the term sum_bandwidth_terminate(priority_class), which is the sum of the bandwith associated with the flows that will be terminated. The constraint of finding the total number of flows that have to be terminated is that sum_bandwidth_terminate(priority_class), should be smaller or approximatelly equal to the variable terminate_bandwidth(priority_class). terminated_bandwidth = 0; priority_class = 0; while terminated_bandwidth < total_congested_bandwidth { terminate_bandwidth(priority_class) = = total_congested_bandwidth - terminated_bandwidth calculate_terminate_flows(priority_class); terminated_bandwidth = = sum_bandwidth_terminate(priority_class) + terminated_bandwidth; priority_class = priority_class + 1; } If the egress node maintains ingress/egress pair reservation aggregates, then the above algorithm is performed for each ingress/egress pair reservation aggregate. Bader, et al. [Page 85 INTERNET-DRAFT RMD-QOSM Appendix A.1.3 Example of a detailed severe congestion operation in the Egress nodes, when aggregated intra-domain QoS-NSLP operational states are used The states of operation in QNE Egress nodes that maintain aggregated intra-domain QoS-NSLP operational states are similar to the operation described in Section A.1.2. The main difference is related to the fact that the end-to-end flows (sessions) that have to be terminated are not selected by the QNE Egress node but by the QNE Ingress node, see Section 4.6.1.6.2.3. This means that the total_congested_bandwidth and the sum_bandwidth_terminate (priority_class) parameters are calculated as described in Section A.1.2. If the QNE Egress maintains per each PHB, more than one priority classes, then the bandwidth value associated with each priority has to be included in the <PDR Bandwidth> parameter within a "PDR_Congestion_Report" container, see Section 4.6.1.6.2.2. Appendix A.2.1 Example of a detailed remarking admission control (congestion notification) operation in Interior nodes In particular, the predefined congestion notification threshold is set according to, and usually less than, an engineered bandwidth limitation, i.e., admission threshold, based on e.g. agreed Service Level Agreement or a capacity limitation of specific links. The difference between the congestion notification threshold and the engineered bandwidth limitation, i.e., admission threshold, provides an interval where the signaling information on resource limitation is already sent by a node but the actual resource limitation is not reached. This is due to the fact that data packets associated with an admitted session have not yet arrived, while allows the admission control process available at the egress to interpret the signaling information and reject new calls before reaching congestion. Note that in the situation when the data rate is higher than the preconfigured congestion notification rate, also data packets are re-marked, see section 4.6.1.6.2.1. To distinguish between congestion notification and severe congestion, two methods may be used (see Appendix 1.1.1): * using different DSCP values (re-marked DSCP values). The remarked DSCP that is used for this purpose is denoted as "notified DSCP" in this document. When this method is used and when the interior node is in "congestion notification" state, see A.1.1, then the node should remark the unmarked bytes using the "notified DSCP". Note that this method can only be applied if all nodes in RMD domain use the "notified" DSCP marking. Bader, et al. [Page 86]
INTERNET-DRAFT RMD-QOSM * Using the "encoded DSCP" marking for congestion notification and severe congestion. This situation is applied when the "notified DSCP" marking is not applied in the RMD domain. When this method is used and when the interior node is in "congestion notification" state, see A.1.1, then the node should remark the unmarked bytes using the "encoded DSCP". Note that if a node starts dropping packets belonging to a PHB that suports both "severe congestion" and "congestion notification" states, see section 4.6.1.6.2.1, then it is considered that the packet rate associated to this PHB is higher than the severe congestion detection threshold and that the operation state of this node has moved to the severe congestion state, see Appendix A.1.1. Appendix A.2.2 Example of a detailed admission control (congestion notification) operation in Egress nodes The admission control congestion notification procedure can be applied only if the egress maintains the ingress/egress pair aggregate. When the operation state of the ingress/egress pair aggregate is the "congestion notification", see Appendix A.1.2, then the implementation of the algorithm depends on how the congestion notification situation is notified to the egress. As mentioned in Section A.2.1, two methods are used: * using the "notified DSCP". During a measurement interval T, the egress counts the number of "notified DSCP" marked bytes that belong to the same PHB and are associated with the same ingress/egress pair aggregate, say input_notified_bytes. We denote the rate as incoming_notified_rate. * using the "encoded DSCP". In this case, during a measurement interval T, the egress measures the input_notified_bytes by counting instead of the "notified DSCP", the "encoded DSCP" bytes. The incoming congestion_rate can be then calculated as follows: incoming_congestion_rate = N*input_notified_bytes/T If the incoming_congestion_rate is higher than a preconfigured congestion notification threshold, then the communication path between ingress and egress is considered to be congested. In this situation if the end-to-end RESERVE (probe) arrives at the egress, then this request SHOULD be rejected. Note that this is happening only when the probe packet is either "notified DSCP" or "encoded DSCP" marked. In this way it is ensured that the end-to-end RESERVE (probe) packet passed through the node that it is congested. This feature is very useful when ECMP based routing is used to detect only flows that are passing through the congested router. Bader, et al. [Page 87 INTERNET-DRAFT RMD-QOSM If such an ingress/egress pair aggregated state is not available when the (probe) RESERVE message arrives at the egress, then this request is accepted if the DSCP of the packet carrying the RESERVE messsage is unmarked. Otherwise (if the packet is either "notified DSCP" or "encoded DSCP" marked), it is rejected. Appendix A.3.1 Example of selecting bi-directional flows for termination during severe congestion When a severe congestion occurs on e.g., in the forward path, and when the algorithm terminates flows to solve the severe congestion in forward path, then the reserved bandwidth associated with the terminated bidirectional flows is also released. Therefore, a careful selection of the flows that have to be terminated should take place. A possible method of selecting the flows belonging to the same priority type passing through the severe congestion point on a unidirectional path can be the following. When the QNE edges maintain per-flow intra-domain QoS-NSLP operational states then: * the egress node should select, if possible, first unidirectional flows instead of bidirectional flows * the egress node should select, if possible, bidirectional flows that reserved a relatively small amount of resources on the path reversed to the path of congestion. When the QNE edges maintain aggregated intra-domain QoS-NSLP operational states then: * the ingress node should select, if possible, first unidirectional flows instead of bidirectional flows * the ingress node should select, if possible, bidirectional flows that reserved a relatively small amount of resources on the path reversed to the path of congestion. Bader, et al. [Page 88 INTERNET-DRAFT RMD-QOSM Appendix A.3.2 Example of a severe congestion solution for bi- directional flows congested simultaneously on forward and reverse path This scenario describes a solution using the combination of the severe congestion solutions described in Section 4.6.2.5.2. It is considered that the severe congestion occurs simultaneously on forward and reverse directions, which may affect the same bi- directional flows. When the QNE Edges maintain per-flow intra-domain QoS-NSLP operational states then the steps can be the following, see Figure A.3. Consider that the egress node selects a number of bi-directional flows to be terminated. In this case the egress will send for each bi-directional flows a NOTIFY message to ingress. If the Ingress receives these NOTIFY messages and its operational state (associated with reverse path) is in the severe congestion state (see Figure A.1 and A.2), then the ingress operates in the following way: QNE (Ingress) NE (int.) NE (int.) NE (int.) QNE (Egress) NTLP stateful NTLP stateful data| user | | | | --->| data | #unmarked bytes| | | |--------------->S #marked bytes | | | | S--------------------------->| | | | | |-------------->|data | | | | |---> | | | | Term.? | NOTIFY | | |Yes |<------------------------------------------------------------| | | | | |data | | | user | |<--- | user data | | data |<--------------| | (#marked bytes)| S<----------| | |<--------------------------------S | | | (#unmarked bytes) S | | Term|<--------------------------------S | | Flow? | S | | YES |RESERVE(RMD-QSpec): S | | |"forward - T tear" s | | |--------------->| RESERVE(RMD-QSpec): | | | | "forward - T tear" | | | |--------------------------->| | | | S |-------------->| | | S RESERVE(RMD-QSpec): | | S "reverse - T tear" | | RESERVE(RMD-QSpec) S |<--------------| | "reverse - T tear" S<----------| | |<--------------------------------S | | Figure A.3: Intra-domain RMD severe congestion handling for bi-directional reservation (congestion on both forward and reverse direction) Bader, et al. [Page 89 INTERNET-DRAFT RMD-QOSM * For each NOTIFY message, the Ingress should identify the bidirectional flows have to be terminated. * The ingress then calculates the total bandwidth that should be released in the reverse direction (thus not in forward direction) if the bidirectional flows will be terminated (preempted), say "notify_reverse_bandwidth". This bandwidth can be calculated by the sum of the bandwidth values associated with all the end-to-end sessions that received a (severe congestion) NOTIFY message. * Furthermore, using the received marked packets (from the reverse path) the ingress will calculate, using the algorithm used by an egress and described in A.1.2, the total bandwidth that has to be terminated in order to solve the congestion in the reverse path direction, say "marked_reverse_bandwidth". * The ingress then calculates the bandwidth of the additional flows that have to be terminated, say "additional_reverse_bandwidth", in order to solve the severe congestion in reverse direction, by taking into account: ** the bandwidth in the reverse direction of the bidirectional flows that were appointed by the egress (the ones that received a NOTIFY message) to be preempted, i.e., "notify_reverse_bandwidth" ** the total amount of bandwidth in the reverse direction that has been calculated by using the received marked packets, i.e., "marked_reverse_bandwidth". This additional bandwidth can be calculated using the following algorithm: IF ("marked_reverse_bandwidth" > "notify_reverse_bandwidth") THEN "additional_reverse_bandwidth" = = "marked_reverse_bandwidth"- "notify_reverse_bandwidth"; ELSE "additional_reverse_bandwidth" = 0 * Ingress terminates the flows that experienced a severe congestion in the "forward" path and received a (severe congestion) NOTIFY message * If possible the ingress SHOULD terminate unidirectional flows that are using the same egress-ingress reverse direction communication path to satisfy the release of a total bandiwtdh up equal to the: "additional_reverse_bandwidth", see Appendix 3.1. Bader, et al. [Page 90 INTERNET-DRAFT RMD-QOSM * If the number of required uni-directional flows (to satisfy the above issue) is not available, then a number of bi-directional flows that are using the same egress-ingress reverse direction communication path MAY be selected for preemption in order to satisfy the release of a total bandiwtdh equal up to the: "additional_reverse_bandwidth". Note that using the guidelines given in Appendix A.3.1, first the bidirectional flows that reserved a relatively small amount of resources on the path reversed to the path of congestion should be selected for termination. When the QNE Edges maintain aggregated intra-domain QoS-NSLP operational states then the steps can be the following. * The egress calculates the bandwidth to be terminated using the same method as described in Appendix A.1.3. The egress includes this bandwidth value in a <PDR Bandwidth> within a "PDR_Congestion_Report" container that is carried by the end-to-end NOTIFY message. * The Ingress receives the NOTIFY message and reads the <PDR Bandwidth> value included in the "PDR_Congestion_Report" container. Note that this value is denoted as "notify_reverse_bandwidth" in the situation that the QNE edges maintain per flow intra-domain QoS-NSLP operational states, but is calculated differently. The variables "marked_reverse_bandwidth" and "additional_reverse_bandwidth are calculated using the same steps as explained for the situation that the QNE edges maintain per flow intra-domain QoS-NSLP states. * Regarding the termination of flows that are using the same egress- ingress reverse direction communication path, the Ingress can follow the same procedures as the situation that the QNE edges maintain per-flow intra-domain QoS-NSLP operational states. Furthermore, the Ingress terminates flows in the forward path the method described in Section 4.6.1.6.2.3. The RMD aggregated (reduced state) reservations maintained by the interior nodes, can be reduced in the "forward" and "reverse" directions by using the procedure described in Section 4.6.2.3 and including in the <Bandwith> parameter within the "RMD-QOSM QOS Description" field carried by the "forward" intra-domain RESERVE the value equal to "notify_reverse_bandwidth" and by including the "additional_reverse_bandwidth" value in the <PDR Bandwidth> parameter within the "PDR_Release_Request" container that is carried by the same intra-domain RESERVE message. Bader, et al. [Page 91 INTERNET-DRAFT RMD-QOSM Intellectual Property Statement The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org. Disclaimer of Validity This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Copyright (C) The Internet Society (2006). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. Bader, et al. [Page 92