Network Shaofu. Peng
Internet-Draft ZTE Corporation
Intended status: Standards Track Peng. Liu
Expires: 14 September 2023 China Mobile
Dong. Yang
Beijing Jiaotong University
13 March 2023
Deadline Based Deterministic Forwarding
draft-peng-detnet-deadline-based-forwarding-05
Abstract
This document describes a deterministic forwarding mechanism based on
deadline. The mechanism enhances strict priority scheduling
algorithm with dynamically adjusting the priority of the queue
according to its deadline attribute.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on 14 September 2023.
Copyright Notice
Copyright (c) 2023 IETF Trust and the persons identified as the
document authors. All rights reserved.
Peng, et al. Expires 14 September 2023 [Page 1]
Internet-Draft Deadline Routing March 2023
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents (https://trustee.ietf.org/
license-info) in effect on the date of publication of this document.
Please review these documents carefully, as they describe your rights
and restrictions with respect to this document. Code Components
extracted from this document must include Revised BSD License text as
described in Section 4.e of the Trust Legal Provisions and are
provided without warranty as described in the Revised BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1. Requirements Language . . . . . . . . . . . . . . . . . . 4
2. Deadline Queue . . . . . . . . . . . . . . . . . . . . . . . 4
3. Get Deadline Information of Packets . . . . . . . . . . . . . 8
3.1. Get Planned Residence Time . . . . . . . . . . . . . . . 8
3.2. Get Existing Accumulated Planned Residence Time . . . . . 9
3.3. Get Existing Accumulated Actual Residence Time . . . . . 10
3.4. Get Existing Accumulated Residence Time Deviation . . . . 10
4. Put Packets into the Deadline Queues . . . . . . . . . . . . 11
4.1. Alternate Queue Allocation Rules . . . . . . . . . . . . 14
5. Buffer Size Design . . . . . . . . . . . . . . . . . . . . . 14
6. Schedulability Condition for Deadline Mechanism . . . . . . . 15
6.1. Conditions for Leack Bucket Constraint Function . . . . . 19
6.2. Schedulability Condition Analysis for On-time Mode . . . 20
7. Admission Control on the Ingress . . . . . . . . . . . . . . 23
8. Overprovision Analysis . . . . . . . . . . . . . . . . . . . 26
9. Latency Compensation Considerations . . . . . . . . . . . . . 27
10. Compatibility Considerations . . . . . . . . . . . . . . . . 29
11. Deployment Considerations . . . . . . . . . . . . . . . . . . 31
12. Benefits . . . . . . . . . . . . . . . . . . . . . . . . . . 31
13. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 32
14. Security Considerations . . . . . . . . . . . . . . . . . . . 32
15. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 32
16. References . . . . . . . . . . . . . . . . . . . . . . . . . 32
16.1. Normative References . . . . . . . . . . . . . . . . . . 32
16.2. Informative References . . . . . . . . . . . . . . . . . 33
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 34
1. Introduction
[RFC8655] describes the architecture of deterministic network and
defines the QoS goals of deterministic forwarding: Minimum and
maximum end-to-end latency from source to destination, timely
delivery, and bounded jitter (packet delay variation); packet loss
ratio under various assumptions as to the operational states of the
nodes and links; an upper bound on out-of-order packet delivery. In
order to achieve these goals, deterministic networks use resource
Peng, et al. Expires 14 September 2023 [Page 2]
Internet-Draft Deadline Routing March 2023
reservation, explicit routing, service protection and other means.
Resource reservation refers to the occupation of resources by service
traffic, exclusive or shared in a certain proportion, such as
dedicated physical link, link bandwidth, queue resources, etc;
Explicit routing means that the transmission path of traffic flow in
the network needs to be selected in advance to ensure the stability
of the route and does not change with the real-time change of network
topology, and based on this, the upper bound of end-to-end delay and
delay jitter can be accurately calculated; Service protection refers
to sending multiple service flows along multiple disjoint paths at
the same time to reduce the packet loss rate. In general, a
deterministic path is a strictly explicit path calculated by a
centralized controller, and resources are reserved on the nodes along
the path to meet the SLA requirements of deterministic services.
[I-D.stein-srtsn] describes that the controller calculates the local
deadline time of each node for the traffic to be transmitted in
advance, which is a absolute system time, forms a stack of these
local deadline time, and then carries them in the forwarded data
packets. Each node forwards the packets according to its own local
deadline. [I-D.stein-srtsn] suggests that FIFO queue can not be used
to realize this function, because the packets stored in the queue are
always first in first out and a special data structure is recommoned.
The packets in this data structure will be automatically sorted with
the order from emergency to non emergency according to the deadline
of the packets. However, it may be difficult to implement this
structure in hardware, and especially for a large network it may be
challenge to synchronize time.
Considering that the link propagation delay is generally a fixed
value, and we focus on the residence time of the packets inside the
node, an alternate approach is to make the deadline eliminate the
interference of link propagation delay and avoids relying on time
synchronization between nodes.
This document desrbies an alternate packets scheduling scheme based
on EDF (earliest-deadline-firs) combined with latency compensation
and used for wide area network. It suggests to only use a single
deadline time to control the packets scheduling of all nodes along
the path. The single deadline time is an offset time, which is based
on the arrival time of the packet and represents the maximum time
allowed for the packet to stay inside the node. However, if each
node has obvious differences in the capability of packets forwarding
and scheduling, more offset-time may be needed.
Peng, et al. Expires 14 September 2023 [Page 3]
Internet-Draft Deadline Routing March 2023
1.1. Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in BCP
14 [RFC2119] [RFC8174] when, and only when, they appear in all
capitals, as shown here.
2. Deadline Queue
For nodes in the network, some queues with count-down time (also
termed as CT) are introduced and maintained for each outgoing port.
These queues are called deadline queues and participate in priority
based scheduling. Deadline queues have the following
characteristics:
* The CT of each deadline queue will decrease with the passage of
time. When it decreases to 0, the scheduling priority of the
queue will be set to the highest, and the scheduling opportunity
will be obtained immediately (note that there may be interference
delay caused by a packet being sent by a low priority queue). It
will prohibit receiving new packets, in which the buffered packets
will be sent to the outgoing port immediately, and the maximum
duration allowed to send packets is the preset authorization time
(also termed as AT), e.g, 10us, 20us, etc. In principle, all
packets buffered in the queue shall be sent within this
authorization time. If the queue is sent to empty and the
authorization time is still free, other queues with lower priority
can be scheduled during this authorization time.
* The scheduling engine can initiate a cycle timer to decrement the
CT of all deadline queues, that is, whenever the timer expires,
the CT values of all queues will be subtracted from the timer
interval (also termed as TI). Note that TI must be less than or
equal to the AT, with AT = N * TI, where the natural number N >=
1.
* For a deadline queue whose CT has been reduced to 0, after a new
round of authorization time, the CT will return to the maximum
initial value (also termed as MAX_CT), allow receiving new
packets, and continue to enter the next round of operation that
decreases with the passage of time.
Peng, et al. Expires 14 September 2023 [Page 4]
Internet-Draft Deadline Routing March 2023
* For a deadline queue whose CT is not reduced to 0, it can receive
packets. In detailed, when a node receives a packet to be
forwarded from a specific outgoing port, it first obtains the
expected deadline of the packet, and then put the packet to the
deadline queue with the relevant CT value of the outgoing port for
transmission.
* For a deadline queue whose CT is not reduced to 0, its scheduling
priority cannot be set to the highest value. The smaller the CT,
the higher the priority. A transmission mode can be further
configured for a deadline queue to control its transmission
behavior. There are two modes:
- The first mode is to allow participation in priority based
scheduling, also termed as in-time mode;
- The second mode, not allowed, also termed as on-time mode.
That is, a queue with on-time mode is allowed to participate in
priority based scheduling only when its CT becomes 0.
In-time mode is work-conserving and applicable to low latency
services, and on-time mode is not work-conserving and applicable
to low latency variation services. Only one mode can be
configured for each deadline queue. One implementation can
support one set of queues in a single mode, or two sets of queues
support two modes.
* At the beginning, all deadline queues have different CT values,
i.e., staggered from each other, so that the CT of only one
deadline queue will decrease to 0 at any time. It should be noted
that CT is just the countdown of the head of the queue, and other
packets in the queue (especially those at the end of the queue)
have to wait longer to be scheduled. It can be seen that the
scheduling countdown of any specific packet in the queue is in the
range [CT, CT+AT).
The above AT, TI and MAX_CT value shall be choosed according to the
actual capacity of the node. In fact, each node in the network can
independently use different AT for different outgoing ports. The
general principle is that if an outgoing port has a large bandwidth
(such as 100G bps), the AT can be small (such as 1us), because the
link with large bandwidth can send the required bits amount even
within a small time duration; If an outgoing port has a small
bandwidth (e.g. 1G bps), the AT should be larger (e.g. 10us), because
the link with small bandwidth needs to send the required bits amount
within a larger duration. In the FE (Fast Ethernet, 100M bps)
scenario, that however may not be the typical scenario this document
focuses on, the transmission time of a single packet may take several
Peng, et al. Expires 14 September 2023 [Page 5]
Internet-Draft Deadline Routing March 2023
microseconds, then attention must be paid to ensure that the AT is
larger than the transmission time of a single packet, especially the
AT should include the interferrence delay caused by a single low
priority packet with maximum size.
The choose of TI should consider the latency granularity of various
service flows, so that CT updated per TI can match the delay
requirements of different services. For example, if the delay
difference of different traffic flows is several microseconds, TI can
be choosed as 1 us. If the delay difference of different traffic
flows is several 10 microseconds, TI can be choosed as 10 us.
A specific example of the deadline queue is depicted in Figure 1.
Peng, et al. Expires 14 September 2023 [Page 6]
Internet-Draft Deadline Routing March 2023
+------------------------------+ +------------------------------+
| Deadline Queue Group: | | Deadline Queue Group: |
| queue-1(CT=60us) | | queue-1(CT=59us) |
| in-time sub-queue ######| | in-time sub-queue ######|
| on-time sub-queue ######| | on-time sub-queue ######|
| queue-2(CT=50us) | | queue-2(CT=49us) |
| in-time sub-queue ######| | in-time sub-queue ######|
| on-time sub-queue ######| | on-time sub-queue ######|
| queue-3(CT=40us) | | queue-3(CT=39us) |
| in-time sub-queue ######| | in-time sub-queue ######|
| on-time sub-queue ######| | on-time sub-queue ######|
| queue-4(CT=30us) | | queue-4(CT=29us) |
| in-time sub-queue ######| | in-time sub-queue ######|
| on-time sub-queue ######| | on-time sub-queue ######|
| queue-5(CT=20us) | | queue-5(CT=19us) |
| in-time sub-queue ######| | in-time sub-queue ######|
| on-time sub-queue ######| | on-time sub-queue ######|
| queue-6(CT=10us) | | queue-6(CT=9us) |
| in-time sub-queue ######| | in-time sub-queue ######|
| on-time sub-queue ######| | on-time sub-queue ######|
| queue-7(CT=0us) | | queue-7(CT=0us) |
| in-time sub-queue ######| | in-time sub-queue ######|
| on-time sub-queue ######| | on-time sub-queue ######|
+------------------------------+ +------------------------------+
+------------------------------+ +------------------------------+
| Non-deadline Queue Group: | | Non-deadline Queue Group: |
| queue-8 ############ | | queue-8 ############ |
| queue-9 ############ | | queue-9 ############ |
| queue-10 ############ | | queue-10 ############ |
| ... ... | | ... ... |
+------------------------------+ +------------------------------+
-o----------------------------------o-------------------------------->
T0 T0+1us time
Figure 1: Example of Deadline Queue for outgoing Port
In this example, the AT for deadline queue group is configured to
10us. Queue-1 ~ queue-7 are deadline queues, and other queues are
traditional non-deadline queues. Each deadline queue has its CT
attribute. The MAX_CT is 60us. At the initial time (T0), the CT of
all deadline queues are staggered from each other. For example, the
CT of queue-1 is 60us, the CT of queue-2 is 50uS, the CT of queue-3
is 40us, and so on. At this time, only the CT of queue-7 is 0, which
has the highest scheduling priority.
Peng, et al. Expires 14 September 2023 [Page 7]
Internet-Draft Deadline Routing March 2023
Suppose the scheduling engine initiates a cycle timer with a time
interval of 1us, i.e., AT = 10 * TI in this case. After each timer
timeout, the TI will be subtracted from the CT of all deadline
queues. As shown in the figure, at T0 + 1us, the timer timeout, the
CT of queue-1 becomes 59us, the CT of queue-2 becomes 49us, the CT of
queue-3 becomes 39us, etc. At this time, the CT of queue-7,
depending on the implementation, can be maintained as 0 or equal to
-1.
At T0 + 10us, the CT of queue-1 becomes 50us, the CT of queue-2
becomes 40us, the CT of queue-3 becomes 30us, etc. At this time, the
CT of queue-7, returns to MAX_CT and is no longer set to the highest
scheduling priority; The CT of queue-6 becomes 0, which has the
highest scheduling priority.
When the CT of a deadline queue becomes 0, it has a time limit of
10us (i.e., authorization time) to send packets in the queue. During
this period, it will be prohibited to receive new packets (in fact,
there can be no new packets with a deadline of 0). After the 10us
time elapses, the CT of another deadline queue will change to 0.
If the deadline queue with the highest priority is still free after
sending packets within the authorized time, the scheduling engine
will visit other queues with the second highest priority during the
rest of the authorized time.
Note that if both in-time and on-time policies are supported, the
queue-1...7 can be taken regard as logical queues and each logical
queue may include two sub-queues, one used to buffer in-time packets
and the other used to buffer on-time packets.
3. Get Deadline Information of Packets
3.1. Get Planned Residence Time
The planned residence time of the packet is an offset time, which is
based on the arrival time of the packet and represents the maximum
time allowed for the packet to stay inside the node. There are many
ways to obtain the planned residence time of the packet.
Peng, et al. Expires 14 September 2023 [Page 8]
Internet-Draft Deadline Routing March 2023
* Carried in the packet. The ingress PE node, when encapsulating
the deterministic service flow, can explicitly insert the planned
residence time into the packet according to SLA. The transit
node, after receiving the packet, can directly obtain the planned
residence time from the packet. Generally, only a single planned
residence time needs to be carried in the packet, which is
applicable to all nodes along the path; Or insert a stack composed
of multiple deadlines, one for each node.
[I-D.peng-6man-deadline-option] defined a method to carry the
planned residence time in the IPv6 packets.
* Included in the FIB entry. Each node in the network can maintain
the deterministic FIB entry. After the packet hits the
deterministic FIB entry, the planned residence time is obtained
from the forwarding information contained in the FIB entry.
* Included in the policy entry. Configure local policies on each
node in the network, and then set the corresponding planned
residence time according to the matched specific characteristics
of the packet, such as 5-tuple. An implementation should support
the policy to forcibly override the planned residence time
obtained by other methods.
For a deterministic path based on deadline scheduling, the path has
deterministic end-to-end delay requirements. The delay includes two
parts, one is the accumulated residence delay and the other is the
accumulated link propagation delay. The end-to-end delay is
subtracted from the accumulated link propagation delay to obtain the
accumulated residence delay. A simple method is that the accumulated
residence delay is shared equally by each node along the path to
obtain the planning residence time of each node. Note that the link
propagation delay in reality may be not always fixed, e.g, due to the
affection of temperature, we assume that the tool for detecting the
link propagation delay can sense the changes beyond the preset
threshold and trigger the recalculation of the deterministic path.
3.2. Get Existing Accumulated Planned Residence Time
The existing accumulated planned residence time of the packet refers
to the sum of the planned residence time of all upstream nodes before
the packet is transmitted to the current node. This information
needs to be carried in the packet. Every time the packet passes
through a node, the node accumulates its corresponding planned
residence time to the existing accumulated planned residence time
field in the packet. [I-D.peng-6man-deadline-option] defined a
method to carry existing accumulated planned residence time in the
IPv6 packets.
Peng, et al. Expires 14 September 2023 [Page 9]
Internet-Draft Deadline Routing March 2023
The setting of "existing accumulated planned residence time" in the
packet needs to be friendly to the chip for reading and writing. For
example, it should be designed as a fixed position in the packet.
The chip may support flexible configuration for that position.
3.3. Get Existing Accumulated Actual Residence Time
The existing accumulated actual residence time of the packet, refers
to the sum of the actual residence time of all upstream nodes before
the packet is transmitted to the current node. This information
needs to be carried in the packet. Every time the packet passes
through a node, the node accumulates its corresponding actual
residence time to the existing accumulated actual residence time
field in the packet. [I-D.peng-6man-deadline-option] defined a
method to carry existing accumulated actual residence time in the
IPv6 packets.
The setting of "existing accumulated actual residence time" in the
packet needs to be friendly to the chip for reading and writing. For
example, it should be designed as a fixed position in the packet.
The chip may support flexible configuration for that position.
The current node can carry the receiving and sending time of the
packet in the auxiliary data structure (note that is not packet
itself) of the packet, then the actual residence time of the packet
in the node can be calculated according to these two value.
Although other methods can also be, for example, carrying the
absolute system time of receiving and sending in the packet to let
the downstream node compute the actual residence time indirectly,
that has a low encapsulation efficiency.
3.4. Get Existing Accumulated Residence Time Deviation
The existing accumulated residence time deviation equals existing
accumulated planned residence time minus existing accumulated actual
residence time. This value can be zero, positive, or negative.
If the existing accumulated planned residence time and the existing
accumulated actual residence time are carried in the packet, it is
not necessary to carry the existing accumulated residence time
deviation. Otherwise, it is necessary. The advantage of the former
is that it can be applied to more scenarios, while the later has less
packaging overhead.
Peng, et al. Expires 14 September 2023 [Page 10]
Internet-Draft Deadline Routing March 2023
4. Put Packets into the Deadline Queues
[RFC9320] presents a latency model for DetNet nodes. There are six
type of delays that a packet can experience from hop to hop. The
processing delay (type-4), the regulator delay (type-5) , the
queueing subsystem delay (type-6), and the output delay (type-1)
together contribute to the residence time in the node.
In this document, the residence delay in the node is simplified into
two parts: the first part is to lookup the forwarding table when the
packet is received from the incoming port (or generated by the
control plane) and deliver the packet to the line card where the
outgoing port is located; the second part is to store the packet in
the queue of the outgoing port for transmission. These two parts
contribute to the actual residence time of the packet in the node.
The former can be called forwarding delay and the latter can be
called queueing delay. The forwarding delay is related to the chip
implementation and is generally constant; The queueing delay is
unstable.
When a node receives a packet from the upstream node, it can first
get the existing accumulated residence time deviation, and then add
it to the planned residence time of the packet at this node to obtain
the adjustment residence value, and then on the basis of the
adjustment residence value, deducting the forwarding delay of the
packet in the node, the allowable queueing delay value (termd as Q)
is obtained, and then the packet will be put to the deadline queue
with corresponding CT, meeting the condition: CT <= Q < CT+AT.
As mentioned above, if the queue already contains many packets, the
newly inserted packet may need to wait more time to be scheduled.
Therefore, an alternate implemantation may support to select deadline
queue for the packet based on Q-k*AT, where 0 <= k <= 1, i.e.,
meeting the condition: CT <= Q-k*AT < CT+AT. In fact, according to
the following introduction, the deadline scheduling mechanism
supports latency compensation, so the end-to-end latency caused by
these two conditions is the same.
Assume that the current node in a deterministic path is i, all
upstream nodes are from 1 to i-1, and the downstream node is i+1.
Let the planned residence time be D, the actual residence time be R,
the adjustment residence value be M, the forwarding delay intra-node
be F, and the existing accumulated residence time deviation be E,
then the allowable queueing delay (Q) of the packet on the current
node i is calculated as follows:
* E(i-1) = sum(D(1), ..., D(i-1)) - sum(R(1), ..., R(i-1))
Peng, et al. Expires 14 September 2023 [Page 11]
Internet-Draft Deadline Routing March 2023
* M(i) = D(i) + E(i-1)
* Q(i) = M(i) - F(i)
In the case of on-time mode, if each hop strictly controls the
scheduling of the packet according to its planned residence time, the
actual residence time of the packet will be very close to the planned
residence time, thus the absolute value of the existing accumulated
residence time deviation will be very small.
In the case of in-time mode, each node sends packets as soon as
possible. Packets mainly experience a fixed forwarding delay, but
little queueing delay. Then the existing accumulated residence time
deviation (E) may be a very large positive value, resulting in a
large allowable queueing delay (Q). If this value exceeds the MAX_CT
of the deadline queue maintained by the node, Q should be modified to
the MAX_CT, or such a packet can be inserted into an escape deadline
queue with fixed larger CT than MAX_CT, or it can also be inserted
into a higher-level queue in a hierarchical deadline queue.
Under some abnormal circumstances, if some upstream nodes have a very
large actual residence time (R), the existing accumulated residence
time deviation (E) may be a negative number, resulting in the
allowable queueing delay (Q) may be less than or equal to 0. In this
case, the allowable queueing delay (Q) should be modified to a value,
such as AT, so that the packet will enter the next queue that will be
in the sending state.
If the queue choosed by the condition is full, the packet must be
inserted into the next queue with higher CT value.
Figure 2 depicts an example of packets inserted to the deadline
queues.
Peng, et al. Expires 14 September 2023 [Page 12]
Internet-Draft Deadline Routing March 2023
P2 P1 +------------------------------+
+--------+ +--------+ | Deadline Queue Group: |
| D=20us | | D=30us | | queue-1(CT=55us) ###### |
| E=15us | | E=-8us | +--+ | queue-2(CT=45us) ###### |
+--------+ +--------+ |\/| | queue-3(CT=35us) ###### |
------incoming port-1------> |/\| | queue-4(CT=25us) ###### |
|\/| | queue-5(CT=15us) ###### |
P4 P3 |/\| | queue-6(CT=5us) ###### |
+--------+ +--------+ |\/| | queue-7(CT=0us) ###### |
| | | D=30us | |/\| +------------------------------+
+--------+ | E=-30us| |\/|
+--------+ |/\|
------incoming port-2------> |\/| +------------------------------+
|/\| | Non-deadline Queue Group: |
P6 P5 |\/| | queue-8 ############ |
+--------+ +--------+ |/\| | queue-9 ############ |
| | | D=40us | |\/| | queue-10 ############ |
+--------+ | E=40us | |/\| | ... ... |
+--------+ +--+ +------------------------------+
------incoming port-3------> ---------outgoing port---------->
-o----------------------------------o-------------------------------->
receiving-time base +F time
Figure 2: Time Sensitive Packets Buffered to Deadline Queue
As shown in Figure 2, the node successively receives six packets from
three incoming ports, among which packet 1, 2, 3 and 5 have
corresponding deadline information, while packet 4 and 6 are best-
effort packets. These packets need to be forwarded to the same
outgoing port according to the forwarding table entries. It is
assumed that they arrive at the line card where the outgoing port is
located at almost the same time after the forwarding delay in the
node (F = 5us). At this time, the queue status of the outgoing port
is shown in the figure. Then:
* The allowable queueing delay (Q) of packet 1 is 30 - 8 - 5 = 17us,
and it will be put into the deadline queue-5 (its CT is 15us),
meeting the condition that Q is in the range [15, 25).
* The allowable queueing delay (Q) of packet 2 is 20 + 15 - 5 =
30us, and it will be put into the deadline queue-4 (its CT is
25us), meeting the condition that Q is in the range [25, 35).
* The allowable queueing delay (Q) of packet 3 is 30 - 30 - 5 =
-5us, and it will be modified to AT (10 us) and put into the
deadline queue-6 (its CT is 5us), meeting the condition that Q is
in the range [5, 15).
Peng, et al. Expires 14 September 2023 [Page 13]
Internet-Draft Deadline Routing March 2023
* The allowable queueing delay (Q) of packet 5 in the node is 40 +
40 - 5 = 75us, and it will be modified to the MAX_CT (60 us) and
put into the deadline queue-1 (its CT is 55us), meeting the
condition that Q is in the range [55, 65).
* Packets 4 and 6 will be put into the non-deadline queue in the
traditional way.
4.1. Alternate Queue Allocation Rules
One option may further divide a deadline queue into multiple sub-
queues, each for a typical planned residence time (D) such as 10us,
20us, ..., 80us. That is, packets with different planned residence
time are inserted into different sub-queues and protected. In this
way, for two packets with the same Q but different D, we can decide
to firstly schedule the packet with the smallest D. This is
beneficial because when packets with smaller D facing larger
interference delay, it is difficult to have space for latency
compensation on downstream nodes, while packets with larger D have
larger space for latency compensation. However, this option may
cause late arrival in-time traffic with high priority (small D) to
preempt the sending of the early arrival on-time traffic with low
priority (large D).
Another option may further divide a deadline queue into multiple sub-
queues, each for a typical residence time deviation (E) such as -60,
-50us, -40us, ..., 0, PV(positive value). That is, packets with
different E are inserted into different sub-queues and protected. In
this way, for two packets with the same Q but different E, we can
decide to firstly schedule the packet with smallest E.
An implementation can also maintain sub-queues with different sub-
priorities. Packets are mapped to different sub-priorities according
to their planned residence time (D) and accumulated residence time
deviation (E), and placed in corresponding sub-queues . The basic
principle is to schedule packets with smaller E first. If E is
equal, then schedule packets with smaller D.
5. Buffer Size Design
The service rate of the deadline scheduler, termed as C, can reach
the link rate, but generally only needs to be configured as part of
the link bandwidth, such as 50%.
The buffer size of each deadline queue is AT * C - M, where M is the
maximum size of the packet with low priority. If we divide the time
by AT (such as 10 us) and observe the deadline queue with the lowest
priority, such as d_100 (i.e., CT=100 us), then in the first AT, the
Peng, et al. Expires 14 September 2023 [Page 14]
Internet-Draft Deadline Routing March 2023
traffic flow with priority d_100 (traffic arrival follows the
constraint of A_100(t)) will enter that queue. In the second AT, the
traffic flow with priority of d_90 (traffic arrival follows the
constraint of A_90(t)) will enter the same queue (i.e., CT=90 us).
In the third AT, the traffic flow with priority of d_80 (traffic
arrival follows the constraint of A_80(t)) will enter the same queue
(i.e., CT=80 us), and so on, until the queue becomes sent. It can be
seen that the maximum buffer size required for the queue is
sum(A_i(AT)) for all priority i. Since the stability condition of
the deadline scheduler must meet sum(A_i(t)) < C*t, so the buffer
size of each deadline queue can be set to C*AT, and the length of an
interference packet should be deducted.
In particular, when a packet that arrives early is penalized and
placed in a queue with a larger CT, it will not cause the queue to
overflow, because the queue is just where it is located. That is,
assuming that the packet does not arrive early but later on time, it
will not be penalized, and will still enter the same queue where the
CT becomes smaller later.
Similarly, when a late arrival packet is rewarded and placed in a
queue with a smaller CT, it will not cause the queue overflow,
because the queue is just where it is located. That is, assuming
that the packet does not arrive late but arrives on time before, it
will not be rewarded, and will still enter the same queue with a
smaller CT which has not been reduced before.
6. Schedulability Condition for Deadline Mechanism
In this section, we first discuss the schedulability condition of in-
time scheduling mode. Then analyze the condition of the on-time
mode.
Consider that a node is traversed by multiple paths, and each path
may carry one or more service flows. For each service flow, it
follows the corresponding constraint function, such as the leaky
bucket arrival curve A(t) = b + r*t. Generally there are two
mechanisms to enforce that traffic of a service before entering the
deadline scheduler conforms to the given traffic constraint function
. The first such mechanism is a traffic policer at the entrance of
the network which rejects traffic exceeding the constraint, and the
second mechanism is a traffic re-shaper on the intermediate node
which temporarily buffers packets to make it conform to the
constraint.
Suppose that all nodes along each path have reserved corresponding
bandwidth resources according to all service flows carried by that
path, then we can classify all service flows forwarded by a node
Peng, et al. Expires 14 September 2023 [Page 15]
Internet-Draft Deadline Routing March 2023
based on different planned residence times. For example, for the
class of planned residence time d_i, the corresponding accumulated
constraint function is A_i(t). Let d_i < d_(i+1), then the
schedulability condition for deadline in-time mode is:
A_1(t-d_1) + sum{A_i(t+AT-d_i) for all i>=2} <= C*t
where AT is the authorization time of each deadline queue, C is
service rate of the deadline scheduler.
The proof is similar with that in [RPQ], except that the step length
is fine-grained by TI when the queue rotates. Figure 3 below gives a
rough explanation.
^
| Planned Residence Time
| | |
|CT_t+AT+2*TI -> ===== | |
| CT_t+AT+TI -> =====| |
CT_t + - - - - - - - - - - - - >+====+ -\
+AT | | | |
| | | |
| | | > Phycical queue-x
d_p + - - - - - - - - - - - - >| | |
| | | |
CT_t + - - - - - - - - - - - - >+====+ -/
| | |===== <- CT_t-TI
| | | ===== <- CT_t-2*TI
| | |
|
| TI | ... ... | TI | TI | TI | TI | TI | TI |
---+----+-----------+----+----+----+----+----+----+--------->
0 ^ ^ ^ ^ time
| | | |
t-tau' t-ofst t t+tau
(busy period begin) (arrival) (departure)
Figure 3: Deadline In-time Scheduling
Peng, et al. Expires 14 September 2023 [Page 16]
Internet-Draft Deadline Routing March 2023
Suppose that the observed packet, with planned residence time d_p,
arrives at the scheduler at time t and leaves the scheduler at time
t+tau. It will be inserted to physical queue-x with count-down time
CT_t at the current timer interval TI with starting time t-ofst and
end time t-ofst+TI. According to the above packet queueing rules, we
have CT_t <= d_p < CT_t+AT. Also suppose that t-tau' is the
beginning of the busy period closest to t. Then, we can get the
amount of packets within time interval [t-tau', t+tau] that must be
scheduled before the observed packet. In detailed:
* For all service i with planned residence time d_i meeting CT_t <=
d_i < CT_t+AT, the workload is sum{A_i[t-tau', t]}.
Explanation: since the packets with priority d_i in the range
[CT_t, CT_t+AT) at time t will be sent before the observed
packet, the packets with the same priority d_i before time t
will become more urgent at time t, and must also be sent before
the observed packet.
* For all service i with planned residence time d_i meeting d_i >=
CT_t+AT, the workload is sum{A_i[t-tau', t-ofst+TI-(d_i-CT_t-
AT+TI)]}.
Explanation: although the packets with priority d_i larger than
CT_t+AT at time t will be sent after the observed packet, but
the packets with the same priority d_i before time t, at time
t-ofst+TI-(d_i-CT_t-AT+TI), will become more urgent at time t,
and must be sent before the observed packet.
* For all service i with planned residence time d_i meeting d_i <
CT_t, the workload is sum{A_i[t-tau', t+(CT_t-d_i)]}.
Explanation: the packets with priority d_i less than CT_t at
time t will certainly be sent before the observed packet, at a
future time t+(CT_t-d_i) the packets with the same priority d_i
will still be urgent than the observed packet (even the
observed packet also become urgent), and must be sent before
the observed packet.
* Then deduct the traffic that has been sent during the busy period,
i.e., C*(tau+tau').
Let tau as d_p, and remember that CT_t <= d_p, the above workload is
less than
sum{A_i(tau'+CT_t+AT-d_i) for all d_i >= CT_t} +
sum{A_i(tau'+CT_t-d_i) for all d_i < CT_t} - C*(tau'+d_p)
Peng, et al. Expires 14 September 2023 [Page 17]
Internet-Draft Deadline Routing March 2023
It is further less than
sum{A_i(tau'+d_p+AT-d_i) for all d_i >= d_2} + A_1(tau'+d_p-d_1) -
C*(tau'+d_p)
Then, denote x as tau'+d_p, we have
sum{A_i(x+AT-d_i) for all d_i >= d_2} + A_1(x-d_1) - C*(x)
Let the above workload less than zero, then we get the condition.
It should be noted that for a class i, its planned residence time d_i
is actually contributed by its own flows and all the classes with
lower planned residence time. Thus, based on the above
schedulability conditions, and knowing the traffic constraint
functions of all classes, we can then give the guidance to allocate
appropriate (i.e., not arbitrary) planned residence time for each
class. In brief, we can choose the traffic arrival constraint
function according to the preset planned residence time, or we can
choose the planned residence time according to the preset traffic
arrival constraint function.
Deadline in-time mode is a variant of EDF scheduler with work-
conserving characteristic and has several aggregated queues. It
needs to rely on the known traffic arrival curve to obtain the
sufficient and necessary schedulability conditions. In traffic
aggregation case, the arrival traffic faced by the intermidiate nodes
may be irregular due to work-conserving scheduling.
* One option may choose to implement re-shaping per flow or
interleaved regulator (IR) introduced by [UBS] placed before the
deadline scheduler on the intermediate node. It is not
recommended to maintain re-shapers per flow in a large-scale
network. As described in [IR-Theory], re-shaper per flow or
interleaved regulator does not increate worst-case latency bounds.
Peng, et al. Expires 14 September 2023 [Page 18]
Internet-Draft Deadline Routing March 2023
* Another option is that the deadline queue based on latency
compensation has essentially played the role of re-shaper queue.
Assume that an in-time burst A_i released late in the network
quickly reaches the intermediate node, and catches up with the
traffic released early in the network, resulting in the actual
arrival curve of A_i exceeding the constraints. However, the
excess traffic can be identified by the scheduler, which has a
larger residence time deviation and is placed in a queue with a
larger CT value, without causing interference to class i.
Although the excess traffic may be ranked before class j (j>i)
after latency compensation, considering that the essence of
latency compensation is to align to the on-time mode, that is,
even if the excess traffic is sent in the on-time mode, this part
of traffic naturally needs to be sent before class j traffic.
This document recommends not to use an explicit re-shaper before the
deadline scheduler on the intermediate nodes.
The test of schedulability conditions needs to be based on the whole
network view. When we need to add new traffic to the network, we
need to consider which nodes the related path will pass through, and
then check in turn whether these nodes still meet the schedulability
conditions after adding new traffic.
6.1. Conditions for Leack Bucket Constraint Function
Assume that we want to support n types of planned residence delay
levels (d_1, d_2,..., d_n) in the network, and the traffic arrival
constraint function of each delay level d_i is the leaky bucket
arrival curve A_i(t) = b_i + r_i * t. The above condition can be
expressed as:
b_1 <= C*d_1 - M
b_1 + b_2 + r_1*(d_2-d_1) <= C*d_2 - M
b_1 + b_2 + b_3 + r_1*(d_3-d_1) + r_2*(d_3-d_2) <= C*d_3 - M
... ...
sum(b_1+...+b_n) + r_1*(d_n-d_1) + r_2*(d_n-d_2) + ... +
r_n_1*(d_n-d_n_1) <= C*d_n - M
where, C is the service rate of the deadline scheduler, M is the
maximum size of the interference packet.
Peng, et al. Expires 14 September 2023 [Page 19]
Internet-Draft Deadline Routing March 2023
Note that the preset value of b_i does not depend on r_i, but r_i
generally refers to b_i (and burst interval) for setting. For
example, the preset value of r_i may be small, while the value of b_i
may be large. Such parameter design is more suitable for
transmitting traffic with large service burst interval, large service
burst size, but small bandwidth requirements.
An extreme example is that the preset r_i of each level d_i is close
to 0 (this is because the burst interval of the served service is too
large, e.g, one hour or one day), but the preset b_i is close to the
maximum value (e.g, b_1 = C*d_1 - M, note that this also requires
that the depth of the leaky bucket used to regulate the traffic is
large enough), then when the concurrent flow of all delay levels is
scheduled, the time 0~d_1 is all used to send the burst b_1, the time
d_1~d_2 is all used to send the burst b_2, the time d_2~d_3 is all
used to send the burst b_3, and so on. In other words , in this
example, the traffic of different levels is sent in turn in the unit
of AT.
However, a more common example is that the preset r_i of each level
d_i will divide C roughly equally, and the preset b_i is the maximum
packet size (such as 2000 bytes).
The parameters b_i and r_i of each level d_i constitute the delay
resources of that level of the link. A path can reserve required
burst and bandwidth from delay resources of the specific level, and
the reservation is successful only if the two resources are
successfully reserved at the same time. As long as neither b_i nor
r_i is free, the delay resource of level d_i is exhausted.
The delay resource reservation status of each level is independent.
For example, in the case that the parameter b_1 is determined, if the
required burst of the d_1 service is large, then although only a few
d_1 services can be supported, but if r_1 is very small, the network
can still support more services of other levels at the same time.
6.2. Schedulability Condition Analysis for On-time Mode
Compared with the in-time mode, on-time mode is non-work-conserving,
which can be considered as the combination of damper and EDF
scheduler. The intuitive understanding of the on-time mode is that
if the headends of multiple paths (they converge at the same
intermediate node) apply admission control strictly according to the
service bandwidth when releasing traffic to the network, and
successfully reserve the corresponding bandwidth resources for these
paths in the control plane, then the on-time forwarding behavior on a
path controls the interval between early and late release of traffic
on that path, but not lead to an increase in the bandwidth occupied
Peng, et al. Expires 14 September 2023 [Page 20]
Internet-Draft Deadline Routing March 2023
by that path. Therefore, the on-time mode does not cause the arrival
curve to exceed the expected traffic constraint function . So that
the above schedulability condition can also be applied to the on-time
mode.
Suppose that the selected parameters of the constraint function of
each level makes the scheduler need to work at full speed (i.e.,
service rate C), then for the in-time mode, there will be a packet of
a specific level to be sent just before its deadline. While for the
on-time mode, it may make the sending time of the packet really
exceed its deadline, but the extra delay will not exceed one AT. The
following Figure 4 shows the difference between on-time scheduling
and in-time scheduling.
arrival traffic:
A_1: #1 #2 #3 #4 #5 ...
A_2: $1 $2 $3 $4 $5 ...
...
A_n: &1 &2 &3 &4 &5 ...
|
v
In-time Scheduling:
#1$1...&1 #2$2...&2 #3$3...&3 #4$4...&4 #5$5...&5 ...
On-time Scheduling:
#1 #2 #3 #4 #5
$1 $2 $3 $4
... ...
&1
------+-----------+-----------+-----------+--... ...--+-----------+---
\__ d_1 __/
\________ d_2 ________/
\______________ d_3 ______________/
... ...
\_________________________ d_n ___________________________/
Figure 4: Difference between On-time and In-time Scheduling
Peng, et al. Expires 14 September 2023 [Page 21]
Internet-Draft Deadline Routing March 2023
As shown in the figure, each burst of A_1 is termed as #num, each
burst of A_2 as $num, and each burst of A_n as &num. A single burst
may contain multiple packets. For example, burst #1 may contain
several packets, and the actual space between #1 and # 2 may be
small. Although the figure shows the example that the burst interval
of multiple levels of service is the same and the phase is aligned,
the actual situation is far from that. However, this example depicts
the typical scheduling behavior of on-time mode.
In the in-time mode, all concurrent traffic of multiple levels will
be scheduled as soon as possible according to priority. For example,
in the duration d_1, in addition to the burst #1 that must be sent,
the burst $1~&1 may also be sent, but the latter is not necessarily
scheduled to be sent before the burst #2 as shown in the figure.
Note that in-time mode cannot guarantee jitter.
While in the on-time mode, each burst is scheduled at its deadline.
Because of the scheduling delay, the transmission of the burst will
exceed its deadline. The last packet of the burst will face more
delay than the first packet. For example, when burst #2 enters the
deadline queue with related CT, it will be behind burst $1, and the
last packet of burst #2 needs to wait for burst $1 and burst #2 to
send first. According to the buffer size design, the extra delay in
the worst case is the time to wait for all packets in the queue to be
sent first, i.e., an AT.
When a node generates the extra delay that exceeds its deadline, it
will be compensated in the downstream node, that is, the allowable
residence delay of the downstream node will be reduced from the extra
delay. On the downstream node, the extra delay that exceeds its
deadline may be generated again. Note that this extra delay is not
cumulative with the number of hops, it is the deviation between the
actual delay and the planned delay of the entire path.
The above worst-case extra delay can also be understood in this way,
termed the current CT range of the on-time queue is [CT_h, CT_e],
where CT_h is the CT value of the head of the queue, CT_e is the CT
value of the end of the queue, when the observed packet, with Q =
CT_h, is placed at the end of the queue, then it gets the extra delay
in the worst case.
On the contrary, if the observed packet, with Q = CT_e, is placed at
the head of the queue, it will be sent one AT before its deadline.
In short, the deviation between the actual delay and the planned
delay in the on-time mode may be in range [-AT, AT].
Peng, et al. Expires 14 September 2023 [Page 22]
Internet-Draft Deadline Routing March 2023
An implementation may choose loose on time scheduling, so that the
deviation is always less than zero. For the queueing rule CT <=
Q-k*AT < CT+AT, one can config k>=1 on the intermediate nodes for
loose on-time scheduling, while config k<1 on the egress for strict
on-time scheduling. A node needs to identify its role as an
intermediate node or egress node for a specific packet and provide an
appropriate k value. Note that even if all intermediate nodes apply
loose on-time scheduling, as long as the egress node applies strict
on-time scheduling, the jitter target can still be achieved.
7. Admission Control on the Ingress
On the ingress PE node, traffic regulation must be performed on the
incoming port, so that the service traffic does not exceed its
T-SPEC. This kind of regulation is usually the shaping using leaky
bucket combined with the incoming queue that receives service
traffic. A service may generate discrete multiple bursts within its
periodic service burst interval, and the leaky bucket depth should be
larger than the maximum burst size.
The leaky bucket shaping will essentially make all the bursts within
the service burst interval evenly distributed within the service
burst interval, which may be inconsistent with the original arrival
curve of the service flow. Therefore, some bursts within the service
burst interval may face more shaping latency. For example, the head
of the service burst interval contains two discrete bursts with the
same size, but the bandwidth reserved by the service is very small
(i.e., total burst size/burst interval). Assuming that the bucket
depth is the size of a single burst, the shaping latency faced by the
second burst is half of the service burst interval.
Although the shaped curve and the original arrival curve can be as
consistent as possible by increasing the bucket depth, to minimize
the shaping delay of each burst, but this means that the service will
occupy more burst resources, and reduce the service scale that the
network can support according to the schedulability conditions.
Unless, customers are willing to spend more money to buy a larger
burst.
Generally, a path may carry multiple service flows with different
delay levels. For a certain delay level d_i, the path will reserve
some resources from the delay resource pool of the link. The delay
resource pool here, as leack bucket constraint function shown in
Section 6.1, is a set of preset parameters that meet the
schedulability conditions. For example, the level d_1 has a burst
upper limit of b_1 and a bandwidth upper limit of r_1. A path j may
allocate partial resources (b_i_j, and r_i_j) from the resource quota
(b_i, and r_i) of the link's delay level d_i . A service flow k that
Peng, et al. Expires 14 September 2023 [Page 23]
Internet-Draft Deadline Routing March 2023
carried in path j, may use resources (b_i_j_k, and r_i_j_k) according
to its T_SPEC. It can be seen that the values of b_i_j and r_i_j
determine the scale of the number of paths that can be supported,
while the values of b_i_j_k and r_i_j_k determine the scale of the
number of services that can be supported. The following expression
exists.
b_i_j >= sum(b_i_j_k), for all service k over the path j.
r_i_j >= sum(r_i_j_k), for all service k over the path j.
b_i >= sum(b_i_j), for all path j through the specific link.
r_i >= sum(r_i_j), for all path j through the specific link.
The values of b_i_j_k and r_i_j_k will be written in the SLA between
the customer and the network provider, and the network entry node
will set the corresponding bucket depth according to b_i_j_k to
forcibly delay the excess bursts; The entry node also sets the
corresponding bucket rate according to r_i_j_k, so that the new added
token in any measurement interval t does not exceed r_i_j_k * t.
On the entry node, for the burst that faces the shaping delay due to
violation of b_i_j_k, its shaping delay cannot be included in the
delay compensation formula of deadline, otherwise, the delay
compensation will make that burst catch up with the previous burst,
resulting in damage to the shaping result and violation of the
arrival constraint function.
Then, the regulated traffic reaches the deadline scheduler on the
outgoing port. Since the traffic of each level meets the leaky
bucket arrival constraint function and the parameters of the shaping
curve do not exceed the limits of the parameters provided by the
schedulability conditions, the traffic can be successfully scheduled
.
Peng, et al. Expires 14 September 2023 [Page 24]
Internet-Draft Deadline Routing March 2023
Note that the flow sent from the deadine scheduler of the headend to
the next hop still follows the arrival constraint function of the
path after the priority sorting on the next hop through latency
compensation. Then on the next hop, when concurrent flows received
from multiple paths are aggregated to the same outgoing port for
transmission, within any d_1 interval, the aggregated d_1 traffic
will not exceed the burst resources of level d_1 reserved by these
paths on the outgoing port, and each aggregated d_i traffic will not
exceed the bandwidth resources of level d_1 reserved by these paths
on the outgoing port; Similarly, within any d_2 interval, the
aggregated d_2 traffic will not exceed the burst resources of level
d_2 reserved by these paths on the outgoing port, and each aggregated
d_i traffic will not exceed the bandwidth resources of level d_1
reserved by these paths on the outoging port, and so on.
Figure 5 depicts an example of deadline based traffic regulated and
scheduled on the ingress PE node. In the figure, the shaping delay
caused by the previous burst is termed as S#, and forwarding delay
termed as F.
1st burst
|
received v
+-+ +-+ +----+ +-+ +--+ +------+
|1| |2| | 3 | |4| |5 | | 6 | <= burst sequence
+-+ +-+ +----+ +-+ +--+ +------+
| | | | | |
~+0 ~+S1 ~+0 ~+S3~+S4 ~+0
~+F ~+F ~+F ~+F ~+F ~+F
| | | | | |
UNI v v v v v v
ingr-PE -+--------+--------+--------+--------+--------+--------+---->
NNI | Auth | Auth | Auth | Auth | Auth | Auth | time
| time | time | time | time | time | time |
1,2 in 3 in 4 in 5 in 6 in
Queue-A Queue-B Queue-C Queue-D Queue-E
(CT<=Q) (CT<=Q) (CT<=Q) (CT<=Q) (CT<=Q)
| | | | |
~+Q ~+Q ~+Q ~+Q ~+Q
| | | | |
sending v v v v v
+-+ +-+ +----+ +-+ +--+ +------+
|1| |2| | 3 | |4| |5 | | 6 |
+-+ +-+ +----+ +-+ +--+ +------+
Peng, et al. Expires 14 September 2023 [Page 25]
Internet-Draft Deadline Routing March 2023
Figure 5: Deadline Based Packets Orchestrating
There are 6 bursts received from the customer. The burst-2, 4, 5 has
regulation delay S1, S3, S4 that caused by previous burst
respectively. While burst-1, 3, 6 has zero regulation delay because
the number of tokens is sufficient. The regulation makes 6 bursts
roughly distributed within the service burst interval.
Suppose that each burst passes through the same intra-node forwarding
delay F, and when they arrive at the deadline scheduler in turn, they
will have the same allowable queueing delay (Q), regardless of
whether they have experienced shaping delay before. When the packets
of burst-1, 2 arrive at the scheduler, according to CT <= Q-k*AT <
CT+AT, they will be placed in Queue-A with matched CT and waiting to
be sent. Similarly, when the packets of burst-3/4/5/6 arrive at the
scheduler, they will be placed in Queue-B/C/D/E respectively and
waiting to be sent.
8. Overprovision Analysis
For each level d_i, the latency resource quota of the specific link
is (b_i, r_i). A path j may allocate partial resources (b_i_j,
r_i_j) from the resource quota (b_i, r_i). In order to support more
d_i services in the network, it is necessary to set larger b_i and
r_i. However, as mentioned earlier, the values of b_i and r_i are
set according to schedulability conditions and cannot be modified at
will. Therefore, the meaningful analysis is the service scale that
the network can support under the premise that b_i and r_i are
determined values.
For bandwidth resource reservation case, the upper limit of the total
bandwidth that can be reserved for all aggregated services of level
d_i is r_i, which is the same as the behavior of traditional
bandwidth resource reservation. There is no special requirement for
the measurement interval of calculating bandwidth value.
For the burst resource reservation case, the upper limit of the total
burst that can be reserved for all aggregated services of level d_i
is b_i. If the burst of each service of level d_i is b_k, then the
number service can be supported is b_i/b_k, which is the worst case
considering the concurrent arrival of these service flows . However,
the burst resource reservation is independent of bandwidth resource,
i.e., it does not take the calculation result of b_k/d_i (or like
b_k/AT, see the example in Section 6.1) to get an overprovision
bandwidth and then to affect the reservable bandwidth resources.
Peng, et al. Expires 14 September 2023 [Page 26]
Internet-Draft Deadline Routing March 2023
9. Latency Compensation Considerations
As mentioned above, the calculation of allowable queueing delay (Q)
reflects the meaning of latency compensation. It can be used for in-
time services to distinguish between emergency flows and non
emergency flows (compared with traditional strict priority
scheduling), and also for on-time services to strictly control delay
jitter. In both cases, packets have a tolerable maximum queueing
delay in the queueing subsystem. The traditional EDF scheduling can
also benefit from this latency compensation.
Suppose that two packets, P1, P2, are generated instantaneously from
a specific in-time service flow at the same source, and the two
packets have the same planned residence time. P1 may face less
interference delay than P2 in their journey. When they arrive at an
intermediate node in turn, P2 will have less allowable queueing delay
(Q) than P1 to try to stay close to P1 again. It should be noted
that to compary who is ealier is based on queue's CT and packet's Q,
according to the above queueing rule (CT <= Q < CT+AT), and the CT of
the queue is not changed in real-time, but gradually with the
decreasing step TI. It is possible to get an unexpected comparision
result.
As shown in Figure 6, P1 and P2 are two packets belonging to the same
burst. The arrival time when they are received on the scheduler is
shown in the figure. Suppose that the AT of the deadline queue is
10us, the decreasing step TI is 1us, and the transmission time of
each packet is 0.01us. Also suppose that the Q values of two
adjacent packets P1 and P2 are 40us and 39us respectively, and they
are both received in the window from T0 to T0+1us. P1 will enter
queue-B with CT range [40, 50), while P2 will enter queue-A with CT
range [30, 40). This means that P2 will be scheduled before P1,
resulting in disorder.
Peng, et al. Expires 14 September 2023 [Page 27]
Internet-Draft Deadline Routing March 2023
packets arrived later
packets arrived earlier |
| |
V V
--------+-----------------------------------------------+---------
... ... | .P1.................P2....................... | ... ...
--------+-----------------------------------------------+---------
P1.Q=40us
P2.Q=39us
| | |
--------o---------------------o---------------------o----------->
T0 T0+1us T0+2us time
queue-A.CT[30,40) queue-A.CT[29,39)
queue-B.CT[40,50) queue-B.CT[39,49)
queue-C.CT[50,60) queue-C.CT[49,59)
Figure 6: Packets queueing based on Latency Compensation
DetNet architecture [RFC8655] provides Packet Ordering Function
(POF), that can be used to solve the above disorder problem caused by
the latency compensation.
Alternatively, if the POF is not enabled, we can also maintain states
for service flows to record the last queueing information to address
this issue.
For example, one ore more OGOs (order guarantee object) are
maintained per flow (or per incoming port, planned residence time,
incoming port plus planned residence time, etc), on each outgoing
port. An OGO records the queueing information which is the queue
that all the packets mapped to this OGO was inserted recently. For
simplicity, a count-down time (CT), which is copied from the recent
inserted deadline queue, can be recorded in OGO. Note that the CT of
OGO needs to decrease synchronously with that of other deadline
queues, with the same decreasing step TI. If the CT of OGO decreases
to 0, it will remain at 0.
When a packet arrives at the deadline scheduler at the outgoing port
, it is first mapped to its OGO, and get the CT of OGO, termed as
OGO.CT. Then, according to the above queueing rule (CT <= Q-k*AT <
CT+AT), get the CT of a preliminarily selected queue, termed as
preliminary CT.
Let Q' is MAX{OGO.CT, preliminary CT}, and put the packet in the
target queue according to CT <= Q' < CT+AT
Update the value of OGO.CT to the CT of target queue.
Peng, et al. Expires 14 September 2023 [Page 28]
Internet-Draft Deadline Routing March 2023
The intuitive meaning of the above operation is that for two packets
belonging to the same OGO (e.g, per flow), the packet received later
should not be queued before the packet received first. This also
implies that the main purpose of latency compensation is to
compensate the interference delay between pakckets of different
flows, rather than compensating the interference delay between
packets of the same flow.
10. Compatibility Considerations
For a particular path, if only some nodes in the path upgrade support
the deadline mechanism defined in this document, the end-to-end
deterministic delay/jitter target will only be partially achieved.
Those legacy devices may adopt the existing priority based scheduling
mechanism, and ignore the possible deadline information in the
packet, thus the intra node delay produced by them cannot be
perceived by the adjacent upgraded node. The more upgraded nodes
included in the path, the closer to the delay/jitter target.
Although, the legacy devices may not support the dataplane mechanism
described in this document, but they can be freely programmed (such
as P4 language) to measure and insert the deadline information into
packets, in this case the achievement of delay/jitter target will be
more perfect.
Only a few key nodes are upgraded to support deadline mechanism,
which is low-cost, but can meet a service with relatively loose time
requirements. Figure 7 shows an example of upgrading only several
network border nodes. In the figure, only R1, R2, R3 and R4 are
upgraded to support deadline mechanism. A deterministic path across
domain 1, 2, and 3 is established, which contains nodes R1, R2, R3,
and R4, as well as explicit nodes in each domain. Domain 1, 2 and 3
use the traditional strict priority based forwarding mechanism. The
encoding of the packet sent by R1 includes the planned residence time
and the accumulated residence time deviation. Especially, DS filed
in IP header ([RFC2474]) are also set to appropriate values. The
basic principle of setting is that the less the planned residence
time, the higher the priority. In order to avoid the interference of
non deterministic flow to deterministic flow, the priority of
deterministic flow should be set as high as possible.
Peng, et al. Expires 14 September 2023 [Page 29]
Internet-Draft Deadline Routing March 2023
The delay analysis based on strict priority in each domain can be
found in [SP-LATENCY], which gives the formula to evaluate the worst-
case delay of each hop during the resource reservation procedure.
The worst-case delay per hop depends on the number of hops and the
burst size of interference flows that may be faced on each hop.
[EF-FIFO] also shows that, for FIFO packet scheduling be used to
support the EF (expedited forwarding) per-hop behavior (PHB), if the
network utilization level alpha < l/(H-l), the worst-case delay bound
is inversely proportional to 1-alpha*(H-1), where H is the number of
hops in the longest path of the network.
Although the deadline in-time scheduling, the SP scheduling and EF
FIFO scheduling are all work-conserving, the deadline in-time
scheduling can further distinguish between emergency and non
emergency according to deadline information other than traffic class
. Therefore, when analyzing the latency of the in-time mode, the
latency is not evaluated just according to the order in which the
packets arrive at the scheduler, but also according to the deadline
of the packets. An intuitive phenomenon is that if a packet
unfortunately faces more interference delays at the upstream nodes,
it will become more urgent at the downstream node, and will not
always be unfortunate. This operation of dynamically modifying the
key fields, e.g, the existing acumulated residence time deviation
(E), of the packet can avoid always overestimating worst-case latency
on all hops. According to schedulability condition, the worst-case
latancy per hop for in-time mode is d_i, and for on-time mode is d_i
+ AT (note that AT does not accumulate with the number of hops).
When the boundary node (e.g, R2) receives the deterministic traffic,
it will decide whether to speed up or hold according to the existing
accumulated residence time deviation information carried in the
packet. The in-time traffic is always sent as soon as possible,
especially when the residence time deviation of the packet is
negative. The on-time traffic always controls the sending time so
that the average planned residence time is followed, especially when
the residence time deviation of the packet is positive. For a
specific deterministic flow, if it experiences too much latency in
the SP domain (due to unreasonable setting of DS field and the
inability to distinguish between deterministic and non deterministic
flows), even if the boundary node accelerates the transmission, it
may not be able to achieve the target of low E2E latency. If the
traffic experiences less latency within the SP domain, on-time mode
will work to achieve the end-to-end jitter target.
Peng, et al. Expires 14 September 2023 [Page 30]
Internet-Draft Deadline Routing March 2023
_____ ___ _____ ___ _____ ___
/ \/ \___ / \/ \___ / \/ \___
/ \ / \ / \
+--+ +--+ +--+ +--+
|R1| Strict Priority |R2| Strict Priority |R3| Strict Priority |R4|
+--+ domian 1 +--+ domian 2 +--+ domian 3 +--+
\____ __/ \____ __/ \____ __/
\_______/ \_______/ \_______/
Figure 7: Example of partial upgrade
11. Deployment Considerations
According to the above schedulability conditions, the planned
residence time (e.g, d_i) that can be provided in the network is
related to the entire deployed service flows. Each delay level d_i
has independent delay resources, and the smaller d_i, the more
valuable it is. The operator needs to match the corresponding d_i
for each service. It is not recommended to allocate a large d_i for
the on-time service, which will consume a large amount of buffer and
is not beneficial to target jitter. However, the in-time service may
also not want to be assigned to a larger d_i, that may be determined
by its SLA that requires a smaller target end-to-end latency. Thus
it is necessary to find a balance to meet more service requirements
as much as possible.
We believe that the planned residence time specified for the service
flows is completely determined by the network, not by the user.
Therefore, it is impossible to carry a d_i beyond the network
capacity in the packets. However, there may be some services that
expect a large residence time, in this case it is recommended to hold
them at the border nodes, but not on the intermediate node. For this
purpose, the MAX_CT configured on the border nodes can be much larger
than the MAX_CT configured on the intermediate nodes, or the
hierarchical deadline queues can be configured on the border nodes.
As mentioned above, loose on-time scheduling or in-time scheduling
can be applied in the network, and the jitter is removed at the edge
node according to strict on-time scheduling.
12. Benefits
The mechanism described in this document has the following benefits:
Peng, et al. Expires 14 September 2023 [Page 31]
Internet-Draft Deadline Routing March 2023
* Time synchronization is not required between network nodes. Each
node can flexibly set the authorization time length of the
deadline queue according to its own outgoing port bandwidth.
* Statistical multiplexing based. It is an enhancement of SP
scheduling algorithm, friendly to the upgrade of packet switching
network. All nodes in the network can independently use different
timeout intervals to rotate the priority of the deadline queues.
* Traditional regulation/shaping on the ingress. It does not depend
on specific time intervals.
* No additional reshaping is required on the intermediate node.
* A single set of deadline queues supports multiple levels of
residence time.
* Latency based scheduling, with clear delay performance. For in-
time mode, the end-to-end delay is H*(F~D), jitter is H*Q; For on-
time mode, the end-to-end delay is H*D, jitter is +-AT.
13. IANA Considerations
There is no IANA requestion for this document.
14. Security Considerations
TBD
15. Acknowledgements
TBD
16. References
16.1. Normative References
[I-D.peng-6man-deadline-option]
Peng, S., Tan, B., and P. Liu, "Deadline Option", Work in
Progress, Internet-Draft, draft-peng-6man-deadline-option-
01, 11 July 2022, <https://datatracker.ietf.org/doc/html/
draft-peng-6man-deadline-option-01>.
[I-D.stein-srtsn]
Stein, Y. J., "Segment Routed Time Sensitive Networking",
Work in Progress, Internet-Draft, draft-stein-srtsn-01, 29
August 2021, <https://datatracker.ietf.org/doc/html/draft-
stein-srtsn-01>.
Peng, et al. Expires 14 September 2023 [Page 32]
Internet-Draft Deadline Routing March 2023
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997,
<https://www.rfc-editor.org/info/rfc2119>.
[RFC2474] Nichols, K., Blake, S., Baker, F., and D. Black,
"Definition of the Differentiated Services Field (DS
Field) in the IPv4 and IPv6 Headers", RFC 2474,
DOI 10.17487/RFC2474, December 1998,
<https://www.rfc-editor.org/info/rfc2474>.
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
May 2017, <https://www.rfc-editor.org/info/rfc8174>.
[RFC8655] Finn, N., Thubert, P., Varga, B., and J. Farkas,
"Deterministic Networking Architecture", RFC 8655,
DOI 10.17487/RFC8655, October 2019,
<https://www.rfc-editor.org/info/rfc8655>.
[RFC9320] Finn, N., Le Boudec, J.-Y., Mohammadpour, E., Zhang, J.,
and B. Varga, "Deterministic Networking (DetNet) Bounded
Latency", RFC 9320, DOI 10.17487/RFC9320, November 2022,
<https://www.rfc-editor.org/info/rfc9320>.
16.2. Informative References
[EF-FIFO] "Fundamental Trade-Offs in Aggregate Packet Scheduling",
2001, <https://ieeexplore.ieee.org/document/992892>.
[IR-Theory]
"A Theory of Traffic Regulators for Deterministic Networks
with Application to Interleaved Regulators", 2018,
<https://ieeexplore.ieee.org/document/8519761>.
[RPQ] "Exact Admission Control for Networks with a Bounded Delay
Service", 1996,
<https://ieeexplore.ieee.org/document/556345/>.
[SP-LATENCY]
"Guaranteed Latency with SP", 2020,
<https://ieeexplore.ieee.org/document/9249224>.
[UBS] "Urgency-Based Scheduler for Time-Sensitive Switched
Ethernet Networks", 2016,
<https://ieeexplore.ieee.org/abstract/document/7557870>.
Peng, et al. Expires 14 September 2023 [Page 33]
Internet-Draft Deadline Routing March 2023
Authors' Addresses
Shaofu Peng
ZTE Corporation
China
Email: peng.shaofu@zte.com.cn
Peng Liu
China Mobile
China
Email: liupengyjy@chinamobile.com
Dong Yang
Beijing Jiaotong University
China
Email: dyang@bjtu.edu.cn
Peng, et al. Expires 14 September 2023 [Page 34]