Network Shaofu. Peng
Internet-Draft ZTE Corporation
Intended status: Standards Track Zongpeng. Du
Expires: 9 January 2024 China Mobile
Kashinath. Basu
Oxford Brookes University
Zuopin. Cheng
New H3C Technologies
Dong. Yang
Beijing Jiaotong University
8 July 2023
Deadline Based Deterministic Forwarding
draft-peng-detnet-deadline-based-forwarding-06
Abstract
This document describes a deterministic forwarding mechanism to IP/
MPLS network, as well as corresponding resource reservation,
admission control, policing, etc, to provide guaranteed latency.
Especially, latency compensation with core stateless is discussed to
replace reshaping and also achieve low jitter.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on 9 January 2024.
Copyright Notice
Copyright (c) 2023 IETF Trust and the persons identified as the
document authors. All rights reserved.
Peng, et al. Expires 9 January 2024 [Page 1]
Internet-Draft Deadline Routing July 2023
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents (https://trustee.ietf.org/
license-info) in effect on the date of publication of this document.
Please review these documents carefully, as they describe your rights
and restrictions with respect to this document. Code Components
extracted from this document must include Revised BSD License text as
described in Section 4.e of the Trust Legal Provisions and are
provided without warranty as described in the Revised BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1. Requirements Language . . . . . . . . . . . . . . . . . . 4
2. EDF Scheduling Overview . . . . . . . . . . . . . . . . . . . 4
2.1. Planned Residence Time of the Service Flow . . . . . . . 5
2.2. Delay Levels Provided by the Network . . . . . . . . . . 6
3. Sorted Queue . . . . . . . . . . . . . . . . . . . . . . . . 7
3.1. Buffer Size Design . . . . . . . . . . . . . . . . . . . 7
3.2. Schedulability Condition for PIFO . . . . . . . . . . . . 7
3.2.1. Conditions for Leaky Bucket Constraint Function . . . 8
4. Rotation Priority Queues . . . . . . . . . . . . . . . . . . 9
4.1. Buffer Size Design . . . . . . . . . . . . . . . . . . . 11
4.2. Schedulability Condition for Deadline Queues . . . . . . 12
4.2.1. Conditions for Leaky Bucket Constraint Function . . . 14
5. Reshaping . . . . . . . . . . . . . . . . . . . . . . . . . . 15
6. Latency Compensation . . . . . . . . . . . . . . . . . . . . 16
6.1. Get Existing Accumulated Planned Residence Time . . . . . 16
6.2. Get Existing Accumulated Actual Residence Time . . . . . 16
6.3. Get Existing Accumulated Residence Time Deviation . . . . 17
6.4. Get Allowable Queueing Delay . . . . . . . . . . . . . . 17
6.5. Scheduled by Allowable Queueing Delay . . . . . . . . . . 18
6.6. On-time Mode Based on Allowable Queueing Delay . . . . . 19
7. Option-1: Reshaping plus Sorted Queue . . . . . . . . . . . . 19
8. Option-2: Reshaping plus RPQ . . . . . . . . . . . . . . . . 19
9. Option-3: Latency Compensation plus Sorted Queue . . . . . . 19
9.1. Packet Disorder Considerations . . . . . . . . . . . . . 20
10. Option-4: Latency Compensation plus RPQ . . . . . . . . . . . 21
10.1. Packet Disorder Considerations . . . . . . . . . . . . . 23
11. Resource Reseravtion . . . . . . . . . . . . . . . . . . . . 25
11.1. Delay Resource Definition . . . . . . . . . . . . . . . 26
11.2. Traffic Engineering Path Calculation . . . . . . . . . . 27
12. Admission Control on the Ingress . . . . . . . . . . . . . . 28
13. Overprovision Analysis . . . . . . . . . . . . . . . . . . . 31
14. Compatibility Considerations . . . . . . . . . . . . . . . . 31
15. Deployment Considerations . . . . . . . . . . . . . . . . . . 33
16. Evaluations . . . . . . . . . . . . . . . . . . . . . . . . . 34
17. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 36
18. Security Considerations . . . . . . . . . . . . . . . . . . . 36
Peng, et al. Expires 9 January 2024 [Page 2]
Internet-Draft Deadline Routing July 2023
19. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 36
20. References . . . . . . . . . . . . . . . . . . . . . . . . . 36
20.1. Normative References . . . . . . . . . . . . . . . . . . 36
20.2. Informative References . . . . . . . . . . . . . . . . . 37
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 38
1. Introduction
[RFC8655] describes the architecture of deterministic network and
defines the QoS goals of deterministic forwarding: Minimum and
maximum end-to-end latency from source to destination, timely
delivery, and bounded jitter (packet delay variation); packet loss
ratio under various assumptions as to the operational states of the
nodes and links; an upper bound on out-of-order packet delivery. In
order to achieve these goals, deterministic networks use resource
reservation, explicit routing, service protection and other means.
* Resource reservation refers to the occupation of resources by
service traffic, exclusive or shared in a certain proportion, such
as dedicated physical link, link bandwidth, queue resources, etc.
* Explicit routing means that the transmission path of traffic flow
in the network needs to be selected in advance to ensure the
stability of the route and does not change with the real-time
change of network topology, and based on this, the upper bound of
end-to-end delay and delay jitter can be accurately calculated.
* Service protection refers to sending multiple service flows along
multiple disjoint paths at the same time to reduce the packet loss
rate.
In general, a deterministic path is a strictly explicit path
calculated by a centralized controller, and resources are reserved on
the nodes along the path to meet the SLA requirements of
deterministic services.
In the presence of admission control, policing, reshaping, a large
number of packet scheduling techniques can provide bounded latency.
However, many packet schedulers may result in an inefficient use of
network resources, or provide an overestimated latency. The
underlying scheduling mechanisms in IP/MPLS networks generally use SP
(Strict Priority) and WFQ (Weighted Fair Queuing), and manage a small
number of priority based queues. They are rate based schedulers.
For SP, the highest priority queue can consume the total port
bandwidth, while for WFQ scheduler, each queue may be configured with
a pre-set rate limit. Both of them can provide the worst-case
latency, but evaluation is generally overestimated. We assume that
Peng, et al. Expires 9 January 2024 [Page 3]
Internet-Draft Deadline Routing July 2023
when providing deterministic services in such a network, the observed
flow always has the highest (or relatively high) priority. In the
case where the network core supports reshaping per flow (or optimized
reshaping as provided by [IR-Theory]), the worst-case latency of a
flow is approximately equal to the accumulated burst of its traffic
class divided by the rate limit of that traffic class (note that a
rate based scheduler may refer to [Net-Calculus] to obtain its rate-
latency service curve and get a more accurate evaluation). When the
network core does not implement reshaping, multiple flows sharing the
same priority may form burst cascade, making it more difficult or
even impossible to evaluate the worst-case latency of a single flow.
[EF-FIFO] discusses the SP scheduling behavior in this core-stateless
situation, which requires the overall network utilization level to be
limited to a small portion of its link capacity in order to provide
an appropriate bounded latency.
According to [EDF-algorithm], an EDF (earliest-deadline-first)
scheduler, which always selects the packet with the shortest deadline
for transmission, is an optimal scheduler for a bounded delay service
in the sense that it can support the delay bounds for any set of
connections that can be supported by some other scheduling method.
EDF is a latency based scheduler, which always selects the packet
with the shortest deadline for transmission. EDF further
distinguishes traffic in terms of time urgency, rather than rough
traffic classes.
This document introduces EDF scheduling mechanism to IP/MPLS network,
as well as corresponding resource reservation, admission control,
policing, etc, to provide guaranteed latency. Especially, an
enhanced option based on latency compensation is discussed to replace
reshaping and also achieve low jitter.
1.1. Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in BCP
14 [RFC2119] [RFC8174] when, and only when, they appear in all
capitals, as shown here.
2. EDF Scheduling Overview
The EDF scheduler assigns a deadline for each incoming packet, which
is equal to the time the packet arrives at the node plus the latency
limit, i.e., planned residence time (D), see Section 2.1. The EDF
scheduling algorithm always selects the packet with the earliest
deadline for transmission.
Peng, et al. Expires 9 January 2024 [Page 4]
Internet-Draft Deadline Routing July 2023
The precondition for EDF to work properly is that the traffic of any
service flow must always satisfy the given traffic constraint
function when it reaches a certain EDF scheduler. Therefore, it
should generally implement traffic regulation at the network entrance
to ensure that the admitted traffic complies with the constraints;
And, implement reshaping on each intermediate node to temporarily
cache packets to ensure that packets entering the EDF scheduler queue
comply with the constraints. However, reshaping per flow is a
challenge in large-scaling networks.
Another challenge of EDF scheduling is that queued packets must be
sorted and stored according to their deadline, and whenever a new
packet arrives at the scheduler, it needs to perform search and
insert operations on the corresponding data structure, e.g, a List, a
PIFO (put-in first-out) queue, or other type of sorted queue, at line
rate. [RPQ] described rotating-priority-queues that approximate EDF
scheduling behavior, and do not require deadline based sorting of
queued packets, simplifying enqueueing operations.
According to the above two challenges, we will obtain four
combination solutions. Operators should choose appropriate solutions
based on the actual network situation.
* option-1: Reshaping plus sorted queue.
* option-2: Reshaping plus RPQ.
* option-3: Latency Compensation plus sorted queue.
* option-4: Latency Compensation plus RPQ.
2.1. Planned Residence Time of the Service Flow
The planned residence time (termed as D) of the packet is an offset
time, which is based on the arrival time of the packet and represents
the maximum time allowed for the packet to stay inside the node.
Peng, et al. Expires 9 January 2024 [Page 5]
Internet-Draft Deadline Routing July 2023
For a deterministic path based on deadline scheduling, the path has
deterministic end-to-end delay requirements. The delay includes two
parts, one is the accumulated residence delay and the other is the
accumulated link propagation delay. The end-to-end delay is
subtracted from the accumulated link propagation delay to obtain the
accumulated residence delay. A simple method is that the accumulated
residence delay is shared equally by each node along the path to
obtain the planning residence time of each node. Note that the link
propagation delay in reality may be not always fixed, e.g, due to the
affection of temperature, we assume that the tool for detecting the
link propagation delay can sense the changes beyond the preset
threshold and trigger the recalculation of the deterministic path.
There are many ways to indicate the planned residence time of the
packet.
* Carried in the packet. The ingress PE node, when encapsulating
the deterministic service flow, can explicitly insert the planned
residence time into the packet according to SLA. The transit
node, after receiving the packet, can directly obtain the planned
residence time from the packet. Generally, only a single planned
residence time needs to be carried in the packet, which is
applicable to all nodes along the path; Or insert a stack composed
of multiple deadlines, one for each node.
[I-D.peng-6man-deadline-option] defined a method to carry the
planned residence time in the IPv6 packets.
* Included in the FIB entry. Each node in the network can maintain
the deterministic FIB entry. After the packet hits the
deterministic FIB entry, the planned residence time is obtained
from the forwarding information contained in the FIB entry.
* Included in the policy entry. Configure local policies on each
node in the network, and then set the corresponding planned
residence time according to the matched specific characteristics
of the packet, such as 5-tuple. An implementation should support
the policy to forcibly override the planned residence time
obtained by other methods.
2.2. Delay Levels Provided by the Network
The network may provide multiple delay levels on the outgoing port,
each with its own delay resource pool. For example, some typical
delay levels may be 10us, 20us, 30us, etc.
In theory, any additional delay level can be added dynamically, as
long as the buffer on the forwarding side allows.
Peng, et al. Expires 9 January 2024 [Page 6]
Internet-Draft Deadline Routing July 2023
The quantification of delay resource pool for each delay level is
actually based on the schedulability conditions of EDF. This
document introduces two types of resources per delay level:
* Burst: represents the amount of bits bound that a delay level
provided.
* Bandwidth: represents the amount of bandwidth bound that a delay
level provided.
For more information on the construction of resource pools, please
refer to Section 3.2 and Section 4.2.
3. Sorted Queue
[PIFO] defined the push-in first-out queue (PIFO), which is a
priority queue that maintains the scheduling order or time. A PIFO
allows elements to be pushed into an arbitrary position based on an
element's rank (the scheduling order or time), but always dequeues
elements from the head.
3.1. Buffer Size Design
If flows are rate-controlled (i.e., reshaping is done inside the
network), the maximum depth of PIFO should be the total amount of
burst resource of all delay levels. Otherwise, more buffer is
necessary to store the accumulated bursts. Please refer to
Section 15 for more considerations.
3.2. Schedulability Condition for PIFO
[RPQ] has given the schedulability condition for classic EDF that
based on any type of sorted queue.
Suppose for any delay level d_i, the corresponding accumulated
constraint function is A_i(t). Let d_i < d_(i+1), then the
schedulability condition is:
sum{A_i(t-d_i) for all i} <= C*t
where C is service rate of the EDF scheduler.
It should be noted that for a class i, its planned residence time d_i
is actually contributed by its own flows and all the classes with
lower planned residence time. Thus, based on the above
schedulability conditions, and knowing the traffic constraint
functions of all classes, we can then give the guidance to allocate
appropriate (i.e., not arbitrary) planned residence time for each
Peng, et al. Expires 9 January 2024 [Page 7]
Internet-Draft Deadline Routing July 2023
class. In brief, we can choose the traffic arrival constraint
function according to the preset planned residence time, or we can
choose the planned residence time according to the preset traffic
arrival constraint function.
The test of schedulability conditions needs to be based on the whole
network view. When we need to add new traffic to the network, we
need to consider which nodes the related path will pass through, and
then check in turn whether these nodes still meet the schedulability
conditions after adding new traffic.
3.2.1. Conditions for Leaky Bucket Constraint Function
Assume that we want to support n types of planned residence delay
levels (d_1, d_2,..., d_n) in the network, and the traffic arrival
constraint function of each delay level d_i is the leaky bucket
arrival curve A_i(t) = b_i + r_i * t. The above condition can be
expressed as:
b_1 <= C*d_1 - M
b_1 + b_2 + r_1*(d_2-d_1) <= C*d_2 - M
b_1 + b_2 + b_3 + r_1*(d_3-d_1) + r_2*(d_3-d_2) <= C*d_3 - M
... ...
sum(b_1+...+b_n) + r_1*(d_n-d_1) + r_2*(d_n-d_2) + ... +
r_n_1*(d_n-d_n_1) <= C*d_n - M
where, C is the service rate of the deadline scheduler, M is the
maximum size of the interference packet.
Note that the preset value of b_i does not depend on r_i, but r_i
generally refers to b_i (and burst interval) for setting. For
example, the preset value of r_i may be small, while the value of b_i
may be large. Such parameter design is more suitable for
transmitting traffic with large service burst interval, large service
burst size, but small bandwidth requirements.
Peng, et al. Expires 9 January 2024 [Page 8]
Internet-Draft Deadline Routing July 2023
An extreme example is that the preset r_i of each level d_i is close
to 0 (this is because the burst interval of the served service is too
large, e.g, one hour or one day), but the preset b_i is close to the
maximum value (e.g, b_1 = C*d_1 - M, note that this also requires
that the depth of the leaky bucket used to regulate the traffic is
large enough), then when the concurrent flow of all delay levels is
scheduled, the time 0~d_1 is all used to send the burst b_1, the time
d_1~d_2 is all used to send the burst b_2, the time d_2~d_3 is all
used to send the burst b_3, and so on.
However, a more common example is that the preset r_i of each level
d_i will divide C roughly equally, and the preset b_i is the maximum
packet size (such as 2000 bytes).
The parameters b_i and r_i of each level d_i constitute the delay
resources of that level of the link. A path can reserve required
burst and bandwidth from delay resources of the specific level, and
the reservation is successful only if the two resources are
successfully reserved at the same time. As long as neither b_i nor
r_i is free, the delay resource of level d_i is exhausted.
The delay resource reservation status of each level is independent.
For example, in the case that the parameter b_1 is determined, if the
required burst of the d_1 service is large, then although only a few
d_1 services can be supported, but if r_1 is very small, the network
can still support more services of other levels at the same time.
4. Rotation Priority Queues
[RPQ] described rotating priority queues, and the priority
granularity of the queue is the same as that of the flows. If the
deadline of the flow is used as priority, it requires a lot of
priority and corresponding queues, with scalability issues.
Therefore, this section provides a deadline queues with count-down
time range whose rotation interval is more refined.
For nodes in the network, some queues with count-down time (also
termed as CT) are introduced and maintained for each outgoing port.
These queues are called deadline queues and participate in priority
based scheduling. Deadline queues have the following
characteristics:
* Each deadline queue has CT (Count-down Time) that is decreased by
TI (rotation interval), and AT (Authorization Time) that is for
sending duration. AT is also the CT difference between two
adjacent queues. Note that TI must be less than or equal to the
AT, with AT = N * TI, where the natural number N >= 1.
Peng, et al. Expires 9 January 2024 [Page 9]
Internet-Draft Deadline Routing July 2023
* When the CT of the queue decreases to 0, the queue has the highest
priority, and prohibit receiving new packets. This prohibition is
necessary, to avoid highest priority packets arriving just at the
end of AT duration from exceeding their deadline. For a deadline
queue whose CT is not reduced to 0, it can receive packets.
* The smaller the CT, the higher the priority. The scheduler always
selects packets from non empty queues with the highest priority
for transmission. If the highest priority queue is empty, it will
schedule the next non empty queue with the next highest priority.
* At the beginning, all deadline queues have different CT values,
i.e., staggered from each other, so that the CT of only one
deadline queue will decrease to 0 at any time. It should be noted
that CT is just the countdown of the head of the queue, and the
countdown of the end of the queue is near CT+AT. So the CT
attribute of a queue is actually a range [CT, CT+AT).
* For a deadline queue whose CT has been reduced to 0, after a new
round of AT, the CT will return to the maximum initial value (also
termed as MAX_CT), allow receiving new packets.
The above AT, TI and MAX_CT value shall be choosed according to the
actual capacity of the link. In fact, each link can independently
use different AT. The general principle is that the larger
bandwidth, the smaller AT. The AT must be designed large enough to
include interference delay caused by a single low priority packet
with maximum size.
The choose of TI should consider the latency granularity of various
service flows, so that CT updated per TI can match the delay
requirements of different services. For example, if the delay
difference of different traffic flows is several microseconds, TI can
be choosed as 1 us. If the delay difference of different traffic
flows is several 10 microseconds, TI can be choosed as 10 us.
A specific example of the deadline queue is depicted in Figure 1.
Peng, et al. Expires 9 January 2024 [Page 10]
Internet-Draft Deadline Routing July 2023
+------------------------------+ +------------------------------+
| Deadline Queue Group: | | Deadline Queue Group: |
| queue-1(CT=60us) ######| | queue-1(CT=59us) ######|
| queue-2(CT=50us) ######| | queue-2(CT=49us) ######|
| queue-3(CT=40us) ######| | queue-3(CT=39us) ######|
| queue-4(CT=30us) ######| | queue-4(CT=29us) ######|
| queue-5(CT=20us) ######| | queue-5(CT=19us) ######|
| queue-6(CT=10us) ######| | queue-6(CT=9us) ######|
| queue-7(CT=0us) ######| | queue-7(CT=-1us) ######|
+------------------------------+ +------------------------------+
+------------------------------+ +------------------------------+
| Non-deadline Queue Group: | | Non-deadline Queue Group: |
| queue-8 ############ | | queue-8 ############ |
| queue-9 ############ | | queue-9 ############ |
| queue-10 ############ | | queue-10 ############ |
| ... ... | | ... ... |
+------------------------------+ +------------------------------+
-o----------------------------------o-------------------------------->
T0 T0+1us time
Figure 1: Example of Deadline Queues
In this example, the AT for deadline queue group is configured to
10us. Queue-1 ~ queue-7 are deadline queues, and other queues are
traditional non-deadline queues. Each deadline queue has its CT
attribute. The MAX_CT is 60us. At the initial time (T0), the CT of
all deadline queues are staggered from each other. For example, the
CT of queue-1 is 60us, the CT of queue-2 is 50uS, and so on.
Suppose the scheduling engine initiates a rotation timer with a time
interval of 1us, i.e., AT = 10 * TI in this case. As shown in the
figure, at T0 + 1us, the CT of queue-1 becomes 59us, the CT of
queue-2 becomes 49us, etc. At this time, the CT of queue-7 can be
maintained as 0 or equal to -1 depending on the implementation.
At T0 + 10us, the CT of queue-7 will return to MAX_CT.
4.1. Buffer Size Design
The service rate of the deadline scheduler, termed as C, can reach
the link rate, but generally only needs to be configured as part of
the link bandwidth, such as 50%.
The buffer size of each deadline queue is AT * C - M, where M is the
maximum size of the packet with low priority. If we divide the time
by AT (such as 10 us) and observe the deadline queue with the lowest
Peng, et al. Expires 9 January 2024 [Page 11]
Internet-Draft Deadline Routing July 2023
priority, such as d_100 (i.e., CT=100 us), then in the first AT, the
traffic flow with priority d_100 (traffic arrival follows the
constraint of A_100(t)) will enter that queue. In the second AT, the
traffic flow with priority of d_90 (traffic arrival follows the
constraint of A_90(t)) will enter the same queue (i.e., CT=90 us),
and so on. It can be seen that the maximum buffer size required for
the queue is sum(A_i(AT)) for all delay level i. Since the stability
condition of the deadline scheduler must meet sum(A_i(t)) < C*t, so
the buffer size of each deadline queue can be set to C*AT.
When deadline queues and latency compensation are used in
combination, a packet that arrives early is penalized and placed in a
queue with a larger CT, it will not cause the queue to overflow,
because the queue is just where it is located. That is, assuming
that the packet does not arrive early but later on time, it will not
be penalized, and will still enter the same queue where the CT
becomes smaller later.
Similarly, when a late arrival packet is rewarded and placed in a
queue with a smaller CT, it will not cause the queue overflow,
because the queue is just where it is located. That is, assuming
that the packet does not arrive late but arrives on time before, it
will not be rewarded, and will still enter the same queue with a
smaller CT which has not been reduced before.
If flows are rate-controlled (i.e., reshaping is done inside the
network), the MAX_CT may be designed as the delay level with largest
delay bound. Otherwise, MAX_CT should be larger than the largest
delay bound to store the accumulated bursts. Please refer to
Section 15 for more considerations.
4.2. Schedulability Condition for Deadline Queues
In this section, we discuss the schedulability condition based on
deadline queues.
Suppose for any delay level d_i, the corresponding accumulated
constraint function is A_i(t). Let d_i < d_(i+1), then the
schedulability condition is:
A_1(t-d_1) + sum{A_i(t+AT-d_i) for all i>=2} <= C*t
where AT is the authorization time of each deadline queue, C is
service rate of the deadline scheduler.
The proof is similar with that in [RPQ], except that the rotation
step is fine-grained by TI. Figure 2 below gives a rough
explanation.
Peng, et al. Expires 9 January 2024 [Page 12]
Internet-Draft Deadline Routing July 2023
^
| Planned Residence Time
| | |
|CT_t+AT+2*TI -> ===== | |
| CT_t+AT+TI -> =====| |
CT_t + - - - - - - - - - - - - >+====+ -\
+AT | | | |
| | | |
| | | > Phycical queue-x
d_p + - - - - - - - - - - - - >| | |
| | | |
CT_t + - - - - - - - - - - - - >+====+ -/
| | |===== <- CT_t-TI
| | | ===== <- CT_t-2*TI
| | |
|
| TI | ... ... | TI | TI | TI | TI | TI | TI |
---+----+-----------+----+----+----+----+----+----+--------->
0 ^ ^ ^ ^ time
| | | |
t-tau' t-ofst t t+tau
(busy period begin) (arrival) (departure)
Figure 2: Deadline Queues Scheduling
Suppose that the observed packet, with planned residence time d_p,
arrives at the scheduler at time t and leaves the scheduler at time
t+tau. It will be inserted to physical queue-x with count-down time
CT_t at the current timer interval TI with starting time t-ofst and
end time t-ofst+TI. According to the above packet queueing rules, we
have CT_t <= d_p < CT_t+AT. Also suppose that t-tau' is the
beginning of the busy period closest to t. Then, we can get the
amount of packets within time interval [t-tau', t+tau] that must be
scheduled before the observed packet. In detailed:
* For all service i with planned residence time d_i meeting CT_t <=
d_i < CT_t+AT, the workload is sum{A_i[t-tau', t]}.
Explanation: since the packets with priority d_i in the range
[CT_t, CT_t+AT) at time t will be sent before the observed
packet, the packets with the same priority d_i before time t
will become more urgent at time t, and must also be sent before
the observed packet.
* For all service i with planned residence time d_i meeting d_i >=
CT_t+AT, the workload is sum{A_i[t-tau', t-ofst+TI-(d_i-CT_t-
AT+TI)]}.
Peng, et al. Expires 9 January 2024 [Page 13]
Internet-Draft Deadline Routing July 2023
Explanation: although the packets with priority d_i larger than
CT_t+AT at time t will be sent after the observed packet, but
the packets with the same priority d_i before time t, at time
t-ofst+TI-(d_i-CT_t-AT+TI), will become more urgent at time t,
and must be sent before the observed packet.
* For all service i with planned residence time d_i meeting d_i <
CT_t, the workload is sum{A_i[t-tau', t+(CT_t-d_i)]}.
Explanation: the packets with priority d_i less than CT_t at
time t will certainly be sent before the observed packet, at a
future time t+(CT_t-d_i) the packets with the same priority d_i
will still be urgent than the observed packet (even the
observed packet also become urgent), and must be sent before
the observed packet.
* Then deduct the traffic that has been sent during the busy period,
i.e., C*(tau+tau').
Let tau as d_p, and remember that CT_t <= d_p, the above workload is
less than
sum{A_i(tau'+CT_t+AT-d_i) for all d_i >= CT_t} +
sum{A_i(tau'+CT_t-d_i) for all d_i < CT_t} - C*(tau'+d_p)
It is further less than
sum{A_i(tau'+d_p+AT-d_i) for all d_i >= d_2} + A_1(tau'+d_p-d_1) -
C*(tau'+d_p)
Then, denote x as tau'+d_p, we have
sum{A_i(x+AT-d_i) for all d_i >= d_2} + A_1(x-d_1) - C*(x)
Let the above workload less than zero, then we get the condition.
Note that the key difference between the above condition and one
based on sorted queue is the AT factor.
Other common considerations are the same as Section 3.2
4.2.1. Conditions for Leaky Bucket Constraint Function
Assume that we want to support n types of planned residence delay
levels (d_1, d_2,..., d_n) in the network, and the traffic arrival
constraint function of each delay level d_i is the leaky bucket
arrival curve A_i(t) = b_i + r_i * t. The above condition can be
expressed as:
Peng, et al. Expires 9 January 2024 [Page 14]
Internet-Draft Deadline Routing July 2023
b_1 <= C*d_1 - M
b_1 + b_2 + r_1*(d_2-d_1) + r_2*AT <= C*d_2 - M
b_1 + b_2 + b_3 + r_1*(d_3-d_1) + r_2*(d_3-d_2) + (r_2+r_3)*AT <=
C*d_3 - M
... ...
sum(b_1+...+b_n) + r_1*(d_n-d_1) + r_2*(d_n-d_2) + ... +
r_n_1*(d_n-d_n_1) + (r_2+...+r_n)*AT <= C*d_n - M
where, C is the service rate of the deadline scheduler, M is the
maximum size of the interference packet.
5. Reshaping
Reshaping per flow inside the network, as described in [RFC2212], is
done at all heterogeneous source branch points and at all source
merge points, to restore (possibly distorted) traffic's shape to
conform to the TSpec. Reshaping entails delaying packets until they
are within conformance of the TSpec.
A network element MUST provide the necessary buffers to ensure that
conforming traffic is not lost at the reshaper. Note that while the
large buffer makes it appear that reshapers add considerable delay,
this is not the case. Given a valid TSpec that accurately describes
the traffic, reshaping will cause little extra actual delay at the
reshaping point (and will not affect the delay bound at all).
Maintaining a dedicated shaping queue per flow can avoid burstiness
cascading between different flows with the same traffic class, but
this approach goes against the design goal of packet multiplexing
networks. [IR-Theory] describes a more concise approach by
maintaining a small number of interleaved regulators (per traffic
class and incoming port), but still maintaining the state of each
flow. With this regulator, packets of multiple flows are processed
in one FIFO queue and only the packet at the head of the queue is
examined against the regulation constraints of its flow. However, as
the number of flows increases, the IR operation may become burdensome
as much as the per-flow reshaping.
For any observed EDF scheduler in the network, when the traffic
arriving from all incoming ports is always reshaped, then these flows
comply with their arrival constraint functions, which is crucial for
the schedulability conditions of EDF scheduling. Based on this, it
can quantify the delay resource pool which is open and reserved for
service flows.
Peng, et al. Expires 9 January 2024 [Page 15]
Internet-Draft Deadline Routing July 2023
6. Latency Compensation
[RFC9320] presents a latency model for DetNet nodes. There are six
type of delays that a packet can experience from hop to hop. The
processing delay (type-4), the regulator delay (type-5) , the
queueing subsystem delay (type-6), and the output delay (type-1)
together contribute to the residence time in the node.
In this document, the residence delay in the node is simplified into
two parts: the first part is to lookup the forwarding table when the
packet is received from the incoming port (or generated by the
control plane) and deliver the packet to the line card where the
outgoing port is located; the second part is to store the packet in
the queue of the outgoing port for transmission. These two parts
contribute to the actual residence time of the packet in the node.
The former can be called forwarding delay (termed as F) and the
latter can be called queueing delay (termed as Q). The forwarding
delay is related to the chip implementation and is generally
constant; The queueing delay is unstable.
6.1. Get Existing Accumulated Planned Residence Time
The existing accumulated planned residence time of the packet refers
to the sum of the planned residence time of all upstream nodes before
the packet is transmitted to the current node. This information
needs to be carried in the packet. Every time the packet passes
through a node, the node accumulates its corresponding planned
residence time to the existing accumulated planned residence time
field in the packet. [I-D.peng-6man-deadline-option] defined a
method to carry existing accumulated planned residence time in the
IPv6 packets.
The setting of "existing accumulated planned residence time" in the
packet needs to be friendly to the chip for reading and writing. For
example, it should be designed as a fixed position in the packet.
The chip may support flexible configuration for that position.
6.2. Get Existing Accumulated Actual Residence Time
The existing accumulated actual residence time of the packet, refers
to the sum of the actual residence time of all upstream nodes before
the packet is transmitted to the current node. This information
needs to be carried in the packet. Every time the packet passes
through a node, the node accumulates its corresponding actual
residence time to the existing accumulated actual residence time
field in the packet. [I-D.peng-6man-deadline-option] defined a
method to carry existing accumulated actual residence time in the
IPv6 packets.
Peng, et al. Expires 9 January 2024 [Page 16]
Internet-Draft Deadline Routing July 2023
The setting of "existing accumulated actual residence time" in the
packet needs to be friendly to the chip for reading and writing. For
example, it should be designed as a fixed position in the packet.
The chip may support flexible configuration for that position.
The current node can carry the receiving and sending time of the
packet in the auxiliary data structure (note that is not packet
itself) of the packet, then the actual residence time of the packet
in the node can be calculated according to these two value.
Although other methods can also be, for example, carrying the
absolute system time of receiving and sending in the packet to let
the downstream node compute the actual residence time indirectly,
that has a low encapsulation efficiency.
6.3. Get Existing Accumulated Residence Time Deviation
The existing accumulated residence time deviation (termed as E)
equals existing accumulated planned residence time minus existing
accumulated actual residence time. This value can be zero, positive,
or negative.
If the existing accumulated planned residence time and the existing
accumulated actual residence time are carried in the packet, it is
not necessary to carry the existing accumulated residence time
deviation. Otherwise, it is necessary. The advantage of the former
is that it can be applied to more scenarios, while the later has less
packaging overhead.
Due to work-conserving behavior of EDF, E may be a very large
positive value.
6.4. Get Allowable Queueing Delay
When a node receives a packet from the upstream node, it can first
get the existing accumulated residence time deviation (E), and then
add it to the planned residence time (D) of the packet at this node
to obtain the adjustment residence value, and then deduct the
forwarding delay (F) of the packet in the node, to obtain the
allowable queueing delay (Q) for that packet.
Q = D + E - F
In detailed, assume that the current node in a deterministic path is
i, all upstream nodes are from 1 to i-1. Let the planned residence
time be D, the actual residence time be R, the forwarding delay
intra-node be F, then the allowable queueing delay (Q) of the packet
on the current node i is calculated as follows:
Peng, et al. Expires 9 January 2024 [Page 17]
Internet-Draft Deadline Routing July 2023
E(i-1) = sum(D(1), ..., D(i-1)) - sum(R(1), ..., R(i-1))
Q(i) = D(i) + E(i-1) - F(i)
6.5. Scheduled by Allowable Queueing Delay
The packet will be sheduled based on its Q, that is, the packet is
scheduled based on latency compensation contributed by E, instead of
only D.
The core stateless latency compensation can achieve the effect of
reshaping per flow. Q can be used to identify ineligibility arrvials
of one delay level and prevent it from interferring with the
scheduling of eligibility arrvials of another delay level.
Firstly, at network entry, all packets (after regulation) of the same
flow will be released to the EDF scheduler one after another at
different time (termed as begin time), but with the same allowable
queueing delay (Q), with initial E = 0. Then, the ideal departure
time of each packet should be its begin time plus Q. If all packets
has the ideal departure time (i.e., the updated E is still 0), then
the arrived traffic faced by the next hop also obey its arrival
constraint function. If all packets of all delay levels released by
all sources have the ideal departure time, all concurrent flows
received by a transit node will comply with their arrival
constraints. However, due to work-conserving behavior of EDF
scheduler, packets may have an advanced departure time (i.e., the
updated E is larger than 0), instead of the ideal one. Therefore,
the arrived traffic faced by the downstream node may violate its
arrival constraint function. In this case, the downstream node may
punish ineligibility arrving packets based on E, i.e. obtain
appropriate Q to restore eligibility arrvials.
Although, lantency compensation has the effect of reshaping, but it
is not equivalent to reshaping. Considering a accumulated bursts
that violates the traffic constraint function and arrives at a node,
if reshaping is used, it will substantially introduce shaping delay
for the ineligibility bursts, which will then enter the queueing
subsystem. While if latency compensation is used, this ineligibility
bursts will only be penalized with a larger Q, but may be immediately
sent if higher priority queues are empty.
Note that the application premise of latency compensation is that a
flow must be based on a fixed explicit path. If multiple packets
from the same flow arrive at the intermediate node along multiple
paths with different lengths, even if these packets are all
eligibility packets, accumulated bursts may still form and cannot
even be punished.
Peng, et al. Expires 9 January 2024 [Page 18]
Internet-Draft Deadline Routing July 2023
6.6. On-time Mode Based on Allowable Queueing Delay
A node needs to identify its role as an transit node or egress node
for a specific packet, to take in-time or on-time behavior.
On the transit nodes, the allowable queueding delay (Q) is only used
as a basis for sorting in the queue, and scheduling is always a work-
serving behavior, i.e., in-time mode.
On the egress node, Q can be further used as a damper factor to hold
packets explicitly, to achieve low jitter.
The Packet Replication, Elimination, and Ordering Functions (PREOF)
require two paths with consistent end-to-end delay. Deadline on-time
mode can be used to schedule packets to arrive at the destination at
a fixed time.
7. Option-1: Reshaping plus Sorted Queue
A receivd packet is put to the PIFO queue according to the time
arrived at scheduler plus Q, where Q = D - F. That is, E is always 0
and not updated.
The planned residence time (D) should be carried in the packet.
8. Option-2: Reshaping plus RPQ
A receivd packet is put to the appropriate deadline queue according
to CT <= Q < CT+AT, where Q = D - F. That is, E is always 0 and not
updated. If the queue choosed by the condition is full, the packet
should be put to the next queue with higher CT value.
The planned residence time (D) should be carried in the packet.
9. Option-3: Latency Compensation plus Sorted Queue
A receivd packet is put to the PIFO queue according to the time
arrived at scheduler plus Q.
The planned residence time (D) and accumulated residence time
deviation (E) should be carried in the packet.
Peng, et al. Expires 9 January 2024 [Page 19]
Internet-Draft Deadline Routing July 2023
9.1. Packet Disorder Considerations
Suppose that two packets, P1, P2, are generated instantaneously from
a specific flow at the source, and the two packets have the same
planned residence time. P1 may face less interference delay than P2
in their journey. When they arrive at an intermediate node in turn,
P2 will have less allowable queueing delay (Q) than P1 to try to stay
close to P1 again. It should be noted that to compary who is ealier
is based on the time arriving at the scheduler plus packet's Q. The
time difference between the arrival of two packets at the scheduler
may not be consistent with the difference between their Q. It is
possible to get an unexpected comparision result.
As shown in Figure 3, P1 and P2 are two back-to-back packets
belonging to the same burst. The arrival time when they are received
on the scheduler is shown in the figure. Suppose that the Q values
of two adjacent packets P1 and P2 are 40us and 39us, and arrive at
the scheduler at time T1 and T2 respectively. P1 will be sorted
based on T1 + 40us, while P2 will be sorted based on T2 + 39us.
Ideally, T2 should be T1 + 1us. However, this may be not the case.
For example, it is possible that T2 = T1 + 0.9us, Q1 = 40, Q2 = 39.1,
but just because the calculation accuracy of Q1 and Q2 is
microseconds, so they are, e.g, with half-adjust, approximately 40 us
and 39 us, respectively. This means that P2 will be sorted before P1
in the PIFO, resulting in disorder.
packets arrived later
packets arrived earlier |
| |
V V
--------+-----------------------------------------------+---------
... ... | .P1.................P2....................... | ... ...
--------+-----------------------------------------------+---------
P1.Q=40us
P2.Q=39us
| |
--------o---------------------o--------------------------------->
T1 T2 (=T1+0.9us)
| ___________________|
| |
v v
PIFO ##############################################################
top
Figure 3: Packets queueing based on Latency Compensation
Peng, et al. Expires 9 January 2024 [Page 20]
Internet-Draft Deadline Routing July 2023
DetNet architecture [RFC8655] provides Packet Ordering Function
(POF), that can be used to solve the above disorder problem caused by
the latency compensation.
Alternatively, if the POF is not enabled, we can also maintain states
for service flows to record the last queueing information to address
this issue.
For example, one ore more OGOs (order guarantee object) are
maintained per delay level and incoming port, on each outgoing port.
An OGO records the rank (i.e., arrival time at the scheduler plus Q)
of the last inserted packet mapped to this OGO.
When a packet arrives at the scheduler, it is first mapped to its
OGO, and get the rank of OGO, and put behind that rank.
Note that in practical situations, two back-to-back packets of the
same flow are generally evenly distributed within the burst interval
after policing, which means that the distance between these two
packets is generally much greater than the calculation accuracy
mentioned above, meaning that the disordered phenomenon will not
really occur.
10. Option-4: Latency Compensation plus RPQ
A receivd packet is put to the appropriate deadline queue according
to CT <= Q < CT+AT. If the queue choosed by the condition is full,
the packet should be put to the next queue with higher CT value.
The planned residence time (D) and accumulated residence time
deviation (E) should be carried in the packet.
Figure 4 depicts an example of packets inserted to the deadline
queues.
Peng, et al. Expires 9 January 2024 [Page 21]
Internet-Draft Deadline Routing July 2023
P2 P1 +------------------------------+
+--------+ +--------+ | Deadline Queue Group: |
| D=20us | | D=30us | | queue-1(CT=55us) ###### |
| E=15us | | E=-8us | +--+ | queue-2(CT=45us) ###### |
+--------+ +--------+ |\/| | queue-3(CT=35us) ###### |
------incoming port-1------> |/\| | queue-4(CT=25us) ###### |
|\/| | queue-5(CT=15us) ###### |
P4 P3 |/\| | queue-6(CT=5us) ###### |
+--------+ +--------+ |\/| | queue-7(CT=0us) ###### |
| | | D=30us | |/\| +------------------------------+
+--------+ | E=-30us| |\/|
+--------+ |/\|
------incoming port-2------> |\/| +------------------------------+
|/\| | Non-deadline Queue Group: |
P6 P5 |\/| | queue-8 ############ |
+--------+ +--------+ |/\| | queue-9 ############ |
| | | D=40us | |\/| | queue-10 ############ |
+--------+ | E=40us | |/\| | ... ... |
+--------+ +--+ +------------------------------+
------incoming port-3------> ---------outgoing port---------->
-o----------------------------------o-------------------------------->
receiving-time base +F time
Figure 4: Time Sensitive Packets Buffered to Deadline Queue
As shown in Figure 4, the node successively receives six packets from
three incoming ports, among which packet 1, 2, 3 and 5 have
corresponding deadline information, while packet 4 and 6 are best-
effort packets. These packets need to be forwarded to the same
outgoing port according to the forwarding table entries. It is
assumed that they arrive at the line card where the outgoing port is
located at almost the same time after the forwarding delay in the
node (F = 5us). At this time, the queue status of the outgoing port
is shown in the figure. Then:
* The allowable queueing delay (Q) of packet 1 is 30 - 8 - 5 = 17us,
and it will be put into the deadline queue-5 (its CT is 15us),
meeting the condition that Q is in the range [15, 25).
* The allowable queueing delay (Q) of packet 2 is 20 + 15 - 5 =
30us, and it will be put into the deadline queue-4 (its CT is
25us), meeting the condition that Q is in the range [25, 35).
* The allowable queueing delay (Q) of packet 3 is 30 - 30 - 5 =
-5us, and it will be modified to AT (10 us) and put into the
deadline queue-6 (its CT is 5us), meeting the condition that Q is
in the range [5, 15).
Peng, et al. Expires 9 January 2024 [Page 22]
Internet-Draft Deadline Routing July 2023
* The allowable queueing delay (Q) of packet 5 in the node is 40 +
40 - 5 = 75us, and it will be modified to the MAX_CT (60 us) and
put into the deadline queue-1 (its CT is 55us), meeting the
condition that Q is in the range [55, 65).
* Packets 4 and 6 will be put into the non-deadline queue in the
traditional way.
According to Section 4.2, An eligibility packet (i.e., E = 0) from a
specific delay level, even at the end of the inserted queue, can
ensure that it does not exceed its deadline, which is the key role of
the AT factor in the condition formula. Now, assuming that a packet
is penalized to a lower priority queue based on its Q, this penalty
will not result in more than expected delay, apart from potential
delay E.
For example, when a packet is inserted queue based on
CT_x <= Q < CT_x + AT
even if it is at the end of the queue, then according to D = Q - E,
i.e., after time E (the penalty time), we have
CT_x - E <= Q - E < CT_x - E + AT
That is
CT_y <= D < CT_y + AT
So, in essence, it is still equivalent to an eligibility packet
entering the corresponding queue based on its delay level, and apply
the schedulability condition.
10.1. Packet Disorder Considerations
Suppose that two packets, P1, P2, are generated instantaneously from
a specific flow at the source, and the two packets have the same
planned residence time. P1 may face less interference delay than P2
in their journey. When they arrive at an intermediate node in turn,
P2 will have less allowable queueing delay (Q) than P1 to try to stay
close to P1 again. It should be noted that to compary who is ealier
is based on queue's CT and packet's Q, according to the above
queueing rule (CT <= Q < CT+AT), and the CT of the queue is not
changed in real-time, but gradually with the decreasing step TI. It
is possible to get an unexpected comparision result.
Peng, et al. Expires 9 January 2024 [Page 23]
Internet-Draft Deadline Routing July 2023
As shown in Figure 5, P1 and P2 are two packets belonging to the same
burst. The arrival time when they are received on the scheduler is
shown in the figure. Suppose that the AT of the deadline queue is
10us, the decreasing step TI is 1us, and the transmission time of
each packet is 0.01us. Also suppose that the Q values of two
adjacent packets P1 and P2 are 40us and 39us respectively, and they
are both received in the window from T0 to T0+1us. P1 will enter
queue-B with CT range [40, 50), while P2 will enter queue-A with CT
range [30, 40) just before the rotation event occurred. This means
that P2 will be scheduled before P1, resulting in disorder.
packets arrived later
packets arrived earlier |
| |
V V
--------+-----------------------------------------------+---------
... ... | .P1.................P2....................... | ... ...
--------+-----------------------------------------------+---------
P1.Q=40us
P2.Q=39us
| | |
--------o---------------------o---------------------o----------->
T0 T0+1us T0+2us time
queue-A.CT[30,40) queue-A.CT[29,39)
queue-B.CT[40,50) queue-B.CT[39,49)
queue-C.CT[50,60) queue-C.CT[49,59)
Figure 5: Packets queueing based on Latency Compensation
DetNet architecture [RFC8655] provides Packet Ordering Function
(POF), that can be used to solve the above disorder problem caused by
the latency compensation.
Alternatively, if the POF is not enabled, we can also maintain states
for service flows to record the last queueing information to address
this issue.
For example, one ore more OGOs (order guarantee object) are
maintained per delay level and incoming port, on each outgoing port.
An OGO records the queueing information which is the queue that all
the packets mapped to this OGO was inserted recently. For
simplicity, a count-down time (CT), which is copied from the recent
inserted deadline queue, can be recorded in OGO. Note that the CT of
OGO needs to decrease synchronously with that of other deadline
queues, with the same decreasing step TI. If the CT of OGO decreases
to 0, it will remain at 0.
Peng, et al. Expires 9 January 2024 [Page 24]
Internet-Draft Deadline Routing July 2023
When a packet arrives at the deadline scheduler at the outgoing port
, it is first mapped to its OGO, and get the CT of OGO, termed as
OGO.CT. Then, according to the above queueing rule (CT <= Q <
CT+AT), get the CT of a preliminarily selected queue, termed as
preliminary CT.
* Let Q' is MAX{OGO.CT, preliminary CT}, and put the packet in the
target queue according to CT <= Q' < CT+AT
* Update the value of OGO.CT to the CT of target queue.
Note that in practical situations, two back-to-back packets of the
same flow are generally evenly distributed within the burst interval
after policing, which means that the distance between these two
packets is generally much greater than the calculation accuracy
mentioned above, meaning that the disordered phenomenon will not
really occur..
11. Resource Reseravtion
Generally, a path may carry multiple service flows with different
delay levels. For a certain delay level d_i, the path will reserve
some resources from the delay resource pool of the link. The delay
resource pool here, as leaky bucket constraint function shown in
Section 3.2.1 or Section 4.2.1, is a set of preset parameters that
meet the schedulability conditions. For example, the level d_1 has a
burst upper limit of b_1 and a bandwidth upper limit of r_1. A path
j may allocate partial resources (b_i_j, and r_i_j) from the resource
quota (b_i, and r_i) of the link's delay level d_i. A service flow k
that carried in path j, may use resources (b_i_j_k, and r_i_j_k)
according to its T_SPEC. It can be seen that the values of b_i_j and
r_i_j determine the scale of the number of paths that can be
supported, while the values of b_i_j_k and r_i_j_k determine the
scale of the number of services that can be supported. The following
expression exists.
* b_i_j >= sum(b_i_j_k), for all service k over the path j.
* r_i_j >= sum(r_i_j_k), for all service k over the path j.
* b_i >= sum(b_i_j), for all path j through the specific link.
* r_i >= sum(r_i_j), for all path j through the specific link.
Peng, et al. Expires 9 January 2024 [Page 25]
Internet-Draft Deadline Routing July 2023
11.1. Delay Resource Definition
The delay resources of a link can be represented as the corresponding
burst and bandwidth resources for each delay level. Basically, what
delay levels (e.g, 10us, 20us, 30us, etc) are supported by a link
should be included in the link capability.
Figure 6 shows the delay resource model of the link. The resource
information of each delay level includes the following attributes:
* Delay Bound: Refers to the delay bound intra node corresponding to
this delay level. It is a pre-configuration value.
* Maximum Reservable Bursts: Refers to the maximum amount of bit
quota corresponding to this delay level. It is a pre-
configuration value based on the schedulability condition.
* Unreserved Bursts: Refers to the amount of bits reservable (i.e.,
free amount) corresponding to this delay level.
* Maximum Reservable Bandwidth: Refers to the maximum amount
bandwidth corresponding to this delay level. It is a pre-
configuration value based on the schedulability condition.
* Unreserved Bandwidth: Refers to the amount of bandwidth reservable
(i.e., free amount) corresponding to this delay level.
Peng, et al. Expires 9 January 2024 [Page 26]
Internet-Draft Deadline Routing July 2023
d_n +----------------------------------------+
| Maximum Reservable Bursts: MRBu_n |
| Unreserved Bursts: UBu_n |
| Maximum Reservable Bandwidth: MRB_n |
| Unreserved Bandwidth: UB_n |
+----------------------------------------+
... ... ...
... ... ...
d_2 +----------------------------------------+
| Maximum Reservable Bursts: MRBu_2 |
| Unreserved Bursts: UBu_2 |
| Maximum Reservable Bandwidth: MRB_2 |
| Unreserved Bandwidth: UB_2 |
+----------------------------------------+
d_1 +----------------------------------------+
| Maximum Reservable Bursts: MRBu_1 |
| Unreserved Bursts: UBu_1 |
| Maximum Reservable Bandwidth: MRB_1 |
| Unreserved Bandwidth: UB_1 |
+----------------------------------------+
----------------------------------------------------------->
Delay Resource of the Link
Figure 6
The IGP/BGP extensions to advertise the link's capability and
timeslot resource is defined in
[I-D.peng-lsr-deterministic-traffic-engineering].
11.2. Traffic Engineering Path Calculation
A candidate path may be selected according to the end-to-end delay
requirement of the flow. Subtract the accumulated link propagation
delay from the end-to-end delay requirement, and then divide it by
the number of hops to obtain the average planned residence time (D)
for each node. Select the propriate delay level d_i (d_i <= D)
closest to the average planned residence time (D), and then reserve
resources from delay level d_i on each hop.
Note that it is D, not d_i, carried in the forwarding packets.
Peng, et al. Expires 9 January 2024 [Page 27]
Internet-Draft Deadline Routing July 2023
12. Admission Control on the Ingress
On the ingress PE node, traffic regulation must be performed on the
incoming port, so that the service traffic does not exceed its
T-SPEC. This kind of regulation is usually the shaping using leaky
bucket combined with the incoming queue that receives service
traffic. A service may generate discrete multiple bursts within its
periodic service burst interval.
According to [RFC9016], the values of Burst Interval,
MaxPacketsPerInterval, MaxPayloadSize of the service flow will be
written in the SLA between the customer and the network provider, and
the network entry node will set the corresponding bucket depth
according to MaxPayloadSize to forcibly delay the excess bursts. The
entry node also sets the corresponding bucket rate according to
arrival rate that can be calculated.
The leaky bucket shaping will essentially make all the bursts within
the service burst interval evenly distributed within the service
burst interval, which may be inconsistent with the original arrival
curve of the service flow. Therefore, some bursts within the service
burst interval may face more shaping delay. For example, on the head
of the service burst interval, it contains two discrete bursts with
the same size, but the bandwidth reserved by the service is very
small (i.e., total burst size/burst interval). Assuming that the
bucket depth is the size of a single burst, the shaping delay faced
by the second burst is approximately half of the service burst
interval.
Although the shaped curve and the original arrival curve can be as
consistent as possible by increasing the bucket depth, to minimize
the shaping delay of each burst, but this means that the service will
occupy more burst resources, and reduce the service scale that the
network can support according to the schedulability conditions.
Unless, customers are willing to spend more money to buy a larger
burst.
On the entry node, for the burst that faces the shaping delay, its
shaping delay cannot be included in the latency compensation formula,
otherwise, it will make that burst catch up with the previous burst,
resulting in damage to the shaping result and violation of the
arrival constraint function.
Peng, et al. Expires 9 January 2024 [Page 28]
Internet-Draft Deadline Routing July 2023
Then, the regulated traffic arrives at the deadline scheduler on the
outgoing port. Since the traffic of each delay level meets the leaky
bucket arrival constraint function and the parameters of the shaping
curve do not exceed the limits of the parameters provided by the
schedulability conditions, the traffic can be successfully scheduled
.
Note that the flow sent from the deadine scheduler of the headend to
the next hop still follows the arrival constraint function of the
path after reshaping or latency compensation on the next hop. Then
on the next hop, when concurrent flows received from multiple paths
are aggregated to the same outgoing port for transmission, within any
d_1 duration, the aggregated d_1 traffic will not exceed the burst
resources of delay level d_1 reserved by these paths on the outgoing
port, and each aggregated d_i traffic will not exceed the bandwidth
resources of delay level d_i reserved by these paths on the outgoing
port; Similarly, within any d_2 duration, the aggregated d_2 traffic
will not exceed the burst resources of level d_2 reserved by these
paths on the outgoing port, and each aggregated d_i traffic will not
exceed the bandwidth resources of level d_i reserved by these paths
on the outoging port, and so on.
Figure 7 depicts an example of deadline based traffic regulated and
scheduled on the ingress PE node in the case of option-4. In the
figure, the shaping delay caused by the previous burst is termed as
S#, and forwarding delay termed as F.
Peng, et al. Expires 9 January 2024 [Page 29]
Internet-Draft Deadline Routing July 2023
1st burst
|
received v
+-+ +-+ +----+ +-+ +--+ +------+
|1| |2| | 3 | |4| |5 | | 6 | <= burst sequence
+-+ +-+ +----+ +-+ +--+ +------+
| | | | | |
~+0 ~+S1 ~+0 ~+S3~+S4 ~+0
~+F ~+F ~+F ~+F ~+F ~+F
| | | | | |
UNI v v v v v v
ingr-PE -+--------+--------+--------+--------+--------+--------+---->
NNI | Auth | Auth | Auth | Auth | Auth | Auth | time
| time | time | time | time | time | time |
1,2 in 3 in 4 in 5 in 6 in
Queue-A Queue-B Queue-C Queue-D Queue-E
(CT<=Q) (CT<=Q) (CT<=Q) (CT<=Q) (CT<=Q)
| | | | |
~+Q ~+Q ~+Q ~+Q ~+Q
| | | | |
sending v v v v v
+-+ +-+ +----+ +-+ +--+ +------+
|1| |2| | 3 | |4| |5 | | 6 |
+-+ +-+ +----+ +-+ +--+ +------+
Figure 7: Deadline Based Packets Orchestrating
There are 6 bursts received from the client. The burst-2, 4, 5 has
regulation delay S1, S3, S4 that caused by previous burst
respectively. While burst-1, 3, 6 has zero regulation delay because
the number of tokens is sufficient. The regulation makes 6 bursts
roughly distributed within the service burst interval.
Suppose that each burst passes through the same intra-node forwarding
delay F, and when they arrive at the deadline scheduler in turn. In
the case of latency compensation plus RPQ, they will have the same
allowable queueing delay (Q), regardless of whether they have
experienced shaping delay before. When the packets of burst-1, 2
arrive at the scheduler, according to CT <= Q < CT+AT, they will be
placed in Queue-A with matched CT and waiting to be sent. Similarly,
when the packets of burst-3/4/5/6 arrive at the scheduler, they will
be placed in Queue-B/C/D/E respectively and waiting to be sent.
Peng, et al. Expires 9 January 2024 [Page 30]
Internet-Draft Deadline Routing July 2023
13. Overprovision Analysis
For each delay level d_i, the delay resource of the specific link is
(b_i, r_i). A path j may allocate partial resources (b_i_j, r_i_j)
from the resource pool (b_i, r_i). In order to support more d_i
services in the network, it is necessary to set larger b_i and r_i.
However, as mentioned earlier, the values of b_i and r_i are set
according to schedulability conditions and cannot be modified at
will. Therefore, the meaningful analysis is the service scale that
the network can support under the premise of determined b_i and r_i.
For bandwidth resource reservation case, the upper limit of the total
bandwidth that can be reserved for all aggregated services of delay
level d_i is r_i, which is the same as the behavior of traditional
bandwidth resource reservation. There is no special requirement for
the measurement interval of calculating bandwidth value.
For the burst resource reservation case, the upper limit of the total
burst that can be reserved for all aggregated services of delay level
d_i is b_i. If the burst of each service of level d_i is b_k, then
the number service can be supported is b_i/b_k, which is the worst
case considering the concurrent arrival of these service flows.
However, the burst resource reservation is independent of bandwidth
resource, i.e., it does not take the calculation result of b_k/d_i to
get an overprovision bandwidth and then to affect the reservable
bandwidth resources.
14. Compatibility Considerations
Deadline is suitable for end-to-end and interconnection between
different networks. A large-scale network may span multiple
networks, and one of the goals of DetNet is to connect each network
domain to provide end-to-end deterministic delay service. The
adoption techniques and capabilities of each network are different,
and the corresponding topology models are either piecewise or nested.
For a particular path, if only some nodes in the path upgrade support
the deadline mechanism defined in this document, the end-to-end
deterministic delay/jitter target will only be partially achieved.
Those legacy devices may adopt the existing priority based scheduling
mechanism, and ignore the possible deadline information carried in
the packet, thus the intra node delay produced by them cannot be
perceived by the adjacent upgraded node. The more upgraded nodes
included in the path, the closer to the delay/jitter target.
Although, the legacy devices may not support the data plane mechanism
described in this document, but they can be freely programmed (such
as P4 language) to measure and insert the deadline information into
packets, in this case the delay/jitter target may be achieved.
Peng, et al. Expires 9 January 2024 [Page 31]
Internet-Draft Deadline Routing July 2023
Only a few key nodes are upgraded to support deadline mechanism,
which is low-cost, but can meet a service with relatively loose time
sensitive. Figure 8 shows an example of upgrading only several
network border nodes. In the figure, only R1, R2, R3 and R4 are
upgraded to support deadline mechanism. A deterministic path across
domain 1, 2, and 3 is established, which contains nodes R1, R2, R3,
and R4, as well as explicit nodes in each domain. Domain 1, 2 and 3
use the traditional strict priority based forwarding mechanism. The
encoding of the packet sent by R1 includes the planned residence time
and the accumulated residence time deviation. Especially, DS filed
in IP header ([RFC2474]) are also set to appropriate values. The
basic principle of setting is that the less the planned residence
time, the higher the priority. In order to avoid the interference of
non deterministic flow to deterministic flow, the priority of
deterministic flow should be set as high as possible.
The delay analysis based on strict priority in each domain can be
found in [SP-LATENCY], which gives the formula to evaluate the worst-
case delay of each hop during the resource reservation procedure.
The worst-case delay per hop depends on the number of hops and the
burst size of interference flows that may be faced on each hop.
[EF-FIFO] also shows that, for FIFO packet scheduling be used to
support the EF (expedited forwarding) per-hop behavior (PHB), if the
network utilization level alpha < l/(H-l), the worst-case delay bound
is inversely proportional to 1-alpha*(H-1), where H is the number of
hops in the longest path of the network.
Although the EDF scheduling, the SP scheduling and EF FIFO scheduling
are all work-conserving, the EDF scheduling can further distinguish
between emergency and non emergency according to deadline information
other than traffic class. Therefore, when analyzing the latency of
EDF scheduling, the latency is not evaluated just according to the
order in which the packets arrive at the scheduler, but also
according to the deadline of the packets. An intuitive phenomenon is
that if a packet unfortunately faces more interference delays at the
upstream nodes, it will become more urgent at the downstream node,
and will not always be unfortunate. This operation of dynamically
modifying the key fields, e.g, the existing acumulated residence time
deviation (E), of the packet can avoid always overestimating worst-
case latency on all hops. According to schedulability condition, the
worst-case latancy per hop is d_i.
When the border node (e.g, R2) receives the deterministic traffic, it
will obtain its rank according to the existing accumulated residence
time deviation information carried in the packet, and always sent as
soon as possible. For a specific deterministic flow, if it
experiences too much latency in the SP domain (due to unreasonable
setting of DS field and the inability to distinguish between
Peng, et al. Expires 9 January 2024 [Page 32]
Internet-Draft Deadline Routing July 2023
deterministic and non deterministic flows), even if the boundary node
accelerates the transmission, it may not be able to achieve the
target of low E2E latency. If the traffic experiences less latency
within the SP domain, on-time mode may work on the egress node to
achieve the end-to-end jitter target.
_____ ___ _____ ___ _____ ___
/ \/ \___ / \/ \___ / \/ \___
/ \ / \ / \
+--+ +--+ +--+ +--+
|R1| Strict Priority |R2| Strict Priority |R3| Strict Priority |R4|
+--+ domian 1 +--+ domian 2 +--+ domian 3 +--+
\____ __/ \____ __/ \____ __/
\_______/ \_______/ \_______/
Figure 8: Example of partial upgrade
15. Deployment Considerations
According to the above schedulability conditions, the planned
residence time (e.g, d_i) that can be provided in the network is
related to the entire deployed service flows. Each delay level d_i
has independent delay resources, and the smaller d_i, the more
valuable it is. The operator needs to match the corresponding d_i
for each service.
When option-3 is choosed, PIFO needs to be designed with a large
depth to store accumulated bursts. Similarly, when option-4 is
choosed, more deadline queues are needed to store accumulated bursts.
The accumulated bursts on a intermediate node consists of multiple
rounds of burst interval flows, for example, the flow generated by
the source within the first round of burst interval (always
experiencing the worst case delay along the path) is caught up by the
flow generated within the second round of burst interval (always
experiencing the best case delay along the path). For delay level
d_i, the worst case delay is d_i, the best case delay is l/R, where l
is the smallest packet size of the flow, R is the port rate. For
simplicity to get the estimate size of accumulated bursts, here we
just take the best case delay as 0. Drawing on the method provided
in [SP-LATENCY], the accumulated bursts of d_i is:
ACC_BUR_i = ((d_i * h) / burst_interval) * b_i
Peng, et al. Expires 9 January 2024 [Page 33]
Internet-Draft Deadline Routing July 2023
For example, d_i is 10 us, burst_interval is 250 us, this means that
within the 25th hop, there will only be one b_10 burst in the queue.
If it exceeds 25 hops and is within 50 hops, there may be two b_10
burst simutaneously in the queue.
The accumulated bursts of other delay levels can be similarly
estimated. Operators need to evaluate the required buffer size based
on network hops and the supported delay levels.
16. Evaluations
This section gives the evaluation results of the Deadline mechanism
based on the requirements that is defined in
[I-D.ietf-detnet-scaling-requirements].
Peng, et al. Expires 9 January 2024 [Page 34]
Internet-Draft Deadline Routing July 2023
+======================+============+===============================+
| requiremens | Evaluation | Notes |
+======================+============+===============================+
| 3.1 Tolerate Time | Partial | No time synchronization needed|
| Asynchrony | | , but need frequency sync. |
+----------------------+------------+-------------------------------+
| 3.2 Support Large | | The eligibility arrival of |
| Single-hop | Yes | flows is independent with the |
| Propagation | | link propagation delay. |
| Latency | | |
+----------------------+------------+-------------------------------+
| 3.3 Accommodate the | | The higher service rate, the |
| Higher Link | Partial | more buffer needed for each |
| Speed | | delay level. |
+----------------------+------------+-------------------------------+
| 3.4(1) Be Scalable | | Multiple delay levels, each |
| to a Large | Partial | with limited delay resources, |
| Number of Flows | | can support lots of flows. |
+----------------------+------------+-------------------------------+
| 3.4(2) Tolerate High | | The unused bandwidth of the |
| Utilization | Yes | high delay level can be used |
| | | by the low levels or BE flows.|
+----------------------+------------+-------------------------------+
| 3.5 Prevent Flow | | Flows are permitted based on |
| Fluctuation from | Yes | the resources reservation of |
| Disrupting | | delay levels, and isolated |
| Service | | from each other. |
+----------------------+------------+-------------------------------+
| 3.6 Tolerate Failures| | Independent of queueing |
| of Links or Nodes| N/A | mechanism. |
| and Topology | | |
| Changes | | |
+----------------------+------------+-------------------------------+
| 3.7 Be scalable to a | | More buffer may be needed when|
| Large Number of | Partial | the latency compensation |
| Hops with Complex| | option is used. |
| Topology | | |
+----------------------+------------+-------------------------------+
| 3.8 Support Multi- | | Independent of queueing |
| Mechanisms in | N/A | mechanism. |
| Single Domain and| | |
| Multi-Domains | | |
+----------------------+------------+-------------------------------+
Figure 9
Peng, et al. Expires 9 January 2024 [Page 35]
Internet-Draft Deadline Routing July 2023
17. IANA Considerations
There is no IANA requestion for this document.
18. Security Considerations
TBD
19. Acknowledgements
TBD
20. References
20.1. Normative References
[I-D.ietf-detnet-scaling-requirements]
Liu, P., Li, Y., Eckert, T. T., Xiong, Q., Ryoo, J.,
zhushiyin, and X. Geng, "Requirements for Scaling
Deterministic Networks", Work in Progress, Internet-Draft,
draft-ietf-detnet-scaling-requirements-02, 24 May 2023,
<https://datatracker.ietf.org/doc/html/draft-ietf-detnet-
scaling-requirements-02>.
[I-D.peng-6man-deadline-option]
Peng, S., Tan, B., and P. Liu, "Deadline Option", Work in
Progress, Internet-Draft, draft-peng-6man-deadline-option-
01, 11 July 2022, <https://datatracker.ietf.org/doc/html/
draft-peng-6man-deadline-option-01>.
[I-D.peng-lsr-deterministic-traffic-engineering]
Peng, S., "IGP Extensions for Deterministic Traffic
Engineering", Work in Progress, Internet-Draft, draft-
peng-lsr-deterministic-traffic-engineering-00, 22 May
2023, <https://datatracker.ietf.org/doc/html/draft-peng-
lsr-deterministic-traffic-engineering-00>.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997,
<https://www.rfc-editor.org/info/rfc2119>.
[RFC2212] Shenker, S., Partridge, C., and R. Guerin, "Specification
of Guaranteed Quality of Service", RFC 2212,
DOI 10.17487/RFC2212, September 1997,
<https://www.rfc-editor.org/info/rfc2212>.
Peng, et al. Expires 9 January 2024 [Page 36]
Internet-Draft Deadline Routing July 2023
[RFC2474] Nichols, K., Blake, S., Baker, F., and D. Black,
"Definition of the Differentiated Services Field (DS
Field) in the IPv4 and IPv6 Headers", RFC 2474,
DOI 10.17487/RFC2474, December 1998,
<https://www.rfc-editor.org/info/rfc2474>.
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
May 2017, <https://www.rfc-editor.org/info/rfc8174>.
[RFC8655] Finn, N., Thubert, P., Varga, B., and J. Farkas,
"Deterministic Networking Architecture", RFC 8655,
DOI 10.17487/RFC8655, October 2019,
<https://www.rfc-editor.org/info/rfc8655>.
[RFC9016] Varga, B., Farkas, J., Cummings, R., Jiang, Y., and D.
Fedyk, "Flow and Service Information Model for
Deterministic Networking (DetNet)", RFC 9016,
DOI 10.17487/RFC9016, March 2021,
<https://www.rfc-editor.org/info/rfc9016>.
[RFC9320] Finn, N., Le Boudec, J.-Y., Mohammadpour, E., Zhang, J.,
and B. Varga, "Deterministic Networking (DetNet) Bounded
Latency", RFC 9320, DOI 10.17487/RFC9320, November 2022,
<https://www.rfc-editor.org/info/rfc9320>.
20.2. Informative References
[EDF-algorithm]
"A framework for achieving inter-application isolation in
multiprogrammed, hard real-time environments", 1996,
<https://ieeexplore.ieee.org/document/896011>.
[EF-FIFO] "Fundamental Trade-Offs in Aggregate Packet Scheduling",
2001, <https://ieeexplore.ieee.org/document/992892>.
[IR-Theory]
"A Theory of Traffic Regulators for Deterministic Networks
with Application to Interleaved Regulators", 2018,
<https://ieeexplore.ieee.org/document/8519761>.
[Net-Calculus]
"Network Calculus: A Theory of Deterministic Queuing
Systems for the Internet", 2001,
<https://leboudec.github.io/netcal/latex/netCalBook.pdf>.
[PIFO] "Programmable Packet Scheduling at Line Rate", 2016,
<https://dl.acm.org/doi/pdf/10.1145/2934872.2934899>.
Peng, et al. Expires 9 January 2024 [Page 37]
Internet-Draft Deadline Routing July 2023
[RPQ] "Exact Admission Control for Networks with a Bounded Delay
Service", 1996,
<https://ieeexplore.ieee.org/document/556345/>.
[SP-LATENCY]
"Guaranteed Latency with SP", 2020,
<https://ieeexplore.ieee.org/document/9249224>.
[UBS] "Urgency-Based Scheduler for Time-Sensitive Switched
Ethernet Networks", 2016,
<https://ieeexplore.ieee.org/abstract/document/7557870>.
Authors' Addresses
Shaofu Peng
ZTE Corporation
China
Email: peng.shaofu@zte.com.cn
Zongpeng Du
China Mobile
China
Email: duzongpeng@foxmail.com
Kashinath Basu
Oxford Brookes University
United Kingdom
Email: kbasu@brookes.ac.uk
Zuopin Cheng
New H3C Technologies
China
Email: czp@h3c.com
Dong Yang
Beijing Jiaotong University
China
Email: dyang@bjtu.edu.cn
Peng, et al. Expires 9 January 2024 [Page 38]