Network                                                     Shaofu. Peng
Internet-Draft                                                  Bin. Tan
Intended status: Standards Track                         ZTE Corporation
Expires: 25 April 2023                                         Peng. Liu
                                                            China Mobile
                                                         22 October 2022


                Deadline Based Deterministic Forwarding
             draft-peng-detnet-deadline-based-forwarding-03

Abstract

   This document describes a deterministic forwarding mechanism based on
   deadline.  The mechanism enhances strict priority scheduling
   algorithm with dynamically adjusting the priority of the queue
   according to its deadline attribute.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on 25 April 2023.

Copyright Notice

   Copyright (c) 2022 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents (https://trustee.ietf.org/
   license-info) in effect on the date of publication of this document.
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.  Code Components
   extracted from this document must include Revised BSD License text as
   described in Section 4.e of the Trust Legal Provisions and are
   provided without warranty as described in the Revised BSD License.



Peng, et al.              Expires 25 April 2023                 [Page 1]


Internet-Draft              Deadline Routing                October 2022


Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
     1.1.  Requirements Language . . . . . . . . . . . . . . . . . .   3
   2.  Deadline Queue  . . . . . . . . . . . . . . . . . . . . . . .   3
   3.  Get Deadline Information of Packets . . . . . . . . . . . . .   7
     3.1.  Get Planned Residence Time  . . . . . . . . . . . . . . .   7
     3.2.  Get Existing Accumulated Planned Residence Time . . . . .   8
     3.3.  Get Existing Accumulated Actual Residence Time  . . . . .   9
     3.4.  Get Existing Accumulated Residence Time Deviation . . . .   9
   4.  Put Packets into the Deadline Queues  . . . . . . . . . . . .  10
     4.1.  Alternate Queue Allocation Ruels  . . . . . . . . . . . .  13
   5.  Buffer Size Design  . . . . . . . . . . . . . . . . . . . . .  13
   6.  Traffic Regulation and Orchestrating  . . . . . . . . . . . .  13
     6.1.  Traffic Re-shaping on Intermediate Node . . . . . . . . .  15
   7.  Delay Compensation Considerations . . . . . . . . . . . . . .  17
   8.  Compatibility Considerations  . . . . . . . . . . . . . . . .  19
   9.  Deployment Considerations . . . . . . . . . . . . . . . . . .  21
   10. Benefits  . . . . . . . . . . . . . . . . . . . . . . . . . .  21
   11. IANA Considerations . . . . . . . . . . . . . . . . . . . . .  22
   12. Security Considerations . . . . . . . . . . . . . . . . . . .  22
   13. Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  22
   14. References  . . . . . . . . . . . . . . . . . . . . . . . . .  22
     14.1.  Normative References . . . . . . . . . . . . . . . . . .  22
     14.2.  Informative References . . . . . . . . . . . . . . . . .  23
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  23

1.  Introduction

   [RFC8655] describes the architecture of deterministic network and
   defines the QoS goals of deterministic forwarding: Minimum and
   maximum end-to-end latency from source to destination, timely
   delivery, and bounded jitter (packet delay variation); packet loss
   ratio under various assumptions as to the operational states of the
   nodes and links; an upper bound on out-of-order packet delivery.  In
   order to achieve these goals, deterministic networks use resource
   reservation, explicit routing, service protection and other means.
   Resource reservation refers to the occupation of resources by service
   traffic, exclusive or shared in a certain proportion, such as
   dedicated physical link, link bandwidth, queue resources, etc;
   Explicit routing means that the transmission path of traffic flow in
   the network needs to be selected in advance to ensure the stability
   of the route and does not change with the real-time change of network
   topology, and based on this, the upper bound of end-to-end delay and
   delay jitter can be accurately calculated; Service protection refers
   to sending multiple service flows along multiple disjoint paths at
   the same time to reduce the packet loss rate.  In general, a
   deterministic path is a strictly explicit path calculated by a



Peng, et al.              Expires 25 April 2023                 [Page 2]


Internet-Draft              Deadline Routing                October 2022


   centralized controller, and resources are reserved on the nodes along
   the path to meet the SLA requirements of deterministic services.

   [I-D.stein-srtsn] describes that the controller calculates the local
   deadline time of each node for the traffic to be transmitted in
   advance, which is a absolute system time, forms a stack of these
   local deadline time, and then carries them in the forwarded data
   packets.  Each node forwards the packets according to its own local
   deadline.  [I-D.stein-srtsn] suggests that FIFO queue can not be used
   to realize this function, because the packets stored in the queue are
   always first in first out and a special data structure is recommoned.
   The packets in this data structure will be automatically sorted with
   the order from emergency to non emergency according to the deadline
   of the packets.  However, it may be difficult to implement this
   structure in hardware, and especially for a large network it may be
   challenge to synchronize time.

   Considering that the link propagation delay is generally a fixed
   value, and we focus on the residence time of the packets inside the
   node, an alternate approach is to make the deadline eliminate the
   interference of link propagation delay and avoids relying on time
   synchronization between nodes.

   This document desrbies an alternate packets scheduling scheme based
   on EDF (earliest-deadline-firs) and used for wide area network.  It
   suggests to only use a single deadline time to control the packets
   scheduling of all nodes along the path.  The single deadline time is
   an offset time, which is based on the time when the packet enters the
   node and represents the maximum time allowed for the packet to stay
   inside the node.  However, if each node has obvious differences in
   the capability of packets forwarding and scheduling, more offset-time
   may be needed.

1.1.  Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described in BCP
   14 [RFC2119] [RFC8174] when, and only when, they appear in all
   capitals, as shown here.

2.  Deadline Queue

   For nodes in the network, some queues with count-down time (also
   termed as CT) are introduced and maintained for each outgoing port.
   These queues are called deadline queues and participate in priority
   based scheduling.  Deadline queues have the following
   characteristics:



Peng, et al.              Expires 25 April 2023                 [Page 3]


Internet-Draft              Deadline Routing                October 2022


   *  The CT of each deadline queue will decrease with the passage of
      time.  When it decreases to 0, the scheduling priority of the
      queue will be set to the highest, and the scheduling opportunity
      will be obtained immediately (note that there may be interference
      delay caused by a packet being sent by a low priority queue).  It
      will prohibit receiving new packets, in which the buffered packets
      will be sent to the outgoing port immediately, and the maximum
      duration allowed to send packets is the preset authorization time
      (also termed as AT), e.g, 10us, 20us, etc.  In principle, all
      packets buffered in the queue shall be sent within this
      authorization time.  If the queue is sent to empty and the
      authorization time is still free, other queues with lower priority
      can be scheduled during this authorization time.

   *  The scheduling engine can initiate a cycle timer to decrement the
      CT of all deadline queues, that is, whenever the timer expires,
      the CT values of all queues will be subtracted from the timer
      interval (also termed as TI).  Note that TI must be less than or
      equal to the AT, with AT = N * TI, where the natural number N >=
      1.

   *  For a deadline queue whose CT has been reduced to 0, after a new
      round of authorization time, the CT will return to the maximum
      initial value (also termed as MAX_CT), allow receiving new
      packets, and continue to enter the next round of operation that
      decreases with the passage of time.

   *  For a deadline queue whose CT is not reduced to 0, it can receive
      packets.  In detailed, when a node receives a packet to be
      forwarded from a specific outgoing port, it first obtains the
      expected deadline of the packet, and then put the packet to the
      deadline queue with the relevant CT value of the outgoing port for
      transmission.

   *  For a deadline queue whose CT is not reduced to 0, its scheduling
      priority cannot be set to the highest value.  The smaller the CT,
      the higher the priority.  A transmission mode can be further
      configured for a deadline queue to control its transmission
      behavior.  There are two modes:

      -  The first mode is to allow participation in priority based
         scheduling, also termed as in-time mode;

      -  The second mode, not allowed, also termed as on-time mode.
         That is, a queue with on-time mode is allowed to participate in
         priority based scheduling only when its CT becomes 0.





Peng, et al.              Expires 25 April 2023                 [Page 4]


Internet-Draft              Deadline Routing                October 2022


      In-time mode is work-conserving and applicable to low latency
      services, and on-time mode is not work-conserving and applicable
      to low latency variation services.  Only one mode can be
      configured for each deadline queue.  One implementation can
      support one set of queues in a single mode, or two sets of queues
      support two modes.

   *  At the beginning, all deadline queues have different CT values,
      i.e., staggered from each other, so that the CT of only one
      deadline queue will decrease to 0 at any time.  It should be noted
      that CT is just the countdown of the head of the queue, and other
      packets in the queue (especially those at the end of the queue)
      have to wait longer to be scheduled.  It can be seen that the
      scheduling countdown of any specific packet in the queue is in the
      range [CT, CT+AT).

   The above AT, TI and MAX_CT value shall be choosed according to the
   actual capacity of the node.  In fact, each node in the network can
   independently use different AT for different outgoing ports.  The
   general principle is that if an outgoing port has a large bandwidth
   (such as 100G bps), the AT can be small (such as 1us), because the
   link with large bandwidth can send the required bits amount even
   within a small duration; If an outgoing port has a small bandwidth
   (e.g. 1G bps), the AT should be larger (e.g. 10us), because the link
   with small bandwidth needs to send the required bits amount within a
   larger duration.  In the FE (Fast Ethernet, 100M bps) scenario, that
   however may not be the typical scenario this document focuses on, the
   transmission time of a single packet may take several microseconds,
   then attention must be paid to ensure that the AT is larger than the
   transmission time of a single packet, especially the AT should
   include the interferrence delay caused by a single low priority
   packet with maximum size.

   The choose of TI should consider the latency granularity of various
   service flows, so that CT updated per TI can match the delay
   requirements of different services.  For example, if the delay
   difference of different traffic flows is several microseconds, TI can
   be choosed as 1 us.  If the delay difference of different traffic
   flows is several 10 microseconds, TI can be choosed as 10 us.

   A specific example of the deadline queue is depicted in Figure 1.










Peng, et al.              Expires 25 April 2023                 [Page 5]


Internet-Draft              Deadline Routing                October 2022


  +------------------------------+   +------------------------------+
  | Deadline Queue Group:        |   | Deadline Queue Group:        |
  |    queue-1(CT=60us)          |   |    queue-1(CT=59us)          |
  |      in-time sub-queue ######|   |      in-time sub-queue ######|
  |      on-time sub-queue ######|   |      on-time sub-queue ######|
  |    queue-2(CT=50us)          |   |    queue-2(CT=49us)          |
  |      in-time sub-queue ######|   |      in-time sub-queue ######|
  |      on-time sub-queue ######|   |      on-time sub-queue ######|
  |    queue-3(CT=40us)          |   |    queue-3(CT=39us)          |
  |      in-time sub-queue ######|   |      in-time sub-queue ######|
  |      on-time sub-queue ######|   |      on-time sub-queue ######|
  |    queue-4(CT=30us)          |   |    queue-4(CT=29us)          |
  |      in-time sub-queue ######|   |      in-time sub-queue ######|
  |      on-time sub-queue ######|   |      on-time sub-queue ######|
  |    queue-5(CT=20us)          |   |    queue-5(CT=19us)          |
  |      in-time sub-queue ######|   |      in-time sub-queue ######|
  |      on-time sub-queue ######|   |      on-time sub-queue ######|
  |    queue-6(CT=10us)          |   |    queue-6(CT=9us)           |
  |      in-time sub-queue ######|   |      in-time sub-queue ######|
  |      on-time sub-queue ######|   |      on-time sub-queue ######|
  |    queue-7(CT=0us)           |   |    queue-7(CT=0us)           |
  |      in-time sub-queue ######|   |      in-time sub-queue ######|
  |      on-time sub-queue ######|   |      on-time sub-queue ######|
  +------------------------------+   +------------------------------+

  +------------------------------+   +------------------------------+
  | Non-deadline Queue Group:    |   | Non-deadline Queue Group:    |
  |    queue-8  ############     |   |    queue-8  ############     |
  |    queue-9  ############     |   |    queue-9  ############     |
  |    queue-10 ############     |   |    queue-10 ############     |
  |    ... ...                   |   |    ... ...                   |
  +------------------------------+   +------------------------------+

  -o----------------------------------o-------------------------------->
   T0                                 T0+1us                       time

          Figure 1: Example of Deadline Queue for outgoing Port

   In this example, the AT for deadline queue group is configured to
   10us.  Queue-1 ~ queue-7 are deadline queues, and other queues are
   traditional non-deadline queues.  Each deadline queue has its CT
   attribute.  The MAX_CT is 60us.  At the initial time (T0), the CT of
   all deadline queues are staggered from each other.  For example, the
   CT of queue-1 is 60us, the CT of queue-2 is 50uS, the CT of queue-3
   is 40us, and so on.  At this time, only the CT of queue-7 is 0, which
   has the highest scheduling priority.





Peng, et al.              Expires 25 April 2023                 [Page 6]


Internet-Draft              Deadline Routing                October 2022


   Suppose the scheduling engine initiates a cycle timer with a time
   interval of 1us, i.e., AT = 10 * TI in this case.  After each timer
   timeout, the timer interval will be subtracted from the CT of all
   deadline queues.  As shown in the figure, at T0 + 1us, the timer
   timeout, the CT of queue-1 becomes 59us, the CT of queue-2 becomes
   49us, the CT of queue-3 becomes 39us, etc.  At this time, the CT of
   queue-7, depending on the implementation, can be maintained as 0 or
   equal to -1.

   At T0 + 10us, the CT of queue-1 becomes 50us, the CT of queue-2
   becomes 40us, the CT of queue-3 becomes 30us, etc.  At this time, the
   CT of queue-7, returns to MAX_CT and is no longer set to the highest
   scheduling priority; The CT of queue-6 becomes 0, which has the
   highest scheduling priority.

   When the CT of a deadline queue becomes 0, it has a time limit of
   10us (i.e., authorization time) to send packets in the queue.  During
   this period, it will be prohibited to receive new packets (in fact,
   there can be no new packets with a deadline of 0).  After the 10us
   time elapses, the CT of another deadline queue will change to 0.

   If the deadline queue with the highest priority is still free after
   sending packets within the authorized time, the scheduling engine
   will visit other queues with the second highest priority during the
   rest of the authorized time.

   Note that if both in-time and on-time policies are supported, the
   queue-1...7 can be taken regard as logical queues and each logical
   queue may include two sub-queues, one used to buffer in-time packets
   and the other used to buffer on-time packets.

3.  Get Deadline Information of Packets

3.1.  Get Planned Residence Time

   The planned residence time of the packet is an offset time, which is
   based on the time when the packet arrivals the node and represents
   the maximum time allowed for the packet to stay inside the node.
   There are many ways to obtain the planned residence time of the
   packet.

   *  Carried in the packet.  The ingress PE node, when encapsulating
      the deterministic service flow, can explicitly insert the planned
      residence time into the packet according to SLA.  The intermediate
      node, after receiving the packet, can directly obtain the planned
      residence time from the packet.  Generally, only a single planned
      residence time needs to be carried in the packet, which is
      applicable to all nodes along the path; Or insert a stack composed



Peng, et al.              Expires 25 April 2023                 [Page 7]


Internet-Draft              Deadline Routing                October 2022


      of multiple deadlines, one for each node.
      [I-D.peng-6man-deadline-option] defined a method to carry the
      planned residence time in the IPv6 packets .

   *  Included in the FIB entry.  Each node in the network can maintain
      the deterministic FIB entry.  After the packet hits the
      deterministic FIB entry, the planned residence time is obtained
      from the forwarding information contained in the FIB entry.

   *  Included in the policy entry.  Configure local policies on each
      node in the network, and then set the corresponding planned
      residence time according to the matched specific characteristics
      of the packet, such as 5-tuple.  An implementation should support
      the policy to forcibly override the planned residence time
      obtained by other methods.

   For a deterministic delay path based on deadline scheduling, the path
   it passes through has deterministic end-to-end delay requirements.
   It includes two parts, one is the accumulated node delay and the
   other is the accumulated link propagation delay.  The end-to-end
   delay is subtracted from the accumulated link propagation delay to
   obtain the accumulated node delay.  A simple method is that the
   accumulated node delay is shared equally by each intermediate node
   along the path to obtain the planning deadline of each node.

3.2.  Get Existing Accumulated Planned Residence Time

   The existing accumulated planned residence time of the packet refers
   to the sum of the planned residence time of all upstream nodes before
   the packet is transmitted to the current node.  This information
   needs to be carried in the packet.  Every time the packet passes
   through a node, the node accumulates its corresponding planned
   residence time to the existing accumulated planned residence time
   field in the packet.  [I-D.peng-6man-deadline-option] defined a
   method to carry existing accumulated planned residence time in the
   IPv6 packets.

   The setting of "existing accumulated planned residence time" in the
   packet needs to be friendly to the chip for reading and writing.  For
   example, it should be designed as a fixed position in the packet.
   The chip may support flexible configuration for that position.










Peng, et al.              Expires 25 April 2023                 [Page 8]


Internet-Draft              Deadline Routing                October 2022


3.3.  Get Existing Accumulated Actual Residence Time

   The existing accumulated actual residence time of the packet, refers
   to the sum of the actual residence time of all upstream nodes before
   the packet is transmitted to the current node.  This information
   needs to be carried in the packet.  Every time the packet passes
   through a node, the node accumulates its corresponding actual
   residence time to the existing accumulated actual residence time
   field in the packet.  [I-D.peng-6man-deadline-option] defined a
   method to carry existing accumulated actual residence time in the
   IPv6 packets.

   The setting of "existing accumulated actual residence time" in the
   packet needs to be friendly to the chip for reading and writing.  For
   example, it should be designed as a fixed position in the packet.
   The chip may support flexible configuration for that position.

   Although other methods can also be, for example, carrying the
   absolute system time of receiving and sending in the packet to
   compute the actual residence time indirectly, that has a low
   encapsulation efficiency and require strict time synchronization
   between nodes.

   A possible method to get the actual residence time in the node is
   that, the receiving and sending time of the packet can be recorded in
   the auxiliary data structure (note that is not packet itself) of the
   packet, and the actual residence time of the packet in the node can
   be calculated according to these two times.

3.4.  Get Existing Accumulated Residence Time Deviation

   The existing accumulated residence time deviation equals existing
   accumulated planned residence time minus existing accumulated actual
   residence time.  This value can be positive or negative.

   If the existing accumulated planned residence time and the existing
   accumulated actual residence time are carried in the packet, it is
   not necessary to carry the existing accumulated residence time
   deviation.  Otherwise, it is necessary.  The advantage of the former
   is that it can be applied to more scenarios, while the later has less
   packaging overhead.










Peng, et al.              Expires 25 April 2023                 [Page 9]


Internet-Draft              Deadline Routing                October 2022


4.  Put Packets into the Deadline Queues

   [I-D.ietf-detnet-bounded-latency] presents a latency model for DetNet
   nodes.  There are six type of delays that a packet can experience
   from hop to hop.  The processing delay (type-4), the regulator delay
   (type-5), the queuing subsystem delay (type-6), and the output delay
   (type-1) together contribute to the residence time in the node.

   In this document, the residence delay in the node is simplified into
   two parts: the first part is to lookup the forwarding table when the
   packet is received from the incoming port (or generated by the
   control plane) and deliver the packet to the line card where the
   outgoing port is located; the second part is to store the packet in
   the queue of the outgoing port for transmission.  These two parts
   contribute to the actual residence time of the packet in the node.
   The former can be called forwarding delay and the latter can be
   called queuing delay.  The forwarding delay is related to the chip
   implementation and is generally constant; The queuing delay is
   unstable.

   When a node receives a packet from an upstream node, it can first get
   the existing accumulated residence time deviation, and then add it to
   the planned residence time of the packet at this node to obtain the
   deadline adjustment value, and then on the basis of the deadline
   adjustment value, deducting the forwarding delay of the packet in the
   node, the allowable queuing delay value (termd as Q) is obtained, and
   then the packet will be put to the deadline queue with corresponding
   CT, meeting the condition: CT <= Q < CT+AT.  As mentioned above, if
   the queue already contains many packets, the newly inserted packet
   may need to wait more time to be scheduled.  Therefore, an alternate
   implemantation may support to select deadline queue for the packet
   based on Q-k*AT, where 0 <= k <= 1, i.e., meeting the condition: CT
   <= Q-k*AT < CT+AT.  In fact, according to the following introduction,
   the deadline scheduling mechanism supports latency credit
   compensation, so the end-to-end latency caused by these two
   conditions is the same.

   Assume that the local node in a deterministic path is i, all upstream
   nodes are from 1 to i-1, and downstream nodes are i + 1, the planned
   residence time is D, the actual residence time is R, the deadline
   adjustment value is M, the forwarding delay inside the node is F, the
   existing accumulated residence time deviation is E, and the allowable
   queuing delay is Q, then the allowable queuing delay (Q) of the
   packet on this node i is calculated as follows:

      E(i-1) = D(1) + D(2) + ... + D(i-1) - R(1) - R(2) - ... - R(i-1)

      M(i) = D(i) + E(i-1)



Peng, et al.              Expires 25 April 2023                [Page 10]


Internet-Draft              Deadline Routing                October 2022


      Q(i) = M(i) - F(i)

   Under normal circumstances, if each hop strictly controls the
   scheduling of the packet according to its planned residence time, the
   actual residence time of the packet will be very close to the planned
   residence time, thus the absolute value of the existing accumulated
   residence time deviation will be very small.

   Consider some extreme cases.  For example, many upstream nodes adopt
   the in-time mode to send packets quickly.  Packets almostly need not
   queue in these nodes, but only depend on the forwarding delay.  Then
   the existing accumulated residence time deviation (E) may be a very
   large positive value, resulting in a large allowable queuing delay
   (Q).  If this value exceeds the MAX_CT of the deadline queue
   maintained by the node, the allowable queuing delay (Q) should be
   modified to the MAX_CT, or such a packet can be inserted into an
   escape deadline queue with fixed larger CT than MAX_CT, or it can
   also be inserted into a higher-level queue in a hierarchical deadline
   queue.

   For another example, if some upstream nodes are abnormal and have a
   very large actual residence time (R), the existing accumulated
   residence time deviation (E) may be a negative number, resulting in
   the allowable queuing delay (Q) may be less than or equal to 0, the
   allowable queuing delay (Q) should be modified to a minimum CT other
   than 0.

   If the queue choosed by the condition is full, the packet must be
   inserted into the next queue with higher CT value.

   Figure 2 depicts an example of packets inserted to the deadline
   queues.



















Peng, et al.              Expires 25 April 2023                [Page 11]


Internet-Draft              Deadline Routing                October 2022


      P2             P1              +------------------------------+
  +--------+      +--------+         | Deadline Queue Group:        |
  | D=20us |      | D=30us |         |    queue-1(CT=55us) ######   |
  | E=15us |      | E=-8us |   +--+  |    queue-2(CT=45us) ######   |
  +--------+      +--------+   |\/|  |    queue-3(CT=35us) ######   |
  ------incoming port-1------> |/\|  |    queue-4(CT=25us) ######   |
                               |\/|  |    queue-5(CT=15us) ######   |
      P4             P3        |/\|  |    queue-6(CT=5us)  ######   |
  +--------+      +--------+   |\/|  |    queue-7(CT=0us)  ######   |
  |        |      | D=30us |   |/\|  +------------------------------+
  +--------+      | E=-30us|   |\/|
                  +--------+   |/\|
  ------incoming port-2------> |\/|  +------------------------------+
                               |/\|  | Non-deadline Queue Group:    |
      P6              P5       |\/|  |    queue-8  ############     |
  +--------+      +--------+   |/\|  |    queue-9  ############     |
  |        |      | D=40us |   |\/|  |    queue-10 ############     |
  +--------+      | E=40us |   |/\|  |    ... ...                   |
                  +--------+   +--+  +------------------------------+
  ------incoming port-3------>        ---------outgoing port---------->

  -o----------------------------------o-------------------------------->
   receiving-time base                +F                            time

       Figure 2: Time Sensitive Packets Buffered to Deadline Queue

   As shown in Figure 2, the node successively receives six packets from
   three incoming ports, among which packet 1, 2, 3 and 5 have
   corresponding deadline information, while packet 4 and 6 are
   traditional packets.  These packets need to be forwarded to the same
   outgoing port according to the forwarding table entries.  It is
   assumed that they arrive at the line card where the outgoing port is
   located at almost the same time after the forwarding delay in the
   node (F = 5us).  At this time, the queue status of the outgoing port
   is shown in the figure.  Then:

   *  The allowable queuing delay (Q) of packet 1 in the node is 30 - 8
      - 5 = 17us, and it will be put into the deadline queue-5 (its CT
      is 15us), meeting the condition that Q is in the range [15, 25).

   *  The allowable queuing delay (Q) of packet 2 in the node is 20 + 15
      - 5 = 30us, and it will be put into the deadline queue-4 (its CT
      is 25us), meeting the condition that Q is in the range [25, 35).

   *  The allowable queuing delay (Q) of packet 3 in the node is 30 - 30
      - 5 = -5us, and it will be modified to the minimum positive value
      5 us then put into the deadline queue-6 (its CT is 5us), meeting
      the condition that Q is in the range [5, 15).



Peng, et al.              Expires 25 April 2023                [Page 12]


Internet-Draft              Deadline Routing                October 2022


   *  The allowable queuing delay (Q) of packet 5 in the node is 40 + 40
      - 5 = 75us, and it will be modified to the MAX_CT (60 us) then put
      into the deadline queue-1 (its CT is 55us), meeting the condition
      that Q is in the range [55, 65).

   *  Packets 4 and 6 will be put into the non-deadline queue in the
      traditional way.

4.1.  Alternate Queue Allocation Ruels

   Each deadline queue can be further divided into multiple sub-queues,
   and each planned residence time corresponds to one.  That is, packets
   with different planned residence time are inserted into different
   sub-queues and protected.  In this way, for two packets with the same
   allowable queuing delay (Q) but different planned residence time, we
   can decide to schedule the packet with the smaller planned residence
   time.  This is beneficial because when packets with smaller planned
   residence time facing larger interference delay, it is difficult to
   have space for delay compensation on downstream nodes, while packets
   with larger planned residence time have larger space for delay
   compensation.

5.  Buffer Size Design

   We assume that the service rate of the deadline scheduler can be the
   link rate (termed as C), so the buffer size of each deadline queue is
   AT * C - M, where M is the maximum size of the packet with low
   priority.

6.  Traffic Regulation and Orchestrating

   On the ingress PE node, traffic regulation should be performed on the
   incoming port, so that the service traffic does not exceed its
   constraint funciton, such as leacky bucket A_i(t) = b_i + r_i*t for
   service i.  Suppose there are multiple service flows released to the
   network through the ingress node, then it must meet sum{r_i} <= C,
   for all service i and any time t, where C is service rate of the
   deadline scheduler.  In addition, stability condition sum{A_i(t)}/t <
   C must also be met.

   Then, on the ingress PE node, traffic is orchestrated into different
   deadline queues of the outgoing port.  Multiple continuous packets of
   the specific service flow are stored in the deadline queue with
   corresponding remaining time according to the planned residence time
   of the service flow.  Note that these packets are not stored in the
   same queue over time.  The amount of bits that can be stored in one
   queue for service i is equal to R_i * AT, however, at least one whole
   packet shall be loaded.  For example, if the allowable queuing delay



Peng, et al.              Expires 25 April 2023                [Page 13]


Internet-Draft              Deadline Routing                October 2022


   (Q) is 20us, then within the current authorization time, the first
   sequence of the packets will be put into the current deadline queue
   with [CT, CT+AT) range that can cover 20us, until the reserved
   bandwidth limit is reached; Then, within the next authorization time,
   the next sequence of packets will be put into the queue with [CT,
   CT+AT) range that can cover 20us, until the reserved bandwidth limit
   is reached; and so on, until the total service bits are loaded.

   Figure 3 depicts an example of deadline based traffic orchestrated on
   the ingress PE node.  It is assumed that the packets loaded in each
   authorization time do not exceed the reserved bandwidth of the
   service.


          1st packet
              |
              v
             +-+ +-+      +----+ +-+ +--+   +------+
             |1| |2|      | 3  | |4| |5 |   |  6   | <= packet sequence
             +-+ +-+      +----+ +-+ +--+   +------+
             |   |        |      |   |      |
             ~+F ~+F      ~+F    ~+F ~+F    ~+F
             |   |        |      |   |      |
    UNI      v   v        v      v   v      v
   ingr-PE -+--------+--------+--------+--------+--------+--------+---->
    NNI     |  Auth  |  Auth  |  Auth  |  Auth  |  Auth  |  Auth  | time
            |  time  |  time  |  time  |  time  |  time  |  time  |
            v        v        v        v
          1,2 in    3 in    4,5 in    6 in
         Buffer-A Buffer-B Buffer-C Buffer-D
          (CT<=Q)  (CT<=Q)  (CT<=Q)  (CT<=Q)
            |        |        |        |
            ~+Q      ~+Q      ~+Q      ~+Q
            |        |        |        |
    sending v        v        v        v
            +-+ +-+  +----+   +-+ +--+ +------+
            |1| |2|  | 3  |   |4| |5 | |  6   |
            +-+ +-+  +----+   +-+ +--+ +------+


               Figure 3: Deadline Based Packets Orchestrating










Peng, et al.              Expires 25 April 2023                [Page 14]


Internet-Draft              Deadline Routing                October 2022


6.1.  Traffic Re-shaping on Intermediate Node

   Multiple ingress PEs may release multiple flows with the same
   deadline.  These flows may arrive at an intermediate node at the same
   time and put into the same deadline queue.  Especially, packets with
   different deadlines sent by a single ingress PE node at different
   instant of time (e.g, for on-time mode the packet sent early has a
   larger planned residence time than the packet sent late, or for in-
   time mode the packet sent early face more interference delay than the
   packet sent late.), may be put into the same deadline queue by an
   intermediate node.  This means that a larger bandwidth is required on
   the intermediate node than the ingress PE node to send more bits
   within the same time duration, which also means more buffer size.

   Deadline in-time mode is a variant of EDF scheduler with work-
   conserving characteristic and has several aggregated queues.  It
   needs to rely on the known traffic arrival curve to obtain sufficient
   and necessary schedulability conditions.  However, in traffic
   aggregation case, the arrival traffic faced by nodes may be irregular
   due to work-conserving scheduling.  One solution is to re-shape the
   traffic on each intermediate node, such as per flow re-shaping or
   interleaved regulator introduced by [UBS].  It is not recommended to
   maintain per flow shapers in large-scale network.  After placing re-
   shapers in front of the deadline scheduler, the combination of the
   re-shaper and the deadline scheduler is no longer work-conserving.
   As described in [IR-Theory], per flow shaper or interleaved regulator
   does not increate worst-case latency bounds.  Whether this conclusion
   is still valid for deadline needs further study.  Suppose that after
   re-shaping, for an in-time service i, the arrival curve is A_i(t),
   and the planned residence time is d_i which meeting d_i < d_(i+1) of
   service i+1, then the schedulability condition for deadline in-time
   mode is:.

      A_1(t-d_1) + sum{A_i(t+AT-d_i) for all i>=2} <= C*t

   where AT is authorization time of each deadline queue, C is service
   rate of the deadline scheduler.

   The proof is similar with that in [RPQ], except that the step length
   is fine-grained when the queue rotates.  Figure 4 below gives a rough
   explanation.










Peng, et al.              Expires 25 April 2023                [Page 15]


Internet-Draft              Deadline Routing                October 2022


        ^
        | Planned Residence Time
        |                          |    |
        |CT_t+AT+2*TI -> =====     |    |
        |      CT_t+AT+TI ->  =====|    |
   CT_t + - - - - - - - - - - - - >+====+ -\
   +AT  |                          |    |  |
        |                          |    |  |
        |                          |    |   > Phycical queue-x
    d_p + - - - - - - - - - - - - >|    |  |
        |                          |    |  |
   CT_t + - - - - - - - - - - - - >+====+ -/
        |                          |    |===== <- CT_t-TI
        |                          |    |    ===== <- CT_t-2*TI
        |                          |    |
        |
        | TI |  ... ...  | TI | TI | TI | TI | TI | TI |
     ---+----+-----------+----+----+----+----+----+----+--------->
        0      ^                   ^   ^            ^         time
               |                   |   |            |
              t-tau'            t-ofst t           t+tau
        (busy period begin)          (arrival)   (departure)

                   Figure 4: Deadline In-time Scheduling

   Suppose that the observed packet, with planned residence time d_p,
   arrives at the scheduler at time t and leaves the scheduler at time
   t+tau.  It will be inserted to physical queue-x with count-down time
   CT_t at the current timer interval TI with starting time t-ofst and
   end time t-ofst+TI.  According to the above packet queuing rules, we
   have CT_t <= d_p < CT_t+AT.  Also suppose that t-tau' is the
   beginning of the busy period closest to t.  Then, we can get the
   amount of packets within time interval [t-tau', t+tau] that must be
   scheduled before the observed packet.  In detailed:

      for all service i with planned residence time d_i meeting CT_t <=
      d_i < CT_t+AT, the workload is sum{A_i[t-tau', t]}.

      for all service i with planned residence time d_i meeting d_i >=
      CT_t+AT, the workload is sum{A_i[t-tau', t-ofst+TI-(d_i-CT_t-
      AT+TI)]}.

      for all service i with planned residence time d_i meeting d_i <
      CT_t, the workload is sum{A_i[t-tau', t+(CT_t-d_i)]}.

      then deduct the traffic that has been sent during the busy period,
      i.e., C*(tau+tau').




Peng, et al.              Expires 25 April 2023                [Page 16]


Internet-Draft              Deadline Routing                October 2022


   Let tau as d_p, and remember that CT_t <= d_p, the above workload is
   less than

      sum{A_i(tau'+CT_t+AT-d_i) for all d_i >= CT_t} +
      sum{A_i(tau'+CT_t-d_i) for all d_i < CT_t} - C*(tau'+d_p)

   It is further less than

      sum{A_i(tau'+d_p+AT-d_i) for all d_i >= d_2} + A_1(tau'+d_p-d_1) -
      C*(tau'+d_p)

   Then, denote x as tau'+d_p, we have

      sum{A_i(x+AT-d_i) for all d_i >= d_2} + A_1(x-d_1) - C*(x)

   Let the above workload less than zero, then we get the condition.

   It should be noted that for a service i with planned residence time
   d_i, its planned residence time is actually contributed by its own
   flow and all the flows with lower planned residence time.  Thus,
   based on the above schedulability conditions, and knowing the traffic
   constraint functions of all services, we can then give the guidance
   to allocate appropriate (i.e., not arbitrary) planned residence time
   for each service.  In brief, we can choose the traffic arrival
   constraint function according to the preset planned residence time,
   or we can choose the planned residence time according to the preset
   traffic arrival constraint function.

   Compared with the in-time mode, on-time mode is non-work-conserving,
   which can be considered as the combination of EDF and damper.  The
   above schedulability conditions can also be applied to the on-time
   mode, that is, if the aggregate traffic for the specific planned
   residence time reaches the maximum limit allowed by the conditions,
   it will achieve passive on-time behavior.  If the aggregate traffic
   is lower than the limit, the deadline on-time scheduler will also
   achieve active on-time behavior by applying the built-in damper.
   Since the built-in dumper has shaped the on-time service flow, there
   is no need to re-shape the on-time service flow by additional shapers
   in front of deadline scheduler.

7.  Delay Compensation Considerations

   As mentioned above, the calculation of allowable queuing delay (Q)
   reflects the meaning of delay compensation.  It can be used for in-
   time services to distinguish between emergency flows and non
   emergency flows (compared with traditional strict priority
   scheduling), and also for on-time services to strictly control delay
   jitter.  In both cases, packets have a tolerable maximum queuing



Peng, et al.              Expires 25 April 2023                [Page 17]


Internet-Draft              Deadline Routing                October 2022


   delay in the queuing subsystem.  The traditional EDF scheduling can
   also benefit from this delay compensation.

   Suppose that two packets, P1, P2, are generated instantaneously from
   a specific in-time service flow at the same source, and the two
   packets have the same planned residence time.  Packet-1 may face less
   interference delay than P2 in their process of forwarding.  When they
   arrive at an intermediate node successively, P2 will have less
   allowable queuing delay (Q) than P1 to try to stay close to P1 again.
   It should be noted that to compary who is ealier is based on queue's
   CT and packet's Q, according to the above queuing rule (CT <= Q <
   CT+AT), and the CT of the queue is not changed in real-time, but
   gradually with the decreasing step TI.  It is possible to get an
   unexpected comparision result.

   As shown in Figure 5, P1 and P2 are two packets belonging to the same
   burst.  The arrival time when they are received on the current
   scheduler is shown in the figure.  Suppose that the AT of the
   deadline queue is 10us, the decreasing step TI is 1us, and the
   transmission time of each packet is 0.01us.  Also suppose that the Q
   values of two adjacent packets P1 and P2 are 40us and 39us
   respectively, and they are both received in the window from T0 to
   T0+1us.  P1 will enter queue-B with CT range [40, 50), while P2 will
   enter queue-A with CT range [30, 40).  This means that P2 will be
   scheduled before P1, resulting in disorder.


                                                 packets arrived later
   packets arrived earlier                               |
                |                                        |
                V                                        V
   --------+-----------------------------------------------+---------
   ... ... | ...............P1...P2....................... | ... ...
   --------+-----------------------------------------------+---------
      allowable queuing delay (Q):
                            P1.Q=40us
                                 P2.Q=39us
             |                     |                     |
     --------o---------------------o---------------------o----------->
             T0                    T0+1us                T0+2us    time
             queue-A.CT[30,40)     queue-A.CT[29,39)
             queue-B.CT[40,50)     queue-B.CT[39,49)
             queue-C.CT[50,60)     queue-C.CT[49,59)


           Figure 5: Packets queuing based on Delay Compensation





Peng, et al.              Expires 25 April 2023                [Page 18]


Internet-Draft              Deadline Routing                October 2022


   DetNet architecture [RFC8655] provides Packet Ordering Function, that
   can be used to solve the above disorder problem caused by the delay
   compensation.  In addition, we can also maintain states for service
   flows to record the last queuing information to address this issue.

8.  Compatibility Considerations

   For a particular path, if only some nodes in the path upgrade support
   the deadline mechanism defined in this document, the end-to-end
   deterministic delay/jitter target will only be partially achieved.
   Those legacy devices may adopt the existing priority based scheduling
   mechanism, and ignore the possible deadline information in the
   packet, thus the delay intra node produced by them cannot be
   perceived by the adjacent upgraded node.  The more upgraded nodes
   included in the path, the closer to the delay/jitter target.
   Although, the legacy devices may not support the dataplane mechanism
   described in this document, but they can be freely programmed (such
   as P4 language) to measure and insert the deadline information into
   packets, in this case the achievement of delay/jitter target will be
   more perfect.

   Only a few key nodes are upgraded to support deadline mechanism,
   which is low-cost, but can meet a service with relatively loose time
   requirements.  Figure 6 shows an example of upgrading only several
   network border nodes.  In the figure, only R1, R2, R3 and R4 are
   upgraded to support deadline mechanism.  A deterministic path across
   domain 1, 2, and 3 is established, which contains nodes R1, R2, R3,
   and R4, as well as explicit nodes in each domain.  Domain 1, 2 and 3
   use the traditional strict priority based forwarding mechanism.  The
   encoding of the packet sent by R1 includes the planned residence time
   and the accumulated residence time deviation.  Especially, IP DSCP or
   Traffic Class are also set to appropriate values.  The basic
   principle of setting is that the less the planned residence time, the
   higher the priority.  However, in order to avoid the interference of
   non deterministic flow to deterministic flow, the priority of
   deterministic flow should be set as high as possible.

   The delay analysis based on strict priority in each domain can be
   found in [SP-LATENCY], which gives the formula to evaluate the worst-
   case delay of each hop during the resource reservation procedure.
   The worst-case delay depends on the number of hops and the burst size
   of interference flows that may be faced on each hop.  [EF-FIFO] also
   shows that, for FIFO packet scheduling be used to support the EF
   (expedited forwarding) per-hop behavior (PHB), if the network
   utilization level alpha < l/(H-l), the worst-case delay bound is
   inversely proportional to 1-alpha*(H-1), where H is the number of
   hops in the longest path of the network.  Although the deadline in-
   time scheduling, the SP scheduling and EF FIFO scheduling are all



Peng, et al.              Expires 25 April 2023                [Page 19]


Internet-Draft              Deadline Routing                October 2022


   work-conserving, the deadline in-time scheduling can further
   distinguish between emergency and non emergency according to deadline
   information other than traffic class.  Therefore, when analyzing the
   latency of the in-time mode, the latency is not evaluated just
   according to the order in which the packets arrive at the scheduler,
   but according to the deadline of the packets.  An intuitive
   phenomenon is that if a packet unfortunately faces more interference
   delays at the upstream nodes, it will become more urgent at the
   downstream node, and will not always be unfortunate.  This operation
   of dynamically modifying the key fields of the packet can avoid
   always overestimating worst-case latency on all hops.  According to
   schedulability condition, in-time mode can be pessimistic that its
   worst-case latancy bounds is the same as on-time mode.

   When the boundary node (e.g, R2) receives the deterministic traffic,
   it will decide whether to speed up or hold according to the
   accumulated residence time deviation information carried in the
   packet.  The in-time traffic is always sent as soon as possible,
   especially when the residence time deviation of the packet is
   negative.  The on-time traffic always controls the sending time so
   that the average planned residence time is followed, especially when
   the residence time deviation of the packet is positive.  For a
   specific deterministic flow, if it experiences too much latency in
   the SP domain (due to unreasonable setting of traffic class and the
   inability to distinguish between deterministic and non deterministic
   flows), even if the boundary node accelerates the transmission, it
   may not be able to achieve the target of low E2E latency.  If the
   traffic experiences less latency within the SP domain, on-time mode
   will work to achieve the end-to-end jitter target.



         _____  ___           _____  ___           _____  ___
        /     \/   \___      /     \/   \___      /     \/   \___
       /               \    /               \    /               \
   +--+                 +--+                 +--+                 +--+
   |R1| Strict Priority |R2| Strict Priority |R3| Strict Priority |R4|
   +--+     domian 1    +--+     domian 2    +--+     domian 3    +--+
       \____         __/    \____         __/    \____         __/
            \_______/            \_______/            \_______/



                    Figure 6: Example of partial upgrade







Peng, et al.              Expires 25 April 2023                [Page 20]


Internet-Draft              Deadline Routing                October 2022


9.  Deployment Considerations

   According to the above schedulability conditions, the planned
   residence time that can be provided in the network is related to the
   entire deployed service flows.  Each planned residence time can be
   considered as a delay resource, and the smaller the residence time,
   the more valuable it is.  The operator needs to match the
   corresponding planned residence time for each service.  It is not
   recommended to allocate a large planned residence time for the on-
   time service, which will consume a large amount of buffer and is not
   beneficial to target jitter.  However, the in-time service may also
   not want to be assigned to a larger planned residence time, that may
   be determined by its SLA that requires a smaller target end-to-end
   latency.  Thus it is necessary to find a balance to meet more service
   requirements as much as possible.

   We believe that the planned residence time specified for the service
   flows is completely determined by the network, not by the user.
   Therefore, it is impossible to carry the planned residence time
   beyond the network capacity in the service packets.  However, there
   may be some services that expect a large residence time, in this case
   it is recommended to hold them at the border nodes, but not on the
   intermediate node.  For this purpose, the MAX_CT configured on the
   border nodes can be much larger than the MAX_CT configured on the
   intermediate nodes, or the hierarchical deadline queues can be
   configured on the border nodes.

10.  Benefits

   The mechanism described in this document has the following benefits:

   *  Time synchronization is not required between network nodes.  Each
      node can flexibly set the authorization time length of the
      deadline queue according to its own outgoing port bandwidth.

   *  Packet multiplexing based, it is an enhancement of PQ scheduling
      algorithm, friendly to the upgrade of packet switching network.
      All nodes in the network can independently use cycle timers with
      different timeout intervals to rotate the deadline queues.

   *  The packet can control its expected residence time in the node.  A
      single set of deadline queues supports multiple levels of
      residence time.

   *  For in-time mode, the end-to-end delay is H*(F~D), jitter is H*Q;
      For on-time mode, the end-to-end delay is H*D, jitter is a just
      single authorization time.




Peng, et al.              Expires 25 April 2023                [Page 21]


Internet-Draft              Deadline Routing                October 2022


11.  IANA Considerations

   There is no IANA requestion for this document.

12.  Security Considerations

   TBD

13.  Acknowledgements

   TBD

14.  References

14.1.  Normative References

   [I-D.ietf-detnet-bounded-latency]
              Finn, N., Boudec, J. L., Mohammadpour, E., Zhang, J., and
              B. Varga, "DetNet Bounded Latency", Work in Progress,
              Internet-Draft, draft-ietf-detnet-bounded-latency-10, 8
              April 2022, <https://www.ietf.org/archive/id/draft-ietf-
              detnet-bounded-latency-10.txt>.

   [I-D.peng-6man-deadline-option]
              Peng, S., Tan, B., and P. Liu, "Deadline Option", Work in
              Progress, Internet-Draft, draft-peng-6man-deadline-option-
              01, 11 July 2022, <https://www.ietf.org/archive/id/draft-
              peng-6man-deadline-option-01.txt>.

   [I-D.stein-srtsn]
              Stein, Y. (., "Segment Routed Time Sensitive Networking",
              Work in Progress, Internet-Draft, draft-stein-srtsn-01, 29
              August 2021, <https://www.ietf.org/archive/id/draft-stein-
              srtsn-01.txt>.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <https://www.rfc-editor.org/info/rfc2119>.

   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
              May 2017, <https://www.rfc-editor.org/info/rfc8174>.

   [RFC8655]  Finn, N., Thubert, P., Varga, B., and J. Farkas,
              "Deterministic Networking Architecture", RFC 8655,
              DOI 10.17487/RFC8655, October 2019,
              <https://www.rfc-editor.org/info/rfc8655>.



Peng, et al.              Expires 25 April 2023                [Page 22]


Internet-Draft              Deadline Routing                October 2022


14.2.  Informative References

   [EF-FIFO]  "Fundamental Trade-Offs in Aggregate Packet Scheduling",
              2001, <https://ieeexplore.ieee.org/document/992892>.

   [IR-Theory]
              "A Theory of Traffic Regulators for Deterministic Networks
              with Application to Interleaved Regulators", 2018,
              <https://ieeexplore.ieee.org/document/8519761>.

   [RPQ]      "Exact Admission Control for Networks with a Bounded Delay
              Service", 1996,
              <https://ieeexplore.ieee.org/document/556345/>.

   [SP-LATENCY]
              "Guaranteed Latency with SP", 2020,
              <https://www.ieee802.org/1/files/public/docs2020/dd-
              grigorjew-strict-priority-latency-0320-v02.pdf>.

   [UBS]      "Urgency-Based Scheduler for Time-Sensitive Switched
              Ethernet Networks", 2016,
              <https://ieeexplore.ieee.org/abstract/document/7557870>.

Authors' Addresses

   Shaofu Peng
   ZTE Corporation
   China
   Email: peng.shaofu@zte.com.cn


   Bin Tan
   ZTE Corporation
   China
   Email: tan.bin@zte.com.cn


   Peng Liu
   China Mobile
   China
   Email: liupengyjy@chinamobile.com










Peng, et al.              Expires 25 April 2023                [Page 23]