[Search] [txt|xml|pdf|bibtex] [Tracker] [Email] [Nits]

Versions: 00                                                            
DETNET                                                         T. Eckert
Internet-Draft                                Futurewei Technologies USA
Intended status: Informational                                 S. Bryant
Expires: January 13, 2022                             Stewart Bryant Ltd
                                                           July 12, 2021

    Problems with existing DetNet bounded latency queuing mechanisms


   The purpose of this memo is to explain the challenges and limitations
   of existing (standardized) bounded latency queuing mechanisms for
   desirable (large scale) MPLS and/or IP based networks to allow them
   to support DetNet services.  These challenges relate to low-cost,
   high-speed hardware implementations, desirable network design
   approaches, system complexity, reliability, scalability, cost of
   signaling, performance and jitter experience for the DetNet
   applications.  Many of these problems are rooted in the use of per-
   hop, per-flow (DetNet) forwarding and queuing state, but highly
   accurate network wide time synchronization can be another challenge
   for some networks.

   This memo does not intend to propose a specific queuing solution, but
   in the same way in which it describes the challenges of mechanisms,
   it reviews how those problem are addressed by currently proposed new
   queuing mechanisms.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on January 13, 2022.

Eckert & Bryant         Expires January 13, 2022                [Page 1]

Internet-Draft          bounded-latency-problems               July 2021

Copyright Notice

   Copyright (c) 2021 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (https://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Summary . . . . . . . . . . . . . . . . . . . . . . . . . . .   3
     1.1.  Problem: High speed forwarding, high scale fan-in/fan-out   3
     1.2.  Solution goal: Lightweight, per-hop, per-flow stateless
           transit hop forwarding  . . . . . . . . . . . . . . . . .   4
     1.3.  Requirement: Support for existing stateless / steering
           solutions . . . . . . . . . . . . . . . . . . . . . . . .   4
     1.4.  Requirement: PCE to ingress/egress LSR only flow
           signaling . . . . . . . . . . . . . . . . . . . . . . . .   4
     1.5.  Requirement: Support for DiffServ QoS model on transit
           hops. . . . . . . . . . . . . . . . . . . . . . . . . . .   4
     1.6.  Requirement: Low jitter bounded latency solutions.  . . .   4
     1.7.  Requirement: Dynamic, application signalled DetNet flows    5
   2.  Evolution of IP/MPLS network technologies and designs . . . .   5
     2.1.  Guaranteed Service with RSVP  . . . . . . . . . . . . . .   5
     2.2.  Hardware forwarding and DiffServ  . . . . . . . . . . . .   6
     2.3.  MPLS and RSVP-TE  . . . . . . . . . . . . . . . . . . . .   6
     2.4.  Path Computation Engines (PCE)  . . . . . . . . . . . . .   7
     2.5.  Segment Routing (SR)  . . . . . . . . . . . . . . . . . .   8
     2.6.  BIER  . . . . . . . . . . . . . . . . . . . . . . . . . .   8
     2.7.  Summary . . . . . . . . . . . . . . . . . . . . . . . . .   8
   3.  Additional current considerations . . . . . . . . . . . . . .   9
     3.1.  Impact of application based state in networks . . . . . .   9
     3.2.  Experience from IP multicast  . . . . . . . . . . . . . .   9
     3.3.  Service Provider and Private MPLS Networks  . . . . . . .  10
     3.4.  Mission-specific vs. shared infrastructures . . . . . . .  11
     3.5.  PTP and challenges with clock synchronization . . . . . .  12
     3.6.  Jitter - in-time versus on-time . . . . . . . . . . . . .  13
   4.  Challenges for high-speed packet forwarding hardware  . . . .  15
   5.  A reference network design  . . . . . . . . . . . . . . . . .  16
   6.  Standardized Bounded Latency algorithms . . . . . . . . . . .  19
     6.1.  Guaranteed Service (GS) . . . . . . . . . . . . . . . . .  19

Eckert & Bryant         Expires January 13, 2022                [Page 2]

Internet-Draft          bounded-latency-problems               July 2021

     6.2.  TSN Asynchronous Traffic Shaping (TSN-ATS)  . . . . . . .  19
     6.3.  Cyclic Queuing and Forwarding (CQF) . . . . . . . . . . .  20
   7.  Candidate solution directions . . . . . . . . . . . . . . . .  22
     7.1.  Packet tagging based CQF  . . . . . . . . . . . . . . . .  22
     7.2.  Packet tagging based CQF with SR  . . . . . . . . . . . .  23
     7.3.  Per-hop latency indications for Segment Routing . . . . .  23
     7.4.  Latency Based Forwarding  . . . . . . . . . . . . . . . .  24
   8.  Conclusions . . . . . . . . . . . . . . . . . . . . . . . . .  25
   9.  Security Considerations . . . . . . . . . . . . . . . . . . .  25
   10. IANA Considerations . . . . . . . . . . . . . . . . . . . . .  25
   11. Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  25
   12. Informative References  . . . . . . . . . . . . . . . . . . .  26
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  28

1.  Summary

   The architectural evolution of IP/MPLS networks (Section 2) in
   service provider and other "larger-than-building" (Section 3.3),
   shared-infrastructure service networks (Section 3.4) has led to a
   range of requirements against per-hop forwarding mechanisms which are
   currently not supported by the current DetNet MPLS forwarding plane
   [RFC8964] and per-hop, per-flow queueing model[RFC8655], Section 3.2,
   especially with respect to the QoS support of per-hop bounded
   latency.  The authors of this memo think that solutions for these
   requirements are relatively easily added to the existing DetNet
   architecture by adding support for already existing and/or proposed,
   but not standardized per-hop forwarding and queuing options.

   The following sub-sections summarize the problem, solution goals and
   requirements as perceived by the authors.  The reasoning for these is
   explained in the following sections.

   Note that requirements are somewhat overlapping in so far as solving
   one of them also solves others, but each addresses the problems from
   a different perspective, and are therefore easier understood for
   different stakeholders.  For example: Operators that do want to see
   support of DetNet for example for Segment Routing (SR) would not
   think that this is "naturally" the same as DetNet supporting the
   DiffServ architecture, even though solutions would have a hard time
   to support only one of the two.

1.1.  Problem: High speed forwarding, high scale fan-in/fan-out

   Forwarders with bounded latency need to support interface speeds of
   100 Gbps up to Tbps, likely over a period of 10 years from initial
   deployment of possible DetNet solutions.  Hundreds of interfaces may
   need to be supported in a single forwarder (fan-in/fan-out).

Eckert & Bryant         Expires January 13, 2022                [Page 3]

Internet-Draft          bounded-latency-problems               July 2021

   Supporting bounded latency at these speeds and fan-in/fan-out raises
   cost and feasibility challenges beyond those that had led to past
   IETF IntServ (GS) standards ([RFC2210], [RFC2212]) or more recent TSN
   bounded latency solutions.

   Note that these high speed and scale requirements even cause
   challenges when DetNet bounded latency traffic is intended to be used
   for only a small percentage of the interfaces traffic.

1.2.  Solution goal: Lightweight, per-hop, per-flow stateless transit
      hop forwarding

   Both high-speed hardware and network architecture design (for reasons
   of simplicity and minimization of shared risk functions) do favor
   architectures that support a lightweight transit hop forwarding plane
   design that requires no forwarding plane or control plane operations
   whose scale support depends on the number of services/service-
   instances (e.g.: DetNet flows) offered, but at best only on the size
   of the network (e.g.: no per-flow, per-hop state).

1.3.  Requirement: Support for existing stateless / steering solutions

   There should be DetNet bounded latency options that work in
   conjunction with per-transit-hop stateless traffic forwarding such as
   through Shortest Path First (SPF) routing with IP/MPLS), engineered
   steering (e.g.: SR) and stateless replication, such as Bit Indexed
   Explicit Replication with/without Tree Engineering (BIER, BIER-TE).

1.4.  Requirement: PCE to ingress/egress LSR only flow signaling

   There should be DetNet bounded latency options that for the purpose
   of traffic engineering (including assurance of bounded latency across
   the network) only require per-flow Path Computation Engine (PCE)
   signaling to network ingress/egress router, but not to transit hop

1.5.  Requirement: Support for DiffServ QoS model on transit hops.

   There should be DetNet bounded latency options that support the
   DiffServ QoS model instead of only the IntServ model.

1.6.  Requirement: Low jitter bounded latency solutions.

   There should be DetNet bounded latency options that together with the
   other requirements also provide a better than worst-case jitter for
   DetNet traffic.

Eckert & Bryant         Expires January 13, 2022                [Page 4]

Internet-Draft          bounded-latency-problems               July 2021

1.7.  Requirement: Dynamic, application signalled DetNet flows

   The DetNet architecture should support signaling and forwarding that
   would make support for automatically application instantiated DetNet
   flows scalable and lightweight to operate.

2.  Evolution of IP/MPLS network technologies and designs

   To help readers understand especially the per-hop stateless
   requirement from above, the following sections summarizes the
   historical evolution of technologies and operational principles that
   the authors think are relevant to understand the requirements
   outlined above and asks to see supported in DetNet.

2.1.  Guaranteed Service with RSVP

   The original (first and only) IETF standardized packet forwarding
   layer standardized queuing option for bounded latency in the IETF is
   "Guaranteed Service", [RFC2212] (GS), see the DetNet bounded latency
   document, [DNBL] section 6.5.  At the time the RFC was published
   (1997), the standardized signaling was proposed to be RSVP [RFC2205],
   and the use of RSVP with GS was standardized in [RFC2210].

   The function to support GS bounded latency in the forwarding plane is
   the per-flow reshaping on every forwarder hop along the path where GS
   packets of one flow may get delayed in the egress interface queue due
   to packets from other GS flows.  In typical networks, this is every
   hop along the path.

   Early (1990/2000) forwarders for which RSVP was implemented where
   using so-called "software" forwarding.  This meant that the
   forwarding plane was implemented through a general purpose CPU
   without additional hardware support for QoS functions such as shaping
   or queuing.  While these forwarders did support traffic flow shaping,
   GS was never implemented on them and their RSVP implementations did
   also not support (but ignored) the RSVP TSPEC/RSPEC signaling
   parameters used for bounded latency.  Instead, RSVP implementations
   only supported the parameters for bandwidth reservation, which was
   henceforth called Call Admission Control (CAC).

   In one instance, a software forwarder implementation with RSVP
   supported the Controlled Load (CL) service [RFC2211], which does not
   provide for bounded but instead for controlled latency.  This service
   is achieved by creating a per-flow queue and applying weighted fair
   queuing (WFQ) with weights according to the reserved bandwidth of the
   flows (see [RFC2211], section 11).  This functionality did not
   proliferate into later generations of routers because the execution
   cost of WFQ was too high for a multitude of flows and the scheduling

Eckert & Bryant         Expires January 13, 2022                [Page 5]

Internet-Draft          bounded-latency-problems               July 2021

   accuracy was too inaccurate in interrupt driven CPU software
   forwarding with higher speed interfaces (100Mbps...1Gbps).

2.2.  Hardware forwarding and DiffServ

   With the rise of forwarding planes with "acceleration" through ASIC
   based Forwarding Plane Elements (FPE) instead of general purpose CPUs
   and/or dedicated QoS hardware, the ability of forwarders to support
   shaping evolved to only be supported, if at all, on DiffServ (DS)
   boundary nodes, but not on DS interior nodes.  This included both
   shaping as well as complex queuing such as WFQ.

   The DS architecture, [RFC2475], was specifically targeted to enable
   the evolving, now common Service Provider network services
   architecture, in which "high-touch" service functions are only
   performed on so-called Provider Edge (PE) routers, which as required
   are DS boundary nodes, whereas the hop-by-hop forwarding through so-
   called Provider (P) (core) routers is meant to utilize only a reduced
   set of forwarding functions, specifically excluding per-hop, per-flow
   QoS forwarding plane functions such as shaping or policing.  DiffServ
   therefore allowed to build higher speed, lower cost forwarding plane
   P routers.  It also enabled to build equally higher speed, lower
   costs PE routers by supporting boundary node functions only on (lower
   speed) customer facing interfaces/line cards, but not on core facing

2.3.  MPLS and RSVP-TE

   With the advent of MPLS [RFC3031], RSVP was extended to support MPLS
   through the RSVP-TE [RFC3209] extensions.  RSVP-TE manages p2p (later
   on also p2mp) MPLS Label Switched Paths (LSP), which when signaled
   through RSVP-TE are also called RSVP-TE tunnels.  These can be seen
   as the equivalent of IP flows that RSVP manages for IP.  RSVP-TE
   tunnels can support a variety of traffic engineering functions, but
   none of the implementations known to the authors ever implemented GS
   or CL services, specifically because hardware forwarding for service
   provider networks was not designed to support these QoS functions for
   P Label Switched Routers (LSR).

   Because CL/GS where not targeted with RSVP-TE, the signaling
   extensions for Interior Gateway Protocols (IGP) required in the
   classical RSVP-TE reservation model (such as [RFC8570] for IS-IS)
   have no parameters to signal per-hop GS queuing latency or buffer
   capacity utilization.  In result, the existing IGP signaling for
   RSVP-TE only supports RSVP-TE to perform bandwidth but not non-
   queuing path latency resource calculations and therefore no latency
   based traffic engineering.

Eckert & Bryant         Expires January 13, 2022                [Page 6]

Internet-Draft          bounded-latency-problems               July 2021

2.4.  Path Computation Engines (PCE)

   Even though RSVP-TE implementations support only DiffServ (but not
   GS/CL) with respect to per-hop QoS functions, its traffic-steering
   (path selection) and signaling model introduced per-flow (per-tunnel)
   control plane and forwarding plane overhead onto every P-hop.
   Through the 200x's, this RSVP-TE overhead was seen as undesirable
   complexity and overhead by many service providers using it.  There
   was also a much larger number of service providers that desired some
   of the benefits provided by RSVP-TE, but who were not willing to
   commit to the complexity, costs and operational risk introduced into
   the network by complex per-flow signaling of RSVP-TE.  The on-path,
   per-hop signaling of RSVP-TE for example introduced so much overhead,
   that reconvergence of RSVP-TE paths after a failure or recovery took
   as much as 20 minutes in networks with 10,000 or more RSVP-TE

   The design of RSVP-TE's (decentralized) on path signaling model
   specifically showed problematic under high resource utilization.  In
   the original, decentralized RSVP-TE deployment model, ingress PE LSR
   would perform so-called Constrained Shored Path Forwarding (CSPF)
   calculations to determine the shortest path with enough free
   resources for a new flow.  Afterwards the ingress PE would signal the
   path via RSVP-TE.  The IGP would signal to all ingress PE how many
   (bandwidth) resources where left on every link.  Under high load,
   when multiple ingress PE where performing this process in parallel
   this would cause high load, churn and reservation collisions.

   These problems of de-centralized RSVP-TE plus IGP signaling lead to
   the introduction of a so-called Path Computation Element (PCE) based
   architecture, in which the (competing and uncoordinated) traffic
   engineering computations on every de-centralized RSVP-TE ingress LSR
   where replaced by a centralized PCE function (or at least a
   coordinated PE function), which would send the calculated results
   back as a path object to the headend LSR, in result limiting the
   functions of RSVP-TE to the signaling of a steered traffic path
   through the network to establish the hop-by-hop LSP.  The use of a
   PCE can likewise eliminate all the reservation state dependent
   signaling from the RSVP-TE IGP extensions, because all the
   reservation calculations solely need to happen only on the PCE.
   Nevertheless, the PCE does not eliminate the per-hop signaling
   overhead of RSVP-TE to establish LSPs and hence it did not eliminate
   for example the majority of the platform and convergence cost of
   RSVP-TE in the network, especially for the control plane of P nodes
   adn could hence not resolve the concerns of service providers who had
   chosen not to adopt RSVP-TE.

Eckert & Bryant         Expires January 13, 2022                [Page 7]

Internet-Draft          bounded-latency-problems               July 2021

2.5.  Segment Routing (SR)

   The introduction of centralized PCE had obsoleted most of the reasons
   for RSVP: headends did not need to do path calculation, and P router
   did not need to manage the available and allocated bandwdth for TE
   tunnels.  In most service-provider use-cases this left RSVP-TE only
   serving as a very complex solution to do traffic steering, and the
   PCE was doing the rest.  This ultimately lead to the design of the
   Segment Routing [RFC8402] architecture, and its mapping to the MPLS
   forwarding plane, SR-MPLS [RFC8660].  Later, a mapping to IPv6 was
   defined with SRv6 [RFC8986].  SR relies on strict or loose hop-by-hop
   hop source routing information, contained in each packet header,
   therefore eliminating the need to set up per-path flow state via
   RSVP-TE, and allowed in conjunction with DiffServ for hop-by-hop QoS
   a complete per-hop, per-flow stateless forwarding solution, arguably
   therefore lightweight, easy to implement at high performance and
   scalable to large number of flows.

2.6.  BIER

   In the same way as SR eliminated the need for hop-by-hop traffic
   steering forwarding state from RSVP-TE in P-routers for unicast
   traffic, Bit Indexed Explicit Replication [RFC8279] (BIER) solves
   this problem for shortest path multicast replication state across
   P-routers, by replacing it with a BIER packet header [RFC8296] and
   therefore eliminating any per-application/flow, per-hop forwarding
   state for multicast in P-routers.  BIER also removed the associated
   overhead of prior ingress replication solutions Service Providers
   where looking into to avoid the per-hop state.

   Finally, BIER-TE [I-D.ietf-bier-te-arch] adds traffic steering with
   replication to the BIER architecture and calls this Tree Engineering.
   Likewise, this is without the need for per-hop/per-flow steering or
   replication state.

2.7.  Summary

   Service Provider networks have evolved especially in the past 25
   years into an architecture, where high-speed, low-cost and high-
   reliability are based on designs that eliminate or reduce as much as
   possible any form of unnecessary control-plane and even more so per-
   flow, per-application plane complexity from P-routers/transit-nodes.

   This has led to the development of the DiffServ QoS architecture that
   eliminated IntServ/per-flow QoS from P-routers, and later on to the
   evolution from MPLS/RSVP-TE to SR and BIER that eliminated per-flow/
   tunnel forwarding/steering and replication state from the same

Eckert & Bryant         Expires January 13, 2022                [Page 8]

Internet-Draft          bounded-latency-problems               July 2021

   Finally, early experience with Traffic Engineering churn under high
   load and todays requirements for often NP-complete optimization lead
   to an architectural preference for off-path/centralized model for TE
   calculations via PCE to also free P-routers from signaling complexity
   and perform dynamic/service-dependent signaling only to PE-routers.

3.  Additional current considerations

   The following subsections look at further into the background for why
   per-hop, per-flow state can be problematic and discuss problems
   beyond this core issue.

3.1.  Impact of application based state in networks

   RSVP-TE was (and is) solely used for services where the operator of a
   domain explicitly provisions RSVP-TE tunnels across its domain (for
   example using a PCE) and can therefore fairly easily know the worst-
   case scaling impact.  For example the number of tunnels does is not a
   chance value arising through dynamic subscriber action, and the
   number of tunnels in the network is primarily impacted by topological
   changes and the (over time relatively rare) of occurrences of
   additional services and/or service instances being provisioned.  For
   RSVP-TE there was never (to the knowledge of the authors) an end-to-
   end application layer interface such as there was for RSVP over IP,
   for example as supported by earlier versions of Microsoft Windows QoS
   enabled IP sockets.

   When per-flow operations including per-hop signaling or even worse
   per-hop forwarding plane or QoS state is not a result of well-
   controlled provisioning or well plannable/predictable failure
   behavior but instead driven by applications not under the control of
   network operators, the per-hop state requirements can become much
   more an operational and cost problem, because of its

3.2.  Experience from IP multicast

   The widest experience with dynamic, application based signaling in
   Service Provider networks likely exist for IP multicast, where
   creation of per-hop forwarding/replication state is triggered by
   applications not under the control of network operations but by
   customer managed applications/application-instances.  Managing the
   amount of state and the control plane load on P-routers was and is
   one of the mayor concerns when operationalizing IP Multicast services
   in SPs.

   Service Provider L2-VPN and L3-VPN services can offer IP Multicast
   via architectures such as [RFC6513] that attempt to solve/reduce the

Eckert & Bryant         Expires January 13, 2022                [Page 9]

Internet-Draft          bounded-latency-problems               July 2021

   problem of customer application driven, per-multicast application in
   a variety of ways, but they all come with their own problems:

   o  In ingress-replication, the ingress-PE sends a separate unicast
      copy to every egress-PE.  This creates significant excess traffic
      on links close to the ingress-PE and potentially higher-cost
      ingress-PE attachment speeds.

   o  In L3VPN aggregates-trees, the traffic for multiple trees is sent
      across a common tree reaching the superset of all egress-PE of all
      included trees.  This reduces the number of trees from one per-
      customer application to a lower number of aggregates this, but it
      creates potentially significant excess traffic towards egress-PE
      that do not need all the aggregated traffic and may even result in
      a requirement for access core access link speeds for those egress

   Finally, the per P-router stateless BIER solution solved these
   issues.  It does not require any per P-router, per tree state
   creation, and achieves a 256x better traffic efficiency than ingress
   replication (with 256 long BIER bit strings).

3.3.  Service Provider and Private MPLS Networks

   With DetNet services being targeted primarily for so-called private
   networks such as (but not limited to) those for industrial, theme
   parks, power supply systems, road, river, airport and train
   transportation networks, it is important to understand how concerns
   for SP networks will apply to such private networks:

   While the aforementioned evolution of MPLS networks focused on large-
   scale service provider networks, the very same architectural
   evolution is or will also happen in any private MPLS networks in the
   same way as the DiffServ architecture equally became the only widely
   adopted QoS architecture in any larger scale (campus or beyond)
   private networks.

   While some of the scaling, cost, performance and reliability issues
   mentioned above for service providers may not equally apply to
   smaller scale private networks, past experience has shown that that
   it is unlikely for a critical mass for different solutions to develop
   across a large variety of vertical private type of networks.  For
   this reason, in the past any larger scale enterprise networks have
   preferred to adopt solutions that had proven themselves through SP
   deployments and that where based on cross-vendor IETF based
   architecture principles and widely, interoperable vendor

Eckert & Bryant         Expires January 13, 2022               [Page 10]

Internet-Draft          bounded-latency-problems               July 2021

   Another reason for private network operators looking for service
   provider calls designs is that it also is simplifies potential
   service provider based management of the network and/or outsourcing
   of the network to a service provider.  This was seen often when large
   enterprises that had to support multi-tenants evolved from ad-hoc
   network virtualization solutions (such as VRF-lite) over to BGP/MPLS-
   VPN designs and later outsourced those very networks.

   In that same line of future proofing, networking technologies first
   developed for enterprises would also be picked up and reused in
   Service Provider networks as long as they would fit.  IP Multicast
   for example had (since about 1996) ca. 10 years of deployment for
   business critical enterprise use cases (such as financial market data
   distribution), before it was adopted widely for IPTV in service

3.4.  Mission-specific vs. shared infrastructures

   Whereas the previous section points to the practice and benefits to
   share technologies between private and SP network, this section
   highlights one core additional requirement of SP networks not found
   in most private networks from which pre-DetNet deterministic service
   requirements will likely originate.

   In architectural terms, the desire and need to minimize or avoid per-
   application/flow forwarding/control-plane state and per-hop control
   plane interactions (be it through on-path signaling or direct PCE to
   P-router signaling) is not primarily a matter of SP/private networks
   or not even of size, but foremost a matter of whether or not the
   network itself is seen as the (a) communications fabric of a large
   distributed application or (b) as an independently running shared
   infrastructure across a potentially wide variety of application/
   services with diverging requirements.

   (a) is the dominant view of the network specifically from many
   (single) mission specific networks such as many industrial networks
   and even non-public High Performance Compute (HPC) center
   architectures.  In either of these case, it is a single architectural
   entity that can control both network infrastructure and application
   to build a mission optimized compound.

   For example, switches in HPC Data Centers had traditionally very
   shallow interface packet buffering for cost reasons, resulting in
   inferior performance under peak load with predominant older TCP
   congestion control stacks.  Instead of using better, more expensive
   switches, it was easier to improve application device TCP stacks,
   leading for example to BBR TCP.  While this is very much in line with
   the desired Internet architecure that is putting a significant

Eckert & Bryant         Expires January 13, 2022               [Page 11]

Internet-Draft          bounded-latency-problems               July 2021

   responsibility onto transport layer protocols in hosts (not limited
   to TCP) to behave "fair" or "ideal", the reality even in many private
   missions centric networks such as manufacturing plant is different.
   Dealing with misbehaving user devics or applications is one of the
   main challenge.  In the example, that is the case when a DC is
   offering public cloud services, where TCP stacks can not be
   controlled, and hence deeper buffers and/or better AQM are a core

   In general: In networks following the (b) shared infrastructure
   design principle, any resource that needs to be shared across
   different services or even service instances becomes a potential
   three party reliability and costing issue between the provider
   running the network and the two (or more) parties whose services
   utilize the common resource.  Minimizing the total amount of complex,
   failure-prone and hard to quantify in a cost-effective manner shared
   resources is thus at the base of any shared infrastructure network

   This again points to the model, where all network control can happen
   on the edge, and due to the absence of per-hop, per-flow state there
   simply is no shared flow state table that needs to be managed across
   multiple different services/subscribers.

3.5.  PTP and challenges with clock synchronization

   Some bounded latency solution require accurate clock synchronization
   across network nodes performing the bounded latency algorithm.  The
   most commonly used (family of) protocol(s) for this is the Precision
   Time Protocol (PTP), standardized in IEEE1588 and various market
   specific profiles thereof.

   PTP can achieve long-term Maximum Time Interval Errors (MTIE) of as
   little as 10th of nsec.  MTIE is the maximum time difference between
   the clocks of two PTP nodes measured over long period of time.

   Implementing PTP in devices comes at a range of design requirements.
   At high degree of accuracy, PTP requires accordingly accurate local
   oscillators that includes hardware such as regulated heating to
   operate under constant temperature.  It includes accurate
   distribution of clock across all components of the system, which can
   be especially challenging in modular, large-scale devices, and
   accurate insertion and retrieval of timestamp field into packet

   While PTP is becoming more and more widely available, consistent
   support of high accuracy across all target type of switches and
   routers in wide area networks cannot be taken for granted to be a

Eckert & Bryant         Expires January 13, 2022               [Page 12]

Internet-Draft          bounded-latency-problems               July 2021

   feasible new requirement raised for DetNet when it did not exist in
   before.  Today, PTP is often found in mobile network fronthauls, but
   not their backhauls or any other broadband aggregation, distribution
   or core networks.  This is because there is, as of today, no strong
   business case requirement for PTP at high precision in those
   networks, whereas technologies such as eCPRI raise such requirements
   against mobile fronthauls.  Instead, those other networks most often
   resort to at best msec accuracy NTP protocol deployments which is
   typically sufficient for control-plane and operational event tracing
   as its main, accuracy defining use-case.

   The larger the network and more multi-vendor varied the deployed
   equipment is, the higher will also be the operational cost of
   maintaining and controlling the accuracy of a PTP service.  This
   primarily has been cited in the past as a reason to not deploy PTP
   even if hardware was supporting it.  This operational challenge will
   especially apply when PTP support may be required for only a small
   percentage of traffic in a high speed wide area network.  The revenue
   from the service needs to cover the operational cost incurred by its
   exclusive components (hardware, software and operations).

3.6.  Jitter - in-time versus on-time

   This section discusses how low-jitter bounded latency applications
   can be highly beneficial for DetNet applications.

   Depending on the bounded latency algorithm, the jitter experienced by
   packets varies based on the amount of competing traffic.  In
   algorithms and their resulting end-to-end service which this memo
   calls "in-time", such as GS and [TSN-ATS], the experienced latency in
   the absence of any competing traffic is zero, and in the presence of
   the maximum amount of permissible competing traffic, latency is the
   maximum, guaranteed bounded latency.  In result, the jitter provided
   by these algorithms is the highest possible.

   In algorithms and their resulting end-to-end service which this memo
   calls "on-time", the experienced latency is completely or most
   significantly independent of the amount of competing traffic and the
   jitter therefore null or minimal.  In these algorithms, the network
   buffers packets when they are earlier than guaranteed, whereas in-
   time algorithms deliver packets (almost) as fast as possible.

   This memo argues that on-time queuing algorithms provide an
   additional value-add over in-time algorithms, especially for use in
   metropolitan or wide-area networks.  Whatever algorithm is used, the
   receiving application only has a guarantee for the maximum bounded
   latency, and the real (shorter) latency of any received packet is no
   indication for the latency of the next packet.  Instead, the receiver

Eckert & Bryant         Expires January 13, 2022               [Page 13]

Internet-Draft          bounded-latency-problems               July 2021

   application has to be prepared for each and any future packet to
   arrive with the worst possible, e.g.: the bounded latency.

   The majority of applications require some higher layer function
   synchronously to the sender application: Rendering of audio/video and
   other media information needs to happen at the same frequency or
   event intervals at which the media was encoded.  When these
   applications receive packets earlier than the time at which they can
   be processed (which is equal or close to the bounded latency), these
   applications buffer media in a so-called playout buffer and release
   them only at that target time.  Likewise, remote control loops
   including industrial Programmable Logic Controller (PLC) loops or
   remote controlling of robots or cars is typically based on
   synchronous operations.  In these applications, early packets are
   also delayed to then be processed "synchronously" later.

   In all cases, where applications need to buffer (or otherwise
   remember) received data when it is too early, in-time queueing
   latency raises the challenge to application developers to be able to
   predict the networks worst possible jitter, and this can be
   particularly challenging for embedded, if not constrained receiver
   devices with minimum memory to buffer/remember.  When these devices
   are designed against one particular type of network with well-known
   low jitter, then they will not necessarily operate correctly in
   networks with larger jitter.  And in metropolitan and WAN networks,
   jitter with in-time services can be highly variable based on its
   design and the relative location of the communicating nodes in the
   topology (see Section 5 for an example network design).

   One example of such issues was encountered when digital TV receivers
   (Set Top Boxes, STB) designed for (mostly synchronous) digital cable
   transmission where evolved to become IPTV STB, but the playout buffer
   of < 50 msec was not sufficient to compensate for a > 50 msec jitter
   experienced in IP metropolitan networks.

   Note that this section does not claim that all applications will
   benefit from on-time service, nor that no application would benefit
   more from in-time service than from on-time service.  Nevertheless,
   the authors are not aware of instances of [RFC8578] application for
   whom in-time service would be more beneficial than on-time service.
   Of course, this comparison is only about the benefit to the
   application and other factors such as the cost/scale of the service
   for the network itself have also to be taken into account.

Eckert & Bryant         Expires January 13, 2022               [Page 14]

Internet-Draft          bounded-latency-problems               July 2021

4.  Challenges for high-speed packet forwarding hardware

   The problems of cost and operational feasibility in shared-
   infrastructure networks specifically applies to scaling of hardware
   resources such as per-application-flow forwarding or QoS state in
   high-speed network routers: Even if the business case makes it clear
   that only e.g. 1 Gbps worth of traffic may require this advanced
   state (such as multicast replication or per-flow shaping for bounded
   latency), it will be more expensive to build this functionality into
   a 100 Gbps transit switch/router than into a 1 Gbps switch/router.
   This too is based on experience from migrating services of low-speed
   mission specific networks, such as IP multicast onto high speed,
   shared-infrastructure service provider networks.

   The reason for this higher cost at higher speed is that the 1 Gbps
   worth of "advanced" traffic still has to be built into 100 times
   faster hardware and each of the "advanced" packets forwarded would
   needs to be replicated/shaped 100 times faster.

   This packet processing issue may look like it applies equally to both
   per-hop, per-flow stateful based forwarding as well as solely in-
   packet based mechanisms, in practice, per-flow state may requires a
   lot more high-speed memory access because of the need to access an
   entry from a state table.  In most cases, this table space can only
   be made to work at line rate packet processing when it is on-chip,
   hence it is not only most expensive, it is also crucial to scale
   right.  And as the 1 vs. 100 Gbps example above showed, it is very
   hard to come by an appropriate scale smaller than "would work for
   100% of traffic" - because network operator providing shared
   infrastructure networks really do not want to be responsible for
   predicting how individual services may grow in adoption by making a
   specific hardware selection that constrains any such grows.

   Last, but not least, on-chip high-speed state tables become even more
   expensive when they do not only have to be read only, but also when
   they have to be written at line rate and even worse, when they have
   to operate for line-rate speed read/write/read control loops:

   The main issue with scaling state in hardware routers is that designs
   will be hesitant to work against unclear growth predictions.  Even if
   at some point in time only 1 Gbps of DetNet traffic was expected to
   be required on a 100 Gbps platform, hardware designers will be much
   more likely want to scale against the worst (best) case service
   growth expectation so that customers will not feel that they would
   buy into a product that becomes obsolete under success.

   Whereas steering state, such as MPLS label entries can easily scale
   to hundreds of thousands, the same is not clear about shapers or

Eckert & Bryant         Expires January 13, 2022               [Page 15]

Internet-Draft          bounded-latency-problems               July 2021

   interleaved regulators.  They are more challenging because they
   require fast (on-chip) read-write memory for the state variables,
   especially when forwarding is parallelized across multiple execution
   unit.  This does incur additional complexity to split up the state
   and its packets across multiple execution units and/or to provide
   consistent cross-execution units shared read/writeable memory.

   Even only writeable (but not cross-execution units then also
   readable) memory has traditionally been a sparse resource the faster
   the forwarding engines are.  This can be seen from (often very
   limited) scale of packet monitoring state such as for IPfix.

   But the main issue of per-hop, per-flow forwarding state that could
   be quite dynamic because it might be triggered by applications is the
   control plane to forwarding-plane-state interactions.  Updating
   hardware forwarding engine state tables is often one of the key
   performance limits of routers.  Adding significant additional state
   with likely ongoing changes is easily seen as a big contributor to
   churn in the control plane and likely reason for stability and
   reduced peak performance under key events such as reconvergence of
   all or large parts of IGP or BGP routing tables.

5.  A reference network design

   The following picture shows an example, worst-case network topology
   of interest (in the opinion of the authors) for bounded latency
   considerations.  This section does not claim that greenfield rollouts
   may or want to use all aspects of this topology.  What his memo does
   claim is that many existing brownfield networks, especially large
   metropolitan areas show all or many of these aspects, and that it
   would be prudent for bounded latency network technologies to support
   networks like these so as to not create new constraints against
   network designers by only supporting physical network topologies
   optimized for a particular type of service (bounded latency).

Eckert & Bryant         Expires January 13, 2022               [Page 16]

Internet-Draft          bounded-latency-problems               July 2021

            Subscribers, Towers, IoT devices
                       ...Access....    National-Core
                     PE100 ... PE199    Exchanges/
                     | |                Peerings
                     | \----\           /   \
                     |       \         /     \
                ---  P11 --- P12 --- P13 --- P14 --
               /                                   \
   Edge  -----P21                                   P15
   DC  PE     |                                     |
        ------P21                                   P17
               \                                   /
                ---  P20 --- .........   --- P18 --
                     /                         \
        Edge   --- P30                      P40
        DC    PE     \                        /
                ----- P31 -- ....  P38 --- P39
                                     \     /
                                      \   /
                           Subscribers, Towers, IoT devices

                   Figure 1: Reference Network Topology

   An example metropolitan scale network as shown in Figure 1 may
   consist of one or more rings of forwarders.  A ring provides the
   minimum cost n+1 redundancy between the ring nodes, especially when,
   as is common in metropolitan networks, new fibre cannot cost-
   effectively be put into new optimum trenches, but existing fibre and/
   or trenches have to be used.  This is specifically true when the area
   includes not dense populated suburban areas (higher cost per
   subscriber and mile for rollouts).

   Multiple, so-called subtended rings typically occur when existing
   networks are expanded into new areas: A new ring is simply connected
   at two most economic points into the existing infrastructure.
   Likewise, such a topology may become more complicated over time by
   addition of capacity, which resulting from TE planning calculations
   may not follow any of the pre-existing ring paths.

   Edge Data-Center (DC), connections to Exchanges/Peerings or national
   cores of the provider itself, as well as all subscribers including
   Mobile Network Towers, and IoT devices connect to these ring directly
   via PE edge-forwarders and (more often) via additional CE type
   devices.  P nodes may also double as PE nodes.

Eckert & Bryant         Expires January 13, 2022               [Page 17]

Internet-Draft          bounded-latency-problems               July 2021

   In densely populated regions, P, or PE nodes may have a high number
   of attached devices, shown in the picture with the example of 100 PE
   forwarder connecting to a single P forwarder (or rather two P for
   redundancy and therefore support of PREOF).

   In summary, the following aspects of these networks are relevant for
   bounded latency:

   o  Link speeds today are at least 100 Gbps and will be Tbps in the
      near future.  Even if only a small percentage of that traffic has
      to support bounded latency, the queuing mechanism need to support
      these high-speed interfaces.

   o  Fan-in/out at PE or P nodes may be (worst case) in the order of
      hundred(s) of incoming interfaces.  Bounded latency mechanisms
      whose number of queues depend on the number (#I) of interfaces in
      a more than linear fashion, such as (#I^2) in the case of
      [TSN-ATS], may introduce significant challenges for cost-effective

   o  Through the advent of decentralized edge Data Center and peerings
      between different operators and content providers, traffic flows
      of interest will not solely be between one central site from/to
      subscribers hub&spoke.  Instead arbitrary, traffic engineered
      paths across the topology between any two edges need to be
      supportable in scale with the bounded latency queuing mechanism.

   o  The total number of edge (#E) nodes (PE or CE) for a bounded
      latency service can easily be in the thousands.  Aggregation of
      bounded latency flows on the order of (#E^2), which is the best
      option in per-hop, per-flow solutions such as [TSN-ATS], is likely
      insufficient to significantly reduce the number of flows that need
      to be managed across P nodes in such bounded latency queuing

   o  The total number of P nodes may be in the hundreds and bounded
      latency flows in the tenths of thousands.  It should also be
      expected that such flows are not necessarily long-term static but
      may need to be provisionable in the time-scale order of for
      example telephone calls (such as flows supporting
      remote control of devices or operations).  Bounded latency
      solutions that require per-flow, per-node state maintenance on the
      P nodes themselves may therefore be undesirable from a network
      operational/complexity/reliability perspective, but also from a
      hardware engineering cost perspective, especially with respect to
      the control plane cost of dynamically setting up per-flow bounded
      latency for flow whenever there is a new flow or all of them

Eckert & Bryant         Expires January 13, 2022               [Page 18]

Internet-Draft          bounded-latency-problems               July 2021

      whenever there are topology or load changes that make rerouting

   Beyond queuing concerns, path selection too specifically for
   deterministic services is a challenge in these networks:

   o  Path lengths may be significantly longer than e.g. 3 hops.  In
      large metropolitan networks, they can reach 20 or more hops.
      Speed of light end-to-end in these networks will be in the order
      of low number of msec.  End-to-end queuing latency can be in the
      same range, if not higher.

   o  To avoid undesirable re-routing under failure when PREOF and
      engineered disjoint paths are used, traffic steering needs to
      support efficiently supportable hop-by-hop traffic steering.  In
      networks designed for source-routing (e..: SR routing),
      efficiently encoded strict-hop-by-hop steering for as much as
      those (e.g.: 20) hops may be desirable to support.

6.  Standardized Bounded Latency algorithms

   [DNBL] gives an overview of the math for the most well-known existing
   deterministic bounded latency algorithms/solutions.  This section
   reviews the relevant currently standardized algorithms from the
   perspective of the above listed problems for high-speed, high-scale,
   shared services infrastructures and to provide additional background
   about them.

6.1.  Guaranteed Service (GS)

   GS is described in section 6.5 of [DNBL].  Section 2.1 describes its
   historical evolution and challenges.  We skip further detailing of
   its issues here to concentrate on IEEE Time Synchronuous Networking -
   Asynchronous Traffic Shaping [TSN-ATS], which in general is seen as
   superior to GS for high speed hardware implementation.  All the
   concerns described in the TSN-ATS section apply equally or even more
   to GS.

6.2.  TSN Asynchronous Traffic Shaping (TSN-ATS)

   Section 6.4 of [DNBL] describes the bounded latency used for TSN
   Asynchronous Traffic Shaping [TSN-ATS].  Like GS, this bounded
   latency solution also relies on per-flow shaper state, except that it
   uses optimized shapers called "Interleaved Regulator" as explained in
   section 4.2.1 of [DNBL].

   The concept and simplification in interleaved regulators over
   traditional shapers and the concept of interleaved regulators is a

Eckert & Bryant         Expires January 13, 2022               [Page 19]

Internet-Draft          bounded-latency-problems               July 2021

   resulting from mathematical work done in the last 10 years starting
   with [UBS].

   In a system with e.g.  N=10,000 flows each with a shaper, the
   forwarder needs to have 10,000 shapers each of which would need to
   calculate the earliest feasible send-time of the first queued packet
   of the flow and all these send-times would need to be compared by a
   scheduler picking the absolute first packet to send.  Of course it is
   unlikely that the router would have to queue at least one packet for
   all queues at any point in time, but the complexity to implement the
   scheduler scales with N.

   With interleaved regulators, there is still the per-flow state
   required to hold each flows traffic parameters and its next-packet
   earliest departure time, but instead of requiring a scheduler to
   compare N entries, packets are queued into one out of (#IIF,#PRIO)
   FIFO queues, one queue for all the packets arriving from the same
   Incoming InterFace (IIF) and targeted the same worst-case queuing
   latency/PRIOrity (PRIO) on this hop.  The shaper now only needs to
   calculate the earliest departure time of the head of each of these M=
   #IIF * #PRIO queues and the complexity of a scheduler to select the
   first packet across those interleave regulators is therefore reduced
   by a factor of O(N/M).

   Unfortunately, while industrial ethernet switches today often have no
   more than 24 IIF, aggregation routers in metropolitan networks may
   have thousands of IIF, so the benefit of interleaved regulators over
   per-flow shaper will likely be much higher in classical TSN
   environments than it would be for example likely DetNet target
   routers in metropolitan networks.

   In addition, the aforementioned core problems for shapers
   (Section 4), namely control plane, read/write/read cycle access and
   scale equally apply to interleaved regulators, so the main
   optimization benefits of interleaved regulators is for the original
   targets of [UBS] / [TSN-ATS]: low-speed (1..10Gbps switches) with
   limited number of interfaces - but to a much lower degree for likely
   important type of DetNet deployments.

6.3.  Cyclic Queuing and Forwarding (CQF)

   TSN Cyclic Queuing and Forwarding as described in [DNBL], section
   6.6, is a per-flow, per-transit-hop stateless forwarding mechanism,
   which solves the concerns with per-hop, per-flow state issues
   described earlier in this memo.  It also provides an on-time service
   in which the per-hop and end-to-end jitter is very small, namely in
   the order of a cycle time.

Eckert & Bryant         Expires January 13, 2022               [Page 20]

Internet-Draft          bounded-latency-problems               July 2021

   [CQF] operates by forwarders sending packets in periodic cycles.
   These cycles are derived from clock synchronization: The start of
   each cycle (and by implication the end of the prior cycle) are simply
   periodically increasing clock timestamps that have to be synchronized
   across adjacent forwarders, usually via PTP.  This method to operate
   cycles allows [CQF] to operate without additional [CQF] data packet
   headers, but it is also the reason for the two issues of [CQF], and
   both relate to the so-called dead time (DT).

   For the receiving node to correctly associate a [CQF] packet to the
   same cycle as the sending node, the last bit of the last packet in
   the cycle on the sending node needs to be received by the receiving
   node before the cycle ends.

   [DNBL] explains that DT is the sum of latencies 1,2,3,4 as of [DNBL]
   Figure 1, but that is missing the MTIE between the forwarders: If a
   cycle is for example 10 usec, and the PTP MTIE is 1 usec, then only 9
   usec of the cycle could be used (without even yet considering the
   other factors contributing to MTIE).  If MTIE is not taken into
   account, a packet might arrive in time from the perspective of the
   sending forwarder, but not in the perspective of the 1 usec earlier
   receiving node.

   In practice, MTIE should be equal or lower than 1% of the cycle time.
   When forwarders and links increase in speed, cycle times could become
   proportionally smaller to reduce per-hop cycle time latency.  When
   this is done, MTIE needs to equally become smaller, raising the costs
   of the solution.  Therefore, [CQF] has a challenge with higher speed

   The second and even more important problem is that DT includes the
   link latency (2 in [DNBL], Figure 1).  With a speed of light in fibre
   of 200,000 Km, link latency is 10 usec for 2 Km.  This makes [CQF]
   very problematic and limited in metropolitan and wide-area networks.
   If the longest link of a network was 10 Km, this would cause a DT on
   that link of 50 usec and with a cycle time of 100 usec, only 50%
   bandwidth could be used for cycle-time (bounded latency) traffic
   (excluding all other DT factors).

   When links are subject to thermal expansion also known as sag on
   hanging wires, such as broadband copper wires (Cable Networks), their
   length can also change by as much as 20% between noon and night
   temperatures, which without changes in the design has to be taken
   into account as part of DT.

   In conclusion, [CQF] solves many of the problems discussed in this
   memo, but it's reliance on timestamp synchronized cycles may pose
   undesirable challenges with the required accuracy of PTP in high

Eckert & Bryant         Expires January 13, 2022               [Page 21]

Internet-Draft          bounded-latency-problems               July 2021

   speed network and especially limits [CQF] ability to support wider-
   scale networks due to DT.

7.  Candidate solution directions

   As this memo outlines, per-hop, per-flow stateless forwarding is the
   one core requirement for to support Gbps speed metropolitan or wide-
   area networks.

   This section gives an overview and evaluation from the perspective of
   the authors of this memo of currently known non-standardized
   proposals for per-hop-stateless forwarding with the explicit goal
   and/or possibility of bounded latency forwarding and in relationship.
   to the concerns and desires described in the previous sections.

7.1.  Packet tagging based CQF

   To overcome the challenges outlined in Section 6.3,
   [I-D.qiang-DetNet-large-scale-DetNet] and
   [I-D.dang-queuing-with-multiple-cyclic-buffers] (tagged-CQF) propose
   a modified [CQF] mechanism in which timestamp based cycle indication
   of [CQF] is replaced by indicating the senders cycle in an
   appropriate packet header field, so that the receiver can accordingly
   map the received packet to the right local cycle.

   This approach completely eliminates the link-latency as a factor
   impacting the effectiveness of the mechanism, because in this
   approach, the link latency does not impact the DT.  Instead the link
   latency is used to calculate which cycle from the sender needs to be
   mapped to which cycle on the receiver, and this is programmd during
   setup of links into the receiving routers cycle mapping table.

   Depending on the number of cycles configured, it is also possible to
   compensate for variability in the link-latency and higher MTIE
   (picture TBD).  If one more cycle is used for example, this would
   allow for MTIE to be the order of one cycle time as opposed to a
   likely target of 1% of cycle time as in [CQF], reducing the required
   PTP clock accuracy by a factor of 100.  This possible reduction in
   required accuracy of operations by appropriate configuration does not
   only cover PTP but also extends into any forwarding operation within
   the nodes, e.g.: it could also reduce the cost of implementation of
   forwarding hardware at higher speeds accordingly.

   In MPLS networks, packet tagged CQF with a small number of cycle tags
   (such as 3 or 4) could easily be realized and standardized by relying
   on E-LSP where 3 or 4 EXP code points would be used to indicate the
   cycle value.  Given how such deterministic bounded latency traffic is
   not subject to congestion control, it also does not require

Eckert & Bryant         Expires January 13, 2022               [Page 22]

Internet-Draft          bounded-latency-problems               July 2021

   additional ECN EXP code points, so those would be available for e.g.:
   best-effort traffic that should use the same E-LSP.

7.2.  Packet tagging based CQF with SR

   [I-D.chen-DetNet-sr-based-bounded-latency] applies the taged-CQF
   mechanisms to Segment Routing (SR) by proposing SR style header
   elements to indicate the per-segment/hop cycle.  This eliminates the
   need to set up on every hop a cycle mapping table.

   It is unclear to the authors of this memo how big a saving this is
   given how the PCE would need to update all the ingress router per-
   flow configurations where header imposition happens when links
   change, whereas the mapping table approach would require only
   localized changes on the affected routers.

7.3.  Per-hop latency indications for Segment Routing

   [I-D.stein-srtsn] describes a mechanism in which a source-routed
   header in the spirit of a Segment Routing (SR) header can be used to
   enable a per-transit-hop per-flow stateless latency control.  For
   every hop, a maximum latency is specified.  The draft outlines a
   control plane which similarly to packet tagging based CQF or
   [TSN-ATS] would put the work of admitting flows, determining their
   paths and admitting their resources along those paths to some form of

   The basic principle of forwarding in this proposal is to put received
   packets into a priority heap and schedule them in order of their
   urgency (shortest latency) for this hop.

   The draft explicitly does not prescribes specific algorithms on the
   forwarders to take the indicated latency for the hop into account in
   a way that the controller can calculate the resource availability,
   such as specific queuing or scheduling algorithms.

   It is not entirely clear to the authors of this memo, if the sole
   indication of such deadline latencies is sufficient to completely
   eliminate per-transit-hop, per-flow state and still achieve
   deterministic latency because of the [UBS] work.  Consider that the
   packets latency for a hop could be used to derive a priority queue on
   the hop relative to other packets with higher or lower latency for
   this hop,

   As was shown in the research work leading up to [TSN-ATS], the
   priority queuing on each hop alone is not sufficient to achieve a
   simple, solely per-hop calculated latency bound under high load
   because of the problem of multi-hop burst aggregation and the

Eckert & Bryant         Expires January 13, 2022               [Page 23]

Internet-Draft          bounded-latency-problems               July 2021

   resulting hard to calculate incurred upper latency bound.  To
   overcome that calculation issue, shapers or as in [TSN-ATS] their
   optimization, interleaved regulators, are used in [TSN-ATS] and GS.
   Shapers/interleaved requires to maintain across packets from the same
   flow per-flow state.

   Nevertheless, appropriate mathematical models for SDN controllers may
   be possible to develop deterministic per-hop forwarding models
   relying not only on the per-hop indicated latency but also on
   additional constraints such as limited number of hops or sufficiently
   low degrees of maximum admitted amount of traffic.  Or else this may
   be used for to be developed latency models that are not 100%
   deterministic, but close enough in probability such that the amount
   of late packets would be in the same order as otherwise unavoidable
   problems such as BER based packet loss.

   To that end, the author of [I-D.stein-srtsn] has conducted
   simulations of the proposed mechanism, contrasting it with other
   mechanisms.  These results, which will be published elsewhere, show
   hat this mechanism excels in cases with high load and a small number
   of flows with tight budgets.  However, some small percentage of
   packets will miss their end-to-end latency bounds, and must be
   treated as lost packets.

   Depending on the algorithms chosen, solutions may or may not rely on
   strong, weak, or no clock synchronization across nodes.

7.4.  Latency Based Forwarding

   "High-Precision Latency Forwarding over Packet-Programmable
   Networks", NOMS 2020 conference [LBF] describes a framework for per-
   transit-hop, per-flow stateless forwarding based on three packet
   parameters: The minimum and maximum desired end-to-end latency, set
   by the sender and not changed by the network, and the experienced
   latency updated by every hop.  Routers supporting this LBF mechanism
   do also extend their routing (e.g.: IGP) to be able to calculate the
   non-queueing latency towards the destination.  Based on the in-packet
   parameters and the future latency prediction are used to prioritize
   packets in queuing including giving them higher priority when they
   are late due to prior hop incurred latency, or delaying them when
   they are too early.

   LBF was started as a more fundamental research into how application
   experience could be improved when they are allowed to indicate such
   differential min/max latency Service Level Objectives (SLO).
   Benefits include the ability to compensate for prior hop incurred
   queuing latency, but also to automatically prioritize packets on a

Eckert & Bryant         Expires January 13, 2022               [Page 24]

Internet-Draft          bounded-latency-problems               July 2021

   single hop based on their future path length, all without the need
   for any explicit admission control.

   The LBF algorithm is completely without need for clock
   synchronization across nodes.  Instead, it assumes mechanisms to know
   or learn link latency and the remaining latencies (as defined in the
   DetNet architecture) can be calculated locally (e.g.: latency through
   a forwarder).

   The authors have not yet tried to define a mathematical model that
   would allow to derive completely deterministic behavior for this
   original LBF algorithm in conjunction with a PCE/SDN controller.  Due
   to the absence of per-flow (shaper/interleaved-regulator), the
   authors believe that deterministic solutions would as outlined above
   for SRTSN (Section 7.3) likely only be possible under additional
   assumed constraints.

8.  Conclusions

   Bounded Latency for DetNet have been designed by trying to adopt
   solutions developed either several decades ago (GS) or recently for
   limited scope and scale L2 networks [TSN-ATS].

   To allow DetNet solutions to explore opportunities in larger speed &
   scale shared network infrastructures, both private and service
   provider networks, it is highly desirable for DetNet WG (and/or other
   IETF WGs claiming responsibility in conjunction with DetNet as the
   driver) to explore the opportunities to standardize additional, and
   in the opinion of the authors better per-hop forwarding models in
   support of (near) deterministic bounded latency by mean of
   standardizing per-flow stateless/"DiffServ" style per-hop forwarding
   behavior (PHB) with appropriate network packet header parameters.

9.  Security Considerations

   This document has no security considerations (yet?).

10.  IANA Considerations

   This document has no IANA considerations.

11.  Acknowledgements

   Thanks for Yaakov Stein for reviewing and proposing text for
   Section 7.3.

Eckert & Bryant         Expires January 13, 2022               [Page 25]

Internet-Draft          bounded-latency-problems               July 2021

12.  Informative References

   [CQF]      IEEE Time-Sensitive Networking (TSN) Task Group., "IEEE
              Std 802.1Qch-2017: IEEE Standard for Local and
              Metropolitan Area Networks -- Bridges and Bridged Networks
              -- Amendment 29: Cyclic Queuing and Forwarding", 2017.

   [DNBL]     Finn, N., Boudec, J. L., Mohammadpour, E., Zhang, J.,
              Varga, B., and J. Farkas, "DetNet Bounded Latency", draft-
              ietf-detnet-bounded-latency-06 (work in progress), May

              Chen, M., Geng, X., and Z. Li, "Segment Routing (SR) Based
              Bounded Latency", draft-chen-DetNet-sr-based-bounded-
              latency-01 (work in progress), May 2019.

              Liu, B. and J. Dang, "A Queuing Mechanism with Multiple
              Cyclic Buffers", draft-dang-queuing-with-multiple-cyclic-
              buffers-00 (work in progress), February 2021.

              Eckert, T., Cauchie, G., and M. Menth, "Tree Engineering
              for Bit Index Explicit Replication (BIER-TE)", draft-ietf-
              bier-te-arch-10 (work in progress), July 2021.

              Qiang, L., Geng, X., Liu, B., Eckert, T., Geng, L., and G.
              Li, "Large-Scale Deterministic IP Network", draft-qiang-
              DetNet-large-scale-DetNet-05 (work in progress), September

              Stein, Y. (., "Segment Routed Time Sensitive Networking",
              draft-stein-srtsn-00 (work in progress), February 2021.

   [LBF]      Clemm, A. and T. Eckert, "High-Precision Latency
              Forwarding over Packet-Programmable Networks", IEEE 2020
              IEEE/IFIP Network Operations and Management Symposium
              (NOMS 2020), doi 10.1109/NOMS47738.2020.9110431, April

   [RFC2205]  Braden, R., Ed., Zhang, L., Berson, S., Herzog, S., and S.
              Jamin, "Resource ReSerVation Protocol (RSVP) -- Version 1
              Functional Specification", RFC 2205, DOI 10.17487/RFC2205,
              September 1997, <https://www.rfc-editor.org/info/rfc2205>.

Eckert & Bryant         Expires January 13, 2022               [Page 26]

Internet-Draft          bounded-latency-problems               July 2021

   [RFC2210]  Wroclawski, J., "The Use of RSVP with IETF Integrated
              Services", RFC 2210, DOI 10.17487/RFC2210, September 1997,

   [RFC2211]  Wroclawski, J., "Specification of the Controlled-Load
              Network Element Service", RFC 2211, DOI 10.17487/RFC2211,
              September 1997, <https://www.rfc-editor.org/info/rfc2211>.

   [RFC2212]  Shenker, S., Partridge, C., and R. Guerin, "Specification
              of Guaranteed Quality of Service", RFC 2212,
              DOI 10.17487/RFC2212, September 1997,

   [RFC2475]  Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z.,
              and W. Weiss, "An Architecture for Differentiated
              Services", RFC 2475, DOI 10.17487/RFC2475, December 1998,

   [RFC3031]  Rosen, E., Viswanathan, A., and R. Callon, "Multiprotocol
              Label Switching Architecture", RFC 3031,
              DOI 10.17487/RFC3031, January 2001,

   [RFC3209]  Awduche, D., Berger, L., Gan, D., Li, T., Srinivasan, V.,
              and G. Swallow, "RSVP-TE: Extensions to RSVP for LSP
              Tunnels", RFC 3209, DOI 10.17487/RFC3209, December 2001,

   [RFC6513]  Rosen, E., Ed. and R. Aggarwal, Ed., "Multicast in MPLS/
              BGP IP VPNs", RFC 6513, DOI 10.17487/RFC6513, February
              2012, <https://www.rfc-editor.org/info/rfc6513>.

   [RFC8279]  Wijnands, IJ., Ed., Rosen, E., Ed., Dolganow, A.,
              Przygienda, T., and S. Aldrin, "Multicast Using Bit Index
              Explicit Replication (BIER)", RFC 8279,
              DOI 10.17487/RFC8279, November 2017,

   [RFC8296]  Wijnands, IJ., Ed., Rosen, E., Ed., Dolganow, A.,
              Tantsura, J., Aldrin, S., and I. Meilik, "Encapsulation
              for Bit Index Explicit Replication (BIER) in MPLS and Non-
              MPLS Networks", RFC 8296, DOI 10.17487/RFC8296, January
              2018, <https://www.rfc-editor.org/info/rfc8296>.

   [RFC8402]  Filsfils, C., Ed., Previdi, S., Ed., Ginsberg, L.,
              Decraene, B., Litkowski, S., and R. Shakir, "Segment
              Routing Architecture", RFC 8402, DOI 10.17487/RFC8402,
              July 2018, <https://www.rfc-editor.org/info/rfc8402>.

Eckert & Bryant         Expires January 13, 2022               [Page 27]

Internet-Draft          bounded-latency-problems               July 2021

   [RFC8570]  Ginsberg, L., Ed., Previdi, S., Ed., Giacalone, S., Ward,
              D., Drake, J., and Q. Wu, "IS-IS Traffic Engineering (TE)
              Metric Extensions", RFC 8570, DOI 10.17487/RFC8570, March
              2019, <https://www.rfc-editor.org/info/rfc8570>.

   [RFC8578]  Grossman, E., Ed., "Deterministic Networking Use Cases",
              RFC 8578, DOI 10.17487/RFC8578, May 2019,

   [RFC8655]  Finn, N., Thubert, P., Varga, B., and J. Farkas,
              "Deterministic Networking Architecture", RFC 8655,
              DOI 10.17487/RFC8655, October 2019,

   [RFC8660]  Bashandy, A., Ed., Filsfils, C., Ed., Previdi, S.,
              Decraene, B., Litkowski, S., and R. Shakir, "Segment
              Routing with the MPLS Data Plane", RFC 8660,
              DOI 10.17487/RFC8660, December 2019,

   [RFC8964]  Varga, B., Ed., Farkas, J., Berger, L., Malis, A., Bryant,
              S., and J. Korhonen, "Deterministic Networking (DetNet)
              Data Plane: MPLS", RFC 8964, DOI 10.17487/RFC8964, January
              2021, <https://www.rfc-editor.org/info/rfc8964>.

   [RFC8986]  Filsfils, C., Ed., Camarillo, P., Ed., Leddy, J., Voyer,
              D., Matsushima, S., and Z. Li, "Segment Routing over IPv6
              (SRv6) Network Programming", RFC 8986,
              DOI 10.17487/RFC8986, February 2021,

   [TSN-ATS]  Specht, J., "P802.1Qcr - Bridges and Bridged Networks
              Amendment: Asynchronous Traffic Shaping", IEEE , July
              2020, <https://1.ieee802.org/tsn/802-1qcr/>.

   [UBS]      Specht, J. and S. Samii, "Urgency-Based Scheduler for
              Time-Sensitive Switched Ethernet Networks", IEEE 28th
              Euromicro Conference on Real-Time Systems (ECRTS), 2016.

Authors' Addresses

   Toerless Eckert
   Futurewei Technologies USA
   2220 Central Expressway
   Santa Clara  CA 95050

   Email: tte@cs.fau.de

Eckert & Bryant         Expires January 13, 2022               [Page 28]

Internet-Draft          bounded-latency-problems               July 2021

   Stewart Bryant
   Stewart Bryant Ltd

   Email: sb@stewartbryant.com

Eckert & Bryant         Expires January 13, 2022               [Page 29]