TSVWG Working Group
INTERNET-DRAFT                                              L. Westberg
                                                          Z. R. Turanyi
                                                             D. Partain
                                                               A. Bader
                                                               Ericsson

                                                   Georgios Karagiannis
                                        University of Twente / Ericsson

                                                       December 1, 2005




                   Load Control of Real-Time Traffic
                     draft-westberg-loadcntr-04.txt




Status of this Memo

   By submitting this Internet-Draft, each author represents that any
   applicable patent or other IPR claims of which he or she is aware
   have been or will be disclosed, and any of which he or she becomes
   aware will be disclosed, in accordance with Section 6 of BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress".

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on June 1, 2006.







Westberg, et. al.                                               [Page 1]


Internet Draft                                              Load Control


Abstract

   There is an increased interest of simple and scalable resource
   provisioning solution for Diffserv networks. This is an updated
   version of the old draft (previous version was submitted in
   April 2000) describing a concept called Load Control.

   Load Control addresses the following issues:

     1. Admission control for real time data flows in stateless domains

     2. Dropping of flows in case of exceptional events, such as
        severe congestion after re-routing

   Admission control in a stateless domain can be a measurement-based
   access control, whereby a probe packet is sent along the forwarding
   path in a network to determine whether a flow can be admitted based
   upon the current congestion state of the network. If measurement-
   based method is not sufficient a lightweight reservation of a certain
   amount of network resources can be performed.

   Load Control uses two-bit markers in packet headers to carry load
   information from core routers to edge devices. The scheme provides
   the capability of controlling the traffic load in the network
   without requiring signaling or any per-flow processing in the
   core routers. The complexity of Load Control is kept to a minimum
   to make implementation simple.


   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in [RFC2119].


Westberg, et. al.                                               [Page 2]


Internet Draft                                              Load Control


Table of Contents

1 Background and Motivation .......................................    4
2 Overview ........................................................    5
3 Operation of Load Control .......................................    5
3.1 Simple Marking ................................................    6
3.2 Unit-based Reservations .......................................    7
3.3 Multiple Unit reservation .....................................    8
3.4 Codepoints for Flow Types .....................................    9
4 Objects for Standardization .....................................    9
4.1 Packet Types ..................................................    9
4.2 Coding of Packet Types ........................................   10
4.3 Behavior Description ..........................................   10
4.3.1 Behavior of the Core Routers ................................   10
4.3.2 Behavior of the Edge Devices ................................   11
5 Interworking with RSVP/Intserv ..................................   11
6 Security Considerations .........................................   12
7 Identification of Edge Nodes ....................................   12
8 IANA Considerations.............................................    12
9 Informative References..........................................    13
 Appendix A. Admission Precision of Simple Marking ................   14
 Appendix B. Effect of Delays on Admission ........................   14
 Appendix C. A Simple Algorithm for Core Routers ..................   15
 Appendix D. Simulation Results ...................................   16
 D.1 Simple Marking ...............................................   17
 D.1.1 Constant Bit-Rate Sources ..................................   17
 D.1.2 On/Off Sources .............................................   17
 D.1.3 The Router Algorithm .......................................   18
 D.2 Unit-Based Reservations ......................................   18
 Appendix E: Marking using ECN bits ...............................   19

Westberg, et. al.                                               [Page 3]


Internet Draft                                              Load Control

1.  Background and Motivation

   The amount of traffic carried on the Internet is now greater than the
   traffic on the world's telephony network. Still, Internet-based
   communication services generate less income than plain old telephony
   services. Enabling value-added services over the Internet is
   therefore crucial for service providers. One significant class of
   such value-added services requires real-time packet transportation.
   It can be expected that these real-time services will be popular as
   they replicate or are natural extensions of existing communication
   services like telephony.  Exact and reliable resource management
   (e.g., admission control) is essential for achieving high utilization
   in networks with real-time transportation capabilities. The problem
   is difficult mainly due to scalability issues.

   With the introduction of differentiated services (DS) [RFC2475], it
   is now possible to provide large scale, real-time services. The basic
   idea of DiffServ is that, rather than classifying packets at each
   router, packets are only classified at the edge devices.  The result
   - the required packet treatment - is stored and carried in the packet
   headers, and core routers can carry out appropriate scheduling.

   The current definition of DiffServ, however, does not contain any
   simple, scalable solution to the problem of resource provisioning and
   control. A number of approaches to solving the problem already exist
   [Berson97, Guerin97, Stoica99, Bernet99].  The scheme presented in
   this document does not require any state aggregation and aims at
   extreme simplicity and low cost of implementation along with good
   scaling properties. Load control operates edge-to-edge in a DS
   domain, or between two RSVP or NSIS capable routers, where only the
   edge devices keep flow state and do per-flow processing.  The main
   purpose of Load Control is to provide a simple and scalable solution
   to the resource provisioning problem.

   Note that the last version of this draft was submitted in April 2000.
   The original Load Control concept has been developed further to a
   signaling concept named Resource Management in Diffserv.  RMD was
   incorporated by NSIS working group, where the protocol details were
   worked out for using NSIS as external protocol [RMD]. Recently new
   drafts have been submitted aiming to standardize new Diffserv PHB
   and define an architecture providing controlled load services in
   Diffserv domains [CL-Diffs, CL-PHB]. These proposals are very similar
   to the two-bit marking scheme of Load Control. The major differences
   are that in the admission control is based on the marking of the data
   packets, i.e. without sending probe packets, and it proposes marking
   of the ECN field.  This document aims to develop a common framework
   that could be used both with RSVP and NSIS external protocols.

Westberg, et. al.                                               [Page 4]


Internet Draft                                              Load Control

2.  Overview

   Load control is achieved by two actions: admission control of
   incoming requests and the dropping of admitted flows in case of
   exceptional events such as link failures.  Load Control uses
   two-bit markers in the probe packet headers to gather information
   about the load level along various paths through the network.  In
   addition, the core routers are able to mark passing packets to signal
   the exhaustion of resources to the edge devices.

   For admission control, the resource state of core routers is gathered
   by sending a specially marked packet, denoted a "probe" packet, from
   the ingress to the egress edge device.  The probe result is then used
   by the ingress to decide flow acceptance or rejection and to set up
   traffic conditioning/policy.  If rigid admission control is required,
   soft-state based reservations are also supported. In this case the
   probe packet does both the probing and allocation of resources along
   the path. The latter method is comparable to signaling based schemes
   but does not require processing of signaling messages in the core
   routers.

   Under normal circumstances, admission control is enough to control
   the load in the network. Nevertheless, when exceptional events (such
   as link failures) cause too much traffic to be re-routed over a link,
   the resulting severe congestion may degrade the quality of all of the
   flows on the link. In that case, the best solution might be to keep
   existing flows and suffer the loss of quality. However, for some
   services, it may be desirable to drop some of the previously admitted
   flows to protect the quality of the remaining flows. Thus, when
   severe congestion occurs, the core routers mark the headers of all
   (not only probe) packets to notify the edge devices of the congestion
   condition.

   In the following sections, we assume a DS (DiffServ) domain where
   connection requests arrive at the edges of the domain via RSVP, NSIS
   QoS NSLP at the request of a Bandwidth Broker, or by other means.
   The requests may be generated directly at the edge by a gateway,
   which provides connection to other types of networks, or in hosts
   that are connected directly to the domain.


3.  Operation of Load Control

   The Load Control scheme has two modes of operation:

      a) 'Simple marking':  This refers to a measurement-based admission
      scheme where routers measure the traffic volume and base the
      marking on these results.

Westberg, et. al.                                               [Page 5]


Internet Draft                                              Load Control

      b) 'Unit-based reservations':  A "unit" represents a share of
      bandwidth in the network that could be reserved by the edge
      devices.  This mode makes it possible to perform resource
      reservations, independently of the amount of traffic that is
      actually transmitted.

   Both modes can perform admission control of incoming requests and
   indicate exceptional events.

   In the appendices, we present some analysis of Load Control
   properties, but a more detailed investigation can be found in
   [Tur99].


3.1.  Simple Marking

   The idea of simple marking is that core routers measure the traffic,
   and, if they encounter near exhaustion of resources, they mark
   passing probe packets and thereby notify the edge devices of the lack
   of resources.

   The scheme has the following steps of operation:

      1) Resource Probing: Before establishing the flow, the initiating
      edge device sends a probe packet into the network.  The probe
      packet passes through the same routers as the actual traffic will
      pass through (in any case, with a high degree of probability) and
      is exposed to the marking function in each router. The marking
      performs an OR-operation of its own status and the incoming probe
      packet status (a packet once marked must not be changed).  When
      the packet reaches the egress edge device, its header will reflect
      the aggregated resource status along that path.

      2) Send resource status to ingress: When the egress edge device
      receives the probe packet, it copies the marker from the header to
      the header/payload of a reverse packet and sends it back to the
      initiating party (the ingress edge device). The probe packet may
      be discarded, converted to an ordinary data packet, or
      encapsulated (as mentioned above) and sent to the ingress edge
      device. The packet containing the probing result can also serve as
      a probe packet for the reverse path. This allows the initiating
      party to check for bi-directional resources.

      3) Acceptance/Rejection: The report packet is returned to the
      initiating ingress edge device, which uses the result of the probe
      to admit or block the request by setting up appropriate packet
      filtering, measuring, and marking rules.

Westberg, et. al.                                               [Page 6]


Internet Draft                                              Load Control

      4) Reaction to exceptional events: If a core router detects severe
      congestion on an interface, it starts marking the data packets on
      that interface. If the egress edge device receives a marked packet
      which is not a probe packet, this can be interpreted as a sign of
      severe congestion along the path.  The fact that the incoming
      marked packet was not sent as a probe packet can be determined
      from the packet content, by multi-field classification or by
      checking the admittance state at the egress edge device.  If
      severe congestion occurs, a signaling message can be sent to the
      ingress edge device, which can then take the appropriate action,
      e.g. preempting active flows between the corresponding ingress and
      egress edge.

      The marking process of interior routers should be a standard
      method. The simplest solution all packets are marked. If
      the number of marked packets are proportional to the overload or
      proportional to the number of lost packets egress edge will
      receive information about the overload and can terminate a number
      of flows that is necessary to restore normal operation.

   To make the scheme more robust against packet loss, the initiating
   edge device MAY maintain a timer associated with each probe packet.
   If a probe packet is lost, the device simply re-transmits on time-
   out.  How often and how many times the probe packet should be
   retransmitted before failure is declared is an implementation issue,
   but these parameters SHOULD be configurable (e.g., via an SNMP MIB).
   Furthermore, whether probes are retransmitted at all SHOULD be
   configurable.


3.2.  Unit-based Reservations

   While measurement-based admission control has important advantages
   over non-measurement based algorithms, it has disadvantages as well.
   Unit-based reservations allow the sources to keep their reservations
   irrespective of the volume of the traffic they transmit.  Although
   the admission scheme is very similar to the simple marking case, the
   presence of actual reservations is a fundamental difference.

   Each flow can occupy any number of units of resources, and even
   fractions of units by allowing a number of flows to share a common
   resource unit.  The unit is not necessarily a simple bandwidth value:
   it may be defined in terms of any resource unit (e.g., effective
   bandwidth) to support statistical multiplexing at packet level (use
   of silence period). The definition of the unit may vary from network
   to network and is outside the scope of this document.  The basic idea
   of unit-based reservation is to allow the edge devices periodically
   to mark some of the data packets to refresh resource reservation.
   Each refresh packet reserves one unit of resources for one refresh
   period. Reservations are timed out after a refresh period and have to
   be refreshed in a soft state manner.  The length of the refresh
   period must be the same throughout the DS domain and SHOULD be
   configurable.

Westberg, et. al.                                               [Page 7]


Internet Draft                                              Load Control


   Core routers estimate the number of reservations by counting the
   number of refresh packets during a refresh interval. If the router
   runs out of units, it goes into blocking state, starts to mark probe
   packets indicating congestion and thereby rejects new flows.  The
   probe packets that pass the router unmarked and the refresh packets
   reserve one unit of resources for the following refresh period.
   (Editor note:  It is clear that we need to have the capability of
   reserving more than one unit, but it is not yet clear how that will
   be encoded in the packet header.  See below.) Thus, after the probe
   packet has passed along the path unmarked, the ingress edge device is
   required to send the first reservation refresh packet during the next
   refresh period.

   If a flow occupies more than one unit, more than one probe packet may
   be sent to allocate the required number of resources (an alternative
   using only one packet should be defined).  Similarly, more than one
   refresh packet must be sent for such a flow. By proper definition of
   the unit, a wide range of flows can be described and handled using
   this simple mechanism.

   If a probe packet was forwarded unmarked by a core router, but was
   marked later downstream, that core router will not be notified and
   will incorrectly maintain the reservation. However, as the flow is
   rejected, no refresh packets will arrive, and the reservation will
   time out at the end of the refresh period and will be released.

   Severe congestion is handled in the same way as in 'Simple marking'
   (see below).

   If a refresh packet is lost, the downstream routers will
   underestimate the number of reserved units. Refresh and probe packets
   should therefore be protected from losses in the manner described
   above.

   Core routers estimate the number of allocated units by counting the
   number of refresh packets during a refresh period.  The accuracy of
   the estimate can be increased by generating refresh packets evenly
   spread in time over the refresh period. This minimizes errors
   resulting from time alignment differences between routers and edge
   devices.


3.3.  Multiple Unit reservation

   In some cases it might feasible to add functionality for reservation
   of several units in one single reservation request.  A similar
   semantic (as the two-bit reservation scheme) could be used to provide
   such functionality but it will of course require addition of a
   integer value denoting the number of units.

   The coding of such proposal is still under discussion and needs to
   studied further.

Westberg, et. al.                                               [Page 8]


Internet Draft                                              Load Control

3.4.  Codepoints for Flow Types

   In both variants of Load Control, routers making marking decisions
   have very little information about the resource or QoS requirement of
   the flow in question. The DS field of the probe packet can be used to
   indicate the DiffServ class the flow will arrive on and thus the QoS
   requirements.  The marking function of core routers can take the
   required PHB into account when deciding on the marking.

   Information on the resource requirements for incoming flows can also
   be expressed using the DS field by dividing real-time traffic into
   classes based on resource requirements and using different codepoints
   for different classes. If the DSCPs denote not only the PHB that the
   flow is to receive, but implicitly also the bandwidth requirements
   for the flow, core routers will be able to mark packets more
   intelligently, resulting in less resource waste and greater
   flexibility.

   In the unit-based case, the major benefit is that the size of the
   unit can be different in different classes, making it possible to
   allocate resources with finer granularity.


4.  Objects for Standardization

   A forthcoming standard might only include the encoding of the Load
   Control information into the IP header and some design
   recommendations.


4.1.  Packet Types

   We need four types of packets in the algorithm:

      - Ordinary Packet (OP)
      - Probe Packet (PP)
      - Marked Packet (MP)
      - Refresh Packet (RP)

   During transport through the network, a probe packet can be changed
   to a marked packet. This indicates that at least one router does not
   accept the reservation associated by the probe packet.

   ------       Rejection       ------
   | PP |---------------------->| MP |
   ------                       ------

   An ordinary packet can also be changed to a marked packet, meaning
   that some exceptional event caused severe congestion on one link of
   the path the packet took.

Westberg, et. al.                                               [Page 9]


Internet Draft                                              Load Control

   ------  Severe Congestion   ------
   | OP |---------------------->| MP |
   ------                       ------

   In the simple marking scheme, only three packet types are used.
   Refresh packets are treated as ordinary packets, except
   that these packets cannot be changed to marked (MP) packets.


4.2.  Coding of Packet Types

   We have two alternative solutions for storing Load Control related
   information in the packet headers: using new DS codepoints or using
   the two currently unused bits (intended for ECN) in the DS byte.  The
   latter case is only considered in Appendix E.

   In the first alternative (where PHBs are intended to be used together
   for Load Control), two or three new codepoints would have to be
   defined for probe, marked and (optionally) refresh packets. For
   example, in the case of the EF PHB, in addition to the codepoint used
   for the EF packets, EF-probe, EF-marked and EF-refresh packets can
   also be sent. The new codepoints can be drawn from the LU/EXP space.


4.3.  Behavior Description

   The behavior of the edge devices depends greatly on the application
   or signaling protocol that uses the load control scheme. Below we
   only describe the few aspects of the edge device behavior that are
   necessary for inter-working with the core routers.


4.3.1.  Behavior of the Core Routers

   All core routers continuously maintain a state of accepting or
   rejecting more flows.  If the state is accepting, the router passes
   all packets unchanged. If the state is congestion, then the router
   changes the marking of incoming packets from probe (PP) to
   marked (MP).

   If the router is capable of detecting severe congestion, and this
   occurs, then the router forwards both ordinary (OP) and probe packets
   (PP) as (MP) marked.  The router MUST NOT change the marking of
   refresh packets (RP).

   Addition for Unit-based Reservations:

      The router uses the refresh and probe markers in packets to
      maintain its estimation of reserved resources. A refresh packet
      signals previously admitted resource usage, while a probe packet
      signals a new request. When passed unmarked, both types of packets
      reserve one unit for one refresh period.

Westberg, et. al.                                              [Page 10]


Internet Draft                                              Load Control


4.3.2.  Behavior of the Edge Devices

   When a new reservation is needed, the ingress edge device should send
   the appropriate number of packets marked as probe.
   If the egress edge device receives a probe packet that is marked,
   this means that the network has insufficient capacity along the path
   between the two edge devices. The egress edge device should take care
   of blocking the flow by notifying the ingress device.  If the egress
   device receives marked (MP) packets that are not initially sent as
   probe packets, it shall use a policy function in the egress node to
   calculate the number of admitted flows that have to be terminated.
   This information shall be sent towards the ingress device to reject
   these admitted flows. This can be determined from the packet content,
   multi-field classification of the IP header, or by checking the
   admittance state at the egress edge device.

    Addition for Unit-based Reservations:

      For the unit-based reservation scheme, the ingress edge device
      should generate the required number of refresh packets per refresh
      period and per flow. If there are not enough data packets to mark
      as refresh packets, the ingress device must generate dummy packets
      and mark those as refresh packets.  The generated refresh packets
      should be as uniformly distributed through the refresh interval as
      possible to minimize the effect of refresh interval timing between
      routers.


5.  Interworking with RSVP/NSIS Intserv

   Load control can also be used in DiffServ regions (backbones) that
   connect RSVP/Intserv regions. This inter-operation is described in
   detail in [Bernet99]. For load control, border routers of the
   DiffServ region must be RSVP-aware in order to detect the arrival of
   new connections.

   RSVP PATH messages can be used as probe packets to gather congestion
   information along the path between the two border routers. When a new
   RSVP path state is installed at the egress border router, the
   collective admission state of the path (collected in the packet of
   the PATH message) is also stored. If a RESV message for the installed
   state arrives within a time period during which the congestion state
   can be considered valid, then the egress border router can perform
   the admission control for the DiffServ network as well. If the first
   RESV message arrives too late, then the egress border router MUST
   solicit a new (dummy) probe packet from the ingress router to
   determine the current congestion state.

Westberg, et. al.                                              [Page 11]


Internet Draft                                              Load Control

   When the egress receives a marked packet that is not a PATH message
   nor a dummy probe packet, this signals a severe congestion state
   along the path. The identity of the ingress router can easily be
   determined from the path state, but in this case the egress router
   can itself decide to drop certain reservations. The ingress router
   can be notified via ResvTear messages while the receiver end systems
   get ResvErr messages.

   RSVP routers can also be placed inside the domain. In this case,
   probing is performed between RSVP routers instead of edge devices.
   Thus adding a simple and cheap extension to non-RSVP capable routers,
   correct admission control is possible on non-RSVP capable parts of an
   end-to-end path.

   Unit-based reservations can also be used to provide resources in a DS

   domain that is used to provide VPN tunnels between customer sites.
   Using a load control scheme, it is fast and easy to modify the size
   of these tunnels. Thus, tunnel size selection can be a very dynamic
   process. Note that tunnels are not necessarily real-time tunnels.
   Packets of any DSCP can travel on them after receiving the
   appropriate PHB. Even best-effort tunnels can be reserved this way.
   Provisioning can be done on a per-DSCP basis or in aggregates as the
   service provider wishes.

   The inter-working with NSIS is discussed in RMD draft in detailed
   [RMD]. Since QoS-NSLP can be used for sender initiated reservation
   the Reserve message can be used as probe packets.


6.  Security Considerations

   We propose using two-bit markers in packet headers (DS field) to
   reserve resources within a DiffServ domain. This poses similar
   security problems to the use of the DS field to differentiate packets
   in general [RFC2475].

   If the interior of the DS domain fully contains a tunnel, then by
   copying the outer marking into the inner header at de-encapsulation,
   load control can be exercised over the links of the tunnel as well.
   The procedure is similar to the one described in [RFC2481]. As IPSec
   [RFC2402, 2406] does not allow the copying of the DS field from the
   outer to the inner header at de-encapsulation, load control cannot be
   exercised over regions where IPSec tunnels are used.


7.  Identification of Edge Nodes

   In the absense of RSVP, an alternative method for identification of
   edge nodes will be required.  This section needs to be written.


8.  IANA Considerations

Westberg, et. al.                                              [Page 12]


Internet Draft                                              Load Control

To be included in later versions


9.  Informative References

[RFC2406] Kent, S. and R. Atkinson, "IP Encapsulating Security Payload
   (ESP)", RFC 2406, November 1998.

[Bernet99] Bernett, Y., Yavatkar, R., Ford, P., Baker, F., Zhang, L.,
   Speer, M., Braden, R., "Interoperation of RSVP/Intserv and Diffserv
   Networks", Work in Progress, March 1999

[Stoica99] Stoica, I., et al "Per Hop Behaviors Based on Dynamic Packet
   States", Work in Progress, February 1999

[Berson97] Berson, S. and Vincent, R., "Aggregation of Internet
   Integrated Services State", Work in Progress, December 1997.

[Guerin97] Guerin, R., Blake, S. and Herzog, S.,"Aggregating RSVP based
   QoS Requests", Work in Progress, November 1997.

[Gross99] Grossglauser, M., Tse, D. N. C., "A Time-Scale Decomposition
   Approach to Measurement-Based Admission Control", Infocom '99

[Tur99] Z. R. Turanyi, L. Westberg "Load Control: Lightweight
   Provisioning of Internet Resources" submitted to Networking 2000,
   Paris, May 2000, http: //www.ericsson.co.hu/ethzrt/

[IAB-QoS] G. Huston (Internet Architecture Board), "Next Steps for the
   IP QoS Architecture", Work in Progress, March 2000.

[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
   Requirement Levels", BCP 14, RFC2119, March 1997.

[RFC3168] Ramakrishnan, K., Floyd, S., Black, S., "The Addition of
   Explicit Congestion Notification (ECN) to IP", RFC3168, Sept. 2001.

[RMD] Bader, A., et al., "RMD-QOSM: An NSIS QoS Signaling
   Policy Model for Networks, Work in Progress, Oct. 2005

[QoS-NSLP] Manner, J., et al., "NSLP for Quality-of-Service
   Signaling," Work in progress, Oct. 2005

[CL-Diffs]  Briscoe, B. et al. "A Framework for Admission Control over
   DiffServ using Pre-Congestion Notification", draft-briscoe-tsvwg-cl-
   architecture-01.txt, Work in progress, Oct 2005,;

[CL-PHB]  Briscoe, B. et al. "The Controlled Load per hop behaviour"
   draft-briscoe-tsvwg-cl-phb-00.txt, Work in progress, July 2005

[Floyd05] Floyd, S., "Specifying Alternate Semantics for the Explicit
   Congestion Notification (ECN) Field", Internet-draft
   draft-floyd-ecn-alternates-02.txt, work in progress, August 2005.

Westberg, et. al.                                              [Page 13]


Internet Draft                                              Load Control

Appendix A. Admission Precision of MBAC Simple Marking

   Simple marking is basically a measurement-based admission control
   scheme, where flows do not say anything about their traffic
   characteristics. In addition, flow departure is not signaled
   explicitly.

   When the network carries more types of flows with different bandwidth
   requirements, the core routers do not know the bandwidth requirements
   of the incoming flows. They simply declare whether they will accept
   more flows or not irrespective of the bandwidth demands of the new
   flow. Thus the marking algorithm in the routers should conservatively
   always expect the largest type of flow that the network carries and
   start rejecting flows when there is not enough bandwidth left for one
   such flow.  On the positive side, this will result in fair rejection
   among different flow types, but on the negative side, some bandwidth
   will be wasted.  However, if the links of our domain can carry at
   least several hundred requests even from the most bandwidth-demanding
   types of flow, then this is not a significant waste.


Appendix B. Effect of Delays on Admission

   When a probe packet is passed unmarked without correcting the
   estimate of the free resources, we in fact admit a flow without
   immediately reserving resources for it.  The reservation will be
   implicitly done later by the arriving traffic or refresh packets of
   the flow. During the time between admission and the arrival of the
   traffic of the flow, new requests can be admitted without taking the
   previously admitted flow into account.  To illustrate the effects of
   this delay, we took an old and simple Markovian example. Flows are
   identical with an average flow-holding time of 180 seconds and flow
   arrivals and departures follow a Poisson process. Let the link be
   able to carry N calls and let the delay be T. The link starts
   refusing flows when the measured traffic exceeds N-H calls. We can
   say that a space of size H is put aside to cater for the errors
   caused by the delay.

   If the link is properly dimensioned, then the usual blocking ratio
   should not exceed 1%. However, in a mass call situation (such as
   occurs at New Year's Eve for example) it can be considerably higher.
   In this example, 50% blocking was chosen to demonstrate the extreme
   load case. Thus, the offered traffic is roughly twice the link
   capacity.

   QoS violation occurs if during time T the difference between the
   number of arriving and departing flows is larger than H. Under the
   above assumptions, the chance of QoS violation can be calculated.
   Naturally the larger H is, the less the chance is that QoS will be
   violated. The required value of H can be determined for a low value
   of QoS violation probability (e.g.  10e-5).

Westberg, et. al.                                              [Page 14]


Internet Draft                                              Load Control

   The following table presents the value of H as a function of link
   size (N), delay length (T) and load (causing 1% or 50% blocking).

         |    1ms    |   10ms    |   100ms   |   500ms   |     1s    |
         | 1%  | 50% | 1%  | 50% | 1%  | 50% | 1%  | 50% | 1%  | 50% |
   ------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
      50 |  2  |  2  |  2  |  3  |   3 |   4 |   4 |   5 |   5 |   7 |
     100 |  2  |  2  |  3  |  3  |   4 |   4 |   4 |   7 |   6 |   9 |
     500 |  2  |  3  |  3  |  4  |   4 |   7 |   9 |  13 |  12 |  18 |
    1000 |  3  |  3  |  4  |  4  |   5 |   9 |  12 |  18 |  16 |  25 |
    5000 |  3  |  4  |  5  |  7  |  12 |  18 |  24 |  44 |  33 |  69 |
   10000 |  4  |  4  |  7  |  9  |  16 |  25 |  33 |  69 |  47 | 113 |

   The amount of required safety margin is highest for small links,
   since less statistical multiplexing is possible there.


Appendix C. A Simple Algorithm for Core Routers

   In this appendix, we present an algorithm for core routers that use
   unit-based reservations. The algorithm is simple, so it can be easily
   implemented in hardware by simple counters. Its inputs are the
   refresh interval and the number of flows allowed on the link. The
   latter is denoted by <threshold>. (We assume flows with similar
   characteristics (e.g., voice) and that one flow sends one refresh
   packet per refresh interval.) If the network uses more DSCPs for
   real-time traffic, then a separate copy of the algorithm may be run
   for each DSCP, resulting in per-DSCP admission.

   The algorithm counts the number of refresh and admitted probe packets
   in refresh intervals (<count>). The result of the counting is an
   upper limit on the number of units reserved on the link, as some
   reservations may have gone by the end of the refresh interval. The
   value of this counter is used in the next interval to decide on
   admission (<last>). When a new reservation is admitted, this value is
   increased to take the new reservation into account. If this value is
   high above the admission limit, then we start sending severe
   congestion notifications by marking regular packets as well.

Westberg, et. al.                                              [Page 15]


Internet Draft                                              Load Control

      On initialization:
         last = 0
         count = 0

      On arrival of a refresh packet
         count++

      On arrival of a probe packet
         if last < threshold then
            last ++
            count ++
         elseif
            Mark Packet
         endif

      On arrival of a regular packet
         if last < threshold*1.1 then
            Mark Packet
         endif


      At the end of the refresh interval
         last = count
         count = 0


Appendix D. Simulation Results

   The purpose of the simulations described in this appendix is to give
   some insight into the performance of load control. The simulation
   cases are by no means representative, and the scheme may work
   differently in other situations. In section C.1, the simple marking
   case is demonstrated with a purely measurement-based admission
   algorithm by using a single link with both constant bit-rate and
   on/off sources. In appendix C.2, the unit-based reservation method is
   shown, using the algorithm in appendix B.

   Severe congestion signaling is not used in any of the examples; only
   admission control is used.

   We simulated a very simple network of one link. This can be viewed as
   the single bottleneck in the domain. The link had a 2 Mbit/s
   throughput, 50% of which was designated to carry real-time traffic.
   The round trip propagation delay was set to 100ms. The real time
   flows arrived according to a Poisson process, holding time was
   exponential with a 90 second mean. The arrival rate of flows was set
   to produce approximately 50% blocking. Only real-time traffic was
   simulated, so scheduling was simple FIFO.

Westberg, et. al.                                              [Page 16]


Internet Draft                                              Load Control

D.1 Simple Marking based on Measurement Based Admission Control

D.1.1 Constant Bit-Rate Sources

   In the first case, flows emitted 40 byte long packets every 20 ms,
   producing a constant 16 kbit/s load. The 1 Mbit/s capacity assigned
   to this traffic can thus carry 62.5 flows. From the table in appendix
   A, we can see that 4 calls should be reserved in addition to the
   62.5. After an initial transient of 5 minutes, we simulated 2.5
   hours.

   During the 2.5 hour simulation time, utilization was measured over
   5-minute intervals. Utilization was also measured in 20ms slots and
   the percentage of slots in which it was above 1.064 Mbit/s (66.5
   calls) was counted.

      min/avg/max of the utilization was: 881 / 899 / 914 kbit/s
      min/avg/max of the violation ratio was: 98.96% / 99.78% / 100%


D.1.2 On/Off Sources

   In the second simulation case, on/off sources were used. During an
   "off" period, no packets were generated, while in the "on" state the
   behavior is the same as in the previous case: 40 byte long packets 20
   ms apart. The distributions of the on and off periods were both drawn
   from a pareto distribution with the shape parameter of 1.1 and mean
   of 5 seconds. The average bit-rate of the sources is thus 8 kbit/s.
   The flow arrival rate has been doubled to produce50% blocking,
   since the link is capable of carrying nearly twice the number of
   flows. The same set of measurements was carried out as in the
   previous case.

      min/avg/max of the utilization was: 808 / 819 / 837 kbit/s
      min/avg/max of the violation ratio was: 98.98% / 99.40% / 99.70%

   It can be seen that although the measurement-based approach was not
   able to prevent the over-use of the real-time resources in this high
   load case, it is a viable alternative. In no case did the 20 ms
   measurements exceed 1.15 Mbit/s, so the over-use just means a
   temporary steal from the resources provisioned to the lower priority
   traffic.

Westberg, et. al.                                              [Page 17]


Internet Draft                                              Load Control

D.1.3 The Router Algorithm

   The mbac algorithm used by the router is presented here only for the
   completeness of the simulation description. The marking strategy was
   the same for both types of traffic. The router counts the number of
   bytes transmitted in every 20 ms interval and calculates the average
   bit rate in these 20 ms slots. Then it smoothes these values in time
   through an exponentially weighted moving average (ewma) filter. The
   window size of the ewma was set to 9 seconds, i.e., running a unit
   step function through it, the output will be 0.63 after 9 seconds.
   The algorithm also calculated the histogram of the difference between
   the original slot values and the filtered values. The histogram has
   been counted in 1000 bins between the range of -1 and +1 Mbit/s. The
   99% quantile of the histogram was calculated every 100 seconds. The
   router marks all passing packets if the sum of the output of the ewma
   filter and the calculated quantile is greater than 1 Mbit/s. The
   router makes no correction to its measurements when a new flow is a

   Thus, the target violation probability was set to 1%, which was in
   fact fulfilled in the long run.

   On arrival of a new packet, only counters are incremented. Every 20
   ms a new value for the ewma must be calculated, a marking decision
   must be made for the next 20 ms and the value of one bin in the
   histogram must be increased. Every 100 seconds, the 99% quantile
   value must be looked up in the histogram and the histogram must be
   initialized.

   The interested reader can read more about the design rationale of the
   above algorithm in [Gross99].


D.2 Unit-Based Reservations

   In this section we demonstrate the unit-based reservation scheme. The
   routers use the simple algorithm in Appendix B, except that it never
   marks regular packets. The simulation setup is otherwise the same as
   in the previous section. The traffic inside the flows does not affect
   the admission algorithm, so during simulation, sources send only
   probe and refresh packets. The definition of the unit is a peak bit-
   rate of 16 kbit/s. The flow number threshold was set to 62 flows
   resulting in close to the same target utilization of 1Mbits/s as in
   appendix C.1. The length of the refresh period was changed between
   100 ms and 10 seconds. The actual number of flows on the link never
   exceeded 62 (no violation), so only the utilization values are shown
   in kbit/s.

Westberg, et. al.                                              [Page 18]


Internet Draft                                              Load Control

                      | interval | min | avg | max |
                      +----------+-----+-----+-----+
                      |    --    | 968 | 972 | 976 |
                      | 100 ms   | 952 | 954 | 959 |
                      |  1 sec.  | 941 | 946 | 949 |
                      |  2 sec.  | 927 | 933 | 936 |
                      |  4 sec.  | 908 | 913 | 920 |
                      |  7 sec.  | 861 | 870 | 879 |
                      | 10 sec.  | 827 | 837 | 852 |

   The first line shows the utilization value for the case when the
   Source limits itself to 62 flows, i.e., blocking is not done by the
   network, but by the source. This emulates the case when the refresh
   period is infinitely short or when a state approach is used, as in
   RSVP. The utilization is not 100% due to the burstiness of the
   arrivals.

   It can be seen that as the refresh packets becomes less frequent,
   more resources are wasted, as the resources allocated to departing
   flows remain allocated until the end of the next refresh period. The
   result is not only lower average utilization, but lower maximal
   utilization as well. When the refresh period is 10 seconds long, the
   highest utilization experienced was 952 kbit/sec, which is 3 units
   below the limit.

   This motivates the use of as short a refresh period as possible.
   However, too short a refresh period will increase the effects of
   clock differences between edge and core devices (which was not taken
   into account during simulation). It also decreases the chance of
   finding a packet to mark as refresh if the flow is currently
   transmitting below its reserved rate.


Appendix E: Marking using ECN bits

   If the ECN bits were to be used for load control marking, the values
   are encoded in the two unused bits as described below, and the DS
   field contains the PHB.

               DS byte    Load Control
               01234567   codepoint (in ECN)
               -----------------------------
               xxxxxx00   Ordinary
               xxxxxx01   Probe
               xxxxxx10   Marked
               xxxxxx11   Refresh


Westberg, et. al.                                              [Page 19]


Internet Draft                                              Load Control

   The interpretation of the two ECN bits remains unspecified for other
   PHB that do not support Load Control. This is done so as not to
   Interfere with possible ECN deployment [RFC2481], [RFC3168].
   Furthermore, the ECN bits SHOULD be used for load control only
   if the requirements on using alternate semantics on ECN specified in
   [Floyd05] are satisfied. In particular, it SHOULD be ensured that the
   load control (edge-to-edge) ECN semantics do not conflict with a flow
   (connection) that is using other ECN semantics end-to-end.


Authors' Addresses

   Lars Westberg
   Ericsson Research
   Kistagangen 26
   SE-164 80 Stockholm
   Sweden
   EMail: Lars.Westberg@.ericsson.com

   Zoltan R. Turanyi
   Ericsson Research
   Ericsson Hungary Ltd.
   Laborc 1, Budapest, H-1037Hungary
   EMail: Zoltan.Turanyi@ericsson.com

   David Partain
   Ericsson Radio Systems AB
   P.O. Box 1248
   SE-581 12  Linkoping
   Sweden
   EMail: David.Partain@ericsson.com

   Attila Bader
   Ericsson Research
   Ericsson Hungary Ltd.
   Laborc 1, Budapest, H-1037
   Hungary
   EMail: Attila.Bader@ericsson.com

   Georgios Karagiannis
   University of Twente
   P.O.  BOX 217
   7500 AE Enschede, The Netherlands
   EMail: g.karagiannis@ewi.utwente.nl


Westberg, et. al.                                              [Page 20]

Internet Draft                                              Load Control

Copyright (C) The Internet Society (2005).

   This document is subject to the rights, licenses and restrictions
   contained in BCP 78, and except as set forth therein, the authors
   retain all their rights.

   This document and the information contained herein are provided
   on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE
   REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND
   THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES,
   EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT
   THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR
   ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A
   PARTICULAR PURPOSE.

   Disclaimer of validity:

   The IETF takes no position regarding the validity or scope of any
   Intellectual Property Rights or other rights that might be claimed
   to pertain to the implementation or use of the technology
   described in this document or the extent to which any license
   under such rights might or might not be available; nor does it
   represent that it has made any independent effort to identify any
   such rights.  Information on the procedures with respect to rights
   in RFC documents can be found in BCP 78 and BCP 79.

   Copies of IPR disclosures made to the IETF Secretariat and any
   assurances of licenses to be made available, or the result of an
   attempt made to obtain a general license or permission for the use
   of such proprietary rights by implementers or users of this
   specification can be obtained from the IETF on-line IPR repository
   at http://www.ietf.org/ipr.

   The IETF invites any interested party to bring to its attention
   any copyrights, patents or patent applications, or other
   proprietary rights that may cover technology that may be required
   to implement this standard.  Please address the information to the
   IETF at ietf-ipr@ietf.org.