Traffic Engineering Working Group                       Kireeti Kompella
Internet Draft                                          Juniper Networks
Expiration Date:  January 2002


              Bandwidth Accounting for Traffic Engineering

                   draft-kompella-tewg-bw-acct-00.txt


1. Status of this Memo

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC2026 except that the right to
   produce derivative works is not granted.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as ``work in progress.''

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.


2. Abstract

   This document examines bandwidth representations for Traffic
   Engineering.  It compares the current representation, two schemes
   that are under consideration, and finally proposes yet another
   approach.  The primary application is Traffic Engineering for
   Differentiated Services.


2.1. ID Summary

   SUMMARY

   This document examines bandwidth representations for Traffic
   Engineering.  It compares the current representation, two schemes



Kompella, Kireeti                                               [Page 1]


Internet Draft     draft-kompella-tewg-bw-acct-00.txt          July 2001


   that are under consideration, and finally proposes yet another
   approach.  The primary application is Traffic Engineering for
   Differentiated Services.


   RELATED DOCUMENTS

   draft-boyle-tewg-ds-nop-00.txt
   draft-ietf-mpls-diff-te-reqts-00.txt
   draft-ietf-tewg-diff-te-reqts-01.txt


   WHERE DOES IT FIT IN THE PICTURE OF THE SUB-IP WORK

   Belongs in TE WG.


   WHY IS IT TARGETED AT THIS WG

   This document describes Traffic Engineering parameters and their use.


   JUSTIFICATION

   There is ongoing work in the TE WG to specify new bandwidth
   parameters for DiffServ TE.  This document proposes a possible
   alternative approach to this problem.


3. Introduction

   This document examines bandwidth representations for Traffic
   Engineering.  Section 4 describes the current representation in
   [ISIS-TE] and [OSPF-TE].  Section 5 describes modification of the
   current representation suggested in [NOP].  Section 6 describes the
   DiffServ TE work [DS-TE].  Finally, Section 7 proposes a new
   bandwidth scheme.

   In what follows, it is assumed that there is a "Traffic Engineering
   Database" (TED) consisting of the topology of a network, i.e., the
   nodes and links in the network, and their attributes.  The goal is to
   compute a path for a traffic flow (variously called aggregated flow,
   traffic trunk and traffic engineering tunnel) of given bandwidth and
   priority from a source node to a destination node; the focus is
   limited to aspects of this computation related to bandwidth.  The
   questions being examined are what bandwidth attributes a link in the
   TED should have, and how these are used in computation.




Kompella, Kireeti                                               [Page 2]


Internet Draft     draft-kompella-tewg-bw-acct-00.txt          July 2001


3.1. Terms

   A link has a "natural" bandwidth, the rate of data flow across the
   link.  For example, an OC-12 link has a bandwidth of 622Mbps.

   A flow has a bandwidth requirement B and a priority p.  The semantics
   of the bandwidth requirement is outside the scope of this document;
   the relevant issue is whether the flow fits on a link (admission
   control).

   A link may be allowed to be "oversubscribed".  That is, the total
   bandwidth of flows across the link may exceed the link's bandwidth.
   It is assumed that the bandwidth of a single flow may not exceed the
   link's bandwidth.

   Priorities range from 0 (highest) to 7 (lowest).  A flow F of higher
   priority may preempt a flow G of lower priority on a link if both
   flows cannot be accommodated on the link.  Preemption in this context
   means that the flow G is terminated; to reestablish G, its path will
   have to be recomputed, and the flow set up over a different set of
   links; in so doing, G may preempt other flows.  This potential
   cascading preemption may lead to (temporary) network instability, and
   thus is not always desirable; this issue is discussed below.


3.2. Operation

   Admission control is the test whether a flow "fits" in a link.

   If a flow is set up over a given link, the unreserved bandwidth of
   the link will change.  Of interest here is how it changes, and
   whether the computing agent has enough information to predict this
   change.  Similarly, when a flow is terminated (for example, by
   preemption), the unreserved bandwidth on the links it was using
   changes; again, it is of interest to be able to determine these
   changes.

   For each scheme below, we attempt to describe the admission control
   criteria and the bandwidth update algorithm.












Kompella, Kireeti                                               [Page 3]


Internet Draft     draft-kompella-tewg-bw-acct-00.txt          July 2001


4. The Current Approach

   The current scheme for bandwidth representation associates with a
   link a link bandwidth, a maximum reservable bandwidth and unreserved
   bandwidth at eight priority levels.  The link bandwidth is set to the
   speed of the link; so for example, an OC-12 link would have a link
   bandwidth of 622Mbps; this quantity is static.  The maximum
   reservable bandwidth is the allowable total bandwidth of flows
   passing through this link; if oversubscription is allowed, this total
   may be larger than the link bandwidth.  This quantity is also
   relatively static (it is configured, and changes only if the
   configuration changes).  So, for example, a OC-12 link that is
   allowed to be oversubscribed by 150% will have a maximum reservable
   bandwidth of 1555Mbps.

   Unreserved bandwidth on the other hand is dynamic, i.e., as flows are
   set up and torn down, the unreserved bandwidth of a link may change.
   The initial value of unreserved bandwidth at each priority is set to
   the maximum reservable bandwidth; thus, unreserved bandwidth may be
   larger than the link bandwidth.

   Denote a link's link bandwidth by lbw, its maximum reservable
   bandwidth by mrb and its unreserved bandwidth at priority p by
   unres[p].


4.1. Admission Control

   Say one wants to compute a path for a flow F with bandwidth B and
   setup priority p.  F can be set up over a link L only if L's link
   bandwidth is at least B and L's unreserved bandwidth at priority p is
   at least B, i.e.,
                         B <= lbw & B <= unres[p]

   If F is set up over L, then L's unreserved bandwidth changes as
   follows:
                     unres[q] := unres[q] - B, q >= p

   If unres[q] < 0 as a result, this is an indication that one or more
   flows need to be preempted to make room for F.

   If flow F is terminated for any reason, including preemption, then
   for each link L over which F was established, L's unreserved
   bandwidth changes as follows:
                     unres[q] := unres[q] + B, q >= p

   As an aside, the above rules imply the following invariant:
                   unres[p] >= unres[q], p <= q     (*)



Kompella, Kireeti                                               [Page 4]


Internet Draft     draft-kompella-tewg-bw-acct-00.txt          July 2001


   Note that this invariant is a consequence of the fact that the
   initial values of unreserved bandwidth are identical, and of the
   procedures that update bandwidth values.  This is not a requirement
   of any specification or protocol.


4.2. Analysis

   This scheme has the advantage of simplicity; also, it is deployed and
   currently working in many networks.  The drawbacks are that each
   priority gets the same initial value and the same oversubscription
   factor; thus, this scheme is insufficient to address the requirements
   of Differentiated Services.

   Note: if preemption is not allowed, there is no value whatsoever in
   having 8 bandwidth values; priorities become irrelevant.  In this
   case, a single value of unreserved bandwidth suffices, and admission
   control reduces to:
                           B <= lbw & B <= unres


5. The "No-op" Scheme

   The "No-op" scheme [NOP] is an attempt to satisfy the requirements of
   DiffServ Traffic Engineering with no changes to topology information
   distributed by routing protocols, and minimal changes in admission
   control procedures.

   The No-op scheme is simple: instead of fixing the initial value of
   unreserved bandwidth at each priority level as the maximum reservable
   bandwidth, allow this to be configured per priority (per link).


5.1. Admission Control

   Admission control in the No-op scheme is the same as for the current
   scheme, i.e., for a flow at priority p with bandwidth B:
                         B <= lbw & B <= unres[p]

   Bandwidth update procedures are not specified.


5.2. Analysis

   Only the node that owns a given link can say what the new values of
   unreserved bandwidth are as flows are established/torn down.  This is
   a serious drawback: all other nodes in the network have to wait for
   the new information to be flooded before they can make meaningful new



Kompella, Kireeti                                               [Page 5]


Internet Draft     draft-kompella-tewg-bw-acct-00.txt          July 2001


   computations.  This time can range from a few seconds to several
   minutes, as it is strongly suggested that bandwidth updates be rate-
   limited.

   As an example, consider an OC-12 link L owned by node A, and assume
   no oversubscription.  Say that node Z is considering link L as a
   candidate for a flow F with priority p and bandwidth B.

   Case 1: Suppose that the initial values of unres[p] is set to 622Mbps
   for all p.  Suppose further that a 222Mbps flow at priority 0 and a
   200Mbps flow at priority 7 have already been established.  The values
   of unres[0] and unres[7] seen by Z will then be 400Mbps and 200Mbps
   respectively.

   Case 2: Suppose that the initial value of unres[0] is set to 400Mbps,
   and of unres[7] is set to 200Mbps.  The values of unres[0] and
   unres[7] seen by Z will again be 400Mbps and 200Mbps respectively.

   If Z wants to set up a 100Mbps flow over this link, it certainly
   fits.  However, in case 1, the values of unres[0] and unres[7] after
   this flow is established are 300Mbps and 100Mbps respectively.  In
   case 2, the new values are 300Mbps and 200Mbps respectively.  If Z
   chooses link L for the flow, and sets it up, Z cannot predict a
   priori which of these values will be the new values, and must wait
   for A's update to know for certain.  Such knowledge is required
   before Z can compute paths for new flows.

   Another drawback is that Z cannot predict whether setting up a given
   flow will cause preemption or not.  In the above example, say that Z
   wants to set up a 300Mbps flow.  In case 1, this will require that
   the priority 7 flow be preempted; in case 2, this flow fits without
   preemption.  If Z doesn't want to cause preemption, Z cannot tell
   whether it should use link L or not.


6. Per-Class-Type Bandwidth Pools

   The scheme in [DS-TE] defines the notion of "Class Type".  A class
   type is a set of DiffServ classes with similar DiffServ
   characteristics.  Each class type has unreserved bandwidth at 8
   priorities, analogous to the current scheme; in fact, the current
   scheme is defined to be bandwidths for "class type 0".

   For each pool, the maximum reservable bandwidth is configured (but
   not advertised); the initial values of unreserved bandwidth for a
   pool are set to the pool's maximum reservable bandwidth.  This means
   that the invariant (*) is valid within a pool.




Kompella, Kireeti                                               [Page 6]


Internet Draft     draft-kompella-tewg-bw-acct-00.txt          July 2001


6.1. Admission Control

   A flow is admissible on a link iff it meets two constraints: the flow
   fits within the class-type bound; and the flow meets the aggregate
   bandwidth for the link.

   The former is the same as the admission control for the current
   scheme, except that the appropriate class type should be used.  The
   latter is requires that the aggregate unreserved bandwidth be known.

   However, the actual approach suggested by [DS-TE] is to fold in the
   two constraints into the unreserved bandwidth values for each class
   type.  This is straightforward to compute for the node owning a link;
   however, as we will see below, this is impossible to predict when
   viewing the link from afar.

   The scheme also allows for compression of advertised values of
   unreserved bandwidths.  This is useful, since each class type has 8
   bandwidths, so the number of bandwidths advertised can be quite
   large.


6.2. Analysis

   The first observation is that the notion of class type is a direct
   consequence of the scaling issues with this approach.  Ideally, one
   would talk in terms of classes; however, if having 6 classes means
   having 6 bandwidth pools of 8 priorities each, this is clearly not
   tenable.  However, going from classes to class types is a definite
   compromise.

   The second is that all members of a class type share the same
   characteristics, i.e., they have the same initial unreserved
   bandwidth.  If a class type consists of AF1 and AF2, it is clearly
   preferable to be able to have independent maximum reservable
   bandwidths for each, rather than a single common value.

   The third is that the notion of oversubscription has not been
   addressed at all.  At the very least, each class type should have
   independent oversubscription factors; at best, each class should.
   For example, one may want to oversubscribe best effort traffic by
   150%, but voice traffic by only 25%.  One may go further and say that
   gold voice should be oversubscribed by only 10% whereas regular voice
   by 25%.

   Fourthly, as mentioned above, the scheme does not allow a computing
   node to predict the new values of bandwidths if a flow is actually
   set up; instead, the node has to wait for this information to be



Kompella, Kireeti                                               [Page 7]


Internet Draft     draft-kompella-tewg-bw-acct-00.txt          July 2001


   flooded.

   Fifthly, it is not clear that cross-class preemption is desirable; a
   clearer understanding of the consequences, and an evaluation by those
   who wish to deploy DiffServ TE should be sought.

   Finally, there are proposals to advertise backup bandwidths for the
   purpose of setting up backup flows.  If there are 4 class types with
   8 bandwidths, and each needs to be backed up, there is a definite
   concern about the scalability of this approach.


7. Per-Class Bandwidth Accounting

   The scheme proposed in this document offers per-class bandwidth
   accounting.  This means per class maximum bandwidth, as well as per-
   class oversubscription.

   The mapping of priority to class is configured by the network
   administrator.  This mapping is irrelevant to protocols and to the
   TED; neither path computation nor admission control care about Class
   Types or classes.


7.1. Advertisements

   Bandwidth advertisements work as follows: each link has a "link
   bandwidth", and N priority levels (N = 8 is the current norm).  Each
   priority has a normalized (see next section) reserved bandwidth, a
   normalized maximum bandwidth, and a subscription factor.  Each
   priority corresponds to exactly one Diff Serv class; more than one
   priority can map to the same class.  For example:

       priority    grade
              0    premium voice
              1    premium assured
              2    standard voice
              3    standard assured
              4    gold data
              5    silver data
              6    bronze data
              7    best effort

   <Link bandwidth> is the natural link bandwidth.  <normalized reserved
   bandwidth> is the reserved bandwidth at each priority level.
   <normalized maximum bandwidth> is the maximum bandwidth allowed at
   each priority.  <subscription> is the subscription factor at each
   priority level.  This advertisement deprecates current bandwidth



Kompella, Kireeti                                               [Page 8]


Internet Draft     draft-kompella-tewg-bw-acct-00.txt          July 2001


   advertisements, i.e., maximum link bandwidth, maximum reservable
   bandwidth and unreserved bandwidth.

   Normalized maximum bandwidth for each priority level is set by
   configuration and must be at most link bandwidth; the default is the
   link bandwidth.  The subscription factor is given in percentage and
   is also set by configuration, and must be >= 100; the default is
   100%.  A subscription factor > 100 for priority p means that priority
   p can be over-subscribed by (subscr[p]-100) %.  Normalized reserved
   bandwidth is initially 0, and is the only field that changes
   dynamically, i.e., as flows come and go.  Normalized reserved
   bandwidth must be <= normalized maximum bandwidth.

   So, a 100Mbps link could have 10Mbps/110% maximum bandwidth and
   subscription factor for priority 0, 15Mbps/125% for priority 1,
   25Mbps/150% for priority 2, 50Mbps/150% for priorities 3-6 and
   100Mbps/400% for priority 7.


7.2. Normalization

   The notion of normalized bandwidth is a simple way of dealing with
   bandwidths at different oversubscription levels.  Suppose a flow
   requires 10Mbps at a priority level that has 50% oversubscription,
   and another flow requires 40Mbps at a priority level that has 150%
   oversubscription.  Adding the bandwidths (10+40 = 50) clearly doesn't
   make sense.  On the other hand, for some model of oversubscription,
   normalizing the bandwidths before adding (or otherwise combining)
   them has some merit.  Clearly, this sacrifices accuracy for
   simplicity; a more accurate mechanism for adding bandwidths (even at
   the same oversubscription levels) would involve adding probability
   functions.

   Definition: for a bandwidth B at priority p, the "normalized"
   bandwidth B' is given by
                         B' = ((B*100)/subscr[p]).

   For a link, denote by reserved[p] the normalized reserved bandwidth
   at priority p; denote by max[p] the normalized maximum bandwidth at
   priority p; denote by subscr[p] the subscription factor for priority
   p.

   Definition: the unreserved bandwidth at priority p is given by:
       unreserved[p] = (link bandwidth - (sum reserved[q], q <= p))

   Note that unreserved[] is computed at each node -- it is not flooded.





Kompella, Kireeti                                               [Page 9]


Internet Draft     draft-kompella-tewg-bw-acct-00.txt          July 2001


7.3. Admission Control

   A flow of bandwidth B and priority p is admissible on a link iff all
   of the following are true:
   a) B  <= max[p] for the link    # flow must fit in priority class
   b) B' <= unreserved[p]          # normalized flow must fit in
                                   # unreserved bandwidth at prio p
   c) B' <= max[p] - res[p]        # normalized flow must fit within prio p

   So, for example, suppose priority 1 has max = 50Mbps with 300%
   subscription.  A flow with bandwidth 145Mbps cannot be admitted even
   though 145 < 50x3.  However, 3 flows with bandwidth 50Mbps can be
   admitted.

   If the flow is accepted, then
       reserved[p] := reserved[p] + B'
       for q >= p
           unreserved[q] := unreserved[q] - B'
           if unreserved[q] < 0, preempt flows at priority q until
               unreserved[q] >= 0

   If a flow of bandwidth B and priority p goes away, then
       reserved[p] := reserved[p] + B'
       unreserved[q] := unreserved[q] - B' for q >= p


7.4. Analysis

   The requirements for DiffServ TE, reduced to simplest terms, can be
   stated as follows: overlay on a single physical topology multiple
   logical topologies (each could represent a DiffServ class); show how
   the topology information can be flooded, and how computation
   (admission control and bandwidth updates) can be done; and show the
   interaction between the various logical topologies.  This document
   does that.

   It is unfortunate that [DS-TE-REQTS] and [TE-FW] speak about Class
   Types rather than classes (thus pre-supposing the solution), and
   ignore oversubscription.  However, the following statement (with
   "class" replacing "class type") sums up the requirements for DiffServ
   TE:

      The fundamental requirement for DS-TE is to be able to enforce
      different bandwidth constraints for different Class Types rather
      than a single one.

   The above scheme may seem fairly complex; however, the procedures for
   admission control and bandwidth updates are very clear and



Kompella, Kireeti                                              [Page 10]


Internet Draft     draft-kompella-tewg-bw-acct-00.txt          July 2001


   straightforward, and readily implementable; the mechanism is
   scalable; and oversubscription is dealt with.  Furthermore, there is
   sufficient information in the TED so that each computing node has an
   accurate picture of the state of the TED after its operations, so
   that a node that needs to do multiple computations can do so without
   waiting for new topology information (with a reasonable chance of
   success).  In the opinion of the author (as an implementor), this
   property is highly desirable, if not an absolute requirement.


8. Security Considerations

   Not discussed.  Not envisioned to be an issue.


9. Acknowledgments

   Many thanks to Quaizar Vohra and Yakov Rekhter for their valuable
   comments and bug fixes.  Any remaining bugs are solely my doing.


10. References

   [DS-TE] "Requirements for support of Diff-Serv-aware MPLS Traffic
   Engineering", Le Faucheur, Francois, Chiu, Angela, Townsend, William,
   Skalecki, Darek, draft-ietf-mpls-diff-te-reqts-00.txt (work in
   progress).

   [DS-TE-REQTS] "Requirements for support of Diff-Serv-aware MPLS
   Traffic Engineering", Le Faucheur, Francois, Nadeau, Thomas, Tatham,
   Martin, Telkamp, Thomas, Cooper, David, Boyle, Jim, Martini, Luca,
   Fang, Luyuan, Lai, Waisum, Ash, Jerry, Hicks, Pete, Chiu, Angela,
   Townsend, William, Skalecki, Darek, draft-ietf-tewg-diff-te-
   reqts-01.txt (work in progress).

   [ISIS-TE] "IS-IS extensions for Traffic Engineering", Li, Tony, Smit,
   Henk, draft-ietf-isis-traffic-03.txt (work in progress).

   [NOP] "Accomplishing Diffserv TE needs with Current Specifications",
   Boyle, Jim, draft-boyle-tewg-ds-nop-00.txt (work in progress).

   [OSPF-TE] "Traffic Engineering Extensions to OSPF", Katz, Dave,
   Yeung, Derek, Kompella, Kireeti, draft-katz-yeung-ospf-traffic-05.txt
   (work in progress).

   [TE-FW] "A Framework for Internet Traffic Engineering", Awduche,
   Daniel, Chiu, Angela, Elwalid, Anwar, Widjaja, Indra, Xiao, XiPeng,
   draft-ietf-tewg-framework-05.txt (work in progress).



Kompella, Kireeti                                              [Page 11]


Internet Draft     draft-kompella-tewg-bw-acct-00.txt          July 2001


11. Author's Address

   Kireeti Kompella
   Juniper Networks, Inc.
   1194 N. Mathilda Ave
   Sunnyvale, CA 94089
   kireeti@juniper.net












































Kompella, Kireeti                                              [Page 12]