Internet Draft Anna Charny, ed.
Cisco Systems
Fred Baker
Cisco Systems
Jon Bennett
Riverdelta Networks
Kent Benson
Tellabs
Jean-Yves Le Boudec
EPFL
Angela Chiu
AT&T Labs
William Courtney
TRW
Bruce Davie
Cisco Systems
Shahram Davari
PMC-Sierra
Victor Firoiu
Nortel Networks
Charles Kalmanek
AT&T Research
K.K. Ramakrishnan
TeraOptic Networks
Dimitrios Stiliadis
Lucent Technologies
Expires August, 2001
draft-ietf-diffserv-ef-supplemental-00.txt February 2001
Supplemental Information for the New Definition of the EF PHB
Status of this Memo
This document is an Internet Draft and is in full conformance with
all provisions of Section 10 of RFC2026. Internet Drafts are working
documents of the Internet Engineering Task Force (IETF), its Areas, and
its Working Groups. Note that other groups may also distribute
working documents as Internet Drafts.
Internet Drafts are draft documents valid for a maximum of six
months. Internet Drafts may be updated, replaced, or obsoleted by
Anna Charny, ed. INTERNET-DRAFT [Page 1]
Information for the EF PHB Expires: July 1 2001
other documents at any time. It is not appropriate to use Internet
Drafts as reference material or to cite them other than as a
"working draft" or "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
To learn the current status of any Internet-Draft, please check the
"1id-abstracts.txt" listing contained in the Internet-Drafts Shadow
Directories on ftp.ietf.org (US East Coast), nic.nordu.net
(Europe), ftp.isi.edu (US West Coast), or munnari.oz.au (Pacific
Rim).
This document is a product of the Diffserv working group of the
Internet Engineering Task Force. This document was written during the
process of clarification of RFC2598 [10] that led to the publication
of [6]. It is published as part of the historical record of the IETF's
Differentiated Services working group. Please address comments to the
group's mailing list at diffserv@ietf.org, with a copy to the
authors.
Copyright (C) The Internet Society (1999). All Rights
Reserved.
Abstract
This document is intended to supplement [6]. Its primary motivation is
providing additional explanation to the revised EF definition and its
properties. The document also provides additional implementation examples
and gives some guidance for computation of the numerical parameters of the new
definition for several well known schedulers and router architectures.
Contents
1. Introduction .............................................3
2. Definition of EF PHB .....................................3
2.1 The formal definition ....................................4
2.2 The case of an ideal output-buffered device with an
EF FIFO at the output ....................................6
2.3 The General case .........................................7
2.3.1 The colorblind definition ..............................7
2.3.2 Packet reordering with the colorblind definition .......8
2.4 The packet-identity-aware definition .....................8
3 Per Packet delay ...........................................9
3.1 Single hop delay bound ...................................9
3.2 Multi-hop worst case delay ...............................9
Anna Charny, ed. INTERNET-DRAFT [Page 2]
Information for the EF PHB Expires: July 1 2001
4 Packet loss ...............................................10
5 Implementation considerations .............................11
5.1 The output buffered model with aggregate queuing
at the output............................................12
5.1.1 Strict Non-preemptive Priority Queue ..................12
5.1.2 WF2Q ..................................................12
5.1.3 Deficit Round Robin (DRR)..............................12
5.1.4 Start-Time Fair Queuing (SFQ) and Self-Clocked
Fair Queuing ..........................................12
5.2 A Router with Variable Internal Delay and Aggregate
Scheduling at the output...............................12
6 Security Considerations ...................................13
7 References.................................................13
8 Appendix. Difficulties with the RFC 2598 Definition .......14
8.1 Perfectly-Clocked Forwarding.............................15
8.2 Router Internal Delay ...................................15
8.3 Maximum Configurable Rate and Provisioning Efficiency....16
8.4 The Non-trivial Nature of the Difficulties ..............17
9 Authors'addresses .........................................18
10 Full Copyright ...........................................20
Specification of Requirements
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [3].
1 Introduction
The Expedited Forwarding (EF) Per-Hop Behavior (PHB) was designed to
be used to build a low-loss, low-latency, low-jitter, assured
bandwidth service. The potential benefits of this service, and
therefore the EF PHB, are enormous. Because of the great value of
this PHB, it is critical that the forwarding behavior required of
and delivered by an EF-compliant node be specific, quantifiable, and
unambiguous.
Unfortunately, the definition of EF PHB in the original RFC2598 [10] was
not sufficiently precise (see Appendix and [4]). A more precise definition
is given in [6]. This document is intended to aid in the understanding of
the properties of the new definition and provide supplemental information
not included in the text of [6] for sake of brevity.
The document is outlined as follows. In section 2, we briefly restate the
definition for EF PHB of [6]. We then provide some additional discussion
of this definition and describe some of its properties. We discuss the
issues associated with per-packet delay and loss in sections 3 and 4. In
section 5 we discuss the impact of known scheduling architectures on the
critical parameters of the new definition. We also discuss the impact of
deviation of real devices from the ideal output-buffered model on the
magnitude of the critical parameters in the definition.
2 Definition of EF PHB
Anna Charny, ed. INTERNET-DRAFT [Page 3]
Information for the EF PHB Expires: July 1 2001
2.1 The formal definition
An intuitive explanation of the new EF definition is described in
[6]. Here we restate the formal definition from [6] verbatim.
A node that supports EF on an interface I at some configured rate R
MUST satisfy the following equations:
d_j <= f_j + E_a (eq_1)
where f_j is defined iteratively by
f_0 = 0, d_0 = 0
f_j = max(a_j, min(d_j-1, f_j-1)) + l_j/R, for all j > 0 (eq_2)
In this definition:
- d_j is the time that the last bit of the j-th EF packet to
depart actually leaves the node from the interface I.
- f_j is the target departure time for the j-th EF packet to
depart from I, the "ideal" time at or before which the last bit of
that packet should leave the node.
- a_j is the time that the last bit of the j-th EF packet destined
to the output I to arrive actually arrives at the node.
- l_j is the size (bits) of the j-th EF packet to depart from I.
l_j is measured on the IP datagram (IP header plus payload) and
does not include any lower layer (e.g. MAC layer) overhead.
- R is the EF configured rate at output I (in bits/second).
- E_a is the error term for the treatment of the EF aggregate.
Note that E_a represents the worst case deviation between actual
departure time of an EF packet and ideal departure time of the
same packet, i.e. E_a provides an upper bound on (d_j - f_j) for
all j.
- d_0 and f_0 do not refer to a real packet departure but are used
purely for the purposes of the recursion. The time origin should
be chosen such that no EF packets are in the system at time 0.
An EF-compliant node MUST be able to be characterized by the range of
possible R values that it can support on each of its interfaces while
conforming to these equations, and the value of E_a that can be met
on each interface. R may be line rate or less. E_a MAY be specified
as a worst-case value for all possible R values or MAY be expressed
as a function of R.
Anna Charny, ed. INTERNET-DRAFT [Page 4]
Information for the EF PHB Expires: July 1 2001
Note also that, since a node may have multiple inputs and complex
internal scheduling, the jth packet to arrive may not be the jth
packet to depart. It is in this sense that eq_1 and eq_2 are
colorblind with regard to packet identity.
In addition, a node that supports EF on an interface I at some
configured rate R MUST satisfy the following equations:
D_j <= F_j + E_p (eq_3)
where F_j is defined iteratively by
F_0 = 0, D_0 = 0
F_j = max(A_j, min(D_j-1, F_j-1)) + L_j/R, for all j > 0 (eq_4)
In this definition:
- D_j is actual the departure time of the individual EF packet
that arrived at time A_j, i.e., given a packet which was the j-th
EF packet destined for I to arrive at the node via any input, D_j
is the time at which the last bit of that individual packet
actually leaves the node from the interface I.
- F_j is the target departure time for the individual EF packet
which arrived at time A_j.
- A_j is the time that the last bit of the j-th EF packet destined
to the output I to arrive actually arrives at the node.
- L_j is the size (bits) of the j-th EF packet to arrive at the
node that is destined to output I. L_j is measured on the IP
datagram (IP header plus payload) and does not include any lower
layer (e.g. MAC layer) overhead.
- R is the EF configured rate at output I (in bits/second).
- E_p is the error term for the treatment of individual EF
packets. Note that E_p represents the worst case deviation between
actual departure time of an EF packet and ideal departure time of
the same packet, i.e. E_p provides an upper bound on (D_j - F_j)
for all j.
- D_0 and F_0 do not refer to a real packet departure but are used
purely for the purposes of the recursion. The time origin should
be chosen such that no EF packets are in the system at time 0.
It is the fact that D_j and F_j refer to departure times for the jth
packet to arrive that makes eq_3 and eq_4 aware of packet identity.
This is the critical distinction between the last two equations and
the first two.
Anna Charny, ed. INTERNET-DRAFT [Page 5]
Information for the EF PHB Expires: July 1 2001
An EF-compliant node SHOULD be able to be characterized by the range
of possible R values that it can support on each of its interfaces
while conforming to these equations, and the value of E_p that can be
met on each interface. E_p MAY be specified as a worst-case value for
all possible R values or MAY be expressed as a function of R. An E_p
value of "undefined" MAY be specified.
2.2 The case of an ideal output-buffered device with an EF FIFO at the
output
For an ideal output-buffered device with a FIFO for EF packets at the
output and no internal delay, the i-th packet to arrive to the device is
also the i-th packet to depart from the device. Therefore, in this ideal
model the colorblind and packet-identity-aware characteristics are
identical, and E_a = E_p. In this section we therefore omit the subscript
and refer to the latency term simply as E.
It could be shown that for such an ideal device the definition of
section 2 is stronger than the well-known rate-latency curve [2] in the
sense that if a scheduler satisfies the EF definition it also satisfies
the rate-latency curve. As a result, all the properties known for the
rate-latency curve also apply to the modified EF definition. However, we
argue below that the definition of section 2.1 is more suitable to reflect
the intent of EF PHB than the rate-latency curve.
It is shown in [2] that the rate-latency curve is equivalent to
the following definition:
Definition of Rate Latency Curve (RLC):
D(j) <= F'(j) + E (eq_5)
where
F'(0)=0,
F'(j)=max(a(j), F'(j-1))+ L(j)/R for all j>0 (eq_6)
It can be easily verified that the EF definition of section 2.1 is
stronger than RLC by noticing that for all j, F'(j) >= F(j).
It is easy to see that F'(j) in the definition RLC corresponds to
the time the j-th departure should have occurred should the EF
aggregate be constantly served exactly at its configured rate R.
Following the common convention, we refer to F'(j) as the "fluid
finish time" of the j-th packet to depart.
The intuitive meaning of the rate-latency curve of RLC is that any
packet is served at most time E later than this packet would finish
service in the fluid model.
For RLC (and hence for the stronger EF definition) it
holds that in any interval (0,t) the EF aggregate gets close to the
desired service rate R (as long as there is enough traffic to
Anna Charny, ed. INTERNET-DRAFT [Page 6]
Information for the EF PHB Expires: July 1 2001
sustain this rate). The discrepancy between the ideal and the actual
service in this interval depends on the latency term E, which in
turn depends on the scheduling implementation. The smaller E, the
smaller the difference between the configured rate and the actual
rate achieved by the
scheduler.
While RLC guarantees the desired rate to the EF aggregate in all
intervals (0,t) to within a specified error, it may nevertheless
result in large gaps in service. For example, suppose that (a large
number) N of identical EF packets of length L arrived from different
interfaces to the EF queue in the absence of any non-EF traffic.
Then any work-conserving scheduler will serve all N packets at link
speed. When the last packet is sent at time NL/C, where C is the
capacity of output link, F(N) will be equal to NL/R. That is, the
scheduler is running ahead of ideal, since NL/C < NL/R for R < C.
Suppose now that at time NL/C a large number of non-EF packets
arrive, followed by a single EF packet. Then the scheduler can
legitimately delay starting to send the EF packet until time
F(N+1)=(N+1)L/R + E - L/C. This means that the EF aggregate will
have no service at all in the interval (NL/C, (N+1)L/R + E - L/C).
This interval can be quite large if R is substantially smaller
than C. In essence, the EF aggregate can be "punished" by a gap in
service for receiving faster service than its configured rate at the
beginning.
The new EF definition alleviates this problem by introducing the term
min(D(j-1), F(j-1)) in the recursion. Essentially, this means
that the fluid finishing time is "reset" if that packet is sent before
its "ideal" departure time. As a consequence of that, for the case
where the EF aggregate is served in the FIFO order, suppose a packet
arrives at time t to a server satisfying the EF definition. The packet
will be transmitted no later than time t + Q(t)/R + E, where Q(t) is
the EF queue size at time t (including the packet under discussion).
This statement is proved in [4].
2.3 The General case
In a more general case, where either the output scheduler does not
serve the EF packets in a FIFO order, or the variable internal delay in
the device reorders packets while delivering them to the output (or
both), the i-th packet destined to a given output interface to arrive
to the device may no longer be the i-th packet to depart from that
interface. In that case the packet-identity-aware and the colorblind
definitions are no longer identical.
2.3.1 The colorblind definition
The colorblind definition can be viewed as a truly aggregate
characteristic of the service provided to EF packets. For an analogy
consider a dark reservoir to which all arriving packets are placed. A
scheduler is allowed to pick a packet from the reservoir in a random
order, without any knowledge of the order of packet arrivals. The
colorblind part of the definition measures the accuracy of the output
Anna Charny, ed. INTERNET-DRAFT [Page 7]
Information for the EF PHB Expires: July 1 2001
rate provided to the EF aggregate as a whole. The smaller E_a, the
more accurate is the assurance that the reservoir is drained at least
at the configured rate.
2.3.2 Packet reordering with the colorblind definition
Note that in this reservoir analogy packets of EF aggregate may be
arbitrarily reordered. However, the definition of EF PHB given in [6]
explicitly requires that no packet reordering occur within a
microflow. This requirement restricts the scheduling implementations,
or, in the reservoir analogy, the order of pulling packets out of the
reservoir to make sure that packets within a microflow are not
reordered, but it still allows reordering at the aggregate
level.
Note that reordering within the aggregate, as long as there is no
flow-level reordering, does not necessarily reflect a "bad"
service. Consider for example a scheduler that arbitrates among 10
different EF "flows" with diverse rates. A scheduler that is aware of
the rate requirements may choose to send a packet of the faster flow
before a packet of the slower flow to maintain lower jitter at the
flow level. In particular, an ideal "flow"-aware WFQ scheduler will
cause reordering within the aggregate, while maintaining packet
ordering and small jitter at the flow level.
It is intuitively clear that for such a scheduler, as well as for a
simpler FIFO scheduler, the "accuracy" of the service rate is crucial
for minimizing "flow"-level jitter. The packet-identity-aware
definition quantifies this accuracy of the service rate. A small
value of E_a is a necessary (although not sufficient) condition for
maintaining small per-flow jitter.
However, the small value of E_a does not give any assurances about the
absolute value of per-packet delay. In fact, if the input rate exceeds
the configured rate, the colorblind definition may result in
arbitrarily large delay of a subset of packets. This is the primary
motivation for the packet-identity-aware definition.
2.4 The packet-identity-aware definition
The primary goal of the packet-aware characterization of the EF
implementation is that, unlike the colorblind characterization, it
provides a way to find a per-packet delay bound as a function of input
traffic parameters.
While the colorblind definition characterizes the accuracy of the
service rate of the entire EF aggregate, the packet-identity-aware
part of the definition characterizes the deviation of the device from
an ideal server that serves the EF aggregate in FIFO order at least at
the configured rate.
The value of E_p in the packet-identity-aware definition is therefore
affected by two factors: the accuracy of the aggregate rate service
and the degree of packet reordering within the EF aggregate (under the
Anna Charny, ed. INTERNET-DRAFT [Page 8]
Information for the EF PHB Expires: July 1 2001
constraint that packets within the same microflow are not
reordered). Therefore, a sub-aggregate aware device that provides an
ideal service rate to the aggregate, and also provides an ideal rate
service for each of the sub-aggregates, may nevertheless have a very
large value of E_p (in this case E_p must be at least equal to the
ratio of the maximum packet size divided by the smallest rate of any
sub aggregate). As a result, a large value of E_p does not
necessarily mean that the service provided to EF aggregate is bad -
rather it may be an indication that the service is good, but non-FIFO.
On the other hand, a large value of E_p may also mean that the
aggregate service is very inaccurate (bursty), and hence in this case
the large value of E_p reflects a poor quality of implementation.
As a result, a large number of E_p does not necessarily provide any
guidance on the quality of the EF implementation. However, a small
value of E_p does indicate a high quality FIFO implementation.
Since E_p and E_a relate to different aspects of the EF
implementation, they should be considered together to determine the
quality of the implementation.
3. Per Packet delay
The primary motivation for the packet-identity-aware definition is
that it allows to quantify the per-packet delay bound. This section
discusses the issues with computing per-packet delay
3.1 Single hop delay bound
If the total traffic arriving to an output port I from all inputs is
constrained by a leaky bucket with parameters (R, B), where R is the
configured rate at I, and B is the bucket depth (burst), then the
delay of any packet departing from I is bounded by D_p, given by
D_p = B/R + E_p (eq_7)
Because the delay bound depends on the configured rate R and the input
burstiness B, it is desirable for both of these parameters to be
visible to a user of the device. A PDB desiring a particular delay
bound may need to limit the range of configured rates and allowed
burstiness that it can support in order to deliver such
bound. Equation (eq_7) provides a means for determining an acceptable
operating region for the device with a given E_p. It may also be
useful to limit the total offered load to a given output to some rate
R_1 < R (e.g. to obtain end-to-end delay bounds [5]). It is important
to realize that, while R_1 may also be a configurable parameter of the
device, the delay bound in (eq_7) does not depend on it. It may be
possible to get better bounds explicitly using the bound on the input
rate, but the bound (eq_7) does not take advantage of this information.
3.2 Multi-hop worst case delay
Although the PHB defines inherently local behavior, in this section we
briefly discuss the issue of per-packet delay as the packet traverses
Anna Charny, ed. INTERNET-DRAFT [Page 9]
Information for the EF PHB Expires: July 1 2001
several hops implementing EF PHB. Given a delay bound (eq_7) at a
single hop, it is tempting to conclude that per-packet bound across h
hops is simply h times the bound (eq_7). However, this is not
necessarily the case, unless B represents the worst case input
burstiness across all nodes in the network.
However, obtaining such a worst case value of B is not trivial. If EF
PHB is implemented using aggregate class-based scheduling where all EF
packets share a single FIFO, the so-called effect of jitter
accumulation may result in an increase in burstiness from hop to
hop. In particular, it can be shown that unless severe restrictions on
EF utilization are imposed, even if all EF flows are ideally shaped at
the ingress, then for any value of delay D it is possible to construct
a network where EF utilization on any link is bounded not to exceed a
given factor, no flow traverses more than a specified number of hops,
but there exists a packet that experiences a delay more than D [5].
This result implies that the ability to limit the worst case
burstiness and the resulting end-to-end delay across several hops may
require not only limiting EF utilization on all links, but also
constraining the global network topology. Such topology constraints
would need to be specified in the definition of any PDB built on top
of EF PHB, if such PDB requires a strict worst case delay bound.
4. Packet loss
Any device with finite buffering may need to drop packets if the input
burstiness becomes sufficiently high. To meet the low loss objective
of EF, a node may be characterized by the operating region in which
loss of EF due to congestion will not occur. This may be specified
as a token bucket of rate r <= R and burst size B that can be
offered from all inputs to a given output interface without loss.
However, as discussed in the previous section, the phenomenon of
jitter accumulation makes it generally difficult to guarantee that the
input burstiness never exceeds the specified operating region. A
no-loss guarantee across multiple hops may require specification of
constraints on network topology which are outside the scope of
inherently local definition of a PHB. Thus, it must be possible to
establish whether a device conforms to the EF definition even when
some packets are lost.
This can be done by performing an "off-line" test of conformance to
equations (eq_1)- (eq_4). After observing a sequence of packets
entering and leaving the node, the packets which did not leave are
assumed lost and are notionally removed from the input stream. The
remaining packets now constitute the arrival stream and the packets
which left the node constitute the departure stream. Conformance to
the equations can thus be verified by considering only those packets
that successfully passed through the node..
Note that in the event that loss does occur, the specification of
which packets are lost is beyond the scope of the definition of EF
PHB. However, those packets that were not lost must conform to the
equations definition of EF PHB in section 2.1.
Anna Charny, ed. INTERNET-DRAFT [Page 10]
Information for the EF PHB Expires: July 1 2001
5. Implementation considerations
A packet passing through a router will experience delay for a number
of reasons. Two familiar components of this delay are the time the
packet sits in a buffer at an outgoing link waiting for the
scheduler to select it and the time it takes to actually transmit
the packet on the outgoing line.
There may be other components of a packet's delay through a router,
however. A router might have to do some amount of header processing
before the packet can be given to the correct output scheduler, for
example. In another case a router may have a FIFO buffer (called a
transmission queue in [7]) where the packet sits after being
selected by the output scheduler but before it is transmitted. In
cases such as these, the extra delay a packet may experience can be
accounted for by absorbing it into the latency terms E_a and E_p.
Implementing EF on a router with a multi-stage switch fabric
requires special attention. A packet may experience additional
delays due to the fact that it must compete with other traffic for
forwarding resources at multiple contention points in the switch
core. The delay an EF packet may experience before it even reaches the
output-link scheduler should be included in the latency
term. Input-buffered and input/output-buffered routers based on
crossbar design may also require modification of their latency
terms. The factors such as the speedup factor and the choice of
crossbar arbitration algorithms may affect the latency terms
substantially.
Delay in the switch core comes from two sources, both of which must be
considered. The first part of this delay is the fixed delay a packet
experiences regardless of the other traffic. This component of the
delay includes the time it takes for things such as packet segmentation
and reassembly in cell based cores, enqueueing and dequeueing at each
stage, and transmission between stages. The second part of the switch
core delay is variable and depends on the type and amount of other
traffic traversing the core. This delay comes about if the stages in
the core mix traffic flowing between different input/output port pairs.
Thus, EF packets must compete against other traffic for forwarding
resources in the core. Some of this competing traffic may even be EF
traffic from other aggregates. This introduces extra delay, that can
also be absorbed by the latency term in the definition.
To capture these considerations, in this section we will consider two
simplified implementation examples. The first is an ideal output
buffered node where packets entering the device from an input
interface are immediately delivered to the output scheduler. In this
model the properties of the output scheduler fully define the values
of the parameters E_a and E_p. We will consider the case where the
output scheduler implements aggregate class-based queueing, so that
all EF packets share a single queue. We will discuss the values of E_a
and E_p for a variety of class-based schedulers widely considered
Anna Charny, ed. INTERNET-DRAFT [Page 11]
Information for the EF PHB Expires: July 1 2001
acceptable for EF implementations.
The second example will consider a router modeled as a black box with
a known bound on the variable delay a packet can experience from the
time it arrives to an input to the time it is delivered to its
destination output. The output scheduler in isolation is assumed to be
an aggregate scheduler with a known value of E_a(S)=E_p(S)=E(S). This
model provides a reasonable abstraction to a large class of router
implementations.
5.1. The output buffered model with aggregate queuing at the output.
As has been mentioned earlier, in this model E_a = E_p, so we shall
omit the subscript and refer to both terms as latency E. The
remainder of this subsection discusses E for a number of scheduling
implementations.
5.1.1 Strict Non-preemptive Priority Queue
A Strict Priority scheduler in which all EF packets share a single
FIFO queue which is served at strict non-preemptive priority over other
queues satisfies the EF definition with the latency term E = MTU/C
where MTU is the maximum packet size and C is the speed of output link.
5.1.2 WF2Q
Another scheduler that satisfies the EF definition with a small
latency term is WF2Q described in [1]. A class-based WF2Q scheduler,
in which all EF traffic shares a single queue with the weight
corresponding to the configured rate of the EF aggregate satisfies the
EF definition with the latency term E = MTU/C+MTU/R.
5.1.3.Deficit Round Robin (DRR)
For DRR [12], both E can be shown to grow linearly with
N*(r_max/r_min)*MTU, where r_min and r_max denote the smallest and
the largest rate among the rate assignments of all queues in the
scheduler, and N is the number of queues in the scheduler
5.1.4. Start-Time Fair Queuing (SFQ) and Self-Clocked Fair Queuing
(SCFQ)
For SFQ [9] and SCFQ [8] E can be shown to grow
linearly with the number of queues in the scheduler.
5.2. A Router with Variable Internal Delay and Aggregate Scheduling at the
output.
In this section we consider a router which is modeled as follows. A
packet entering the router may experience a variable delay D_v with a
known upper bound D. That is, 0<=D_v<D. At the output all EF packets
share a single class queue. Class queues are scheduled by a scheduler
with a known value E(S) (where E(S) corresponds to the model where
this scheduler is implemented in an ideal output buffered device).
Anna Charny, ed. INTERNET-DRAFT [Page 12]
Information for the EF PHB Expires: July 1 2001
In this case, it can be easily shown that E_a = E(S) + D.
The computation of E_p is more complicated. For such device, E_p =
E(S)+ min((D+B/R), D*C_inp/R), where (B, R) are the input token
bucket parameters of the EF aggregate at the given input, and C_inp is
the total capacity of all input links from which EF traffic destined
to the given output can arrive.
Intuitively, this can be understood as follows. The term min((D+B/R),
D*C_inp/R) is the time to it takes to drain (at rate R) the maximum
number of EF packets that can arrive in the interval of length D to
the output from all inputs. Consider an EF packet arriving at the
input at time t. The ideal departure time d of this packet in the
packet-identity-aware definition would be determined by the number of
packets which arrived to the router before this packet. However, due
to variable delay, some packets that arrive later than this packet to
the input of the device may end up arriving before this packet to the
output. The number of bits in all such packets is bounded by min(DR+B,
D*C_inp). If the EF queue is served at the configured rate R, then
the chosen packet may be delayed additionally by min((D+B/R),
D*C_inp/R).
Recall from the discussion of section 3 that bounding input burstiness
B may not be easy in a general topology. In the absence of the
knowledge of a bound on B one can bound E_p as E_p = E(S) +
D*C_inp/R.
Note also that the bounds in this section are derived using only the
bound on the variable portion of the interval delay and the error
bound of the output scheduler. If more details about the architecture
of a device are available, it may be possible to compute better
bounds.
6. Security Considerations
This informational draft provides additional information to aid in
understanding of the EF PHB described in [6]. It adds no new
functions to it. As a result, it adds no security issues to those
described in that specification.
7. References
[1] J.C.R. Bennett and H. Zhang, ``WF2Q: Worst-case
Fair Weighted Fair Queuing'', INFOCOM'96, Mar, 1996
[2] J.-Y. Le Boudec, "Application of Network Calculus To
Guaranteed Service Networks", IEEE Transactions on
Information theory, (44) 3, May 1998
[3] S. Bradner, "Key Words for Use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
[4] J. Bennett, K. Benson, A. Charny, W. Courney, J.Y. Le Boudec,
Anna Charny, ed. INTERNET-DRAFT [Page 13]
Information for the EF PHB Expires: July 1 2001
"Delay Jitter Bounds and Packet Scale Rate Guarantee for Expedited
Forwarding", Proc. Infocom 2001, April 2001.
[5] A. Charny, J.-Y. Le Boudec "Delay Bounds in a
Network with Aggregate Scheduling". Proc. of QoFIS'2000, September
25-26, 2000, Berlin, Germany.
[6] B. Davie, ed. et al, "An Expedited Forwarding PHB",
draft-ietf-diffserv-rfc2598bis-00.txt, work in progress, February
2001.
[7] T. Ferrari and P. F. Chimento, "A Measurement-
Based Analysis of Expedited Forwarding PHB
Mechanisms," Eighth International Workshop on Quality
of Service, Pittsburgh, PA, June 2000,
[8] S.J. Golestani. "A Self-clocked Fair Queuing
Scheme for Broad-band Applications". In Proceedings of
IEEE INFOCOM'94, pages 636-646, Toronto, CA, April
1994.
[9] P. Goyal, H.M. Vin, and H. Chen. "Start-time
Fair Queuing: A Scheduling Algorithm for Integrated
Services". In Proceedings of the ACM-SIGCOMM 96, pages
157-168, Palo Alto, CA, August 1996.
[10] V. Jacobson, K. Nichols, K. Poduri, "An Expedited
Forwarding PHB", RFC 2598, June 1999
[11] V. Jacobson, K. Nichols, K. Poduri,
"The 'Virtual Wire' Behavior Aggregate,"
(draft-ietf-diffserv-ba-vw-00.txt), March 2000.
[12] M. Shreedhar and G. Varghese. "Efficient Fair Queueing
Using Deficit Round Robin". In Proceedings of
SIGCOMM'95, pages 231-243, Boston, MA, September 1995.
8. Appendix. Difficulties with the RFC 2598 EF PHB Definition
The definition of the EF PHB as given in [10] states:
"The EF PHB is defined as a forwarding treatment for a particular
diffserv aggregate where the departure rate of the aggregate's
packets from any diffserv node must equal or exceed a configurable
rate. The EF traffic SHOULD receive this rate independent of the
intensity of any other traffic attempting to transit the node.
It [the EF PHB departure rate] SHOULD average at least the
configured rate when measured over any time interval equal
to or longer than the time it takes to send an output link
MTU sized packet at the configured rate."
Anna Charny, ed. INTERNET-DRAFT [Page 14]
Information for the EF PHB Expires: July 1 2001
A literal interpretation of the definition would consider the
behaviors given in the next two subsections as non-compliant. The
definition also unnecessarily constrains the maximum configurable
rate of an EF aggregate.
8.1 Perfectly-Clocked Forwarding
Consider the following stream forwarded from a router with EF-
configured rate R=C/2, where C is the output line rate. In the
illustration, E is an MTU-sized EF packet while x is a non-EF packet
or unused capacity, also of size MTU.
... E x E x E x E x E x E x...
|-----|
The interval between the vertical bars is 3*MTU/C, which is greater
than MTU/(C/2), and so is subject to the EF PHB definition. During
this interval, 3*MTU/2 bits of the EF aggregate should be forwarded,
but only MTU bits are forwarded. Therefore, while this forwarding
pattern should be considered compliant under any reasonable
interpretation of the EF PHB, it actually does not formally comply
with the definition of RFC 2598.
Note that this forwarding pattern can occur in any work-conserving
scheduler in an ideal output-buffered architecture where EF packets
arrive in a perfectly clocked manner according to the above pattern
and are forwarded according to exactly the same pattern in the
absence of any non-EF traffic.
Trivial as this example may be, it reveals the lack of mathematical
precision in the formal definition. The fact that no work-conserving
scheduler can formally comply with the definition is unfortunate,
and appears to warrant some changes to the definition that would
correct this problem.
The underlying reason for the problem described here is quite simple
- one can only expect that the EF aggregate is served at configured
rate in some interval where there is enough backlog of EF packets to
sustain that rate. In the example above the packets come in exactly
at the rate at which they are served, and so there is no persistent
backlog. Certainly, if the input rate is even smaller than the
configured rate of the EF aggregate, there will be no backlog as
well, and a similar formal difficulty will occur.
A seemingly simple solution to this difficulty might be to require
that the EF aggregate is served at its configured rate only when the
queue is backlogged. However, as we show in the remainder of this
section, this solution does not suffice.
8.2 Router Internal Delay
We now argue that the example considered in the previous section is
not as trivial as it may seem at first glance.
Anna Charny, ed. INTERNET-DRAFT [Page 15]
Information for the EF PHB Expires: July 1 2001
Consider a router with EF configured rate R = C/2 as in the previous
example, but with an internal delay of 3T (where T = MTU/C) between
the time that a packet arrives at the router and the time that it is
first eligible for forwarding at the output link. Such things as
header processing, route look-up, and delay in switching through a
multi-layer fabric could cause this delay. Now suppose that EF
traffic arrives regularly at a rate of (2/3)R = C/3. The router will
perform as shown below.
EF Packet Number 1 2 3 4 5 6 ...
Arrival (at router) 0 3T 6T 9T 12T 15T ...
Arrival (at scheduler) 3T 6T 9T 12T 15T 18T ...
Departure 4T 7T 10T 13T 16T 19T ...
Again, the output does not satisfy the RFC 2598 definition of EF
PHB. As in the previous example, the underlying reason for this
problem is that the scheduler cannot forward EF traffic faster than
it arrives. However, it can be easily seen that the existence of
internal delay causes one packet to be inside the router at all
times. An external observer will rightfully conclude that the number
of EF packets that arrived to the router is always at least one
greater than the number of EF packets that left the router, and
therefore the EF aggregate is constantly backlogged. However, while
the EF aggregate is continuously backlogged, the observed output
rate is nevertheless strictly less that the configured rate.
This example indicates that the simple addition of the condition
that EF aggregate must receive its configured rate only when the EF
aggregate is backlogged does not suffice in this case.
Yet, the problem described here is of fundamental importance in
practice. Most routers have a certain amount of internal delay. A
vendor declaring EF compliance is not expected to simultaneously
declare the details of the internals of the router. Therefore, the
existence of internal delay may cause a perfectly reasonable EF
implementation to display seemingly non-conformant behavior, which
is clearly undesirable.
8.3 Maximum Configurable Rate and Provisioning Efficiency
It is well understood that with any non-preemptive scheduler, the
compliant configurable rate for an EF aggregate cannot exceed C/2
[11]. This is because an MTU-sized EF packet may arrive to an
empty queue at time t just as an MTU-sized non-EF packet begins
service. The maximum number of EF bits that could be forwarded
during the interval [t, t + 2*MTU/C] is MTU. But if configured rate
R > C/2, then this interval would be of length greater than MTU/R,
and more than MTU EF bits would have to be served during this
interval for the router to be compliant. Thus, R must be no greater
than C/2.
It can be shown that for schedulers other than PQ, such as various
Anna Charny, ed. INTERNET-DRAFT [Page 16]
Information for the EF PHB Expires: July 1 2001
implementations of WFQ, the maximum compliant configured rate may be
much smaller than 50%. For example, for SCFQ [8] the maximum
configured rate cannot exceed C/N, where N is the number of queues
in the scheduler. For WRR, mentioned as compliant in section 2.2 of
RFC 2598, this limitation is even more severe. This is because in
these schedulers a packet arriving to an empty EF queue may be
forced to wait until one packet from each other queue (in the case
of SCFQ) or until several packets from each other queue (in the case
of WRR) are served before it will finally be forwarded.
While it is frequently assumed that the configured rate of EF
traffic will be substantially smaller than the link bandwidth, the
requirement that this rate should never exceed 50% of the link
bandwidth appears unnecessarily limiting. For example, in a fully
connected mesh network, where any flow traverses a single link on
its way from source to its destination there seems no compelling
reason to limit the amount of EF traffic to 50% (or an even smaller
percentage for some schedulers) of the link bandwidth.
Another, perhaps even more striking example is the fact that even a
TDM circuit with dedicated slots cannot be configured to forward EF
packets at more than 50% of the link speed without violating RFC
2598 (unless the entire link is configured for EF). If the
configured rate of EF traffic is greater than 50% (but less than the
link speed), there will always exist an interval longer than MTU/R
in which less than the configured rate is achieved. For example,
suppose the configured rate of the EF aggregate is 2C/3. Then the
forwarding pattern of the TDM circuit might be
E E x E E x E E x ...
|---|
where only one packet is served in the marked interval of length 2T
= 2MTU/C. But at least 4/3 MTU would have to be served during this
interval by a router in compliance with the definition in RFC 2598.
The fact that even a TDM line cannot be booked over 50% by EF
traffic indicates that the restriction is artificial and
unnecessary.
8.4 The Non-trivial Nature of the Difficulties
One possibility to correct the problems discussed in the previous
sections might be to attempt to clarify the definition of the
intervals to which the definition applied or by averaging over
multiple intervals. However, an attempt to do so meets with
considerable analytical and implementation difficulties. For
example, attempting to align interval start times with some epochs
of the forwarded stream appears to require a certain degree of
global clock synchronization and is fraught with the risk of
misinterpretation and mistake in practice.
Another approach might be to allow averaging of the rates over some
larger time scale. However, it is unclear exactly what finite time
scale would suffice in all reasonable cases. Furthermore, this
Anna Charny, ed. INTERNET-DRAFT [Page 17]
Information for the EF PHB Expires: July 1 2001
approach would compromise the notion of very short-term time scale
guarantees that are the essence of EF PHB.
We also explored a combination of two simple fixes. The first is the
addition of the condition that the only intervals subject to the
definition are those that fall inside a period during which the EF
aggregate is continuously backlogged in the router (i.e., when an EF
packet is in the router). The second is the addition of an error
(latency) term that could serve as a figure-of-merit in the
advertising of EF services.
With the addition of these two changes the candidate definition
becomes as follows:
In any interval of time (t1, t2) in which EF traffic is
continuously backlogged, at least R(t2 - t1 - E) bits of EF
traffic must be served, where R is the configured rate for the
EF aggregate and E is an implementation-specific latency term.
The "continuously backlogged" condition eliminates the insufficient-
packets-to-forward difficulty while the addition of the latency term
of size MTU/C resolves the perfectly-clocked forwarding example
(section 1.2.1), and also removes the limitation on EF configured
rate.
However, neither fix (nor the two of them together) resolves the
example of section 1.2.2. To see this, recall that in the example of
section 1.2.2 the EF aggregate is continuously backlogged, but the
service rate of the EF aggregate is consistently smaller than the
configured rate, and therefore no finite latency term will suffice
to bring the example into conformance.
9. Authors' addresses
Anna Charny, ed.
Cisco Systems
300 Apollo Drive
Chelmsford, MA 01824
acharny@cisco.edu
Fred Baker
Cisco Systems
170 West Tasman Dr.
San Jose, CA 95134
fred@cisco.com
Jon Bennett
RiverDelta Networks
3 Highwood Drive East
Tewksbury, MA 01876
jcrb@riverdelta.com
Kent Benson
Anna Charny, ed. INTERNET-DRAFT [Page 18]
Information for the EF PHB Expires: July 1 2001
Tellabs Research Center
3740 Edison Lake Parkway #101
Mishawaka, IN 46545
Kent.Benson@tellabs.com
Jean-Yves Le Boudec
ICA-EPFL, INN
Ecublens, CH-1015
Lausanne-EPFL, Switzerland
leboudec@epfl.c
Angela Chiu
AT&T Labs
100 Schulz Dr. Rm 4-204
Red Bank, NJ 07701
alchiu@att.com
Bill Courtney
TRW
Bldg. 201/3702
One Space Park
Redondo Beach, CA 90278
bill.courtney@trw.com
Shahram Davari
PMC-Sierra Inc
555 Legget drive
Suit 834, Tower B
Ottawa, ON K2K 2X3, Canada
shahram_davari@pmc-sierra.com
Bruce Davie
Cisco Systems
300 Apollo Drive
Chelmsford, MA 01824
bsd@cisco.com
Victor Firoiu
Nortel Networks
600 Tech Park
Billerica, MA 01821
vfirou@nortelnetworks.com
Charles Kalmanek
AT&T Labs-Research
180 Park Avenue, Room A113,
Florham Park NJ
crk@research.att.com.
K.K. Ramakrishnan
AT&T Labs-Research
Rm. A155, 180 Park Ave,
Florham Park, NJ 07932
Anna Charny, ed. INTERNET-DRAFT [Page 19]
Information for the EF PHB Expires: July 1 2001
kkrama@research.att.com
Dimitrios Stiliadis
Lucent Technologies
1380 Rodick Road
Markham, Ontario, L3R-4G5, Canada
stiliadi@bell-labs.com
10. Full Copyright
Copyright (C) The Internet Society 1999. All Rights Reserved.
This document and translations of it may be copied and furnished to
others, and derivative works that comment on or otherwise explain it
or assist in its implementation may be prepared, copied, published
and distributed, in whole or in part, without restriction of any
kind, provided that the above copyright notice and this paragraph
are included on all such copies and derivative works. However, this
document itself may not be modified in any way, such as by removing
the copyright notice or references to the Internet Society or other
Internet organizations, except as needed for the purpose ofdeveloping
Internet standards in which case the procedures for
copyrights defined in the Internet Standards process must be
followed, or as required to translate it into languages other than
English.
The limited permissions granted above are perpetual and will not be
revoked by the Internet Society or its successors or assigns.
This document and the information contained herein is provided on an
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Anna Charny, ed. INTERNET-DRAFT [Page 20]