Transport Area Working Group B. Briscoe
Internet-Draft BT
Intended status: Standards Track June 30, 2007
Expires: January 1, 2008
Layered Encapsulation of Congestion Notification
draft-briscoe-tsvwg-ecn-tunnel-00
Status of this Memo
By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on January 1, 2008.
Copyright Notice
Copyright (C) The IETF Trust (2007).
Abstract
This document redefines how the explicit congestion notification
(ECN) field of the outer IP header of a tunnel should be constructed.
It brings all IP in IP tunnels (v4 or v6) into line with the way
IPsec tunnels now construct the ECN field, ensuring that the outer
header reveals any congestion experienced so far on the path. It
specifies the default ECN tunneling behaviour for any Diffserv per-
hop behaviour (PHB), but also gives general principles to guide the
design of alternate congestion marking behaviours for specific PHBs
Briscoe Expires January 1, 2008 [Page 1]
Internet-Draft ECN Tunnelling June 2007
and for lower layer congestion notification schemes.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Requirements notation . . . . . . . . . . . . . . . . . . . . 5
3. Design Constraints . . . . . . . . . . . . . . . . . . . . . . 6
3.1. Security Constraints . . . . . . . . . . . . . . . . . . . 6
3.2. Control Constraints . . . . . . . . . . . . . . . . . . . 7
3.3. Management Constraints . . . . . . . . . . . . . . . . . . 8
4. Design Principles . . . . . . . . . . . . . . . . . . . . . . 9
5. Default ECN Tunnelling Rules . . . . . . . . . . . . . . . . . 11
6. Backward Compatibility . . . . . . . . . . . . . . . . . . . . 12
7. Changes from Earlier RFCs . . . . . . . . . . . . . . . . . . 13
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 14
9. Security Considerations . . . . . . . . . . . . . . . . . . . 14
10. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 14
11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 15
12. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 15
13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 15
13.1. Normative References . . . . . . . . . . . . . . . . . . . 15
13.2. Informative References . . . . . . . . . . . . . . . . . . 16
Appendix A. In-path Load Regulation . . . . . . . . . . . . . . . 17
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 20
Intellectual Property and Copyright Statements . . . . . . . . . . 21
Briscoe Expires January 1, 2008 [Page 2]
Internet-Draft ECN Tunnelling June 2007
1. Introduction
This document redefines how the explicit congestion notification
(ECN) field [RFC3168] of the outer IP header of a tunnel should be
constructed. It brings all IP in IP tunnels (v4 or v6) into line
with the way IPsec tunnels [RFC4301] now construct the ECN field,
ensuring that the outer header reveals any congestion experienced so
far on the path. Although this memo focuses on IP in IP tunnelling
it also gives generalised advice for any encapsulation by lower layer
headers.
ECN allows a congested resource to notify the onset of congestion
without having to drop packets, by explicitly marking a proportion of
packets with the congestion experienced (CE) codepoint. Congestion
notification is unusual in that it propagates from the physical layer
upwards to the transport layer, because congestion is exhaustion of a
physical resource. The transport layer can directly detect loss of a
packet (or frame) by a lower layer. But if a lower layer marks a
packet (or frame) to notify incipient congestion, this marking has to
be explicitly copied up the layers at every header decapsulation.
So, at each decapsulation of an outer (lower layer) header a
congestion marking has to be arranged to propagate into the forwarded
(upper layer) header. It must continue upwards until it reaches the
destination transport, which should feed congestion notification back
to the source transport.
Note that often lower layer resources are arranged to be protected by
higher layer buffers, so instead of blocking occurring at the lower
layer, it occurs when the higher layer queue overflows. Thus, non-
blocking link and physical layer technologies do not have to
implement congestion notification, which can be introduced solely in
IP layer active queue management (AQM). However, if we want to use
congestion notification, we have to arrange for it to be explicitly
copied up the layers when IP is tunnelled in IP (and if a particular
link layer technology isn't protected from blocking by network layer
queues).
IPsec tunnel mode is a specific form of tunnelling that can hide the
inner headers. Because the ECN field has to be mutable, it cannot be
covered by IPsec encryption or authentication calculations.
Therefore concern has been raised in the past that the ECN field
could be used as a low bandwidth covert channel to communicate with
someone on the unprotected public Internet even if an end-host is
restricted to only communicate with the public Internet through an
IPsec gateway. However, the recently updated version of IPsec
[RFC4301] chose not to block this covert channel, deciding that the
threat could be managed given the channel bandwidth is so limited
(ECN is a 2-bit field).
Briscoe Expires January 1, 2008 [Page 3]
Internet-Draft ECN Tunnelling June 2007
An unfortunate sequence of standards actions leading up to this
latest change in IPsec has left us with nearly the worst of all
possible combinations of outcomes, despite the best endeavours of
everyone concerned. Even though information about congestion
experienced on the upstream path has various uses if it is revealed
in the outer header of a tunnel, when ECN was standardised[RFC3168]
it was decided that all IP in IP tunnels should hide upstream
congestion information simply to avoid the extra complexity of two
different mechanisms for IPsec and non-IPsec tunnels. However, now
that [RFC4301] IPsec tunnels deliberately no longer hide this
information, we are left in the perverse position where non-IPsec
tunnels still hide congestion information unnecessarily. This
document is designed to correct that anomaly.
Specifically, RFC3168 says that, if a tunnel supports ECN (termed a
'full-functionality' ECN tunnel), the tunnel ingress must not copy a
CE marking from the inner header into the outer header that it
creates. Instead the tunnel ingress has to set the ECN field of the
outer header to ECT(0) (i.e. codepoint 10). We term this 'resetting'
a CE codepoint. However, RFC4301 reverses this, stating that the
tunnel ingress must simply copy the ECN field from the inner to the
outer header. The main purpose of this document is to carry over
this new relaxed attitude to covert channels from IPsec to all IP in
IP tunnels, so all tunnel ingress nodes consistently copy the ECN
field.
The rest of the document deals with the knock-on effects of this
apparently minor change. It is organised as follows:
o S.5 of RFC3168 permits the Diffserv codepoint (DSCP)[RFC2474] to
'switch in' different behaviours for marking the ECN field, just
as it switches in different per-hop behaviours (PHBs) for
scheduling. Therefore we cannot only discuss the ECN protocol
that RFC3168 gives as a default. We need to also give guidance
for possible different marking schemes. Therefore in Section 3 we
lay out the design constraints when tunneling congestion
notification.
o Then in Section 4 we resolve the tensions between these
constraints to give general design principles on how a tunnel
should process congestion notification; principles that could
apply to any marking behaviour for any PHB, not just the default
in RFC3168. In particular, we examine the underlying principles
behind whether CE should be reset or copied into the outer header
at the ingress to a tunnel--or indeed at the ingress of any
layered encapsulation of headers with congestion notification
fields.
Briscoe Expires January 1, 2008 [Page 4]
Internet-Draft ECN Tunnelling June 2007
o Section 5 then confirms the precise rules for the default ECN
tunnelling behaviour based on the above design principles. These
rules apply to all PHBs, unless stated otherwise in the
specification of a PHB. There is no requirement for a PHB to
state anything about ECN behaviour if the default behaviour is
sufficient.
o Extending the new IPsec tunnel ingress behaviour to all IP in IP
tunnels causes one further knock-on effect that is dealt with in
Section 6 on Backward Compatibility. If one end of an IPsec
tunnel is compliant with [RFC4301], assuming IKEv2 key management
is used, the other end can be guaranteed to also be [RFC4301]
compliant. So there is no backward compatibility problem with
IKEv2 RFC4301 IPsec tunnels. But once we extend our scope to any
IP in IP tunnel, we have to cater for the possibility that a
tunnel ingress compliant with this specification is sending to an
egress that doesn't even understand ECN (e.g. a legacy [RFC2003]
tunnel egress). If a tunnel ingress copied incoming ECN-capable
headers into outer headers, then a legacy tunnel egress would
discard any congestion markings added to the outer header within
the tunnel. ECN-capable traffic sources would not see any
congestion feedback and instead continually ratchet up their share
of the bandwidth without realising that cross-flows from other ECN
sources were continually having to ratchet down.
The scope of this document is all IP in IP tunnelling, irrespective
of whether IPv4 or IPv6 is used for either of the inner and outer
headers. The document only concerns wire protocol processing at
tunnel endpoints and makes no changes or recommendations concerning
algorithms for congestion marking or congestion response. The
general design principles of Section 4 may also be useful when any
datagram/packet/frame with a congestion notification capability is
encapsulated by a connectionless outer header [BBnet] that might also
support a congestion notification capability in the future as
discussed in S.9.3 of [RFC3168] (e.g. IP encapsulated in L2TP
[RFC2661], GRE [RFC1701] or PPTP [RFC2637]). However, of course, the
IETF does not have standards authority over every link or tunnel
protocol, so this document focuses only on IP in IP.
[I-D.ietf-tsvwg-ecn-mpls] applies these principles to IP in MPLS and
to MPLS in MPLS.
2. Requirements notation
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119].
Briscoe Expires January 1, 2008 [Page 5]
Internet-Draft ECN Tunnelling June 2007
3. Design Constraints
Tunnel processing of a congestion notification field has to meet
congestion control needs without creating new information security
vulnerabilities (if information security is required).
3.1. Security Constraints
Information security can be assured by using various end to end
security solutions (including IPsec in transport mode [RFC4301]), but
a commonly used scenario involves the need to communicate between two
physically protected domains across the public Internet. In this
case there are certain management advantages to using IPsec in tunnel
mode solely across the publicly accessible part of the path. The
path followed by a packet then crosses security 'domains'; the ones
protected by physical or other means before and after the tunnel and
the one protected by an IPsec tunnel across the otherwise unprotected
domain. We will use the scenario in Figure 1 where endpoints 'A' and
'B' communicate through a tunnel with ingress 'I' and egress 'E'
within physically protected edge domains across an unprotected
internetwork where there may be 'men in the middle', M.
physically unprotected physically
<-protected domain-><--domain--><-protected domain->
+------------------+ +------------------+
| | M | |
| A-------->I=========>==========>E-------->B |
| | | |
+------------------+ +------------------+
<----IPsec secured---->
tunnel
Figure 1: IPsec Tunnel Scenario
IPsec encryption is typically used to prevent 'M' seeing messages
from 'A' to 'B'. IPsec authentication is used to prevent 'M'
masquerading as the sender of messages from 'A' to 'B' or altering
their contents. But 'I' can also use IPsec tunnel mode to allow 'A'
to communicate with 'B', but impose encryption to prevent 'A' leaking
information to 'M'. Or 'E' can insist that 'I' uses tunnel mode
authentication to prevent 'M' communicating information to 'B'.
Mutable IP header fields such as the ECN field (as well as the TTL/
Hop Limit and DS fields) cannot be included in the cryptographic
calculations of IPsec. Therefore, if 'I' encrypts but copies these
mutable fields into the outer header that is exposed across the
tunnel it will have allowed a covert channel from 'A' to M. And if
'E' copies these fields from the outer header to the inner, even if
it validates authentication from 'I', it will have allowed a covert
Briscoe Expires January 1, 2008 [Page 6]
Internet-Draft ECN Tunnelling June 2007
channel from 'M' to 'B'.
ECN at the IP layer is designed to carry information about congestion
from a congested resource to some downstream node that will feed the
information back somehow to the point upstream of the congestion that
can regulate the load on the congested resource. In terms of the
above scenario, ECN is effectively intended to create an information
channel from 'M' to 'B', for 'B' to forward to 'A'. Therefore the
goals of IPsec and ECN are mutually incompatible.
With respect to the DS or ECN fields, S.5.1.2 of RFC4301 says,
"controls are provided to manage the bandwidth of this [covert]
channel". Using the ECN processing rules of RFC4301, the channel
bandwidth is two bits per datagram from 'A' to 'M' and one bit per
datagram from 'M' to 'A' because 'E' limits the combinations it will
copy. In both cases the covert channel bandwidth is further reduced
by noise from any real congestion marking. RFC4301 therefore implies
that these covert channels are sufficiently limited to be considered
a manageable threat. However, with respect to the larger (6b) DS
field, the same section of RFC4301 says not copying is the default,
but a configuration option can allow copying "to allow a local
administrator to decide whether the covert channel provided by
copying these bits outweighs the benefits of copying". Of course, an
administrator considering copying of the DS field has to take into
account that it could be concatenated with the ECN field giving an 8b
per datagram channel.
3.2. Control Constraints
Congestion control requires that any congestion notification marked
into packets by a resource will be able to traverse a feedback loop
back to a node capable of controlling the load on that resource. To
avoid ambiguity later rather than calling this node the data source
we will call it the Load Regulator. This will allow us to deal with
exceptional cases where load is not regulated by the data source, but
usually the two will be synonymous. Note the term "a node _capable
of_ controlling the load" deliberately includes a source application
that doesn't actually control the load but ought to (e.g. an
application without congestion control that uses UDP).
A--->R--->I=========>M=========>E-------->B
Figure 2: Simple Tunnel Scenario
We now consider a similar tunneling scenario to the IPsec one just
described, but without the different security domains so we can just
focus on ensuring the control loop and management monitoring can work
(Figure 2). If we want resources in the tunnel to be able to
Briscoe Expires January 1, 2008 [Page 7]
Internet-Draft ECN Tunnelling June 2007
explicitly notify congestion and the feedback loop is from 'B' to
'A', it will certainly be necessary for 'E' to copy any CE marking
from the outer header to the inner header for onward transmission to
'B', otherwise congestion notification from resources like 'M' cannot
be fed back to the Load Regulator ('A'). But it doesn't seem
necessary for 'I' to copy CE markings from the inner to the outer
header. For instance, if resource 'R' is congested, it can send
congestion information to 'B' using the congestion field in the inner
header without 'I' copying the congestion field into the outer header
and 'E' copying it back to the inner header. 'E' can then write any
additional congestion marking introduced across the tunnel into the
congestion field of the inner header.
Indeed, this arrangement can be extended to multi-level congestion
marking (such as that proposed for PCN [PCN-arch]) as long as all the
marks have unambiguously ranked values. For instance, if a
hypothetical multi-level marking scheme for PCN had PCN-capable
codepoints ranked 1, 2 and 3, then, if 'I' reset the outer congestion
field to the lowest ranked value that is PCN-capable (1), 'E' would
simply write the highest ranked of the inner and outer congestion
markings into the forwarded header. For instance, if the inner
marking on arrival at 'I' was 3 and 'I' reset the outer to 1, but 'M'
subsequently set it to 2, then the header forwarded by 'E' would be
max(3,2) = 3.
It might be useful for the tunnel egress to be able to tell whether
congestion occurred across a tunnel or upstream of it. If outer
header congestion marking was reset at the tunnel ingress ('I'), by
the end of a tunnel ('E') the outer headers would indicate congestion
experienced across the tunnel ('I' to 'E'), while the inner header
would indicate congestion upstream of 'I'. But the same information
could be gleaned even if the tunnel ingress copied the inner to the
outer headers. By the end of the tunnel ('E'), any packet with an
_extra_ mark in the outer header relative to the inner header would
indicate congestion across the tunnel ('I' to 'E'), while the inner
header would still indicate congestion upstream of ('I').
All this shows that 'E' can preserve the control loop irrespective of
whether 'I' copies congestion notification into the outer header or
resets it.
3.3. Management Constraints
As well as control, there are also management constraints.
Specifically, a management system may monitor congestion markings in
passing packets, perhaps at the border between networks as part of a
service level agreement. For instance, monitors at the borders of
autonomous systems may need to measure how much congestion has
Briscoe Expires January 1, 2008 [Page 8]
Internet-Draft ECN Tunnelling June 2007
accumulated since the original source to determine between them how
much of the congestion is contributed by each domain.
Therefore it should be clear how far back in the path the congestion
markings have accumulated from. In this document we term this the
baseline of the congestion marking, i.e. the source of the layer that
last reset rather than copied the congestion notification field when
creating an outer header. Given some tunnels cross domain borders
(e.g. consider M in Figure 2 is monitoring a border), it is therefore
desirable for 'I' to copy congestion accumulated so far into the
outer headers exposed across the tunnel.
Appendix A discusses various scenarios where the Load Regulator lies
in-path, not at the source host as we would typically expect. It
concludes that the baseline for congestion notification should be
determined by where the Load Regulator function is, whether it is at
the source host or within the path. Therefore every tunnel ingress
should copy the ECN field into the outer header it creates unless it
is also a Load Regulator, in which case it should reset any CE
markings, which is an exception to the normal copying rule for a
tunnel ingress.
4. Design Principles
The constraints from the three perspectives of security, control and
management in Section 3 are somewhat in tension as to whether a
tunnel ingress should copy congestion markings into the outer header
it creates or reset them. From the control perspective either
copying or resetting works. From the management perspective copying
is preferable (with the exception of an in-path load regulator).
From the security perspective resetting is preferable but copying is
now considered acceptable given the bandwidth of a 2-bit covert
channel can be managed.
Therefore an outer encapsulating header capable of carrying
congestion markings SHOULD reflect accumulated congestion since the
last interface designed to regulate load (the Load Regulator). This
implies congestion notification SHOULD be copied into the outer
header of each new encapsulating header that supports it--except at
an in-path Load Regulator. An in-path Load Regulator knows its
function is to regulate load, so if it also acts as the ingress to a
tunnel, in every new outer header it creates it MUST reset any
congestion marking.
The Load Regulator is the node to which congestion feedback should be
returned by the next downstream node with a transport layer function
(typically but not always the data receiver). The Load Regulator is
Briscoe Expires January 1, 2008 [Page 9]
Internet-Draft ECN Tunnelling June 2007
not always (or even typically) the same thing as the node identified
by the source address of the outermost exposed header. In general
the addressing of the outermost encapsulation header says nothing
about the identifiers of either the upstream or the downstream
transport layer functions. As long as the transport functions know
each other's addresses, they don't have to be identified in the
network layer or in any link layer. It was only a convenience that a
TCP receiver assumed that the address of the source transport is the
same as the network layer source address of a packet it receives.
More generally, the return transport address could be identified
solely in the transport layer protocol. For instance, a signalling
protocol like RSVP [RFC2205] breaks up a path into transport layer
hops and informs each hop of the address of its transport layer
neighbour without any need to identify these hops in the network
layer. RSVP can be arranged so that these transport layer hops are
bigger than the underlying network layer hops. The host identity
protocol (HIP) architecture [RFC4423] also supports the same
principled separation (for mobility amongst other things), where the
transport layer receiver identifies the transport layer sender using
an identifier provided by the transport layer, which gets mapped to a
network layer address below the transport layer.
Note that this principle deliberately doesn't require a packet header
to reveal the origin address of the baseline that congestion
notification has accumulated from. It is not necessary for the
network and lower layers to know the address of the Load Regulator.
Only the destination transport needs to know that. With congestion
notification, the network and link layers only notify congestion
forwards, they aren't involved in feeding it backwards. If they are,
e.g. backward congestion notification (BCN) in Ethernet [802.1au],
that should be considered as a transport function added to the lower
layer, which must sort out its own addressing. Indeed, this is one
reason why ICMP source quench is now deprecated [RFC1254]; when
congestion occurs within a tunnel it is complex (particularly in the
case of IPsec tunnels) to return the ICMP messages beyond the tunnel
ingress back to the Load Regulator .
Similarly, if a management system is monitoring congestion and needs
to know the baseline of congestion notification, the management
system has to find this out from the transport; in general it cannot
tell solely by looking at the network or link layer headers.
We have said that a tunnel ingress that is not a Load Regulator
SHOULD (as opposed to MUST) copy incoming congestion notification
into an outer encapsulating header that supports it. In the case of
2-bit ECN, the IETF security area have deemed the benefit always
outweighs the risk. Therefore for 2-bit ECN we can and we will say
Briscoe Expires January 1, 2008 [Page 10]
Internet-Draft ECN Tunnelling June 2007
'MUST' (Section 5). But in this section where we are setting down
general design principles, we leave it as a 'SHOULD'. This allows
for future multi-bit congestion notification fields where the risk
from the covert channel created by copying congestion notification
might outweigh the congestion control benefit of copying.
5. Default ECN Tunnelling Rules
The following ECN tunnel processing rules are the default for a
packet with any DSCP. If required, different ECN processing rules
MAY be defined for the appropriate Diffserv PHB using the guidelines
in Section 4.
When a tunnel ingress creates an encapsulating IP header, the 2-bit
ECN field of the inner IP header MUST be copied into the outer IP
header, for all types of IP in IP tunnel (except if the tunnel
ingress is in compatibility mode--see Section 6). If the tunnel
ingress is also a Load Regulator, it MUST instead reset the outer
header to ECT(0).
To decapsulate the inner header at the tunnel egress, the outgoing
inner header MUST be calculated from the combination of the incoming
inner and outer headers setting the outgoing ECN field to the
codepoints displayed in the body of Table 1.
+--Incoming Outer Header---
+--------------------+---------+------------+-----------+-----------+
| Incoming Inner | Not-ECT | ECT(0) | ECT(1) | CE |
| Header | | | | |
+--------------------+---------+------------+-----------+-----------+
| Not-ECT | Not-ECT | drop (!!!) | drop(!!!) | drop(!!!) |
| ECT(0) | ECT(0) | ECT(0) | ECT(0) | CE |
| ECT(1) | ECT(1) | ECT(1) | ECT(1) | CE |
| CE | CE | CE (!!!) | CE (!!!) | CE |
+--------------------+---------+------------+-----------+-----------+
+-----Outgoing Header------
Table 1: IP in IP Decapsulation
The exclamation marks '(!!!)' in Table 1 indicate that this
combination of inner and outer headers should not be possible if only
legal transitions have taken place. So, the decapsulator should drop
or mark the ECN field as the table specifies, but it MAY also raise
an appropriate alarm. It MUST NOT raise an alarm so often that the
illegal combinations would amplify into a flood of alarm messages.
Briscoe Expires January 1, 2008 [Page 11]
Internet-Draft ECN Tunnelling June 2007
6. Backward Compatibility
A legacy tunnel egress may not know how to process an ECN field, so
it will most likely simply disregard all outer headers. Therefore,
unless a compliant tunnel ingress has established that the tunnel
egress understands ECN processing, it MUST only send packets with the
ECN field set to Not-ECT in the outer header. Otherwise, if ECN
capable outer headers were sent towards a legacy egress, it would
dangerously remove information about congestion experienced within
the tunnel.
A tunnel ingress may establish whether its tunnel egress will
understand ECN processing by configuration or by negotiation. Note
that a [RFC4301] tunnel ingress that has used IKEv2 key management
[RFC4306] can guarantee that the tunnel egress is also RFC4301-
compliant and therefore need not negotiate ECN capabilities.
To be compliant with this specification a tunnel ingress that does
not know the egress ECN capability (e.g. by configuration) MUST
implement a 'normal' mode and a 'compatibility' mode, and it MUST
initiate each negotiated tunnel in compatibility mode. On the other
hand, a compliant tunnel egress MUST merely implement the one
behaviour in Section 5, which we term 'full-functionality' mode.
Before switching to normal mode, a compliant tunnel ingress that does
not know the egress ECN capability (e.g. by configuration) MUST
negotiate with the tunnel egress to establish whether the egress is
in full functionality mode. If the egress is in full functionality
mode, the ingress puts itself into normal mode. In normal mode the
ingress follows the encapsulation rule in Section 5 (i.e. it copies
the inner ECN field into the outer header). If the egress is not in
full-functionality mode or doesn't understand the question, the
tunnel ingress MUST remain in compatibility mode.
A tunnel ingress in compatibility mode MUST set all outer headers to
Not-ECT.
The decapsulation rules for the egress of the tunnel in Section 5
have been defined in such a way that congestion control will still
work safely if any of the earlier versions of ECN processing are used
unilaterally at the encapsulating ingress of the tunnel. If a tunnel
ingress tries to negotiate to use limited functionality mode or full
functionality mode, a decapsulating tunnel egress compliant with this
specification MUST agree to the request, even though its behaviour
will be the same in both cases. For 'forward compatibility', a
compliant tunnel egress MUST raise a warning about any requests to
enter modes it doesn't recognise, but it can continue operating. If
no ECN-related mode is requested, no error or warning need be raised
Briscoe Expires January 1, 2008 [Page 12]
Internet-Draft ECN Tunnelling June 2007
as the egress behaviour is compatible with all the legacy ingress
behaviours that don't negotiate capabilities.
Note that if a compliant node is the ingress for multiple tunnels, a
mode setting will need to be stored for each tunnel ingress.
However, if a node is the egress for multiple tunnels, none of the
tunnels will need to store a mode setting, because a compliant egress
can only be in one mode.
7. Changes from Earlier RFCs
The rule that a tunnel ingress MUST copy any ECN field into the outer
header is a change to RFC3168 (unless it is a Load Regulator as well,
in which case there is no change).
The rules for calculating the outgoing ECN field on decapsulation at
a tunnel egress are in line with the full functionality mode of ECN
in RFC3168 and with RFC4301, except that neither identified the need
to raise an alarm if the inner header was CE but the outer header was
ECT.
The rules for how a tunnel establishes whether the egress has full
functionality ECN capabilities are an update to RFC3168. For all the
typical cases, RFC4301 is not updated by the ECN capability check in
this specification, because a typical RFC4301 tunnel ingress will
have already established that it is talking to an RFC4301 tunnel
egress (e.g. if it uses IKEv2). However, there may be some corner
cases (e.g. manual keying) where an RFC4301 tunnel ingress talks with
an egress with limited functionality ECN handling. For such corner
cases, the requirement to use compatibility mode in this
specification updates RFC4301.
The optional ECN Tunnel field in the IPsec security association
database (SAD) and the optional ECN Tunnel Security Association
Attribute defined in RFC3168 are no longer needed. The security
association (SA) has no policy on ECN usage, because all RFC4301
tunnels now support ECN without any policy choice.
RFC3168 defines a (required) limited functionality mode and an
(optional) full functionality mode for a tunnel, but RFC4301 doesn't
need modes. In this specification only the ingress might need two
modes, unlike the modes of RFC3168 that were properties of the pair
of tunnel endpoints after negotiation.
All these ECN processing rules update RFC2003 on IP in IP tunnelling.
Briscoe Expires January 1, 2008 [Page 13]
Internet-Draft ECN Tunnelling June 2007
8. IANA Considerations
This memo includes no request to IANA.
9. Security Considerations
Section 3.1 discusses the security constraints imposed on ECN tunnel
processing. The Design Principles of Section 4 trade-off between
security (covert channels) and congestion monitoring & control. In
fact, ensuring congestion markings are not lost is itself another
aspect of security, because if we allowed congestion notification to
be lost, any attempt to enforce a response to congestion would be
much harder.
We keep the behaviour defined in both RFC3168 and RFC4301 where, if
the inner and outer headers carry contradictory ECT values the inner
header is preserved for onward forwarding. However, in writing this
document we noticed this behaviour would hide illegal suppression of
congestion notification from the detection mechanism designed for
this attack. One reason two ECT codepoints were defined was to
enable the source to detect if a CE marking had been applied then
subsequently removed. The source could detect this by weaving a
pseudo-random sequence of ECT(0) and ECT(1) values into a stream of
packets [RFC3540]. With the rules as they stand in RFC3168 and
RFC4301, within a tunnel a CE marking could be added and subsequently
removed by a non-compliant node without detection, because the
evidence of such misbehaviour is removed by the decapsulator.
We could have specified that an outer header value of ECT should
overwrite a contradictory ECT value in the inner header to close this
loophole. But we chose not to for two reasons: i) we wanted to avoid
any changes to IPsec tunnelling behaviour; ii) allowing ECT values in
the outer header to override the inner header would have increased
the bandwidth of the covert channel through the egress gateway from 1
to 1.5 bit per datagram, potentially threatening to upset the
consensus established in the security area that says that the
bandwidth of this covert channel can now be safely managed.
10. Conclusions
This document updates the tunnelling treatment of RFC3168 ECN for all
IP in IP tunnels to bring it into line with the new behaviour in the
IPsec architecture of RFC4301.
At the tunnel egress, header decapsulation for the default ECN
marking behaviour is broadly unchanged except that one exceptional
Briscoe Expires January 1, 2008 [Page 14]
Internet-Draft ECN Tunnelling June 2007
case has been catered for. At the ingress, for all forms of IP in IP
tunnel, encapsulation has been brought into line with the new IPsec
rules in RFC4301 which copy rather than reset CE markings when
creating outer headers. Previously, upstream congestion information
was not revealed in the outer header, which limited the scope of some
management monitoring techniques and prevented certain active queue
management algorithms from taking account of upstream congestion
markings. The change ensures all IP in IP tunnels reflect the more
relaxed attitude to revealing congestion information in the new IPsec
architecture, which now deems that the threat from 2-bit covert
channels can be managed without disabling ECN.
Also, this document defines more generic principles to guide the
design of alternate forms of tunnel processing of congestion
notification, if required for specific Diffserv PHBs (such as will be
required for the PCN working group) or for other lower layer
encapsulating protocols that might support congestion notification in
the future (e.g. MPLS).
11. Acknowledgements
Thanks to David Black, Bruce Davie, Toby Moncaster and Gabriele
Corliano for their careful review comments.
12. Comments Solicited
Comments and questions are encouraged and very welcome. They can be
addressed to the IETF Transport Area working group mailing list
<tsvwg@ietf.org>, and/or to the authors.
13. References
13.1. Normative References
[RFC2003] Perkins, C., "IP Encapsulation within IP", RFC 2003,
October 1996.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC2474] Nichols, K., Blake, S., Baker, F., and D. Black,
"Definition of the Differentiated Services Field (DS
Field) in the IPv4 and IPv6 Headers", RFC 2474,
December 1998.
Briscoe Expires January 1, 2008 [Page 15]
Internet-Draft ECN Tunnelling June 2007
[RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
of Explicit Congestion Notification (ECN) to IP",
RFC 3168, September 2001.
[RFC4301] Kent, S. and K. Seo, "Security Architecture for the
Internet Protocol", RFC 4301, December 2005.
13.2. Informative References
[802.1au] "IEEE Standard for Local and Metropolitan Area Networks--
Virtual Bridged Local Area Networks - Amendment 10:
Congestion Notification", 2006,
<http://www.ieee802.org/1/pages/802.1au.html>.
(Work in Progress; Access Controlled link within page)
[BBnet] Sexton, M. and A. Reid, "Broadband Networking: {ATM},
{SDH} and {SONET}", Artech House telecommunications
library ISBN: 0-89006-578-0, 1997.
[I-D.ietf-tsvwg-ecn-mpls]
Davie, B., "Explicit Congestion Marking in MPLS",
draft-ietf-tsvwg-ecn-mpls-00 (work in progress),
March 2007.
[I-D.rosen-pwe3-congestion]
Rosen, E., "Pseudowire Congestion Control Framework",
draft-rosen-pwe3-congestion-04 (work in progress),
October 2006.
[PCN-arch]
Eardley, P., Babiarz, J., Chan, K., Charny, A., Geib, R.,
Karagiannis, G., Menth, M., and T. Tsou, "Pre-Congestion
Notification Architecture",
draft-eardley-pcn-architecture-00 (work in progress),
June 2007.
[PCNcharter]
IETF, "Congestion and Pre-Congestion Notification (pcn)",
IETF w-g charter , Feb 2007,
<http://www.ietf.org/html.charters/pcn-charter.html>.
[RFC1254] Mankin, A. and K. Ramakrishnan, "Gateway Congestion
Control Survey", RFC 1254, August 1991.
[RFC1701] Hanks, S., Li, T., Farinacci, D., and P. Traina, "Generic
Routing Encapsulation (GRE)", RFC 1701, October 1994.
Briscoe Expires January 1, 2008 [Page 16]
Internet-Draft ECN Tunnelling June 2007
[RFC2205] Braden, B., Zhang, L., Berson, S., Herzog, S., and S.
Jamin, "Resource ReSerVation Protocol (RSVP) -- Version 1
Functional Specification", RFC 2205, September 1997.
[RFC2637] Hamzeh, K., Pall, G., Verthein, W., Taarud, J., Little,
W., and G. Zorn, "Point-to-Point Tunneling Protocol",
RFC 2637, July 1999.
[RFC2661] Townsley, W., Valencia, A., Rubens, A., Pall, G., Zorn,
G., and B. Palter, "Layer Two Tunneling Protocol "L2TP"",
RFC 2661, August 1999.
[RFC3426] Floyd, S., "General Architectural and Policy
Considerations", RFC 3426, November 2002.
[RFC3540] Spring, N., Wetherall, D., and D. Ely, "Robust Explicit
Congestion Notification (ECN) Signaling with Nonces",
RFC 3540, June 2003.
[RFC4306] Kaufman, C., "Internet Key Exchange (IKEv2) Protocol",
RFC 4306, December 2005.
[RFC4423] Moskowitz, R. and P. Nikander, "Host Identity Protocol
(HIP) Architecture", RFC 4423, May 2006.
[Shayman] "Using ECN to Signal Congestion Within an MPLS Domain",
2000, <http://www.ee.umd.edu/~shayman/papers.d/
draft-shayman-mpls-ecn-00.txt>.
(Expired)
Appendix A. In-path Load Regulation
In the traditional Internet architecture one tends to think of the
source host as the Load Regulator for a path. It is generally not
desirable or practical for a node part way along the path to regulate
the load. However, various reasonable proposals for in-path load
regulation have been made from time to time (e.g. fair queuing,
traffic engineering). Also the IETF has recently chartered a working
group to standardise admission control across a part of a path using
pre-congestion notification (PCN) [PCNcharter], which involves in-
path load regulation. This is of particular relevance here because
it involves congestion notification with an in-path Load Regulator
and it can involve tunnelling.
We will use the more complex scenario in Figure 3 to tease out all
the issues that arise when combining congestion notification and
Briscoe Expires January 1, 2008 [Page 17]
Internet-Draft ECN Tunnelling June 2007
tunnelling with various possible in-path load regulation schemes. In
this case 'I1' and 'E2' break up the path into three separate
congestion control loops. The feedback for these loops is shown
going right to left across the top of the figure. The 'V's are arrow
heads representing the direction of feedback, not letters. But there
are also two tunnels within the middle control loop: 'I1' to 'E1' and
'I2' to 'E2'. The two tunnels might be VPNs, perhaps over two MPLS
core networks. M is a congestion monitoring point, perhaps between
two border routers where the same tunnel continues unbroken across
the border.
______ _______________________________________ _____
/ \ / \ / \
V \ V M \ V \
A--->R--->I1===========>E1----->I2=========>==========>E2------->B
Figure 3: complex Tunnel Scenario
The question is, should the congestion markings in the outer exposed
headers of a tunnel represent congestion only since the tunnel
ingress or over the whole upstream path from the source of the inner
header (whatever that may mean)? Or put another way, should 'I1' and
'I2' copy or reset CE markings?
The answer is that the baseline of congestion marking should be the
nearest upstream interface designed to regulate traffic load--the
Load Regulator. In Figure 3 'A', 'I1' or 'E2' are all Load
Regulators. We have shown the feedback loops returning to each of
these nodes so that they can regulate the load causing the congestion
notification. So the baseline for congestion markings exposed to M
should be 'I1' (the Load Regulator), not 'I2'. That is, 'I2' SHOULD
copy any CE marking into the outer header it creates, while 'I1' is
an exception because it is an in-path load regulator, so it should
reset the ECN field in the outer header it creates.
The following further examples illustrate how this answer might be
applied:
o Preemption marking is currently defined for PCN [PCN-arch] so that
the rate of unmarked packets at the end of a path of multiple
bottlenecks determines the maximum sustainable aggregate bit rate
over that path. To produce the correct marking by the end, each
congested node must only consider packets to be eligible for
marking if they have not already been marked by any previous
bottleneck along a path that may span multiple tunnels (including
MPLS encapsulations etc.). This scheme only results in the
correct marking rate if the markings accumulated so far along the
path are copied into the outer exposed header of each tunnel or
encapsulation. Consider that 'I1' and 'E2' in the complex
Briscoe Expires January 1, 2008 [Page 18]
Internet-Draft ECN Tunnelling June 2007
scenario of Figure 3 are edge gateways of a PCN region. Admission
control based on PCN measurements is a form of load regulation, so
'I1' regulates the load on the PCN region. Therefore 'I1' should
be the baseline of congestion marking for _both_ tunnels within
the scope of its feedback loop. Therefore 'I2' should follow the
normal rules and copy congestion marking into the outer tunnel
header, while 'I1' is an exception because it is also a load
regulator, so it should reset CE markings in the outer header.
o [Shayman] suggested feedback of ECN accumulated across an MPLS
domain could cause the ingress to trigger re-routing to mitigate
congestion. This case is more like the simple scenario of
Figure 2, with a feedback loop across the MPLS domain ('E' back to
'I'). The baseline for congestion exposed in outer headers in
this case will be the tunnel ingress, which should therefore reset
the ECN field in the outer headers it creates. But the reason it
should act as the baseline is because it is an in-path load
regulator (re-routing around congestion is a load regulation
function), not just because it is a tunnel ingress.
o The PWE3 working group of the IETF is considering the problem of
how and whether an aggregate private wire emulation should respond
to congestion [I-D.rosen-pwe3-congestion]. Although the study is
still at the requirements stage, some (controversial) solution
proposals include in-path load regulation at the ingress to the
tunnel that could lead to tunnel arrangements with similar
complexity to that of Figure 3.
These are not contrived scenarios--they could be a lot worse. For
instance, a host may create a tunnel for IPsec which is placed inside
a tunnel for Mobile IP over a remote part of its path. And around
this all we may have MPLS labels being pushed and popped as packets
pass across different core networks. Similarly, it is possible that
subnets could be built from link technology (e.g. ethernet switches)
so that link headers being added and removed could involve congestion
notification in future link headers with all the same issues as with
IP in IP tunnels.
The reason we introduced the concept of a Load Regulator was to allow
for in-path load regulation. In the traditional Internet
architecture one tends to think of a host and a Load Regulator as
synonymous, but when considering tunnelling, even the definition of a
host is too fuzzy, whereas a Load Regulator is a clearly defined
function. Similarly, the concept of innermost header is too fuzzy to
be able to (wrongly) say that the source address of the innermost
header should be the baseline. Which is the innermost header when
multiple encapsulations may be in use? Where do we stop? If we say
the original source in the above IPsec-Mobile IP case is the host,
Briscoe Expires January 1, 2008 [Page 19]
Internet-Draft ECN Tunnelling June 2007
how do we know it isn't tunnelling an encrypted packet stream on
behalf of another host in a p2p network?
The reason there has been so much confusion over the question of
whether a tunnel ingress should copy or reset CE markings is that we
have become used to thinking that only hosts regulate load. The end
to end design principle advises that this is a good idea [RFC3426],
but it also advises that it is only a guiding principle intended to
make the designer think very carefully before breaking it. We do
have proposals where load regulation functions sit within a network
path for good, if sometimes controversial, reasons, e.g. PCN edge
admission control gateways [PCN-arch] or traffic engineering
functions at domain borders to re-route around congestion [Shayman].
Author's Address
Bob Briscoe
BT
B54/77, Adastral Park
Martlesham Heath
Ipswich IP5 3RE
UK
Phone: +44 1473 645196
Email: bob.briscoe@bt.com
URI: http://www.cs.ucl.ac.uk/staff/B.Briscoe/
Briscoe Expires January 1, 2008 [Page 20]
Internet-Draft ECN Tunnelling June 2007
Full Copyright Statement
Copyright (C) The IETF Trust (2007).
This document is subject to the rights, licenses and restrictions
contained in BCP 78, and except as set forth therein, the authors
retain all their rights.
This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Intellectual Property
The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights
might or might not be available; nor does it represent that it has
made any independent effort to identify any such rights. Information
on the procedures with respect to rights in RFC documents can be
found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any
assurances of licenses to be made available, or the result of an
attempt made to obtain a general license or permission for the use of
such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository at
http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF at
ietf-ipr@ietf.org.
Acknowledgments
Funding for the RFC Editor function is provided by the IETF
Administrative Support Activity (IASA). This document was produced
using xml2rfc v1.32 (of http://xml.resource.org/) from a source in
RFC-2629 XML format.
Briscoe Expires January 1, 2008 [Page 21]