PWE3 S. Bryant, Ed.
Internet-Draft C. Filsfils
Intended status: Standards Track Cisco Systems
Expires: November 7, 2011 U. Drafz
Deutsche Telekom
V. Kompella
J. Regan
Alcatel-Lucent
S. Amante
Level 3 Communications
May 6, 2011
Flow Aware Transport of Pseudowires over an MPLS Packet Switched Network
draft-ietf-pwe3-fat-pw-06
Abstract
Where the payload of a pseudowire comprises a number of distinct
flows, it can be desirable to carry those flows over the equal cost
multiple paths (ECMPs) that exist in the packet switched network.
Most forwarding engines are able to hash based on MPLS label stacks
and use this mechanism to balance MPLS flows over ECMPs.
This document describes a method of identifying the flows, or flow
groups, within pseudowires such that Label Switching Routers can
balance flows at a finer granularity than individual pseudowires.
The mechanism uses an additional label in the MPLS label stack. END
Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC2119 [RFC2119].
Status of this Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
Bryant, et al. Expires November 7, 2011 [Page 1]
Internet-Draft FAT-PW May 2011
material or to cite them other than as "work in progress."
This Internet-Draft will expire on November 7, 2011.
Copyright Notice
Copyright (c) 2011 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Bryant, et al. Expires November 7, 2011 [Page 2]
Internet-Draft FAT-PW May 2011
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1. ECMP in Label Switching Routers . . . . . . . . . . . . . 5
1.2. Flow Label . . . . . . . . . . . . . . . . . . . . . . . . 5
2. Native Service Processing Function . . . . . . . . . . . . . . 6
3. Pseudowire Forwarder . . . . . . . . . . . . . . . . . . . . . 6
3.1. Encapsulation . . . . . . . . . . . . . . . . . . . . . . 7
4. Signaling the Presence of the Flow Label . . . . . . . . . . . 8
4.1. Structure of Flow Label Sub-TLV . . . . . . . . . . . . . 9
5. Static Pseudowires . . . . . . . . . . . . . . . . . . . . . . 9
6. Multi-Segment Pseudowires . . . . . . . . . . . . . . . . . . 10
7. OAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
8. Applicability of PWs using Flow Labels . . . . . . . . . . . . 11
8.1. Equal Cost Multiple Paths . . . . . . . . . . . . . . . . 12
8.2. Link Aggregation Groups . . . . . . . . . . . . . . . . . 13
8.3. Multiple RSVP-TE Paths . . . . . . . . . . . . . . . . . . 13
8.4. The Single Large Flow Case . . . . . . . . . . . . . . . . 14
8.5. Applicability to MPLS-TP . . . . . . . . . . . . . . . . . 15
8.6. Asymmetric Operation . . . . . . . . . . . . . . . . . . . 15
9. Applicability to MPLS LSPs . . . . . . . . . . . . . . . . . . 15
10. Security Considerations . . . . . . . . . . . . . . . . . . . 16
11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16
12. Congestion Considerations . . . . . . . . . . . . . . . . . . 16
13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 17
14. References . . . . . . . . . . . . . . . . . . . . . . . . . . 17
14.1. Normative References . . . . . . . . . . . . . . . . . . . 17
14.2. Informative References . . . . . . . . . . . . . . . . . . 18
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 19
Bryant, et al. Expires November 7, 2011 [Page 3]
Internet-Draft FAT-PW May 2011
1. Introduction
A pseudowire (PW) [RFC3985] is normally transported over one single
network path, even if multiple Equal Cost Multiple Paths (ECMP) exit
between the ingress and egress PW provider edge (PE)
equipments[RFC4385] [RFC4928]. This is required to preserve the
characteristics of the emulated service (e.g. to avoid misordering
SAToP PW packets [RFC4553] or subjecting the packets to unusable
inter-arrival times ). The use of a single path to preserve order
remains the default mode of operation of a PW. The new capability
proposed in this document is an OPTIONAL mode which may be used when
the use of ECMP is known to be beneficial (and not harmful) to the
operation of the PW.
Some PWs are used to transport large volumes of IP traffic between
routers. One example of this is the use of an Ethernet PW to create
a virtual direct link between a pair of routers. Such PWs may carry
from hundred's of Mbps to Gbps of traffic. These PWs only require
packet ordering to be preserved within the context of each individual
transported IP flow. They do not require packet ordering to be
preserved between all packets of all IP flows within the pseudowire.
The ability to explicitly configure such a PW to leverage the
availability of multiple ECMPs allows for better capacity planning as
the statistical multiplexing of a larger number of smaller flows is
more efficient than with a smaller set of larger flows.
Typically, forwarding hardware can deduce that an IP payload is being
directly carried by an MPLS label stack, and it is capable of looking
at some fields in packets to construct hash buckets for conversations
or flows. However, when the MPLS payload is a PW, an intermediate
node has no information on the type PW being carried in the packet.
This limits the forwarder at the intermediate node to only being able
to make an ECMP choice based on a hash of the label stack. In the
case of a PW emulating a high bandwidth trunk, the granularity
obtained by hashing the label stack is inadequate for satisfactory
load-balancing. The ingress node, however, is in the special
position of being able to look at the un-encapsulated packet and
spread flows amongst any available ECMPs, or even any Loop-Free
Alternates [RFC5286] . This document defines a method to introduce
granularity on the hashing of traffic running over PWs by introducing
an additional label, chosen by the ingress node, and placed at the
bottom of the label stack.
In addition to providing an indication of the flow structure for use
in ECMP forwarding decisions, the mechanism described in the document
may also be used to select flows for distribution over an 802.1ad
link aggregation group that has been used in an MPLS network.
Bryant, et al. Expires November 7, 2011 [Page 4]
Internet-Draft FAT-PW May 2011
NOTE: Although Ethernet is frequently referenced as a use case in
this RFC, the mechanisms described in this document are general
mechanisms that may be applied to any PW type in which there are
identifiable flows, and in which there is no requirement to preserve
the order between those flows.
1.1. ECMP in Label Switching Routers
Label switching routers (LSRs) commonly generate a hash of the label
stack or some elements of the label stack as a method of
discriminating between flows, and use this to distribute those flows
over the available ECMPs that exist in the network. Since the label
at the bottom of stack is usually the label most closely associated
with the flow, this normally provides the greatest entropy, and hence
is usually included in the hash. This document describes a method of
adding an additional label stack entry (LSE) at the bottom of stack
in order to facilitate the load balancing of the flows within a PW
over the available ECMPs. A similar design for general MPLS use has
also been proposed [I-D.kompella-mpls-entropy-label], Section 9.
An alternative method of load balancing by creating a number of PWs
and distributing the flows amongst them was considered, but was
rejected because:
o It did not introduce as much entropy as can be introduced by
adding an additional LSE.
o It required additional PWs to be set up and maintained.
1.2. Flow Label
An additional LSE [RFC3032] is interposed between the PW LSE and the
control word, or if the control word is not present, between the PW
LSE and the PW payload. This additional LSE is called the flow LSE
and the label carried by the flow LSE is called the flow label.
Indivisible flows within the PW MUST be mapped to the same flow label
by the ingress PE. The flow label stimulates the correct ECMP load
balancing behaviour in the packet switched network (PSN). On receipt
of the PW packet at the egress PE (which knows flow LSE is present)
the flow LSE is discarded without processing.
Note that the flow label MUST NOT be an MPLS reserved label (values
in the range 0..15) [RFC3032], but is otherwise unconstrained by the
protocol.
Considerations of the TTL value are described in the Security section
of this document. The flow LSE can never become the top LSE in
normal operation, and hence the TTL in the flow LSE is never used to
Bryant, et al. Expires November 7, 2011 [Page 5]
Internet-Draft FAT-PW May 2011
determine whether the packet should be discarded due to TTL expiry.
Therefore there are restrictions on the TTL value.
This document does not define a use for the TC bits (formerly known
as the EXP bits) in the flow label. Future documents may define a
use for these bits, therefore implementations conforming to this
specification MUST set the TC bits to zero at the ingress and MUST
ignore them at the egress.
2. Native Service Processing Function
The Native Service Processing (NSP) function [RFC3985] is a component
of a PE that has knowledge of the structure of the emulated service
and is able to take action on the service outside the scope of the
PW. In this case it is required that the NSP in the ingress PE
identify flows, or groups of flows within the service, and indicate
the flow (group) identity of each packet as it is passed to the
pseudowire forwarder. As an example, where the PW type is an
Ethernet, the NSP might parse the ingress Ethernet traffic and
consider all of the IP traffic. This traffic could then be
categorised into flows by considering all traffic with the same
source and destination address pair to be a single indivisible flow.
Since this is an NSP function, by definition, the method used to
identify a flow is outside the scope of the PW design. Similarly,
since the NSP is internal to the PE, the method of flow indication to
the PW forwarder is outside the scope of this document.
3. Pseudowire Forwarder
The PW forwarder must be provided with a method of mapping flows to
load balanced paths.
The forwarder must generate a label for the flow or group of flows.
How the flow label values are determined is outside the scope of this
document, however the flow label allocated to a flow MUST NOT be an
MPLS reserved label and SHOULD remain constant for the life of the
flow. It is RECOMMENDED that the method chosen to generate the load
balancing labels introduces a high degree of entropy in their values,
to maximise the entropy presented to the ECMP selection mechanism in
the LSRs in the PSN, and hence distribute the flows as evenly as
possible over the available PSN ECMP. The forwarder at the ingress
PE prepends the PW control word (if applicable), and then pushes the
flow label, followed by the PW label.
NOTE: Although this document does not attempt to specify any hash
algorithms, it is suggested that any such algorithm should be based
Bryant, et al. Expires November 7, 2011 [Page 6]
Internet-Draft FAT-PW May 2011
on the assumption that there will be a high degree of entropy in the
values assigned to the load balancing labels.
The forwarder at the egress PE uses the pseudowire label to identify
the pseudowire. From the context associated with the pseudowire
label, the egress PE can determine whether a flow LSE is present. If
a flow LSE is present, it MUST be checked to determine whether it
carries a reserved label. If it is a reserved label the packet is
processed according to the rules associated with that reserved label,
otherwise the LSE is discarded.
All other PW forwarding operations are unmodified by the inclusion of
the flow LSE.
3.1. Encapsulation
The PWE3 Protocol Stack Reference Model modified to include flow LSE
is shown in Figure 1 below
+-------------+ +-------------+
| Emulated | | Emulated |
| Ethernet | | Ethernet |
| (including | Emulated Service | (including |
| VLAN) |<==============================>| VLAN) |
| Services | | Services |
+-------------+ +-------------+
| Flow | | Flow |
+-------------+ Pseudowire +-------------+
|Demultiplexer|<==============================>|Demultiplexer|
+-------------+ +-------------+
| PSN | PSN Tunnel | PSN |
| MPLS |<==============================>| MPLS |
+-------------+ +-------------+
| Physical | | Physical |
+-----+-------+ +-----+-------+
Figure 1: PWE3 Protocol Stack Reference Model
The encapsulation of a PW with a flow LSE is shown in Figure 2 below
Bryant, et al. Expires November 7, 2011 [Page 7]
Internet-Draft FAT-PW May 2011
+---------------------------+
| |
| Payload |
| | n octets
| |
+---------------------------+
| Optional Control Word | 4 octets
+---------------------------+
| Flow LSE | 4 octets
+---------------------------+
| PW LSE | 4 octets
+---------------------------+
| MPLS Tunnel LSE (s) | n*4 octets (four octets per LSE)
+---------------------------+
Figure 2: Encapsulation of a pseudowire with a pseudowire flow LSE
4. Signaling the Presence of the Flow Label
When using the signalling procedures in [RFC4447], a new Pseudowire
Interface Parameter Sub-TLV, the Flow Label Sub-TLV (FL Sub-TLV), is
used to synchronise the flow label states between the ingress and
egress PEs.
The absence of a FL Sub-TLV indicates that the PE is unable process
flow labels. A PE that is using PW signalling and that does not send
a FL Sub-TLV MUST NOT include a flow label in the PW packet. A PE
that is using PW signalling and which does not receive a FL Sub-TLV
from its peer MUST NOT include a flow label in the PW packet. This
preserves backwards compatibility with existing PW specifications.
A PE that wishes to send a flow label in a PW packet MUST include in
its label mapping message a FL Sub-TLV with T = 1 (see Section 4.1).
A PE that is willing to receive a flow label MUST include in its
label mapping message a FL Sub-TLV with R = 1 (see Section 4.1).
A PE that receives a label mapping message a FL Sub-TLV with R = 0
MUST NOT include a flow label in the PW packet.
Thus a PE sending a FL Sub-TLV with T = 1 and receiving a FL Sub-TLV
with R = 1 MUST include a flow label in the PW packet. Under all
other combinations of FL Sub-TLV signalling a PE MUST NOT include a
flow label in the PW packet.
Bryant, et al. Expires November 7, 2011 [Page 8]
Internet-Draft FAT-PW May 2011
The signalling procedures in [RFC4447] state that "Processing of the
interface parameters should continue when unknown interface
parameters are encountered, and they MUST be silently ignored." The
signalling procedure described here is therefore backwards compatible
with existing implementations.
Note that what is signalled is the desire to include the flow LSE in
the label stack. The value of the flow label is a local matter for
the ingress PE, and the label value itself is not signalled.
4.1. Structure of Flow Label Sub-TLV
The structure of the flow label TLV is shown in Figure 3.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| FL=0x17 | Length |T|R| Reserved |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 3: Flow Label Sub-TLV
Where:
o FL (value 0x17) is the flow label sub-TLV identifier assigned by
IANA (seeSection 11 ).
o Length is the length of the TLV in octets and is 4.
o When T=1 the PE is requesting the ability to send a PW packet that
includes a flow label. When T= 0, the PE is indicating that it
will not send a PW packet containing a flow label.
o When R=1 the PE is able to receive a PW packet with a flow label
present. When R=0 the PE is unable to receive a PW packet with
the flow label present.
o Reserved bits MUST be zero on transmit and MUST be ignored on
receive.
5. Static Pseudowires
If PWE3 signalling [RFC4447] is not in use for a PW, then whether the
flow label is used MUST be identically provisioned in both PEs at the
PW endpoints. If there is no provisioning support for this option,
the default behaviour is not to include the flow label.
Bryant, et al. Expires November 7, 2011 [Page 9]
Internet-Draft FAT-PW May 2011
6. Multi-Segment Pseudowires
The flow label mechanism described in this document works on multi-
segment PWs without requiring modification to the Switching PEs
(S-PEs). This is because the flow LSE is transparent to the label
swap operation, and because interface parameter Sub-TLV signalling is
transitive.
7. OAM
The following OAM considerations apply to this method of load
balancing.
Where the OAM is only to be used to perform a basic test that the PWs
have been configured at the PEs, VCCV [RFC5085] messages may be sent
using any load balance PW path, i.e. using any value for the flow
label.
Where it is required to verify that a pseudowire is fully functional
for all flows, VCCV [RFC5085] connection verification message MUST be
sent over each ECMP path to the pseudowire egress PE. This solution
may be difficult to achieve and scales poorly.
Under these circumstances, it may be sufficient to send VCCV messages
using any load balance pseudowire path because if a failure occurs
within the PSN the failure will normally be detected and repaired by
the PSN. That is, the PSN's Interior Gateway protocol (IGP) link/
node failure detection mechanism (loss of light, bidirectional
forwarding detection [RFC5880] or IGP hello detection), and the IGP
convergence will naturally modify the ECMP set of network paths
between the Ingress and Egress PE's. Hence the PW is only impacted
during the normal IGP convergence time. Note that this period may be
reduced if a fast re-route or fast convergence technology is deployed
in the network [RFC4090], [RFC5286].
If the failure is related to the individual corruption of a Label
Forwarding Information database (LFIB) entry in a router, then only
the network path using that specific entry is impacted. If the PW is
load balanced over multiple network paths, then this failure can only
be detected if, by chance, the transported OAM flow is mapped onto
the impacted network path, or if all paths are tested. Since testing
all paths may present problems as noted above, other mechanisms to
detect this type of error may need to be developed, such as an LSP
self test technology.
To troubleshoot the MPLS PSN, including multiple paths, the
techniques described in [RFC4378] and [RFC4379] can be used.
Bryant, et al. Expires November 7, 2011 [Page 10]
Internet-Draft FAT-PW May 2011
Where the PW OAM is carried out of band (VCCV Type 2) [RFC5085] it is
necessary to insert an "MPLS Router Alert Label" in the label stack.
The resultant label stack is a follows:
+-------------------------------+
| |
| VCCV Message | n octets
| |
+-------------------------------+
| Optional Control Word | 4 octets
+-------------------------------+
| Flow label | 4 octets
+-------------------------------+
| PW label | 4 octets
+-------------------------------+
| Router Alert label | 4 octets
+-------------------------------+
| MPLS Tunnel label(s) | n*4 octets (four octets per label)
+-------------------------------+
Figure 4: Use of Router Alert Label
Note that, depending on the number of labels hashed by the LSR, the
inclusion of the Router Alert label may cause the OAM packet to be
load balanced to a different path from that taken by the data packets
with identical Flow and PW labels.
8. Applicability of PWs using Flow Labels
A node within the PSN is not able to perform deep-packet-inspection
(DPI) of the PW as the PW technology is not self-describing: the
structure of the PW payload is only known to the ingress and egress
PE devices. The method proposed in this document provides a
statistical mitigation of the problem of load balance in those cases
where a PE is able to discern flows embedded in the traffic received
on the attachment circuit.
The methods described in this document are transparent to the PSN and
as such do not require any new capability from the PSN.
The requirement to load-balance over multiple PSN paths occurs when
the ratio between the PW access speed and the PSN's core link
bandwidth is large (e.g. >= 10%). ATM and FR are unlikely to meet
this property. Ethernet may have this property, and for that reason
this document focuses on Ethernet. Applications for other high-
Bryant, et al. Expires November 7, 2011 [Page 11]
Internet-Draft FAT-PW May 2011
access-bandwidth PW's (e.g. Fibre Channel) may be defined in the
future.
This design applies to MPLS PWs where it is meaningful to de-
construct the packets presented to the ingress PE into flows. The
mechanism described in this document promotes the distribution of
flows within the PW over different network paths. This in turn means
that whilst packets within a flow are delivered in order (subject to
normal IP delivery perturbations due to topology variation), order is
no longer maintained for all packets sent over the PW. It is not
proposed to associate a different sequence number with each flow. If
sequence number support is required the flow label mechanism MUST NOT
be used.
Where it is known that the traffic carried by the Ethernet PW is IP
the flows can be identified and mapped to an ECMP. Such methods
typically include hashing on the source and destination addresses,
the protocol ID and higher-layer flow-dependent fields such as TCP/
UDP ports, L2TPv3 Session IDs etc.
Where it is known that the traffic carried by the Ethernet PW is
non-IP, techniques used for link bundling between Ethernet switches
may be reused. In this case however the latency distribution would
be larger than is found in the link bundle case. The acceptability
of the increased latency is for further study. Of particular
importance the Ethernet control frames SHOULD always be mapped to the
same PSN path to ensure in-order delivery.
8.1. Equal Cost Multiple Paths
ECMP in packet switched networks is statistical in nature. The
mapping of flows to a particular path does not take into account the
bandwidth of the flow being mapped or the current bandwidth usage of
the members of the ECMP set. This simplification works well when the
distribution of flows is evenly spread over the ECMP set and there
are a large number of flows that have low bandwidth relative to the
paths. The random allocation of a flow to a path provides a good
approximation to an even spread of flows, provided that polarisation
effects are avoided. The method defined in this document has the
same statistical properties as an IP PSN.
ECMP is a load-sharing mechanism that is based on sharing the load
over a number of layer 3 paths through the PSN. Often however
multiple links exist between a pair of LSRs that are considered by
the IGP to be a single link. These are known as link bundles. The
mechanism described in this document can also be used to distribute
the flows within a PW over the members of the link bundle by using
the flow label value to identify candidate flows. How that mapping
Bryant, et al. Expires November 7, 2011 [Page 12]
Internet-Draft FAT-PW May 2011
takes place is outside the scope of this specification. Similar
considerations apply to link aggregation groups.
There is no mechanism currently defined to indicate the bandwidths in
use by specific flows using the fields of the MPLS shim header.
Furthermore, since the semantics of the MPLS shim header are fully
defined in [RFC3032] and [RFC5462], those fields cannot be assigned
semantics to carry this information. This document does not define
any semantic for use in the TTL or TC fields of the label entry that
carries the flow label, but requires that the flow label itself be
selected with a high degree of entropy suggesting that the label
value should not be overloaded with additional meaning in any
subsequent specification.
A different type of load balancing is the desire to carry a PW over a
set of PSN links in which the bandwidth of members of the link set is
less than the bandwidth of the PW. Proposals to address this problem
have been made in the past[I-D.stein-pwe3-pwbonding]. Such a
mechanism can be considered complementary to this mechanism.
8.2. Link Aggregation Groups
A Link Aggregation Group (LAG) is used to bond together several
physical circuits between two adjacent nodes so they appear to
higher-layer protocols as a single, higher bandwidth "virtual" pipe.
These may co-exist in various parts of a given network. An advantage
of LAGs is that they reduce the number of routing and signalling
protocol adjacencies between devices, reducing control plane
processing overhead. As with ECMP, the key problem related to LAGs
is that due to inefficiencies in LAG load-distribution algorithms, a
particular component of a LAG may experience congestion. The
mechanism proposed here may be able to assist in producing a more
uniform flow distribution.
The same considerations requiring a flow to go over a single member
of an ECMP set apply to a member of a LAG.
8.3. Multiple RSVP-TE Paths
In some networks it is desirable for a Label Edge Router (LER) to be
able to load balance a PW across multiple RSVP-TE tunnels. The flow
label mechanism described in this document may be used to provide the
LER with the required flow information, and necessary entropy to
provide this type of load balancing. An example of such a case is
the of the flow label mechanism in networks using a link bundle with
the all ones component [RFC4201].
Methods by which the LER is configured to apply this type of ECMP is
Bryant, et al. Expires November 7, 2011 [Page 13]
Internet-Draft FAT-PW May 2011
outside the scope of this document.
8.4. The Single Large Flow Case
Clearly the operator should make sure that the service offered using
PW technology and the method described in this document does not
exceed the maximum planned link capacity, unless it can be guaranteed
that it conforms to the Internet traffic profile of a very large
number of small flows.
If the NSP cannot access sufficient information to distinguish flows,
perhaps because the protocol stack required parsing further into the
packet than it is able, then the functionality described in this
document does not give any benefits. The most common case where a
single flow dominates the traffic on a PW is when it is used to
transport enterprise traffic. Enterprise traffic may well consist of
a single, large TCP flow, or encrypted flows that cannot be handled
by the methods described in this document.
An operator has four options under these circumstances:
1. The operator can choose to do nothing and the system will work as
it does without the flow label.
2. The operator can make the customer aware that the service
offering has a restriction on flow bandwidth and police flows to
that restriction. This would allow customers offering multiple
flows to use a larger fraction their access bandwidth, whilst
preventing a single flow from consuming a fraction of internal
link bandwidth that the operator considered excessive.
3. The operator could configure the ingress PE to assign a constant
flow label to all high bandwidth flows so that only one path was
affected by these flows,
4. The operator could configure the ingress PE to assign a random
flow label to all high bandwidth flows so as to minimise the
disruption to the network as a cost of out of order traffic to
the user.
The issues described above are mitigated by the following two
factors:
o Firstly, the customer of a high-bandwidth PW service has an
incentive to get the best transport service because an inefficient
use of the PSN leads to jitter and eventually to loss to the PW's
payload.
Bryant, et al. Expires November 7, 2011 [Page 14]
Internet-Draft FAT-PW May 2011
o Secondly, the customer is usually able to tailor their
applications to generate many flows in the PSN. A well-known
example is massive data transport between servers which use many
parallel TCP sessions. This same technique can be used by any
transport protocol: multiple UDP ports, multiple L2TPv3 Session
ID's, multiple GRE keys may be used to decompose a large flow into
smaller components. This approach may be applied to IPsec
[RFC4301] where multiple Security Parameters Indexes (SPIs) may be
allocated to the same security association.
8.5. Applicability to MPLS-TP
The MPLS Transport Profile (MPLS-TP) [RFC5654] requirement 44 states
that "MPLS-TP MUST support mechanisms that ensure the integrity of
the transported customer's service traffic as required by its
associated SLA. Loss of integrity may be defined as packet
corruption, reordering, or loss during normal network conditions. "
The flow aware transport of a PW reorders packets, therefore MUST NOT
be deployed in a network conforming to the MPLS-TP unless these
integrity requirements specified in the SLA can be satisfied.
8.6. Asymmetric Operation
The protocol defined in this document supports the asymmetric
inclusion of the flow LSE. Asymmetric operation can be expected when
there is asymmetry in the bandwidth requirements making it
unprofitable for one PE to perform the flow classification, or when
that PE is otherwise unable to perform the classification but is able
to receive flow labeled packet from its peer. Asymmetric operation
of the PW may also be required when one PE has a high transmission
bandwidth requirement, but has a need to receive the entire PW on a
single interface in order to perform a processing operation that
requires the context of the complete PW (for example policing of the
egress traffic).
9. Applicability to MPLS LSPs
A further application of this technique would be to create a basis
for hash diversity without having to peek below the label stack for
IP traffic carried over LDP LSPs. Work on the generalisation of this
to MPLS has been described in [I-D.kompella-mpls-entropy-label].
This is can be regarded as a complementary, but distinct, approach
since although similar consideration may apply to the identification
of flows and the allocation of flow label values, the flow labels are
imposed by different network components, and the associated
signalling mechanisms are different.
Bryant, et al. Expires November 7, 2011 [Page 15]
Internet-Draft FAT-PW May 2011
10. Security Considerations
The PW generic security considerations described in [RFC3985] and the
security considerations applicable to a specific PW type (for
example, in the case of an Ethernet PW [RFC4448] apply. The security
considerations in [RFC5920] also apply.
It is useful to give consideration to the choice of TTL value in the
flow LSE [RFC3032]. The flow LSE is at the bottom of label stack,
therefore, even when penultimate hop popping is employed, it will
always be will preceded by the PW label on arrival at the PE. If the
flow label is inadvertently examined as if it were a normal label,
the packet might be forwarded. This can be prevented by setting the
associated TTL to 1. Note that this may be a departure from
considerations that apply to the general MPLS case.
11. IANA Considerations
IANA is requested to amend the PW Interface Parameters Sub-TLV type
Registry value 0x17 (Flow Label indicator) to refer to this RFC.
Parameter Length Description
ID
0x17 4 Flow Label
12. Congestion Considerations
The congestion considerations applicable to PWs as described in
[RFC3985] and any additional congestion considerations developed at
the time of publication apply to this design.
The ability to explicitly configure a PW to leverage the availability
of multiple ECMP is beneficial to capacity planning as, all other
parameters being constant, the statistical multiplexing of a larger
number of smaller flows is more efficient than with a smaller number
of larger flows.
Note that if the classification into flows is only performed on IP
packets the behaviour of those flows in the face of congestion will
be as already defined by the IETF for packets of that type and no
additional congestion processing is required.
Where flows that are not IP are classified PW congestion avoidance
must be applied to each non-IP load balance group.
Bryant, et al. Expires November 7, 2011 [Page 16]
Internet-Draft FAT-PW May 2011
13. Acknowledgements
The authors wish to thank Eric Grey, Kireeti Kompella, Joerg
Kuechemann, Wilfried Maas, Luca Martini, Mark Townsley, and Lucy Yong
for valuable comments on this document.
14. References
14.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC3032] Rosen, E., Tappan, D., Fedorkow, G., Rekhter, Y.,
Farinacci, D., Li, T., and A. Conta, "MPLS Label Stack
Encoding", RFC 3032, January 2001.
[RFC4379] Kompella, K. and G. Swallow, "Detecting Multi-Protocol
Label Switched (MPLS) Data Plane Failures", RFC 4379,
February 2006.
[RFC4385] Bryant, S., Swallow, G., Martini, L., and D. McPherson,
"Pseudowire Emulation Edge-to-Edge (PWE3) Control Word for
Use over an MPLS PSN", RFC 4385, February 2006.
[RFC4447] Martini, L., Rosen, E., El-Aawar, N., Smith, T., and G.
Heron, "Pseudowire Setup and Maintenance Using the Label
Distribution Protocol (LDP)", RFC 4447, April 2006.
[RFC4448] Martini, L., Rosen, E., El-Aawar, N., and G. Heron,
"Encapsulation Methods for Transport of Ethernet over MPLS
Networks", RFC 4448, April 2006.
[RFC4553] Vainshtein, A. and YJ. Stein, "Structure-Agnostic Time
Division Multiplexing (TDM) over Packet (SAToP)",
RFC 4553, June 2006.
[RFC4928] Swallow, G., Bryant, S., and L. Andersson, "Avoiding Equal
Cost Multipath Treatment in MPLS Networks", BCP 128,
RFC 4928, June 2007.
[RFC5085] Nadeau, T. and C. Pignataro, "Pseudowire Virtual Circuit
Connectivity Verification (VCCV): A Control Channel for
Pseudowires", RFC 5085, December 2007.
Bryant, et al. Expires November 7, 2011 [Page 17]
Internet-Draft FAT-PW May 2011
14.2. Informative References
[I-D.kompella-mpls-entropy-label]
Kompella, K., Drake, J., Amante, S., Henderickx, W., and
L. Yong, "The Use of Entropy Labels in MPLS Forwarding",
draft-kompella-mpls-entropy-label-02 (work in progress),
March 2011.
[I-D.stein-pwe3-pwbonding]
Stein, Y., Mendelsohn, I., and R. Insler, "PW Bonding",
draft-stein-pwe3-pwbonding-01 (work in progress),
November 2008.
[RFC3985] Bryant, S. and P. Pate, "Pseudo Wire Emulation Edge-to-
Edge (PWE3) Architecture", RFC 3985, March 2005.
[RFC4090] Pan, P., Swallow, G., and A. Atlas, "Fast Reroute
Extensions to RSVP-TE for LSP Tunnels", RFC 4090,
May 2005.
[RFC4201] Kompella, K., Rekhter, Y., and L. Berger, "Link Bundling
in MPLS Traffic Engineering (TE)", RFC 4201, October 2005.
[RFC4301] Kent, S. and K. Seo, "Security Architecture for the
Internet Protocol", RFC 4301, December 2005.
[RFC4378] Allan, D. and T. Nadeau, "A Framework for Multi-Protocol
Label Switching (MPLS) Operations and Management (OAM)",
RFC 4378, February 2006.
[RFC5286] Atlas, A. and A. Zinin, "Basic Specification for IP Fast
Reroute: Loop-Free Alternates", RFC 5286, September 2008.
[RFC5462] Andersson, L. and R. Asati, "Multiprotocol Label Switching
(MPLS) Label Stack Entry: "EXP" Field Renamed to "Traffic
Class" Field", RFC 5462, February 2009.
[RFC5654] Niven-Jenkins, B., Brungard, D., Betts, M., Sprecher, N.,
and S. Ueno, "Requirements of an MPLS Transport Profile",
RFC 5654, September 2009.
[RFC5880] Katz, D. and D. Ward, "Bidirectional Forwarding Detection
(BFD)", RFC 5880, June 2010.
[RFC5920] Fang, L., "Security Framework for MPLS and GMPLS
Networks", RFC 5920, July 2010.
Bryant, et al. Expires November 7, 2011 [Page 18]
Internet-Draft FAT-PW May 2011
Authors' Addresses
Stewart Bryant (editor)
Cisco Systems
250 Longwater Ave
Reading RG2 6GB
United Kingdom
Phone: +44-208-824-8828
Email: stbryant@cisco.com
Clarence Filsfils
Cisco Systems
Brussels
Belgium
Email: cfilsfil@cisco.com
Ulrich Drafz
Deutsche Telekom
Muenster
Germany
Email: Ulrich.Drafz@t-com.net
Vach Kompella
Alcatel-Lucent
Email: Alcatel-Lucent vach.kompella@alcatel-lucent.com
Joe Regan
Alcatel-Lucent
Email: joe.regan@alcatel-lucent.comRegan
Shane Amante
Level 3 Communications
Email: shane@castlepoint.net
Bryant, et al. Expires November 7, 2011 [Page 19]