Network Working Group F. Coras
Internet-Draft A. Cabellos-Aparicio
Intended status: Informational J. Domingo-Pascual
Expires: July 7, 2013 Technical University of
Catalonia
F. Maino
D. Farinacci
cisco Systems
January 3, 2013
LISP Replication Engineering
draft-coras-lisp-re-01
Abstract
This document describes a method to build and optimize inter-domain
LISP router distribution trees for locator-based unicast and
multicast replication of EID-based multicast packets.
Status of this Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on July 7, 2013.
Copyright Notice
Copyright (c) 2013 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
Coras, et al. Expires July 7, 2013 [Page 1]
Internet-Draft LISP-RE January 2013
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Definition of Terms . . . . . . . . . . . . . . . . . . . . . 4
3. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4. Overlay Distribution Tree . . . . . . . . . . . . . . . . . . 6
4.1. LISP Replication Node Database . . . . . . . . . . . . . . 7
4.2. Building the Distribution Tree . . . . . . . . . . . . . . 7
5. Distribution Tree Optimization . . . . . . . . . . . . . . . . 8
5.1. Topology Discovery . . . . . . . . . . . . . . . . . . . . 9
5.2. Optimization Algorithm . . . . . . . . . . . . . . . . . . 9
6. Security Considerations . . . . . . . . . . . . . . . . . . . 11
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11
8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 11
9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 11
9.1. Normative References . . . . . . . . . . . . . . . . . . . 11
9.2. Informative References . . . . . . . . . . . . . . . . . . 12
Appendix A. MADDBST heuristic . . . . . . . . . . . . . . . . . . 13
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 13
Coras, et al. Expires July 7, 2013 [Page 2]
Internet-Draft LISP-RE January 2013
1. Introduction
The Locator/Identifier Separation Protocol (LISP) [I-D.ietf-lisp]
provides the mechanisms for the separation of Location and Identity
semantics presently overloaded by IP addresses. The split results in
the creation of two namespaces that unambigously identify edge-site
network objects, Endpoint IDs (EIDs), and core routing objects,
Routing LOCators (RLOCs). Apart from aiding the scalablity of the
core routing infrastructure, the decoupling also enables the
(re)implementation of new or existing inter-domain routing
mechanisms.
One such mechanism is inter-domain IP source-specific multicast (SSM)
[RFC4607]. In this sense, [I-D.ietf-lisp-multicast] defines the
procedures carried out for delivering multicast packets from a source
host in a LISP site to receivers residing in the same domain or in
other LISP or non-LISP sites when an underlying multicast
infrastructure exists. The signaling protocol it specifies for
conveying (S-EID,G) state and building the distribution tree
connecting the xTRs of the source and receiver domains is PIM
[RFC4601]. An alternative method that uses Map-Requests instead of
PIM for propagating (S-EID,G) state from multicast receiver site ETRs
to source site ITR is established in
[I-D.farinacci-lisp-mr-signaling].
Although desirable to use multicast routing in the core network when
available, a mismatch between the multicast capabilities of receiver
ETRs and source ITR might impede their multicast interconnection. In
such a case, unicast RLOC encapsulation will be necessary to deliver
multicast packets directly to the ETRs. This however leads to high
ITR head-end replication for large sets of receiving ETRs.
Therefore, to reduce the replication load of the ITR and scale the
service with the number of multicast receivers, the ITR may choose to
offload replication to a set of RTRs.
The current document describes how multicast RTRs can be used to
build an inter-domain distribution tree rooted at the ITR that can
perform unicast and/or multicast encapsulated replication of
multicast packets. This concept, of distributing the replication
load from ITR to other RTRs downstream on the core overlay
distribution tree, is known as Replication Engineering or LISP-RE.
Since unicast replication in such overlays can be suboptimal when
compared to the underlay network, methods to optimize packet delivery
over the distribution tree are also presented.
This specification does not describe how (S-EID,G) state is built in
source and receiver domains, nor does it describe how such state is
propagated from receiver ETRs to source ITR. This specification
Coras, et al. Expires July 7, 2013 [Page 3]
Internet-Draft LISP-RE January 2013
defines how (S-EID,G) map-cache state is built in the ITR, RTRs and
ETRs participating in the overlay distribution tree when they are not
connectable by multicast.
2. Definition of Terms
The terminology in this document is consistent with the definitions
in [I-D.ietf-lisp] and [I-D.ietf-lisp-multicast] however, it is
extended to account for LISP-RE concepts:
Delivery Group (DG): The outer destination address of a multicast
encapsulated multicast packet with an EID source.
Replication Target: A unicast locator or Delivery Group towards
which a multicast packet may be encapsulated and replicated.
Replication List: A locator-set associated to a multicast map-cache
entry (S-EID,G) as defined in [I-D.farinacci-lisp-te]. Locators
in the set may be either unicast RLOCs or Delivery Groups. When
used by a LISP router, a multicast packet matching the map-cache
entry must be replicated to all members of the set. The unicast
RLOCs are used to encapsulate multicast packets for unicast
delivery on the underlying network. Delivery Groups are used to
encapsulate multicast packets for multicast delivery on the
underlay.
Unicast Replication: Is the notion of replicating a multicast packet
with an EID source address at an ITR or RTR by encapsulating it
into a unicast packet. That is, the oif-list of a multicast map-
cache entry can not only have interfaces present for link-layer
replication and multicast encapsulation but also for unicast
encapsulation.
Overlay Distribution Tree: A degree-constrained spanning tree that
represents the path followed by unicast and/or multicast
encapsulated multicast packets from the root (ITR) to the leaves
(ETRs) through intermediary nodes (RTRs). The ITR and RTRs
unicast and/or multicast replicate packets to their tree children.
Such tree is built and maintained by the overlay distribution tree
controller either by using LISP signaling defined in
[I-D.ietf-lisp-multicast] and [I-D.farinacci-lisp-mr-signaling] or
by programming the mapping database directly by using ELPs to
describe network-wide fanout.
Coras, et al. Expires July 7, 2013 [Page 4]
Internet-Draft LISP-RE January 2013
Distribution Tree Controller (DTC): A central entity that manages
the overlay distribution tree, such entity can be either the ITR
or an external orchestration system.
LISP Replication Node: A router (either the ITR or an RTR)
participating and replicating packets downstream in the
distribution tree.
Multicast Ingress Tunnel Router (ITR): An ITR as specificed in
[I-D.ietf-lisp] that participates as the root in the distribution
tree.
Multicast Egress Tunnel Router (ETR): An ETR as specified in
[I-D.ietf-lisp] that participates as a leaf in the distribution
tree. ETR are the only members of the tree that do not unicast
replicate.
Multicast Re-encapsulating Tunnel Router (RTR): An RTR as specified
in [I-D.farinacci-lisp-te] that participates as an intermediary
node in the distribution tree.
Explicit Locator Path (ELP): an explicit and strictly ordered list
of replication targets a packet must travel to. An ELP may be
used to source route a LISP-encapsulated packet on an explicit
path of RTRs, however the path between two RTRs is determined by
the underlying routing protocol. ELP format is described in
[I-D.ietf-lisp-lcaf] and their use in [I-D.farinacci-lisp-te].
3. Overview
This specification describes a method to diminish the replication
load of the ITR by using RTRs to build an inter-domain distribution
tree. The tree is centrally managed either by the ITR itself or by
an external orchestration system. An advantage of this orchestration
system is that it offloads signaling from the ITR. The entity that
manages the tree is generally referred to as the distribution tree
controller (DTC).
In order to offload unicast replication of multicast packets the DTC
uses a ITR and a set of RTRs. RTRs willing to participate in the
distribution tree associated to the (S-EID,G) multicast channel must
join the distribution tree by sending a Map-Request/Join-Request
[I-D.farinacci-lisp-mr-signaling] to the DTC. Using this procedure
the DTC learns the RLOCs of the available RTRs. Additionally, the
DTC must learn the replication capacity of each RTR using out-of-band
signaling or by manual configuration.
Coras, et al. Expires July 7, 2013 [Page 5]
Internet-Draft LISP-RE January 2013
Given that the ITR and RTRs have a limited replication capacity the
distribution tree is a degree-constrained spanning-tree. This means
that the root is the ITR, the intermediary members are RTRs while
leaves are always ETRs. Multicast packets are addressed to (S-EID,G)
and are unicast and/or multicast encapsulated when being transported
downstream the tree.
In order to build and maintain the overlay distribution tree the DTC
must configure state in the replication nodes (ITR and RTRs). This
is done by means of the signaling specified in
[I-D.ietf-lisp-multicast] and [I-D.farinacci-lisp-mr-signaling].
Particularly, the DTC receives Map-Requests from RTRs (also from the
ITR if the DTC is an external orchestration system) addressed to
(S-EID,G). Upon inspection of the source RLOC of the Map-Request the
controller determines the originating ITR/RTR and generates an ad-hoc
Map-Reply containing the specific replication list for that
particular node according to the topology of the tree. For a LISP
replication node, the replication list specifies the set of RTRs/ETRs
to which it has to replicate packets, i.e., its overlay distribution
tree children. Alternatively, an external orchestration system may
directly program the mapping database with ELPs that describe the
topology of the overlay distribution tree. Ways of achieving this
will be discussed in future versions of the document.
The DTC determines the specific topology of the overlay distribution
tree using a centralized algorithm. The only requirements for this
algorithm are that it builds a tree that guarantees that ETRs receive
the encapsulated multicast packets, that the replication capacity of
the ITR and RTRs is not exceeded and that forwarding loops are
avoided.
In some cases the network administrator may want an optimized overlay
distribution tree, although this specification does not standardize
any particular algorithm it suggests one in Section 5.2. In order to
build an optimized tree this algorithm makes use of the distance
(e.g., latency) between the tree members and the amount of multicast
receivers connected to each ETR. Such metrics are not provided by
LISP and therefore must be obtained using out-of-band signaling.
4. Overlay Distribution Tree
This section describes how the DTC can build an overlay distribution
tree using the signaling and mechanisms defined in
[I-D.ietf-lisp-multicast] and [I-D.farinacci-lisp-mr-signaling].
Coras, et al. Expires July 7, 2013 [Page 6]
Internet-Draft LISP-RE January 2013
4.1. LISP Replication Node Database
The DTC maintains per (S-EID,G) multicast channel a LISP Replication
Node Database (LRND) that stores information about the distribution
tree state. This information includes among others the RLOCs of the
ITR, RTRs and ETRs that constitute the distribution tree and define
the overlay replication topology (i.e., the parent-child relations).
Said data may be obtained by the DTC from the standard signaling
messages exchanged with the RTRs and ETRs. Additionally, by means of
out-of-band signalling the DTC should obtain information about the
replication capacity of RTRs.
If the operator chooses to build an optimized tree, more information
must be available at the LRND, this is further discussed in
Section 5.2.
4.2. Building the Distribution Tree
This section describes the procedures followed by ETRs and RTRs when
attaching to the distribution tree. All procedures assume that the
DTC has a LRND consistent with the state of the overlay distribution
tree and is aware of the replication capacity of participating RTRs.
The decision of an RTR to join the overlay distribution tree depends
on out-of-band signalling (e.g., orchestration system, manual
configuration). But, its attachment to the distribution tree is done
by means of one of the following two procedures:
1. The RTR explicitly signals the ITR by sending a Join-Request for
(S-EID,G) and is replied to with a replication list.
2. If an orchestration system programs the mapping database with
ELPs describing the overlay distribution tree, an RTR Map-
Requests for (S-EID,G) and receives as reply an ELP that defines
its distribution tree fanout. Ways of encoding the tree topology
into ELPs will be discussed in future versions of this document.
For RTRs using option 1 the DTC, an ITR in this case, will perform
the same processing as for joining ETRs. The following sequence of
steps is used to attach an ETR to the overlay distribution tree:
1. The DTC receives a Map-Request/Join-Request for (S-EID,G) from an
ETR.
2. If multicast replication is requested, the DTC proceeds as
defined in [I-D.farinacci-lisp-mr-signaling] and no further steps
are taken.
Coras, et al. Expires July 7, 2013 [Page 7]
Internet-Draft LISP-RE January 2013
3. If unicast replication is requested, the DTC must choose a
position for the ETR in the distribution tree topology.
Specifically, it initiates a search within the LRND for a node
(either the ITR or a RTR) with enough spare replication capacity
that will replicate multicast traffic towards the ETR. This tree
member will become the parent of the ETR and once it is selected
the LRND is updated accordingly. The search algorithm depends on
operational requirements and this specification does not
standardize one, however Section 5.2 provides an example
algorithm. Note also that certain algorithms may require the
complete or partial re-shape the tree based on certain
performance metrics.
4. The DTC must create/update the (S-EID,G) associated replication
state for the selected parent using the mechanisms defined in
[I-D.ietf-lisp] and [I-D.farinacci-lisp-mr-signaling] (e.g.,
Solicit-Map-Request). This results in the parent sending a Map-
Request for (S-EID,G), in turn, the DTC Map-Replies with an ad-
hoc replication list of locator-sets according to topology stored
at the LRND. If the algorithm results in a complete or partial
re-shape of the tree then state at multiple tree members must be
created/updated. In order to avoid packet loss this must be done
synchronously. It will be discussed in future versions of this
document how to achieve this.
5. Once the distribution tree is configured to replicate multicast
traffic to the ETR the DTC Map-Replies (as specified in
[I-D.farinacci-lisp-mr-signaling]) with the destination EID-
prefix set to (parent-RLOC, ETR-RLOC).
When a LISP replication node signals its departure from the tree, the
information stored at the LRND is updated accordingly. For ETRs, the
state of the parent member must be updated as described in step 4.
For RTRs both the state of the parent and its children must be
updated however, such updates may result in packet loss. Moreover,
certain optimization algorithms may result in a re-shape of the tree
and as a consequence the state of multiple tree members must be
created/updated according to the new topology. How to manage these
updates with no packet loss will be discussed in future versions of
this document.
5. Distribution Tree Optimization
Operators wishing to improve the performance of the distribution tree
need to implement at least one topology discovery mechanism and
choose a set of optimization algorithms. Due to the centralized
group management, on-line switching between optimization algorithms
Coras, et al. Expires July 7, 2013 [Page 8]
Internet-Draft LISP-RE January 2013
may be possible in accordance to the required performance. However,
their use is dependent on the presence of overlay topological
information. The following logical modules need to be implemented in
order to support overlay optimizations with LISP-RE:
Topology Discovery Coordinator: It is in charge of organizing the
topology measurements and building a database that stores the
topological distances (i.e., a metric must be chosen) between
overlay members.
Distribution Tree Computation Unit: It is a component that with the
help of an algorithm or heuristic, given as input the topology of
the overlay and a constraint, or constraint set, can compute an
optimal, or close to optimal, degree-constrained minimum spanning
tree that may be used for multicast content distribution.
Whether to implement the above modules in the ITR or in other network
elements is the decision of the network administrator.
5.1. Topology Discovery
The present document does not specify any topology discovery
mechanisms. Both active and passive topology measurements could be
used. A choice between the two, of the policy and admission control
used or of the network element in charge of coordinating these
measurements could be made in the future based on practical
experience. Alternatively, precomputed network maps like the ones
offered by [IPLANE] and/or out-of-band signalling may be used.
5.2. Optimization Algorithm
The current document does not recommend an optimization algorithm.
However, it provides as an example a low computation cost heuristic,
which, in the scenarios simulated in [LCAST-TR], can produce
latencies between the ITR and the multicast receivers close to
unicast ones. Its choice is to be influenced by operational
requirements and the hardware constraints of the equipment in charge
of running it. Future experiments might result in a recommendation.
In what follows, we use the term "distance" when referring to a
relative length or amplitude of a metric, observed on a path
connecting two points, but when the exact nature of the metric is of
no interest.
Considering as goal the delivery of content for delay sensitive
applications, the function the algorithm minimizes is the maximum
distance (e.g. latency or number of AS hops) from a multicast
receiver to the ITR source. Notice that the reference is the
Coras, et al. Expires July 7, 2013 [Page 9]
Internet-Draft LISP-RE January 2013
multicast receiver host and not an ETR. Thus, what matters in
deciding a member's position in the distribution tree is not solely
its distance to the ITR but also the number of multicast receivers it
serves. Then, a router close to the source but serving few receivers
might find itself lower in the distribution tree than another with a
slightly higher distance to the source but with a larger receiver
set. The algorithm optimizes the quality of experience for multicast
receivers and not for tunnel routers.
The problem described above, that searches for a minimum average
distance, degree-bounded spanning tree (MADDBST), can be formally
stated as:
Definition: Given an undirected complete graph G=(V,E), a designated
vertex r belonging to V, for all vertices v in V, a degree bound
d(v) <= dmax, dmax a positive integer, a vertex weight function
c(v) with positive integer values, and an edge weight function
w(e) with positive values, for all edges e in E. Let W(r,v,T)
represent the cost of the path linking r and v in the spanning
tree T. Find the spanning tree T of G, routed at r, satisfying
that d(v) <= dmax and the distance to the source per multicast
receiver is minimized.
The heuristic used to solve this problem works by incrementally
growing a tree, starting at the root node r, until it becomes a
spanning tree. For each node v, not yet a tree member, it selects a
potential parent node u in the tree T, such that the distance per
receiver to r, is minimized. At each step, the node with the
smallest metric value is added to the tree and the parent selection
is redone. The pseudocode of the heuristic is provided in
Appendix A.
[SHI] and [BAN] have previously defined and solved similar
optimization problems. Shi et al. [SHI] also prove that a
particular instance of the problem, where all vertices have weight 1,
is NP-complete for degree constraints 2 <= dmax <= |V|-1.
The algorithm can optimize an unicast overlay however, it should not
be used to optimize multicast underlay delivery. As a result, if
multicast is used as underlay between part of the overlay members,
once one of the members of such Delivery Group is added to the
distribution tree, the others should be marked as attached also.
These nodes should receive multicast encapsulated multicast packets
from the chosen node over the underlying multicast distribution tree.
Finally, since the RTRs do not replicate packets for multicast
receiver hosts, prior to applying the MADDBST heuristic, a Minimum
Spanning Tree (MST) algorithm should be used to compute the RTR
Coras, et al. Expires July 7, 2013 [Page 10]
Internet-Draft LISP-RE January 2013
distribution tree. In this case, the MADDBST heuristic should start
attaching ETRs having as input the tree resulting from MST.
6. Security Considerations
Security concerns for LISP-RE the same as for
[I-D.ietf-lisp-multicast] and [I-D.farinacci-lisp-mr-signaling].
7. IANA Considerations
This memo includes no request to IANA.
8. Acknowledgements
TODO
9. References
9.1. Normative References
[I-D.farinacci-lisp-mr-signaling]
Farinacci, D. and M. Napierala, "LISP Control-Plane
Multicast Signaling", draft-farinacci-lisp-mr-signaling-00
(work in progress), July 2012.
[I-D.farinacci-lisp-te]
Farinacci, D., Lahiri, P., and M. Kowal, "LISP Traffic
Engineering Use-Cases", draft-farinacci-lisp-te-01 (work
in progress), July 2012.
[I-D.ietf-lisp]
Farinacci, D., Fuller, V., Meyer, D., and D. Lewis,
"Locator/ID Separation Protocol (LISP)",
draft-ietf-lisp-24 (work in progress), November 2012.
[I-D.ietf-lisp-lcaf]
Farinacci, D., Meyer, D., and J. Snijders, "LISP Canonical
Address Format (LCAF)", draft-ietf-lisp-lcaf-00 (work in
progress), August 2012.
[I-D.ietf-lisp-multicast]
Farinacci, D., Meyer, D., Zwiebel, J., and S. Venaas,
"LISP for Multicast Environments",
draft-ietf-lisp-multicast-14 (work in progress),
Coras, et al. Expires July 7, 2013 [Page 11]
Internet-Draft LISP-RE January 2013
February 2012.
[RFC4601] Fenner, B., Handley, M., Holbrook, H., and I. Kouvelas,
"Protocol Independent Multicast - Sparse Mode (PIM-SM):
Protocol Specification (Revised)", RFC 4601, August 2006.
[RFC4607] Holbrook, H. and B. Cain, "Source-Specific Multicast for
IP", RFC 4607, August 2006.
9.2. Informative References
[BAN] Banerjee, S., Kommareddy, C., Kar, K., Bhattacharjee, B.,
and S. Khuller, "Construction of an efficient overlay
multicast infrastructure for real-time applications",
INFOCOM , 2002.
[IPLANE] Madhyastha, H., Katz-Bassett, E., Anderson, T.,
Krishnamurthy, A., and A. Venkataramani, "iPlane: An
Information Plane for Distributed Services", USENIX OSDI ,
2009.
[LCAST-TR]
Coras, F., Cabellos, A., Domingo, J., Maino, F., and D.
Farinacci, "Inter-Domain Multicast: LISP Edge Based
Trees", Technical
Report http://personals.ac.upc.edu/fcoras/lcast-tr.pdf,
2012.
[SHI] Shi, S., Turner, J., and M. Waldvogel, "Dimensioning
server access bandwidth and multicast routing in overlay
networks", NOSSDAV , 2001.
Coras, et al. Expires July 7, 2013 [Page 12]
Internet-Draft LISP-RE January 2013
Appendix A. MADDBST heuristic
INPUT: G = (V,E); r; dmax; w(u,v); c(v); u, v in V
OUTPUT: T
FOREACH v in V DO
delta(v) = w(r,v)/c(v);
p(v) = r;
END FOREACH
T takes (U = {r}, D={});
WHILE U != V DO
LET u in U-V be the vertex with the smallest delta(u);
U = U U {u}; L = L U {(p(u),u)};
FOREACH v in V-U DO
delta(v) = infinity;
FOREACH u in U DO
IF d(u) < dmax and
W{r,u,T} + w(u,v)/c(v) < delta(v) THEN
delta(v) = W{r,u,T} + w(u,v)/c(v);
p(v) = u;
END IF
END FOR
END FOR
END WHILE
Figure 1
Authors' Addresses
Florin Coras
Technical University of Catalonia
C/Jordi Girona, s/n
BARCELONA 08034
Spain
Email: fcoras@ac.upc.edu
Albert Cabellos-Aparicio
Technical University of Catalonia
C/Jordi Girona, s/n
BARCELONA 08034
Spain
Email: acabello@ac.upc.edu
Coras, et al. Expires July 7, 2013 [Page 13]
Internet-Draft LISP-RE January 2013
Jordi Domingo-Pascual
Technical University of Catalonia
C/Jordi Girona, s/n
BARCELONA 08034
Spain
Email: jordi.domingo@ac.upc.edu
Fabio Maino
cisco Systems
Tasman Drive
San Jose, CA 95134
USA
Email: fmaino@cisco.com
Dino Farinacci
cisco Systems
Tasman Drive
San Jose, CA 95134
USA
Email: farinacci@gmail.com
Coras, et al. Expires July 7, 2013 [Page 14]