Network Working Group Greg Bernstein
Internet Draft Grotto Networking
Intended status: Informational Young Lee
Huawei
March 12, 2012
Use Cases for High Bandwidth Query and Control of Core Networks
draft-bernstein-alto-large-bandwidth-cases-01.txt
Status of this Memo
This Internet-Draft is submitted to IETF in full conformance with
the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other documents
at any time. It is inappropriate to use Internet-Drafts as
reference material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on September 12, 2011.
Copyright Notice
Copyright (c) 2012 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
Bernstein & Lee, et al. Expires September 12, 2012 [Page 1]
Internet-Draft Cross Stratum Optimization Use-cases March 2012
carefully, as they describe your rights and restrictions with
respect to this document.
Abstract
This draft describes two generic use-cases that illustrate
application layer traffic optimization applied to high bandwidth
core networks. The type of information and interactions needed to
perform various optimizations is described. In addition, extensions
to the existing ALTO protocol are suggested that provide this
functionality.
Table of Contents
1. Introduction...................................................3
1.1. Computing Clouds, Data Centers, and End Systems...........4
2. End System Aggregate Networking................................5
2.1. Aggregated Bandwidth Scaling..............................5
2.2. Cross Stratum Optimization Example........................6
2.3. Data Center and Network Faults and Recovery...............7
2.4. Cross Stratum Control Interfaces..........................8
3. Data Center to Data Center Networking..........................9
3.1. Cross Stratum Optimization Examples.......................9
3.2. Network and Data Center Faults and Reliability...........10
4. Potential ALTO Protocol Extensions............................11
4.1. High Bandwidth Network Information.......................12
4.1.1. Maximum Reservable Bandwidth........................13
4.1.2. Latency Information.................................14
4.1.3. Endpoint Access Bandwidth Capacity..................14
4.2. Network Information via Constraint and Cost Graph........14
4.3. Network Updates and Notifications........................17
4.3.1. Notification Interface..............................17
4.4. Application-Network Reservation Interface................18
4.4.1. IP Bypass/Traffic Engineering.......................18
4.4.2. High Bandwidth Reservation/Recovery Interface.......19
5. Conclusion....................................................19
6. Security Considerations.......................................20
7. IANA Considerations...........................................20
8. References....................................................20
8.1. Informative References...................................20
Author's Addresses...............................................22
Intellectual Property Statement..................................22
Disclaimer of Validity...........................................22
Bernstein & Lee Expires September 12, 2012 [Page 2]
Internet-Draft Cross Stratum Optimization Use-cases March 2012
1. Introduction
Cloud Computing, network applications, software as a service (SaaS),
Platform as a service (PaaS), and Infrastructure as a Service
(IaaS), are just a few of the terms used to describe situations
where multiple computation entities interact with one another across
a network. When the communication resources consumed by these
interacting entities is significant compared with link or network
capacity then opportunities may exist for more efficient utilization
of available computation and network resources if both computation
and network stratums cooperate in some way. The application layer
traffic optimization (ALTO) working group is tackling the similar
problem of "better-than-random peer selection" for distributed
applications based on peer to peer (P2P) or client server
architectures [1]. In addition, such optimization is important in
content distribution networks (CDNs) as illustrated in [2].
In the network stratum, particularly at the lower layers such as
MPLS and optical, there are many restoration and recovery mechanisms
to deal with network faults. The emergence of network based
applications or cloud based disaster recovery/business recovery
brings a new dimension to fault management, but also opportunities
to more efficiently deliver higher levels of reliability. For
example, the reliability requirements for mission critical
applications are typically quantified by two key time parameters.
The first is the Recovery Time Objective (RTO) which is the time to
get the application back up and functioning and is similar to
network recovery time notions. The second is the Recovery Point
Objective (RPO) which quantifies in terms of time the amount of data
loss that can be tolerated when a disaster occurs. Different
applications and organizations can have greatly different demands
from miliseconds to 12 hours. In addition, the amount of data that
may need to be transferred to meet these objectives can vary greatly
amongst different application types. With recover point objectives
of, say an hour or more, a dynamic optical network layer could be
very efficiently shared so as to reduce the overall cost to achieve
a given layer of reliability. However, to do so requires cooperation
between application and network stratum.
General multi-protocol label switching (GMPLS) [3] can and is being
applied to various core networking technologies such as SONET/SDH
and wavelength division multiplexing (WDM) [4]. GMPLS provides
dynamic network topology and resource information, and the
capability to dynamically allocate resources (provision label
switched paths). Furthermore, the path computation element (PCE) [5]
provides for traffic engineered path optimization.
Bernstein & Lee Expires September 12, 2012 [Page 3]
Internet-Draft Cross Stratum Optimization Use-cases March 2012
However, neither GMPLS nor PCE provide interfaces that are
appropriate for an application layer entity to use for the following
reasons:
. GMPLS routing exposes full network topology information which
tends to be proprietary to a carrier or require specialized
knowledge and techniques to make use of, e.g., the routing and
wavelength assignment (RWA) problem in WDM networks [4].
. Core networks typically consist of two or more layers, while
applications are typically only know about the IP layer and
above. Hence applications would not be able to make direct use
of PCE capabilities.
. GMPLS signaling interfaces are defined for either peer GMPLS
nodes or via a user network interface (UNI) [6]. Neither of
these are appropriate for direct use by an application entity.
In this paper we discuss two general use-cases that can generate
core network flows with significant bandwidth and may vary
significantly over time. The "cross stratum optimization" problems
generated by these use cases are discussed. Finally, we look at
interfaces between the application and network "stratums" that can
enable these types of optimizations and how they can be created via
extensions to the current ALTO protocol[7].
1.1. Computing Clouds, Data Centers, and End Systems
While the definition of cloud computing or compute clouds is
somewhat nebulous (or "foggy" if you will) [8], the physical
instantiation of compute resources with network connectivity is very
real and bounded by physical and logical constraints. For the
purposes of this draft, we will call any network connected compute
resources a data center if its network connectivity is significant
compared either to the bandwidth of an individual WDM wavelength or
with respect to the network links in which it is located. Hence we
include in our definition very large data centers that feature
multiple fiber access and consume more than 10MW of power, moderate
to large content distribution network (CDN) installations located in
or near major internet exchange points, medium sized business
centers, etc...
We will refer to those computational entities that don't meet our
bandwidth criteria for a data center as an "end system".
Bernstein & Lee Expires September 12, 2012 [Page 4]
Internet-Draft Cross Stratum Optimization Use-cases March 2012
2. End System Aggregate Networking
In this section we consider the fundamental use case of end systems
communicating with data centers as shown in Figure 1. In this figure
the "clients" are end systems with relatively small access bandwidth
compared to a WDM wavelength, e.g., under 100Mbps. We show these
clients roughly partitioned into three network related end user
regions ("A", "B", and "C"). Given a particular network application,
in a static network application situation, each client in a region
would be associated with a particular data center.
Region B
+---------+ +------+
| Data | |Client|
|Center 2 | | B1 |+------+
+------+ +----+----+ +--+---+|Client|
|Client| | / | B2 |
| A1 `. _.-+--------+-. +--+---+
Region A +------+ `-. ,-'' `--. / ...
+------+ ,`: `+. +------+
|Client| / \ |Client|
| A2 +------+ \---+ BM |
+------+ ( Network ) +------+
... .-' /
+------+ _.-' \ `.
|Client|.-' `=. ,-' `.
| AN | _.-'' `--. _.-\ +---`.----+
+------+ +----'----+ `----+------+'' \ | Data |
| Data | | \ | |Center 3 |
|Center 1 | +--+---+ +--+---+ \ +---------+
+---------+ |Client| |Client| \------+
| C1 | | C2 | |Client|
+------+ +------+ | CK |
Region C +------+
Figure 1. End system to data center communications.
2.1. Aggregated Bandwidth Scaling
One of the simplest examples where the aggregation of end system
bandwidth can quickly become significant to the "network" is for
video on demand (VoD) streaming services. Unlike a live streaming
service where IP or lower layer multicast techniques can be
generally applied, in VoD the transmissions are unique between the
data center and clients. For regular quality VoD we'll use an
estimate of 1.5Mbps per stream (assuming H.264 coding), for HD VoD
we'll use an estimate of 10Mbps per stream. To fill up a 10Gbps
capacity optical wavelength requires either 6,666 or 1,000 clients
Bernstein & Lee Expires September 12, 2012 [Page 5]
Internet-Draft Cross Stratum Optimization Use-cases March 2012
for regular or high definition respectively. Note that special
multicasting techniques such as those discussed in [9] and peer
assistance techniques such as provided in some commercial systems
[10] can reduce the overall network bandwidth requirements.
With current high speed internet deployment such numbers of clients
are easily achieved; in addition demand for VoD services can vary
significantly over time, e.g., new video releases, inclement weather
(increases number of viewers), etc...
2.2. Cross Stratum Optimization Example
In an ideal world both data centers and networks would have
unlimited capacity, however in actuality both can have constraints
and possibly varying marginal costs that vary with load or time of
day. For example suppose that in Figure 1 that Data Center 3 has
been primarily serving VoD to region "C" but that it has, at a
particular period in time, run out of computation capacity to serve
all the client requests coming from region "C". At this point we
have a fundamental cross stratum optimization (CSO) problem. We want
to see if we can accommodate additional client request from region
"C" by using a different data center than the fully utilized data
center #3. To answer this questions we need to know (a) available
capacity on other data centers to meet a request, (b) the marginal
(incremental) cost of servicing the request on a particular data
center with spare capacity, (c) the ability of the network to
provide bandwidth between region "C" to a data center, and (d) the
incremental cost of bandwidth from region "C" to a data center.
Bernstein & Lee Expires September 12, 2012 [Page 6]
Internet-Draft Cross Stratum Optimization Use-cases March 2012
Region B
+---------+ +------+
| Data | |Client|
|Center 2 | | B1 |+------+
+------+ +----+----+ +--+---+|Client|
|Client| | / | B2 |
| A1 `. _.-+--------+-. +--+---+
Region A +------+ `-. ,-'' XXXXX XX `--. / ...
+------+ ,`: ``---..__ XXXX `+. +------+
|Client| / X | ```--XX \ |Client|
| A2 +------+..X`. \ XX--+---+ BM |
+------+ ( X `-/ \ ) +------+
... .-' .' | +----.X /
+------+ _.-' \ X/ \ | X `.
|Client|.-' `=.X \ XXXX ,-' `.
| AN | _.-'' `--. XXXXXXXXX _.-\ +---`.----+
+------+ +----'----+ `----+------+'' \ | Data |
| Data | | \ | |Center 3 |
|Center 1 | +--+---+ +--+---+ \ +---------+
+---------+ |Client| |Client| \------+
| C1 | | C2 | |Client|
+------+ +------+ | CK |
Region C +------+
Figure 2. Aggregated flows between end systems and data centers.
In Figure 2 we show a possible result of solving the previously
mentioned CSO problem. Here we show the additional client requests
from region "C" being serviced by data center #2 across the network.
Figure 2 also illustrates the possibility of setting up "express"
routes across the network at the MPLS level or below. Such
techniques, known as "optical grooming" or "optical bypass"[11],[12]
at the optical layer, can result in significant equipment and power
savings for the network by "bypassing" higher level routers and
switches.
2.3. Data Center and Network Faults and Recovery
Data center failures, whether partial or complete, can have a major
impact on revenues in the VoD example previously described. If there
is excess capacity in other data centers within the network
associated with the same application then clients could be
redirected to those other centers if the network has the capacity.
Moreover, MPLS and GMPLS controlled networks have the ability to
reroute traffic very quickly while preserving QoS. As with general
network recovery techniques [13] various combinations of pre-
Bernstein & Lee Expires September 12, 2012 [Page 7]
Internet-Draft Cross Stratum Optimization Use-cases March 2012
planning and "on the fly" approaches can be used to tradeoff between
recovery time and excess network capacity needed for recovery.
In the case of network failures there is the potential for clients
to be redirected to other data centers to avoid failed or over
utilized links.
2.4. Cross Stratum Control Interfaces
Two types of load balancing techniques are currently utilized in
cloud computing. The first is load balancing within a data center
and is sometimes referred to as local load balancing. Here one is
concerned with distributing requests to appropriate machines (or
virtual machines) in a pool based on the current machine
utilization. The second type of load balancing is known as global
load balancing and is used to assign clients to a particular data
center out of a choice of more than one within the network and is
our concern here. A number of commercial vendors offer both local
and global load balancing products. Currently global load balancing
systems have very little knowledge of the underlying network. To
make better assignments of clients to data centers many of these
systems use geographic information based on IP addresses. Hence we
see that current systems are attempting to perform cross stratum
optimization albeit with very coarse network information. A more
elaborate interface for CSO in the client aggregation case would be:
1. A Network Query Interface - Where the global load balancer
can inquire as to the bandwidth availability between "client
regions" and data centers.
2. A Network Resource Reservation Interface - Where the global
load balancer can make explicit requests for bandwidth
between client regions and data centers.
3. A Fault Recovery Interface - For the global load balancer to
make requests for expedited bulk rerouting of client traffic
from one data center to another. Or for the network layer to
make requests to the application to help deal with network
faults.
The network query interface can be considered a superset of the
functionality supported by the current ALTO protocol [7]. Potential
extensions are detailed in section 4.
Bernstein & Lee Expires September 12, 2012 [Page 8]
Internet-Draft Cross Stratum Optimization Use-cases March 2012
3. Data Center to Data Center Networking
There are a number of motivations for data center to data center
communications: on demand capacity expansion ("cloud bursting"),
cooperative exchanges between business partners, offsite data
backup, "rent before building", etc... In Figure 3 we show an
example where a number of businesses each with an "internal data
center" contracts with a large external data center for additional
computational (which may include storage) capacity. The data centers
may connect to each other via IP transit type services or more
typically via some type of Ethernet virtual private line or LAN
service.
+-------------------+
| |
| Large Data Center |
| |
+----------+--------+
|
_.+-----------.
,--'' `---.
,-' `-.
,' `.
,' `.
+--------+ ; Network :
|Business| __..+ |
| #1 DC +-' : ;
+--------+ `. ,'
`. ;:
`-. ,-' \
`---. _.--' +--`.----+
`+-----------'' |Business|
/ | #N DC |
| +--------+
+----+---+
|Business|
| #2 DC |
+--------+
Figure 3. Basic data center to data center networking.
3.1. Cross Stratum Optimization Examples
In the DC-to-DC example of Figure 3 we can have computational
constraints/limits at both local and remote data centers; fixed and
marginal computational costs at local and remote data centers; and
Bernstein & Lee Expires September 12, 2012 [Page 9]
Internet-Draft Cross Stratum Optimization Use-cases March 2012
network bandwidth costs and constraints between data centers. Note
that computing costs could vary by the time of day along with the
cost of power and demand. Some cloud providers have quite
sophisticated compute pricing models including: reserved, on demand,
and spot (auction) variants.
In addition, to possibly dynamically changing pricing, traffic
loads between data centers can be quite dynamic. In addition, data
movement between data centers is another source of large network
usage variation. Such peaks can be due to scheduled daily or weekly
offsite data backup, bulk VM migration to a new data center,
periodic virtual machine migration, etc...
3.2. Network and Data Center Faults and Reliability
For networked applications that require high levels of
reliability/availability the network diagram of Figure 4 could be
enhanced with redundant business locations and external data centers
as shown in Figure 4. For example cell phone subscriber databases
and financial transactions generally require what is called
geographic database replication and results in extra communication
between sites supporting high availability. For example if business
#1 in Figure 4 required a highly available database related service
then there would be an additional communication flows from the data
center "1a" to data center "1b". Furthermore, if business #1 has
outsourced some of its computation and storage needs to independent
data center X then for resilience it may want/need to replicate
(hot-hot redundancy) this information at independent data center Y.
Bernstein & Lee Expires September 12, 2012 [Page 10]
Internet-Draft Cross Stratum Optimization Use-cases March 2012
+-------------+ +-------------+
|Independent | |Independent |
|Data Center X| |Data Center Y|
+-----+-------+ +------+------+
\ /
`. _.------------. .'
\--'' `-+-.
,-' `-. +--------+
,' `. .'Business|
,' `.-' |#N DC-a |
; Network : +--------+
+--------+ | |
|Business+--- ;
|#1 DC-a | `. +:
+--------+ `. ;/ \
`-. ,-' `.
.'`---. _.--' +--`.----+
+--------+ / `+-+---------\' |Business|
|Business| .' | \ |#N DC-a |
|#1 DC-b .' / \ +--------+
+--------+ | \
+----+---+ +--------+
|Business| |Business|
|#2 DC-a | |#2 DC-b |
+--------+ +--------+
Figure 4. Data center to data center networking with redundancy.
4. Potential ALTO Protocol Extensions
This section discusses the applicability of the ALTO protocol and
necessary extensions to support the high bandwidth consuming use
cases previously covered. Before doing so we discuss general
properties of the high bandwidth scenarios that may differ
significantly from other uses of the ALTO protocol.
The first has to do with scope and scale. The consumer of high
bandwidth alto extensions is typically some type of application
controller within a data center, as opposed to an individual end
user. The number of such entities with a need for the high bandwidth
related information is orders of magnitude smaller than, say, peer
to peer networking users, or applications closer to the end user.
Since a network provider may consider this information sensitive,
there may be a desire to limit its distribution to a "pre-
registered" set of entities.
Bernstein & Lee Expires September 12, 2012 [Page 11]
Internet-Draft Cross Stratum Optimization Use-cases March 2012
Secondly, there is the notion of time scales. In cloud services we
already see variants such as "on demand" compute instances and
"reserved" compute instances. For network resource queries we may be
concerned with (a) current bandwidth availability, (b) bandwidth
availability at a future time, or (c) bandwidth for a bulk data
transfer of a given amount that must take place within a given time
window.
Time-dependent bandwidth information can be and typically are
considered in network planning and provisioning systems. For
example, a VoD provider knows ahead of time when the latest
"blockbuster" film will be available via its service and can make
estimates based on historical data on the bandwidth that it will
need to deal with the subsequent demand. The following discussions,
however, are restricted to "current time" for now.
Finally another goal in the design of an interface between the
application and networking stratums is to minimize the need for
either stratum to know too much about the inner workings of the
other. Hence as much as possible it is desired to insulate the
applications stratum from technology specifics of the network. That
said, data centers providing IaaS may prefer to specify flows and
connectivity at a layer below IP such as Ethernet.
4.1. High Bandwidth Network Information
ALTO's network map and cost map concepts can be used to support the
aforementioned high bandwidth use cases. In this section we will
explore both how they could be used in high bandwidth "core"
networks and how they might be extended to better support large
bandwidth optimization.
The ALTO concept of provider defined network location identifier,
(PID), is a powerful network abstraction mechanism that is also
appropriate for optical/high bandwidth scenarios. For example, a
network provider could assign PIDs to WDM ROADMs or OTN switches
providing access to an optical core network. All subtending
datacenters or hosts would have their IP addresses grouped with such
a PID. The collection of these would form an ALTO network map.
Furthermore, a corresponding ALTO cost map can be used by the
network to indicate preferred connectivity. Since not all these
entities necessarily connect directly to an edge WDM ROADM or OTN
switch, ALTO's Endpoint Property Service can be used to denote the
type of interface supported by an end system or data center and its
bandwidth capabilities.
Bernstein & Lee Expires September 12, 2012 [Page 12]
Internet-Draft Cross Stratum Optimization Use-cases March 2012
4.1.1. Maximum Reservable Bandwidth
The amount of bandwidth of available between two sites or
subnetworks can be of prime interest to large bandwidth consuming
applications. Unlike "unused" IP bandwidth, sub-IP bandwidth such as
that from SDH, OTN, and WDM cannot be probed from a network edge or
application. The only way to find out if such bandwidth could be
allocated to a particular application data flow is to query the
network.
One may want to query the network as to the reservable bandwidth in
a number of different cases:
(a) Bandwidth available between a single source destination pair
(two PIDs)
(b) Bandwidth between one particular source (PID) and several
other destinations (PIDs)
(c) Bandwidth between one set of sources (PIDs) and another set of
destinations (PIDs).
Case (a), bandwidth between two points, is well defined, however, in
cases (b) and (c) there is some ambiguity. For example in (c) are
we considering multiple sources communicating with multiple
destinations at the same time? Do some of these pairs interfere with
each other? To fully understand such constraints some type of
constrained graph abstraction would be needed.
However, if we restrict the question in cases (b) and (c) to what is
the maximum reservable bandwidth between each source and destination
pair within the sets considered individually, then the question is
unambiguous, useful, and can fit within ALTO's existing cost map
structure (section 5.2 [7]). A new ALTO cost type of "reservable
bandwidth" can be defined for this purpose. This would be a
"numeric" cost type that represents the actual bandwidth in the unit
of, say, Mbps.
From the point of view of an optical network, an extended ALTO
request would arrive at our extended ALTO server asking for the
"reservable bandwidth" between multiple Source Network Locations,
say [Src_1, Src_2, ..., Src_m], and a list of multiple Destination
Network Locations, say [Dst_1, Dst_2, ..., Dst_n]. The network
computing entity would calculate the "reservable bandwidth" between
all of these individual source destination pairs. The extended ALTO
Server would then return the "reservable bandwidth" as an ALTO Path
Cost for each communicating pair (i.e., Src_1 -> Dst_1, ..., Src_1 -
> Dst_n, ..., Src_m -> Dst_1, ..., Src_m -> Dst_n).
Bernstein & Lee Expires September 12, 2012 [Page 13]
Internet-Draft Cross Stratum Optimization Use-cases March 2012
4.1.2. Latency Information
Latency information, either fixed due to propagation delay times, or
statistical measures due to queuing induced delays can be similarly
represented via ALTO's cost map structure.
When choosing amongst flows between multiple data centers utilizing
significant amounts of bandwidth, alternative routes with differing
latency may need to be considered. In such a situation, a simple
latency cost map, may need to be replaced by an abstract graph model
to allow for more effective optimization of resources.
4.1.3. Endpoint Access Bandwidth Capacity
There are a number of standard sized pipes used to access high
bandwidth networks and these can either be larger or smaller than
the bandwidth availability within various portions of the network.
Hence to make good use of network resources it is desirable to
advertise and endpoints access bandwidth capacity. Typically this
would be a number in terms of Mbps or Gbps and would reflect the
true bandwidth available to the endpoint after upstream bottlenecks
or overhead is taken into account. This information could be
advertised via ALTO's endpoint property service.
4.2. Network Information via Constraint and Cost Graph
As discussed in the previous section, as the desired connectivity
between locations becomes more complex (rather than exclusive point
to point),the basic ALTO cost map structure can be insufficient to
reveal network bottlenecks and hence optimization decision points.
Consider the network shown in Figure 5, where DC indicates a data
center, ER an end user region (as in the end user aggregation use
case), N a switching node of some sort, and L a link. The link
capacities and costs are also shown on the figure as well as a cost
map between [ER1, ER2] and [DC1, DC2, DC3]. Since the network has a
tree structure (very unusual but easier to draw in ASCII art), the
cost map is unique.
As an illustration, assume that the maximum available capacity
between any individual end region and a data center is 5 units
(i.e., L1=L2=L5=L6=5). However, link L3 (capacity 8 units)
represents a bottle neck to all the data centers (L3 is on all the
paths to DC1, DC2, or DC3 from all end regions, ER1 and ER2). In a
similar way, link L4 (capacity 6 units) represents a bottle neck to
data centers DC1 and DC2 from all end regions, ER1 and ER2. A simple
"cost map" like structure misses these bottle necks.
Bernstein & Lee Expires September 12, 2012 [Page 14]
Internet-Draft Cross Stratum Optimization Use-cases March 2012
,---. L1 +----+
( ER1 )`-. L5 .'|DC1 |
`---' `-._ ,-. / +----+
(N1 ) L3 ,-..'
.-'`-' `-.__ L4__.+(N3 )
,---. .' `-.,-..--'' `-'`. +----+
( ER2 ).-'L2 (N2 ) L6 `-.|DC2 |
`---' `-'`-._ +----+
`-.
Link Capacity Cost `-._
L1 5 1 L7 `-._
L2 5 2 `-._
L3 8 1 `-.
L4 6 2 Cost Map `-._ +----+
L5 5 1 DC1 DC2 DC3 `-._|DC3 |
L6 5 1 ER1 5 5 8 +----+
L7 10 6 ER2 6 6 9
Figure 5. Example network illustrating bottlenecks
With the current ALTO cost map structure, the least cost path from
ER1 would be either to DC1 or DC2. However, with the proposed
capacitated cost map, the connection from ER1 to DC3 could be a
better choice than the rest depending on the relative cost of
network resources to data center resources.
A more general and relatively efficient alternative is to provide
the requestor with a capacitated and multiply weighted graph that
approximates and abstracts the capabilities of the network as seen
by the source and destination location sets.
The creation of an approximate graph model to represent the network
for cross layer optimization purposes is similar to the well-known
topology aggregation problem [14] and [15], but different in a
number of respects. First, the goal is not the approximation of the
network structure for general route computation use, but a view of
only a portion of the network relevant to the participating
locations that approximates the costs and constraints amongst these
locations. Second, the specific technologies underlying the costs
and constraints are of no interest to the application layer and
hence much technology specific layer information that one sees in
GMPLS link state routing databases would be absent in such a graph.
Bernstein & Lee Expires September 12, 2012 [Page 15]
Internet-Draft Cross Stratum Optimization Use-cases March 2012
Like the current ALTO filtered cost map, a request for a
capacitated, weighted graph would take source and destination PIDs
as inputs. In JSON notation we could represent the resulting graph
as an JSON object containing link objects. A first cut encoding
could be something like:
object {
LinkEntry [LinkName]<0..*>;
} CostConstraintGraphData;
object {
PIDName: a-end; // Node name at one side of the link
PIDName: z-end; // Node name at the other side of the link
Weight: wt;
JSONNumber: latency;
Capacity: r-cap; // Reservable capacity
} LinkEntry;
Where a link name is formatted like a PIDName (but names a link),
and PID names are used for both provider defined location and
provider defined internal model node identification. A graph
representation of the network of 0 might look like:
{
"meta" : {},
"data" : {
"graph": {
"L1": {"a-end":"ER1", "z-end":"N1", "wt":1,"r-cap":5},
"L2": {"a-end":"ER2", "z-end":"N1", "wt":2,"r-cap":5},
"L3": {"a-end":"N1", "z-end":"N2", "wt":1,"r-cap":8},
"L4": {"a-end":"N2", "z-end":"N3", "wt":2,"r-cap":6},
"L5": {"a-end":"N3", "z-end":"DC1", "wt":1,"r-cap":5},
"L6": {"a-end":"N3", "z-end":"DC2", "wt":1,"r-cap":5},
"L7": {"a-end":"N2", "z-end":"DC3", "wt":6,"r-cap":10}
}
}
}
Bernstein & Lee Expires September 12, 2012 [Page 16]
Internet-Draft Cross Stratum Optimization Use-cases March 2012
4.3. Network Updates and Notifications
Changing conditions in the network such as costs or capacity may
need to be relayed to the application layer in suitable form and in
a time frame relative to their importance to service QoS, service
delivery, or cross layer optimization.
Network fault conditions can affect service QoS in a number of ways.
The most obvious being a significant reduction in capacity to
current application flows. In such a case the application would want
to be notified as soon as possible and take remedial action. In
other cases a network fault may only be observable as an increase in
latency (due to increased length of recovered optical path) such an
increase may not immediately result in breach of a service level
agreement (SLA) but could cumulatively over time. Hence notification
of such a change in condition would need to be timely and the
network may qualify if the change of state is relatively permanent
or what the duration may be.
Some applications, such as those involving bulk file transfer, may
have flexible time windows, with the exact time the service is
rendered dictated by network availability. In particular, the
network takes advantage of application flexibility in the exact
scheduling of the network resources to be used. Such occurrences may
be non-recurring, e.g. a one off bulk file transfer, or recurring as
would be common in cloud based system backup and restore
applications. In this case the notification from the network needs
to relatively timely (but most likely on the order of seconds rather
than milliseconds), is specific to a particular network service
instance rather than raw network cost or capacity, and the entire
notification process may require a non-repudiation security
assurance.
Changes in the network that only affect costs but not QoS can affect
the cross layer optimization of an existing application. The time
frame for such notifications would typically be in terms of
fractions of an hour to days.
4.3.1. Notification Interface
With the exception of the "notification of network service instance
availability", all other notifications can be made via modifications
or updates to suitably extended network or cost maps, or graphs.
Since the high bandwidth uses cases deal with a rather restricted
user group, a number of implementation mechanisms may be possible,
that may not be viable in a more general ALTO deployment. For
example, with a capacitated graph representation we may selectively
Bernstein & Lee Expires September 12, 2012 [Page 17]
Internet-Draft Cross Stratum Optimization Use-cases March 2012
update specific links of the graph for particular application
entities. Note that in order to do this the network layer would need
to keep track of the graph models in use by specific application
entities and update them as appropriate.
4.4. Application-Network Reservation Interface
The network query interfaces previously discussed allows the
application layer to find out about the options, costs, and
capabilities available from the network layer in a suitably high
level but actionable format. However, it remains to specify an
interface for the application layer to communicate its usage intent
to the network layer and possibly make firm commitments for scarce
network resources. Before delving into this interface we first look
a bit at what happens behind the scenes in high bandwidth networks.
4.4.1. IP Bypass/Traffic Engineering
There are various ways to alter the path that IP flows take through
a network. Two IETF standard ways are via DiffServe [16] and MPLS-TE
[17]. Both mechanism start with IP packet classification but in
MPLS-TE a packet belonging to a flow matching an MPLS forwarding
equivalence class (FEC) will be "pulled" from normal IP packet
forwarding and place in a MPLS tunnel, known as a label switched
path. It will then be forwarded on via MPLS mechanisms bypassing the
IP layer until it "pops" out of its MPLS tunnel and rejoins the IP
forwarding world (hopefully much closer to its intended destination
and making better use of network resources along the way).
In the SONET, SDH, G.709, and WDM world a similar process can take
place, but is known by the term "grooming" [11],[12]. In both cases
network resources including bandwidth, equipment, and power can be
significantly optimized by essentially setting up "express lanes" at
a lower layer in the network's protocol stack. Note that with
optical transport networks there can many layers below "layer 2",
i.e., one can think of the "physical" layer as possibly consisting
of a number of different sub-layers.
If the application layer by knowing its usage patterns or required
network usage can let the network its needs then IP/Optical bypass
can be more readily be performed on a dynamic basis, particularly if
the network has a GMPLS infrastructure. The application layer should
not need to know the specifics of how the IP bypass occurs, e.g.,
via MPLS, OpenFlow, or other standard or proprietary techniques.
Bernstein & Lee Expires September 12, 2012 [Page 18]
Internet-Draft Cross Stratum Optimization Use-cases March 2012
4.4.2. High Bandwidth Reservation/Recovery Interface
As previously stated the application layer should not be exposed to
the details of networking mechanisms that will provide the bandwidth
and QoS guarantees. Hence the application layer would specify its
demands in terms of IP flows such as when specifying an MPLS FEC. It
is for further study whether some IaaS applications may want to deal
with layer 2 (Ethernet) flows rather than IP. In either case the
basic principles would be the same. Note that a bandwidth
reservation interface such as this could also be used when
application layer is seeking network help in dealing with disaster
recovery and business continuity.
A number of current protocols come close to the features desired of
such an interface, but none are completely appropriate. A short
summary follows:
(a) PCE: The PCE interface takes requests for connections with
various optimization conditions supported. PCEs though return the
computed paths to the requester, something of which is undesired in
our reservation interface. Note that PCE is built directly on TCP.
(b) UNIs (GMPLS and OIF): UNIs provide RSVP-TE based signaling
interfaces for connection requests at a particular layer. Such
interfaces expect the requester to know something about the network
layers being utilized. Typically, if these are used, they are used
between access and core network equipment.
(c) Cloud IaaS interfaces for reserving instances: These are
typically RESTful or XML-RPC type interfaces. With these interfaces
compute, storage and other IaaS related resources are requested
(setup/teardown).
We note that currently such an interface is currently out of the
scope of ALTO or any current IETF working group. One reason to
consider this within ALTO is the tight coupling between the network
information (PIDs, network map, cost map, capacitated graph) and
requests that would be made by the application layer. In the high
bandwidth case both query and reservation have similar
security/privacy requirements.
5. Conclusion
In this draft we have discussed two generic use cases that motivate
the usefulness of general interfaces for cross stratum optimization
in the network core. In our first use case network resource usage
became significant due to the aggregation of many individually
unique client demands. While in the second use case where data
Bernstein & Lee Expires September 12, 2012 [Page 19]
Internet-Draft Cross Stratum Optimization Use-cases March 2012
centers were communicating with each other bandwidth usage was
already significant enough to warrant the use of private line/LAN
type of network services.
Both use cases result in optimization problems that trade off
computational versus network costs and constraints. Both featured
scenarios where advanced reservation, on demand, and recovery type
service interfaces could prove beneficial. In the later section of
this document we showed how ALTO concepts [1] and the ALTO protocol
could be used and extended to support joint application network
optimization for large network bandwidth consuming applications.
6. Security Considerations
TBD
7. IANA Considerations
This informational document does not make any requests for IANA
action.
8. References
8.1. Informative References
[1] "draft-ietf-alto-reqs-09." [Online]. Available:
http://datatracker.ietf.org/doc/draft-ietf-alto-reqs/. [Accessed:
17-May-2011].
[2] J. Medved, N. Bitar, S. Previdi, B. Niven-Jenkins, and G. Watson,
"Use Cases for ALTO within CDNs." [Online]. Available:
http://tools.ietf.org/html/draft-jenkins-alto-cdn-use-cases-02.
[Accessed: 06-Mar-2012].
[3] E. Mannie, Ed., "Generalized Multi-Protocol Label Switching (GMPLS)
Architecture, RFC 3945." Oct-2004.
[4] Y. Lee, G. Bernstein, and W. Imajuku, Eds., "Framework for GMPLS
and PCE Control of Wavelength Switched Optical Networks (WSON), RFC
6163." Apr-2011.
[5] A. Farrel, J. P. Vasseur, and J. Ash, "A Path Computation Element
(PCE)-Based Architecture, RFC 4655." Aug-2006.
[6] G. Swallow, J. Drake, H. Ishimatsu, Y. Rekhter,, "Generalized
Multiprotocol Label Switching (GMPLS) User-Network Interface (UNI):
Resource ReserVation Protocol-Traffic Engineering(RSVP-TE) Support
for the Overlay Model, RFC 4208," Oct-2005.
Bernstein & Lee Expires September 12, 2012 [Page 20]
Internet-Draft Cross Stratum Optimization Use-cases March 2012
[7] Y. R. Yang, R. Alimi, and R. Penno, "ALTO Protocol." [Online].
Available: http://tools.ietf.org/html/draft-ietf-alto-protocol-10.
[Accessed: 05-Mar-2012].
[8] M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. Katz, A.
Konwinski, G. Lee, D. Patterson, A. Rabkin, I. Stoica, and M.
Zaharia, "A view of cloud computing," Commun. ACM, vol. 53, pp. 50-
58, Apr. 2010.
[9] K. A. Hua and S. Sheu, "Skyscraper broadcasting: a new broadcasting
scheme for metropolitan video-on-demand systems," in Proceedings of
the ACM SIGCOMM '97 conference on Applications, technologies,
architectures, and protocols for computer communication, Cannes,
France, 1997, pp. 89-100.
[10] "Adobe Flash Media Server 4.0 * Building peer-assisted networking
applications." [Online]. Available:
http://help.adobe.com/en_US/flashmediaserver/devguide/WSa4cb07693d12
3884520b86f312a354ba36d-8000.html. [Accessed: 13-May-2011].
[11] Rudra Dutta and George N. Rouskas, "Traffic grooming in WDM
networks: Past and future," IEEE Network, vol. 16, no. 6, pp. 46 -
56, 2002.
[12] Keyao Zhu and B. Mukherjee, "Traffic grooming in an optical WDM
mesh network," Selected Areas in Communications, IEEE Journal on,
vol. 20, no. 1, pp. 122-133, 2002.
[13] G. Bernstein, B. Rajagopalan, and D. Saha, Optical Network
Control: Architecture, Protocols, and Standards. Addison-Wesley
Professional, 2003.
[14] B. Awerbuch and Y. Shavitt, "Topology aggregation for directed
graphs," Networking, IEEE/ACM Transactions on, vol. 9, no. 1, pp.
82-90, 2001.
[15] S. Uludag, K.-S. Lui, K. Nahrstedt, and G. Brewster, "Analysis of
Topology Aggregation techniques for QoS routing," ACM Comput. Surv.,
vol. 39, Sep. 2007.
[16] K. Nichols, D. L. Black, S. Blake, and F. Baker, "Definition of
the Differentiated Services Field (DS Field) in the IPv4 and IPv6
Headers." RFC2747. Available: http://tools.ietf.org/html/rfc2474.
[17] D. O. Awduche and J. Agogbua, "Requirements for Traffic
Engineering Over MPLS." RFC2702. Available:
http://tools.ietf.org/html/rfc2702.
Bernstein & Lee Expires September 12, 2012 [Page 21]
Internet-Draft Cross Stratum Optimization Use-cases March 2012
Author's Addresses
Greg M. Bernstein
Grotto Networking
Fremont California, USA
Phone: (510) 573-2237
Email: gregb@grotto-networking.com
Young Lee
Huawei Technologies
5340 Legacy Drive, Building 3
Plano, TX 75024
USA
Phone: (469) 277-5838
Email: leeyoung@huawei.com
Intellectual Property Statement
The IETF Trust takes no position regarding the validity or scope of
any Intellectual Property Rights or other rights that might be
claimed to pertain to the implementation or use of the technology
described in any IETF Document or the extent to which any license
under such rights might or might not be available; nor does it
represent that it has made any independent effort to identify any
such rights.
Copies of Intellectual Property disclosures made to the IETF
Secretariat and any assurances of licenses to be made available, or
the result of an attempt made to obtain a general license or
permission for the use of such proprietary rights by implementers or
users of this specification can be obtained from the IETF on-line
IPR repository at http://www.ietf.org/ipr
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement
any standard or specification contained in an IETF Document. Please
address the information to the IETF at ietf-ipr@ietf.org.
Disclaimer of Validity
All IETF Documents and the information contained therein are
provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION
Bernstein & Lee Expires September 12, 2012 [Page 22]
Internet-Draft Cross Stratum Optimization Use-cases March 2012
HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY,
THE IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL
WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY
WARRANTY THAT THE USE OF THE INFORMATION THEREIN WILL NOT INFRINGE
ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS
FOR A PARTICULAR PURPOSE.
Acknowledgment
Funding for the RFC Editor function is currently provided by the
Internet Society.
Bernstein & Lee Expires September 12, 2012 [Page 23]