[Search] [txt|pdf|bibtex] [Tracker] [Email] [Diff1] [Diff2] [Nits]

Versions: 00 01 02                                                      
Network Working Group                                    Greg Bernstein
Internet Draft                                        Grotto Networking
Intended status: Informational                                Young Lee

                                                          July 16, 2012

      Use Cases for High Bandwidth Query and Control of Core Networks


Status of this Memo

   This Internet-Draft is submitted to IETF in full conformance with
   the provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-

   Internet-Drafts are draft documents valid for a maximum of six
   months and may be updated, replaced, or obsoleted by other documents
   at any time.  It is inappropriate to use Internet-Drafts as
   reference material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at

   The list of Internet-Draft Shadow Directories can be accessed at

   This Internet-Draft will expire on January 16, 2011.

Copyright Notice

   Copyright (c) 2012 IETF Trust and the persons identified as the
   document authors. All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document. Please review these documents

Bernstein & Lee, et al. Expires January 16, 2013               [Page 1]

Internet-Draft   Cross Stratum Optimization Use-cases         July 2012

   carefully, as they describe your rights and restrictions with
   respect to this document.


   This draft describes two generic use-cases that illustrate
   application layer traffic optimization applied to high bandwidth
   core networks.  The type of information and interactions needed to
   perform various optimizations is described. In addition extensions
   to the existing ALTO protocol widely applicable to any high
   bandwidth applications are suggested.  These include bandwidth
   constraint representations for a diverse range of control and data
   plane technologies as well as advanced filtering based on

Table of Contents

   1. Introduction...................................................3
      1.1. Computing Clouds, Data Centers, and End Systems...........4
   2. End System Aggregate Networking................................5
      2.1. Aggregated Bandwidth Scaling..............................5
      2.2. Cross Stratum Optimization Example........................6
      2.3. Data Center and Network Faults and Recovery...............7
   3. Data Center to Data Center Networking..........................8
      3.1. Cross Stratum Optimization Examples.......................9
      3.2. Network and Data Center Faults and Reliability............9
   4. Cross Stratum Control Interfaces..............................10
   5. Potential ALTO Protocol Extensions............................11
   6. Bandwidth Constraint Information..............................12
      6.1. Introduction.............................................12
         6.1.1. Example Network: Providers View.....................13
      6.2. Data and Control Plane Path Choices......................14
      6.3. ALTO Extensions..........................................15
         6.3.1. Mutually Constrained Paths..........................15
   Simple IP Network Example......................16
   TDM Network Example............................16
   JSON Encoding..................................18
         6.3.2. Cost-Capacity Graphs................................18
   Simple TDM Example with Graph Reduction........19
   Ethernet MSTP Example with Multiple Graphs.....20
   JSON Encoding..................................23
   7. Constraint Based Filtering....................................24
   8. Conclusion....................................................24
   9. Security Considerations.......................................24
   10. IANA Considerations..........................................25
   11. References...................................................25
      11.1. Informative References..................................25

   Bernstein & Lee         Expires January 16, 2013   [Page 2]

Internet-Draft   Cross Stratum Optimization Use-cases         July 2012

   Author's Addresses...............................................27
   Intellectual Property Statement..................................27
   Disclaimer of Validity...........................................27

1. Introduction

   Cloud Computing, network applications, software as a service (SaaS),
   Platform as a service (PaaS), and Infrastructure as a Service
   (IaaS), are just a few of the terms used to describe situations
   where multiple computation entities interact with one another across
   a network.   When the communication resources consumed by these
   interacting entities is significant compared with link or network
   capacity then opportunities may exist for more efficient utilization
   of available computation and network resources if both computation
   and network stratums cooperate in some way. The application layer
   traffic optimization (ALTO) working group is tackling the similar
   problem of "better-than-random peer selection" for distributed
   applications based on peer to peer (P2P) or client server
   architectures [1]. In addition, such optimization is important in
   content distribution networks (CDNs) as illustrated in [2].

   In the network stratum, particularly at the lower layers such as
   MPLS and optical, there are many restoration and recovery mechanisms
   to deal with network faults. The emergence of network based
   applications or cloud based disaster recovery/business recovery
   brings a new dimension to fault management, but also opportunities
   to more efficiently deliver higher levels of reliability. For
   example, the reliability requirements for mission critical
   applications are typically quantified by two key time parameters.
   The first is the Recovery Time Objective (RTO) which is the time to
   get the application back up and functioning and is similar to
   network recovery time notions. The second is the Recovery Point
   Objective (RPO) which quantifies in terms of time the amount of data
   loss that can be tolerated when a disaster occurs. Different
   applications and organizations can have greatly different demands
   from miliseconds to 12 hours. In addition, the amount of data that
   may need to be transferred to meet these objectives can vary greatly
   amongst different application types. With recover point objectives
   of, say an hour or more, a dynamic optical network layer could be
   very efficiently shared so as to reduce the overall cost to achieve
   a given layer of reliability. However, to do so requires cooperation
   between application and network stratum.

   General multi-protocol label switching (GMPLS) [3] can and is being
   applied to various core networking technologies such as SONET/SDH
   and wavelength division multiplexing (WDM) [4]. GMPLS provides

   Bernstein & Lee         Expires January 16, 2013   [Page 3]

Internet-Draft   Cross Stratum Optimization Use-cases         July 2012

   dynamic network topology and resource information, and the
   capability to dynamically allocate resources (provision label
   switched paths). Furthermore, the path computation element (PCE) [5]
   provides for traffic engineered path optimization.

   However, neither GMPLS nor PCE provide interfaces that are
   appropriate for an application layer entity to use for the following

     . GMPLS routing exposes full network topology information which
        tends to be proprietary to a carrier or require specialized
        knowledge and techniques to make use of, e.g., the routing and
        wavelength assignment (RWA) problem in WDM networks [4].

     . Core networks typically consist of two or more layers, while
        applications are typically only know about the IP layer and
        above. Hence applications would not be able to make direct use
        of PCE capabilities.

     . GMPLS signaling interfaces are defined for either peer GMPLS
        nodes or via a user network interface (UNI) [6]. Neither of
        these are appropriate for direct use by an application entity.

   In this paper we discuss two general use-cases that can generate
   core network flows with significant bandwidth and may vary
   significantly over time. The "cross stratum optimization" problems
   generated by these use cases are discussed. Finally, we look at
   interfaces between the application and network "stratums" that can
   enable these types of optimizations and how they can be created via
   extensions to the current ALTO protocol[7].

1.1. Computing Clouds, Data Centers, and End Systems

   While the definition of cloud computing or compute clouds is
   somewhat nebulous (or "foggy" if you will) [8], the physical
   instantiation of compute resources with network connectivity is very
   real and bounded by physical and logical constraints. For the
   purposes of this draft, we will call any network connected compute
   resources a data center if its network connectivity is significant
   compared either to the bandwidth of an individual WDM wavelength or
   with respect to the network links in which it is located. Hence we
   include in our definition very large data centers that feature
   multiple fiber access and consume more than 10MW of power, moderate
   to large content distribution network (CDN) installations located in
   or near major internet exchange points, medium sized business
   centers, etc...

   Bernstein & Lee         Expires January 16, 2013   [Page 4]

Internet-Draft   Cross Stratum Optimization Use-cases         July 2012

   We will refer to those computational entities that don't meet our
   bandwidth criteria for a data center as an "end system".

2. End System Aggregate Networking

   In this section we consider the fundamental use case of end systems
   communicating with data centers as shown in Figure 1. In this figure
   the "clients" are end systems with relatively small access bandwidth
   compared to a WDM wavelength, e.g., under 100Mbps. We show these
   clients roughly partitioned into three network related end user
   regions ("A", "B", and "C"). Given a particular network application,
   in a static network application situation, each client in a region
   would be associated with a particular data center.

                                           Region B
                             +---------+  +------+
                             |  Data   |  |Client|
                             |Center 2 |  |  B1  |+------+
             +------+        +----+----+  +--+---+|Client|
             |Client|             |         /     |  B2  |
             |  A1  `.         _.-+--------+-.    +--+---+
   Region A  +------+ `-.  ,-''               `--.  /   ...
        +------+        ,`:                       `+.     +------+
        |Client|       /                             \    |Client|
        |  A2  +------+                               \---+  BM  |
        +------+     (             Network             )  +------+
         ...        .-'                               /
     +------+   _.-'   \                             `.
     |Client|.-'        `=.                       ,-'  `.
     |  AN  |       _.-''  `--.               _.-\   +---`.----+
     +------+ +----'----+      `----+------+''    \  |  Data   |
              |  Data   |           |       \      | |Center 3 |
              |Center 1 |        +--+---+ +--+---+ \ +---------+
              +---------+        |Client| |Client|  \------+
                                 |  C1  | |  C2  |  |Client|
                                 +------+ +------+  |  CK  |
                                       Region C     +------+

            Figure 1. End system to data center communications.

2.1. Aggregated Bandwidth Scaling

    One of the simplest examples where the aggregation of end system
   bandwidth can quickly become significant to the "network" is for
   video on demand (VoD) streaming services. Unlike a live streaming
   service where IP or lower layer multicast techniques can be
   generally applied, in VoD the transmissions are unique between the
   data center and clients. For regular quality VoD we'll use an

   Bernstein & Lee         Expires January 16, 2013   [Page 5]

Internet-Draft   Cross Stratum Optimization Use-cases         July 2012

   estimate of 1.5Mbps per stream (assuming H.264 coding), for HD VoD
   we'll use an estimate of 10Mbps per stream. To fill up a 10Gbps
   capacity optical wavelength requires either 6,666 or 1,000 clients
   for regular or high definition respectively.  Note that special
   multicasting techniques such as those discussed in [9] and peer
   assistance techniques such as provided in some commercial systems
   [10] can reduce the overall network bandwidth requirements.

    With current high speed internet deployment such numbers of clients
   are easily achieved; in addition demand for VoD services can vary
   significantly over time, e.g., new video releases, inclement weather
   (increases number of viewers), etc...

2.2. Cross Stratum Optimization Example

    In an ideal world both data centers and networks would have
   unlimited capacity, however in actuality both can have constraints
   and possibly varying marginal costs that vary with load or time of
   day.  For example suppose that in Figure 1 that Data Center 3 has
   been primarily serving VoD to region "C" but that it has, at a
   particular period in time, run out of computation capacity to serve
   all the client requests coming from region "C". At this point we
   have a fundamental cross stratum optimization (CSO) problem. We want
   to see if we can accommodate additional client request from region
   "C" by using a different data center than the fully utilized data
   center #3. To answer this questions we need to know (a) available
   capacity on other data centers to meet a request, (b) the marginal
   (incremental) cost of servicing the request on a particular data
   center with spare capacity, (c) the ability of the network to
   provide bandwidth between region "C" to a data center, and (d) the
   incremental cost of bandwidth from region "C" to a data center.

   Bernstein & Lee         Expires January 16, 2013   [Page 6]

Internet-Draft   Cross Stratum Optimization Use-cases         July 2012

                                             Region B
                               +---------+  +------+
                               |  Data   |  |Client|
                               |Center 2 |  |  B1  |+------+
               +------+        +----+----+  +--+---+|Client|
               |Client|             |         /     |  B2  |
               |  A1  `.         _.-+--------+-.    +--+---+
     Region A  +------+ `-.  ,-'' XXXXX     XX  `--.  /   ...
          +------+        ,`:       ``---..__ XXXX  `+.     +------+
          |Client|       /  X        |       ```--XX   \    |Client|
          |  A2  +------+..X`.       \              XX--+---+  BM  |
          +------+     (  X   `-/     \                  )  +------+
           ...        .-'     .'       |        +----.X /
       +------+   _.-'   \  X/         \        |    X `.
       |Client|.-'        `=.X          \      XXXX ,-'  `.
       |  AN  |       _.-''  `--.    XXXXXXXXX  _.-\   +---`.----+
       +------+ +----'----+      `----+------+''    \  |  Data   |
                |  Data   |           |       \      | |Center 3 |
                |Center 1 |        +--+---+ +--+---+ \ +---------+
                +---------+        |Client| |Client|  \------+
                                   |  C1  | |  C2  |  |Client|
                                   +------+ +------+  |  CK  |
                                         Region C     +------+

     Figure 2. Aggregated flows between end systems and data centers.

   In Figure 2 we show a possible result of solving the previously
   mentioned CSO problem. Here we show the additional client requests
   from region "C" being serviced by data center #2 across the network.
   Figure 2 also illustrates the possibility of setting up "express"
   routes across the network at the MPLS level or below. Such
   techniques, known as "optical grooming" or "optical bypass"[11],[12]
   at the optical layer, can result in significant equipment and power
   savings for the network by "bypassing" higher level routers and

2.3. Data Center and Network Faults and Recovery

    Data center failures, whether partial or complete, can have a major
   impact on revenues in the VoD example previously described. If there
   is excess capacity in other data centers within the network
   associated with the same application then clients could be
   redirected to those other centers if the network has the capacity.
   Moreover, MPLS and GMPLS controlled networks have the ability to
   reroute traffic very quickly while preserving QoS. As with general
   network recovery techniques [13] various combinations of pre-

   Bernstein & Lee         Expires January 16, 2013   [Page 7]

Internet-Draft   Cross Stratum Optimization Use-cases         July 2012

   planning and "on the fly" approaches can be used to tradeoff between
   recovery time and excess network capacity needed for recovery.

    In the case of network failures there is the potential for clients
   to be redirected to other data centers to avoid failed or over
   utilized links.

3. Data Center to Data Center Networking

    There are a number of motivations for data center to data center
   communications: on demand capacity expansion ("cloud bursting"),
   cooperative exchanges between business partners, offsite data
   backup, "rent before building", etc... In Figure 3 we show an
   example where a number of businesses each with an "internal data
   center" contracts with a large external data center for additional
   computational (which may include storage) capacity. The data centers
   may connect to each other via IP transit type services or more
   typically via some type of Ethernet virtual private line or LAN

                         |                   |
                         | Large Data Center |
                         |                   |
                             ,--''               `---.
                          ,-'                         `-.
                        ,'                               `.
                      ,'                                   `.
     +--------+      ;                Network                :
     |Business|  __..+                                       |
     | #1 DC  +-'    :                                       ;
     +--------+       `.                                   ,'
                        `.                               ;:
                          `-.                         ,-'  \
                             `---.               _.--'   +--`.----+
                                  `+-----------''        |Business|
                                   /                     | #N DC  |
                                  |                      +--------+
                             | #2 DC  |

          Figure 3. Basic data center to data center networking.

   Bernstein & Lee         Expires January 16, 2013   [Page 8]

Internet-Draft   Cross Stratum Optimization Use-cases         July 2012

3.1. Cross Stratum Optimization Examples

    In the DC-to-DC example of Figure 3 we can have computational
   constraints/limits at both local and remote data centers; fixed and
   marginal computational costs at local and remote data centers; and
   network bandwidth costs and constraints between data centers. Note
   that computing costs could vary by the time of day along with the
   cost of power and demand. Some cloud providers have quite
   sophisticated compute pricing models including: reserved, on demand,
   and spot (auction) variants.

    In addition, to possibly dynamically changing pricing, traffic
   loads between data centers can be quite dynamic. In addition, data
   movement between data centers is another source of large network
   usage variation. Such peaks can be due to scheduled daily or weekly
   offsite data backup, bulk VM migration to a new data center,
   periodic virtual machine migration, etc...

3.2. Network and Data Center Faults and Reliability

    For networked applications that require high levels of
   reliability/availability the network diagram of Figure 4 could be
   enhanced with redundant business locations and external data centers
   as shown in Figure 4. For example cell phone subscriber databases
   and financial transactions generally require what is called
   geographic database replication and results in extra communication
   between sites supporting high availability. For example if business
   #1 in Figure 4 required a highly available database related service
   then there would be an additional communication flows from the data
   center "1a" to data center "1b".  Furthermore, if business #1 has
   outsourced some of its computation and storage needs to independent
   data center X then for resilience it may want/need to replicate
   (hot-hot redundancy) this information at independent data center Y.

   Bernstein & Lee         Expires January 16, 2013   [Page 9]

Internet-Draft   Cross Stratum Optimization Use-cases         July 2012

              +-------------+              +-------------+
              |Independent  |              |Independent  |
              |Data Center X|              |Data Center Y|
              +-----+-------+              +------+------+
                     \                           /
                      `.     _.------------.   .'
                        \--''               `-+-.
                     ,-'                         `-.       +--------+
                   ,'                               `.    .'Business|
                 ,'                                   `.-' |#N DC-a |
                ;                Network                :  +--------+
    +--------+  |                                       |
    |Business+---                                       ;
    |#1 DC-a |   `.                                   +:
    +--------+     `.                               ;/  \
                     `-.                         ,-'     `.
                      .'`---.               _.--'       +--`.----+
        +--------+   /       `+-+---------\'            |Business|
        |Business| .'           |          \            |#N DC-a |
        |#1 DC-b .'             /           \           +--------+
        +--------+             |             \
                          +----+---+    +--------+
                          |Business|    |Business|
                          |#2 DC-a |    |#2 DC-b |
                          +--------+    +--------+

     Figure 4. Data center to data center networking with redundancy.

4. Cross Stratum Control Interfaces

    Two types of load balancing techniques are currently utilized in
   cloud computing. The first is load balancing within a data center
   and is sometimes referred to as local load balancing. Here one is
   concerned with distributing requests to appropriate machines (or
   virtual machines) in a pool based on the current machine
   utilization. The second type of load balancing is known as global
   load balancing and is used to assign clients to a particular data
   center out of a choice of more than one within the network and is
   our concern here.  A number of commercial vendors offer both local
   and global load balancing products.  Currently global load balancing
   systems have very little knowledge of the underlying network. To
   make better assignments of clients to data centers many of these
   systems use geographic information based on IP addresses. Hence we
   see that current systems are attempting to perform cross stratum
   optimization albeit with very coarse network information. A more

   Bernstein & Lee         Expires January 16, 2013   [Page 10]

Internet-Draft   Cross Stratum Optimization Use-cases         July 2012

   complete interface for CSO in the client aggregation case that is
   also applicable in the "data center to data center" case would be:

       1. A Network Query Interface - Where the global load balancer
          can inquire as to the bandwidth availability between "client
          regions" and data centers.

       2. A Network Resource Reservation Interface - Where the global
          load balancer can make explicit requests for bandwidth
          between client regions and data centers.

       3. A Fault Recovery Interface - For the global load balancer to
          make requests for expedited bulk rerouting of client traffic
          from one data center to another. Or for the network layer to
          make requests to the application to help deal with network

    The network query interface can be considered a superset of the
   functionality supported by the current ALTO protocol [7]. Potential
   extensions to ALTO for this purpose are given in the next section.

5. Potential ALTO Protocol Extensions

   This section discusses the applicability of the ALTO protocol and
   necessary extensions to support a network query interface suitable
   for high bandwidth consuming applications. Before doing so we
   discuss general properties of the high bandwidth scenarios that may
   differ significantly from other uses of the ALTO protocol.

   The first has to do with scope and scale. The consumer of high
   bandwidth alto extensions is typically some type of application
   controller within a data center, as opposed to an individual end
   user. The number of such entities with a need for the high bandwidth
   related information is orders of magnitude smaller than, say, peer
   to peer networking users, or applications closer to the end user.
   Since a network provider may consider this information sensitive,
   there may be a desire to limit its distribution to a "pre-
   registered" set of entities. Hence these extensions would be
   applicable to controlled or partially controlled environments.

   Secondly, there is the notion of time scales. In cloud services we
   already see variants such as "on demand" compute instances and
   "reserved" compute instances. For network resource queries we may be
   concerned with (a) current bandwidth availability, (b) bandwidth
   availability at a future time, or (c) bandwidth for a bulk data

   Bernstein & Lee         Expires January 16, 2013   [Page 11]

Internet-Draft   Cross Stratum Optimization Use-cases         July 2012

   transfer of a given amount that must take place within a given time

   Time-dependent bandwidth information can be and typically are
   considered in network planning and provisioning systems. For
   example, a VoD provider knows ahead of time when the latest
   "blockbuster" film will be available via its service and can make
   estimates based on historical data on the bandwidth that it will
   need to deal with the subsequent demand. The following discussions,
   however, are restricted to "current time" for now.

   Finally another goal in the design of an interface between the
   application and networking stratums is to minimize the need for
   either stratum to know too much about the inner workings of the
   other. Hence as much as possible it is desired to insulate the
   applications stratum from technology specifics of the network. That
   said, data centers providing IaaS may prefer to specify flows and
   connectivity at a layer below IP such as Ethernet.

   The key ALTO extensions useful for querying the network for high
   bandwidth consuming applications are:

   (a)  Bandwidth Constraint Information
   (b)  Constraint Based Filtering
   (c)  Multi-cost information [MultiCost]
   (d)  Endpoint Access Bandwidth Capacity (a new endpoint property)

   In the following sections we discuss (a) and (b).

6. Bandwidth Constraint Information

6.1. Introduction

   The amount of bandwidth of available between two entities or two
   sets of entities can be of prime interest to applications that have
   stringent bandwidth requirements relative to a networks capacity.
   Such entities can be communicating across a WAN, a metro area, a
   LAN, or even within a compute cluster.

   One may want to query the network as to the available bandwidth in a
   number of different cases:

   (a)   Bandwidth available between a single source destination pair

   (b)   Bandwidth between one particular source and several other

   Bernstein & Lee         Expires January 16, 2013   [Page 12]

Internet-Draft   Cross Stratum Optimization Use-cases         July 2012

   (c)   Bandwidth between one set of sources and another set of

   Case (a), bandwidth between two points, is well defined, however, in
   cases (b) and (c) there is some ambiguity.  In cases (b) and (c) one
   may want to the query for the bandwidth available to a single "flow"
   at a time, or for multiple simultaneous "flows" between sources and

   If the bandwidth query is for potentially simultaneous flows then
   there is the possibility that the flows of interest would (or could)
   share network resources, e.g., link capacity. Such a situation leads
   to what is known as a multi-commodity flow problem [NetOpt]. General
   formulations of this problem [NetOpt] allow for arbitrary path
   selection and can permit splitting of user demands across multiple
   paths if inverse multiplexing like techniques are available.
   Alternative formulations of multi-commodity flow problems exist
   [RWA] when path choices between a source and destination are
   restricted to an explicit list of paths (or a single path). In both
   formulations link capacities form a key optimization constraint.

   To perform better application layer traffic optimization, the
   presence and capacity of such "mutual bottleneck" links would need
   to be considered by "large bandwidth applications". This draft shows
   how a combination of abstract path link vectors and/or constrained
   cost graph can be used to enable enhanced application layer traffic
   optimization. These techniques are illustrated with connectionless
   technologies such as IP and Ethernet, as well as MPLS and circuit
   switched technologies that can be controlled via GMPLS.

6.1.1. Example Network: Providers View

   In Figure 1 we show an example network consisting of five nodes and
   six links. This is the network provider's view of the network and
   not necessarily information to be shared in detail with
   applications. We will use this same network to illustrate bandwidth
   constraint representations for different technologies. For
   illustrative purposes we only consider a single weight (cost) and
   bandwidth constraint per link. The units of bandwidth could be Mbps,
   Gbps, or wavelengths depending upon the technology. These costs and
   constraints are from the network provider's perspective and may or
   may not be the sole guidance in path selection, e.g., non-shortest
   paths may be chosen depending upon data and control plane
   technologies. However, when considering a path between a source and
   destination across this network we sum the weights for each link
   along the path to obtain the total cost for the path.

   Bernstein & Lee         Expires January 16, 2013   [Page 13]

Internet-Draft   Cross Stratum Optimization Use-cases         July 2012

         +----+                  L0 Wt=10,BW=50         +----+
         | N0 |-----------------------------------------| N3 |
         +----+  `.                                     +----+
           |       `. L4 Wt=7                             |
           |         `-. BW=40                            |
           |            `.  +----+                        |
           |              `.| N4 |                        |
           | L1          .' +----+                        |
           | Wt=10      /                          L2     |
           | BW=45     /                           Wt=12  |
           |          /L5 Wt=10                    BW=30  |
           |        .'    BW=45                           |
           |       /                                      |
           |      /                                       |
         +----+ .'              L3 Wt=15 BW=42          +----+
         | N1 |.........................................| N2 |
         +----+                                         +----+
               Figure 1 Generic Constrained Network Example

6.2. Data and Control Plane Path Choices

   In this section we survey common data and control plane technologies
   with respect to the path choices that they may allow as well as the
   methods one can use to infer available paths. Methods for inferring
   paths influence how efficient the network layer can convey cost and
   constraint information to the application layer, i.e., even if the
   control plane limits us to a single fixed path between a source an
   destination, if we need many paths between many sources and
   destinations it can be very efficient if such information can be
   derived from a simple graph representation.

   Technologies that allow arbitrary placement of paths across a
   network include: circuit switched technologies (WDM, TDM), strictly
   connection oriented packet technologies (MPLS, ATM, and Frame
   Relay), and connection oriented modes of multi-purpose protocols
   such as InfiniBand's CO service. In these cases a network provider
   can furnish a graph representation of the network suitable for the
   application optimizer to choose routes.  In some cases, for example,
   in WDN networks due to optical impairments, the usable paths may be
   restricted in a way not readily discerned from a simple graph
   representation. In such a case a list of possible paths would need
   to be furnished.

   Bernstein & Lee         Expires January 16, 2013   [Page 14]

Internet-Draft   Cross Stratum Optimization Use-cases         July 2012

   For IP, a connectionless technology, one typically thinks of a
   single path between each source and destination (not considering
   equal cost multipath).  Although no choice in path selection is
   available, in the case of single area OSPF the paths can be derived
   from a graph, while BGP [BGP4] uses techniques based on policies and
   path vectors (AS_PATH) as part of its route selection process and
   these are not derived from graphs. Multi-Topology Routing
   enhancements to OSPF[MT-OSPF] can allow multiple path choices
   between a source and destination and such paths could be derived
   from their corresponding graphs.

   Ethernet switching offers the greatest variety of path selection
   capabilities depending upon the control plane employed. The basic
   Ethernet Bridge specifications in 802.1D [802.1D] utilizes a single
   tree structure as the communication backbone between all nodes.
   Hence, one has no choice in path between nodes and the paths can be
   easily derived from a graph of the spanning tree.  We will also see
   that such graphs are easy to reduce. IEEE 802.1Q [802.1Q] includes
   virtual LANs (VLANs) and allows for multiple spanning trees. The
   multiple spanning tree protocol (MSTP) allows for the assignment of
   VLANs to trees.  Hence we have more than one choice in paths but all
   flows within the same VLAN have to share the same tree. Note that
   trees can be given as graphs so this is a case where we may want
   multiple graphs.

   OpenFlow [OpenFlow] capable switches permit general forwarding
   behavior based on general packet header matching. These can include
   Ethernet destination and source addresses, IP destination and source
   addresses, as well as other protocol related fields.  Since both
   source and destination information can be utilized in forwarding
   OpenFlow can enable traffic engineering like a connection oriented
   packet switching technology. Hence arbitrary path selection based on
   a graph is possible.

6.3. ALTO Extensions

   In this section we show give two different models for representing
   bandwidth constraints, give several examples of both approaches, and
   furnish an initial JSON encoding for both approaches. We end this
   section with a discussion of which approach a network provider may
   want to choose within a given context.

6.3.1. Mutually Constrained Paths

   As discussed in section 6.2. the network's data or control plane may
   dictate the paths taken between a source and destination. Even if
   such paths could be derived from a graph, the network provider may
   choose to provide information about the paths to promote information

   Bernstein & Lee         Expires January 16, 2013   [Page 15]

Internet-Draft   Cross Stratum Optimization Use-cases         July 2012

   hiding or to minimize the amount of information needed to be
   transferred via ALTO. For example if the application is asking for
   cost/capacity information between a few sources and destinations
   providing path information for these few paths may take much less
   space than a corresponding graph.

   In the following we give examples of paths with shared link
   bandwidth constraints for two different technologies then we provide
   a tentative JSON encoding for use with the ALTO protocol. Simple IP Network Example

   Consider Figure 1 as a single OSPF area with N0 representing a large
   data center and nodes N2 and N3 as potential clients. The
   corresponding path link vectors with their corresponding cost (sum
   of weights) and link bandwidth constraints:

   Path  Src-Dest    Path Vector    Path Cost
   P1    N0-N2:      {L0, L2}       22
   P2    N0-N3:      {L0}           10
   Link        Bandwidth
   L0             50
   L2             30

   Table 1. Path Vectors for paths P1 and P2, and used link capacities.

   From an optimization perspective each (capacitated) link is a
   potential traffic constraint. From Table 1 since the paths from N0-
   N2 and N0-N3 shared a common link, L0, the sum of their bandwidth
   flows must be less than the capacity of L0 (50 units). In addition,
   the capacity constraint on link L2 tell us that the bandwidth of the
   traffic from N0-N2 must be less than 30 units. This information, as
   well as the total costs of the two paths, is all that is needed for
   a constrained joint optimization to proceed. Detailed information on
   link costs (as seen by the network) is not necessary, nor is
   information on unused links. TDM Network Example

   Now suppose the network of Figure 1 is a TDM network controlled by
   GMPLS. Once again N0 representing a large data center and nodes N2
   and N3 as potential clients. However in this case the network
   provider offers an additional path, P3, for getting from N0-N2.

   Path  Src-Dest    Path Vector    Path Cost

   Bernstein & Lee         Expires January 16, 2013   [Page 16]

Internet-Draft   Cross Stratum Optimization Use-cases         July 2012

   P1    N0-N2       {L0, L2}       22
   P2    N0-N3       {L0}           10
   P3    N0-N2       {L1,L3}        25
   Link        Bandwidth
   L0             50
   L1             45
   L2             30
   L3             42

         Table 2. Path Vectors for P1-P3 and used link capacities.

   Once again no information in addition to that shown in Table 2 is
   required to perform a constrained optimization. However, path P3 is
   the only path using links L1 and L3. Link L3's capacity is 42 units
   and is less that link L1's capacity of 45 units. Satisfying link
   L3's capacity constraint (for the set of paths P1-P3) implies that
   link L1's capacity constraint is always satisfied and hence no
   information on link L1 needs to be sent from the network. In
   particular the network could send the information shown in Table 3
   where we have replaced links L1 and L3 with an "abstract link"
   (AL13) with capacity equal to that of link L3.

   Path  Src-Dest    Path Vector    Path Cost
   P1    N0-N2       {L0, L2}       22
   P2    N0-N3       {L0}           10
   P3    N0-N2       {AL13}         25
   Link        Bandwidth
   L0             50
   L2             30
   AL13           42

       Table 3. Path Vectors for P1-P3 and abstract link capacities.

   Note that simplifications such as the previous can frequently be
   performed and can result in significant information savings. Also
   this constraint information reduction was performed without the
   network provider having knowledge of the application layers traffic
   demands. Methods for performing these reductions may be specific to
   service providers and not subject to standardization.

   Bernstein & Lee         Expires January 16, 2013   [Page 17]

Internet-Draft   Cross Stratum Optimization Use-cases         July 2012 JSON Encoding

   In some cases there may be more than one path given between a source
   and destination.  In this case the network needs to furnish with
   each path the following information: (source, destination), (path id
   if more than one between source and destination), costs, overall
   path constraint (if any), and list of mutual abstract links for this
   path. In addition we need to furnish capacities for all mutual
   abstract links mentioned.

   object {
      PIDName source;
      PIDName dest;
      JSONNumber wt;    //A numerical path cost
      JSONNumber delay; //A numerical path latency, optional
      JSONNumber bw; //A numerical bandwidth constraint, optional
      LIDName mutual-links<1..*>; //shared constrained links, optional
   } PathData;

   Note that "mutual-links" is a JSON array that contains the names of
   the shared links that this path depends upon (may be empty). Note
   that all costs are associated with path entities, while constraints
   may be associated with paths or links.

   object {
      JSONNumber bw; //A numerical bandwidth constraint, optional
   } SharedAbstractLink;

   Note that the shared abstract link only contains capacity
   information. This is much different from the case where a graph is

   object {
      PathData [pathname]<0..*>; // The individual path info
      SharedAbstractLink [linkname]<0..*>; //Shared link info

   } NetworkPathData;

6.3.2. Cost-Capacity Graphs

   As discussed in section 6.2. the network's data or control plane may
   allow arbitrary path selection and hence a cost-capacity graph

   Bernstein & Lee         Expires January 16, 2013   [Page 18]

Internet-Draft   Cross Stratum Optimization Use-cases         July 2012

   representation would be needed for the optimization to fully take
   advantage of this network flexibility.

   In the case where path choice is limited, but the paths can be
   derived from a graph, it may be useful for the network to supply a
   graph to reduce the amount of information transferred via the ALTO
   protocol.  Suppose the application is interested in many source
   destination pairs. In this case the amount of path information
   including abstract link constraints could significantly exceed the
   information size of a graph.

   In the following we give examples of cost-capacity graphs for a
   technology (TDM) that can offer arbitrary path choice, and for a
   technology (MSTP Ethernet) that offers limited path choice but where
   specifying graphs can result in significant efficiencies, we then
   provide a tentative JSON encoding of cost-capacity graphs for use
   with the ALTO protocol. Simple TDM Example with Graph Reduction

   Consider again where Figure 1 represents a TDM network and in this
   case the provider will permit the application to make path choices.
   Suppose that the application only involves nodes N0, N1, and N2, and
   not N3 or N4. By studying the structure of the graph of Figure 1 one
   can derive the reduced graph shown in Figure 2 that maintains all
   relevant cost and capacity information from the point of view of
   nodes N0, N1, and N2.  In particular we were able to remove nodes N2
   and N4, substitute abstract link AL0M2 for links L0 and L2, and
   substitute abstract link AL4M5 for link L4 and L5. Note that any
   such reductions, approximate or exact, are at the network providers

        | N0 |-------------------------------------------+
        +----+  `.                     AL0M2             |
          |       `.                    Wt=22,BW=30      |
          |         `-.                                  |
          |            `.                                |
          |             | AL4M5                          |
          | L1          .  Wt=17,BW=40                   |
          | Wt=10      /                                 |
          | BW=45     /                                  |
          |          /                                   |
          |        .'                                    |
          |       /                                      |

   Bernstein & Lee         Expires January 16, 2013   [Page 19]

Internet-Draft   Cross Stratum Optimization Use-cases         July 2012

          |      /                                       |
        +----+ .'              L3 Wt=15 BW=42          +----+
        | N1 |.........................................| N2 |
        +----+                                         +----+
     Figure 2. Reduced graph of Figure 1 from the perspective of nodes

   The resulting information to be conveyed concerning this reduced
   graph is shown in Table 4.

   Link        End Nodes      Bandwidth   Cost
   AL0M2       (N0, N2)       50          22
   L1          (N0, N1)       45          10
   L3          (N1, N2)       42          15
   AL4M5       (N0, N1)       40          17

             Table 4. Representation of the graph of Figure 2. Ethernet MSTP Example with Multiple Graphs

   Consider the Ethernet network shown in Figure 3 running the MSTP
   with three multiple spanning tree instances define.  Suppose the
   application is interested in connectivity between nodes N1, N3, N5,
   N6, and N7. In Figures 4-6 we show the spanning tree instances along
   with a high fidelity graph reduction that removes nodes that are not
   of interest and abstracts links as needed.

   Let's compare these reduced graph representations with that of a
   path representation. Since we have n=5 communicating nodes of
   interest this leads to n*(n-1)/2 = 10 potential paths per MSTI that
   the network would need to furnish cost and constraint information as
   in section 6.3.1. In the case of graphs reduced for the nodes of
   interest from tree structures it can be proved that the number of
   links in the graph is equal to (n-1), e.g., the reduced graph
   consists of 5 nodes and 4 links.

               +----+      L4
              /| N3 |..______         +----+
             | +----+        `````----| N4 |..__   L6
             /                     .-'+----+    ``--.__      +----+
            /                   .-'      |             ``--..| N7 |
           | L2              .-'         |                   +----+
           /              .-'            /                 .'  |
          /             .'              |                 /    /
         |           .-'                /               .'    |

   Bernstein & Lee         Expires January 16, 2013   [Page 20]

Internet-Draft   Cross Stratum Optimization Use-cases         July 2012

         /        .-'  L9              |              .'      |
      +-+--+   .-'                     |         L11 /        /
      | N2 |.-'                     L5 /           .'        |
      +----+                          |           /          /L8
         \                            |         .'          |
          \ L1                        /       .'            |
           \                         |       /              /
            \                        /     .'              |
          +----+                    |    .'                /
          | N1 |.__    L3           |   /                +----+
          +----+   `--._            / .'             __..| N6 |
                        ``-.._    +----+     __..--''    +----+
                              ``-.| N5 |.--''  L7
                Figure 3. Ethernet Network supporting MSTP.

                 L4                                   AL4M6
                       +--+                                  +--+
          +--+   __..--|N4|`.                   +--+   __..--|N7|
          |N3|--'      +--+  \ L6               |N3|--'      +--+
          +--+                `.                +--+           |
           /                    `.               /             \
     L2   /                     +--+            /               |
        .'                      |N7|          .'AL1M2           \ L8
       /                        +--+         /                   |
     +--+     MSTI #1             /        +--+                  \
     |N2|                        /         |N1|                   |
     +--+                     L8|          +--+                   \
       \         (a)            /                    (b)         +--+
        |  L1                  /                               .'|N6|
        \                    +--+                      +--+  .'  +--+
         \                 .'|N6|                      |N5|.' L7
        +--+       +--+  .'  +--+                      +--+
        |N1|       |N5|.' L7
        +--+       +--+
    Figure 4. (a) Spanning tree instance #1, (b) Reduced graph from the
                 perspective of notes N1, N3, N5, N6, N7.

   Bernstein & Lee         Expires January 16, 2013   [Page 21]

Internet-Draft   Cross Stratum Optimization Use-cases         July 2012

         +--+   L4_..-|N4|                  +--+
         |N3|.--''    +--+                  |N3||
         +--+      .-'  |                   +--+\
                .-'     /                        |
            _.-'       |       +--+              \                +--+
         .-' L9        |       |N7|               |               |N7|
      .-'              /       +--+               \               +--+
    +--+              |          +         AL4M5   \                +
    |N2|           L5 /         |                   |              |
    +--+  MSTI #2    |       L8 /                   \           L8 /
                     |         /                     |            /
           (a)       /        /               (b)    \           /
                    |       +--+                      |        +--+
              L3    /     .'|N6|                      \      .'|N6|
       +--+       +--+  .'  +--+          +--+   L3  +--+  .'  +--+
       |N1|-------|N5|.' L7               |N1|-------|N5|.' L7
       +--+       +--+                    +--+       +--+
    Figure 5. (a) Spanning tree instance #2, (b) Reduced graph from the
                 perspective of notes N1, N3, N5, N6, N7.

         +--+ L4 __.|N4|`.                  +--+ AL4M6
         |N3|---'   +--+  \L6               |N3|.__
         +--+              `.               +--+   ``--...__
          /                  `.                             ``--..
     L2  /                   +--+                               +--+
       .'   MSTI #3         /|N7|                              /|N7|
      /                   .' +--+                            .' +--+
    +--+             L11 /    |                         L11 /    |
    |N2|                /     /                            /     /
    +--+     (a)      .'   L8/                  (b)      .'   L8/
                     /      |                           /      |
                    /       /                          /       /
                  .'      +--+                       .'      +--+
                 /        |N6|                      /        |N6|
     +--+  L3   +--+      +--+          +--+  L3   +--+      +--+
     |N1|.......|N5|                    |N1|.......|N5|
     +--+       +--+                    +--+       +--+
    Figure 6. (a) Spanning tree instance #2, (b) Reduced graph from the
                 perspective of notes N1, N3, N5, N6, N7.

   Bernstein & Lee         Expires January 16, 2013   [Page 22]

Internet-Draft   Cross Stratum Optimization Use-cases         July 2012

   In many data center applications all communicating virtual machines
   (VM) need to be place within the same VLAN. MSTP allows the
   assignment of VLANs to MSTIs hence a reduced graph representation
   can provide a very good mechanism for determining an optimum fit
   between communicating VM traffic patterns and MSTI VLAN assignment. JSON Encoding

   Like the current ALTO filtered cost map, a request for a cost-
   capacity graph would take source and destination PIDs as inputs. In
   JSON notation we could represent the return graph or graphs as an
   JSON object containing link objects. As we saw in the Ethernet case
   it may be useful to supply more than one graph. In addition
   restrictions on routing such as only the shortest path between
   source and destination is a valid route, e.g., OSPF routing for IP,
   or that all routes come from the same graph, e.g., VLAN assignment
   to MSTI in MSTP Ethernet.

   Hence we are led to a tentative JSON encoding which includes named
   link objects, named graph objects, an a versioned container for
   holding graphs and any other general information such as the
   previously mentioned restrictions.

   object {
      NIDName aend;  // Node ids are similar to PIDs but
      NIDName zend;  // may not have end points
      JSONNumber wt;    //A numerical routing cost
      JSONNumber delay; //A numerical latency cost, optional
      JSONNumber bw; //A numerical bandwidth "cost", optional
      // Other costs private or experimental could be added
      // for example stuff related to reliability or economic cost.
      // Only one cost of each type would be permitted.
      // Note a multi-cost like mechanism could be used.
   } LinkData

   // Collection of links each identified by link id (LID) name.
   object {
      LinkData [lidname]<0..*>; // Link id (LID) would be an identifier
     ...              // similar to a PID or NID and identifies the
                      // link
   } NetworkGraphData;

   Bernstein & Lee         Expires January 16, 2013   [Page 23]

Internet-Draft   Cross Stratum Optimization Use-cases         July 2012

   // Finally Multiple graph encapsulation and versioning

   object {
      VersionTag     map-vtag;
      NetworkGraphData [graphname]<1..*>; //named graphs
      ... // other information such as graph choice restrictions
          // or routing restrictions.

   } InfoResourceNetwork;

   Where a graph name is formatted like a PIDName, but names a graph.

7. Constraint Based Filtering

   Young's stuff here.

8. Conclusion

   In this draft we have discussed two generic use cases that motivate
   the usefulness of general interfaces for cross stratum optimization
   in the network core. In our first use case network resource usage
   became significant due to the aggregation of many individually
   unique client demands. While in the second use case where data
   centers were communicating with each other bandwidth usage was
   already significant enough to warrant the use of private line/LAN
   type of network services.

   Both use cases result in optimization problems that trade off
   computational versus network costs and constraints. Both featured
   scenarios where advanced reservation, on demand, and recovery type
   service interfaces could prove beneficial. In the later section of
   this document we showed how ALTO concepts [1] and the ALTO protocol
   could be used and extended to support joint application network
   optimization for large network bandwidth consuming applications.

9. Security Considerations


   Bernstein & Lee         Expires January 16, 2013   [Page 24]

Internet-Draft   Cross Stratum Optimization Use-cases         July 2012

10. IANA Considerations

   This informational document does not make any requests for IANA

11. References

11.1. Informative References

[1] "draft-ietf-alto-reqs-09." [Online]. Available:
   http://datatracker.ietf.org/doc/draft-ietf-alto-reqs/. [Accessed:
[2]  J. Medved, N. Bitar, S. Previdi, B. Niven-Jenkins, and G. Watson,
   "Use Cases for ALTO within CDNs." [Online]. Available:
   [Accessed: 06-Mar-2012].
[3]  E. Mannie, Ed., "Generalized Multi-Protocol Label Switching (GMPLS)
   Architecture, RFC 3945." Oct-2004.
[4]  Y. Lee, G. Bernstein, and W. Imajuku, Eds., "Framework for GMPLS
   and PCE Control of Wavelength Switched Optical Networks (WSON), RFC
   6163." Apr-2011.
[5]  A. Farrel, J. P. Vasseur, and J. Ash, "A Path Computation Element
   (PCE)-Based Architecture, RFC 4655." Aug-2006.
[6]  G. Swallow, J. Drake, H. Ishimatsu, Y. Rekhter,, "Generalized
   Multiprotocol Label Switching (GMPLS) User-Network Interface (UNI):
   Resource ReserVation Protocol-Traffic Engineering(RSVP-TE) Support
   for the Overlay Model, RFC 4208," Oct-2005.
[7]  Y. R. Yang, R. Alimi, and R. Penno, "ALTO Protocol." [Online].
   Available: http://tools.ietf.org/html/draft-ietf-alto-protocol-10.
   [Accessed: 05-Mar-2012].
[8]  M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. Katz, A.
   Konwinski, G. Lee, D. Patterson, A. Rabkin, I. Stoica, and M.
   Zaharia, "A view of cloud computing," Commun. ACM, vol. 53, pp. 50-
   58, Apr. 2010.
[9]  K. A. Hua and S. Sheu, "Skyscraper broadcasting: a new broadcasting
   scheme for metropolitan video-on-demand systems," in Proceedings of
   the ACM SIGCOMM  '97 conference on Applications, technologies,
   architectures, and protocols for computer communication, Cannes,
   France, 1997, pp. 89-100.
[10] "Adobe Flash Media Server 4.0 * Building peer-assisted networking
   applications." [Online]. Available:
   3884520b86f312a354ba36d-8000.html. [Accessed: 13-May-2011].

   Bernstein & Lee         Expires January 16, 2013   [Page 25]

Internet-Draft   Cross Stratum Optimization Use-cases         July 2012

[11]  Rudra Dutta and George N. Rouskas, "Traffic grooming in WDM
   networks: Past and future," IEEE Network, vol. 16, no. 6, pp. 46 -
   56, 2002.
[12]  Keyao Zhu and B. Mukherjee, "Traffic grooming in an optical WDM
   mesh network," Selected Areas in Communications, IEEE Journal on,
   vol. 20, no. 1, pp. 122-133, 2002.
[13]  G. Bernstein, B. Rajagopalan, and D. Saha, Optical Network
   Control: Architecture, Protocols, and Standards. Addison-Wesley
   Professional, 2003.
[14]  B. Awerbuch and Y. Shavitt, "Topology aggregation for directed
   graphs," Networking, IEEE/ACM Transactions on, vol. 9, no. 1, pp.
   82-90, 2001.
[15]  S. Uludag, K.-S. Lui, K. Nahrstedt, and G. Brewster, "Analysis of
   Topology Aggregation techniques for QoS routing," ACM Comput. Surv.,
   vol. 39, Sep. 2007.
[16]  K. Nichols, D. L. Black, S. Blake, and F. Baker, "Definition of
   the Differentiated Services Field (DS Field) in the IPv4 and IPv6
   Headers." RFC2747. Available: http://tools.ietf.org/html/rfc2474.
[17]  D. O. Awduche and J. Agogbua, "Requirements for Traffic
   Engineering Over MPLS." RFC2702. Available:

   Bernstein & Lee         Expires January 16, 2013   [Page 26]

Internet-Draft   Cross Stratum Optimization Use-cases         July 2012

Author's Addresses

   Greg M. Bernstein
   Grotto Networking
   Fremont California, USA
   Phone: (510) 573-2237
   Email: gregb@grotto-networking.com

   Young Lee
   Huawei Technologies
   1700 Alma Drive, Suite 500
   Plano, TX 75075
   Phone: (972) 509-5599
   Email: ylee@huawei.com

Intellectual Property Statement

   The IETF Trust takes no position regarding the validity or scope of
   any Intellectual Property Rights or other rights that might be
   claimed to pertain to the implementation or use of the technology
   described in any IETF Document or the extent to which any license
   under such rights might or might not be available; nor does it
   represent that it has made any independent effort to identify any
   such rights.

   Copies of Intellectual Property disclosures made to the IETF
   Secretariat and any assurances of licenses to be made available, or
   the result of an attempt made to obtain a general license or
   permission for the use of such proprietary rights by implementers or
   users of this specification can be obtained from the IETF on-line
   IPR repository at http://www.ietf.org/ipr

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights that may cover technology that may be required to implement
   any standard or specification contained in an IETF Document. Please
   address the information to the IETF at ietf-ipr@ietf.org.

Disclaimer of Validity

   All IETF Documents and the information contained therein are
   provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION

   Bernstein & Lee         Expires January 16, 2013   [Page 27]

Internet-Draft   Cross Stratum Optimization Use-cases         July 2012



   Funding for the RFC Editor function is currently provided by the
   Internet Society.

   Bernstein & Lee         Expires January 16, 2013   [Page 28]