Internet Research Task Force                         P. Ashwood-Smith
Internet Draft                                                 Huawei
Intended status: Informational                             M. Soliman
Expires: February 03, 2014                        Carleton University
                                                               T. Wan
                                                        July 03, 2013

                           SDN State Reduction



   This document makes the argument that to support the centralized
   control of a substantial number of forwarding devices (as Software
   Defined Networking (SDN) proposes) that the scale, speed, cost and
   general quality of such a solution will be improved by reducing
   the state needed to be distributed into the network of devices by
   the controller(s). To this end we re-visit forms of Source Routing
   (SR), in particular Strict Link Source Routing (SLSR) and suggest
   that light weight SLSR could allow substantial reduction in
   controller burden while potentially reducing the costs/complexity
   on forwarding devices. We discuss some simulation results that
   demonstrate these advantages and how the advantages grow
   substantially as the network diameter grows. We also look at
   various implementation possibilities including existing IPV4, V6,
   MPLS, new/modified MPLS vs. something brand new that could
   possibly be implemented with new SDN technology like Protocol
   Oblivious Forwarding-POF.

Status of this Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-

   Internet-Drafts are draft documents valid for a maximum of six
   months and may be updated, replaced, or obsoleted by other
   documents at any time.  It is inappropriate to use Internet-Drafts
   as reference material or to cite them other than as "work in

   The list of current Internet-Drafts can be accessed at

Ashwood-Smith, et al.    Expires February 03 2014           [Page 1]

Internet-Draft            SDN state reduction               July 2013

   The list of Internet-Draft Shadow Directories can be accessed at

   This Internet-Draft will expire on December 3, 2012.

Copyright Notice

   Copyright (c) 2013 IETF Trust and the persons identified as the
   document authors. All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   ( in effect on the date of
   publication of this document. Please review these documents
   carefully, as they describe your rights and restrictions with
   respect to this document.

Table of Contents

   1. Terminology                                                   3
   2. Introduction                                                  3
   3. Logical Example                                               5
   4. Expressing a Path                                             6
   5. Computing a Path                                              7
   6. Downloading Forwarding State                                  8
   7. Logically Forwarding SLSR                                     10
      7.1. Ingress Logical Unicast Forwarding                       10
      7.2. Tandem Logical Unicast Forwarding                        11
      7.3. Egress Logical Unicast Forwarding                        12
   8. Logical Multicast Forwarding SLSR Packets                     13
   9. Failure Recovery                                              14
   10. Comparison of Logical Model to Existing Source Routing       15
      10.1. MPLS as a SLSR                                          15
      10.2. IPV4/6 Options as SLSR                                  18
      10.3. Protocol Oblivious Forwarding as SLSR mechanism         19
   11. Security Considerations                                      20
   12. Conclusions and Future work                                  21
   13. IANA Considerations                                          21
   14. References                                                   21
      14.1. Informative References                                  21
   15. Authors' Addresses                                           23
   16. Contributors                                                 23
   17. Acknowledgements                                             23

Ashwood-Smith, et al.   Expires February 18, 2012           [Page 2]

Internet-Draft            SDN state reduction               July 2013

1. Terminology

   ATM         Asynchronous Transfer Mode (a cell based network)
   BGP         Boarder Gateway Protocol
   CSPF        Constrained Shortest Path First
   DOS         Denial of Service (attack)
   ECMP        Equal Cost Multi Path
   flow        Logically related packets following the same path
   IS-IS       Intermediate System to Intermediate System
   LACP        Link Aggregation Control Protocol
   LAG         Link Aggregation
   Loose       A source route that enumerates only some of all hops
   MPLS        Multi Protocol Label Switching
   MPLS-TE     MPLS Traffic Engineering.
   NPU         Network Processor Unit (programmable forwarding)
   OpenFlow    Open data path programming protocol
   OSPF        Open Shortest Path First
   PCE         Path Computation Element (used with MPLS-TE)
   PNNI        Private Network to Network Interface (link state ATM)
   POF         Protocol Oblivious Forwarding - more generic OpenFlow)
   RSVP-TE     Resource Reservation Protocol - Traffic Engineering
   SDN         Software Defined Networking (as per [OPENFLOW])
   SDN-domain  Set of forwarding devices controlled as a unit.
   SR          Source Routing - enumerating hops to traverse
   SLSR        Strict Link Source Routing - enumerating links [SLSR]
   SPF         Shortest Path First - (Dijkstra etc.)
   SSRR        Strict Source Record Route - IPV4 header option 9
   Strict      Source route that enumerates every hop(unlike Loose)
   TE          Traffic Engineering
   VFI         Virtual Forwarding Instance (layer 2)
   VPLS        Virtual Private Lan Service
   VRF         Virtual Routing and Forwarding (layer 3)

2. Introduction

   The centralized control of a network is not a new idea. Indeed
   centralized control was widely deployed in voice networks and some
   early data networks but of course gave way to distributed control
   for IP.

   Centralized computation is however still widely used for traffic
   engineered networks, like MPLS-TE and GMPLS where a Path
   Computation Engine (PCE) makes use of a global view of a sub-
   network and its resource usage for the planning of new paths and
   their resources. The data path state distribution with these
   models is however not initiated centrally and relies on protocols
   like RSVP-TE to install the hop by hop state. In fact this form

Ashwood-Smith, et al.   Expires February 18, 2012           [Page 3]

Internet-Draft            SDN state reduction               July 2013

   of distributed control with centralized traffic engineering
   computations is the norm today.

   Notwithstanding the massive deployment of this kind of hybrid
   distributed/central control, we have in the last several years
   seen a huge resurgence of interest in fully centralized control of
   at least a set of forwarding devices [ONF] [OPENFLOW] with
   Software Defined Networking (SDN). This SDN proposes a central
   controller (or controllers) using IP protocols such as TCP to talk
   to a set of arbitrarily interconnected (and cheap/dumb) forwarding
   devices (SDN-domain) and which is responsible for the
   configuration of the majority of forwarding state on those
   devices. This state may be produced either as a result of pro-
   active configuration, or based on re-active responses to packet
   flow indications from the forwarding devices themselves.

   Since this central controller has knowledge of the entire sub-
   network of devices, and potentially of the traffic demands
   into/out of the sub-network, it can perform a variety of path
   optimization computations similar to CSPF/MPLS-TE/PCE/GMPLS, or
   even more elaborate forms of optimization (trading flows against
   each other rather than individually optimizing them, exploiting
   quiet areas of the network to offload busy areas etc), the output
   of which is forwarding state for all meta flows in the entire sub
   network of devices and a sub network which more optimally meets
   the desired local constraints. One such deployment reports a
   substantial increase in network utilization from 30% to 70%-90%

   A central controller can also more effectively solve problems such
   as bin-packing and path blocking [SDNGOOG], which occur when flows
   are optimized individually with greedy type algorithms rather than
   considering other orderings of the flows. The finer grained
   ability to place traffic can also permit much more detailed
   placement of traffic after a failure, including traffic not
   directly affected by the failure but the replacement of which is
   critical to achieving fair/efficient use of the remaining
   bandwidth subsequent to the failure.

   Since the output of the controller is much closer to a TE (Traffic
   Engineered) type solution from a PCE (Path Control Element) than
   an SPF (Shortest Path First) solution the controller cannot simply
   install destination based forwarding entries. A controller either
   needs to install tunnels that follow the explicit routes it wishes
   and then map traffic to those tunnels at the edges, or it must
   install n-tuple < <source IP> <destination IP> <source port>
   <destination port> etc.> state and configure these n-tuple matches
   on every hop along the desired path. Packets which fail to match
   an n-tuple are either discarded or sent to the controller.

Ashwood-Smith, et al.   Expires February 18, 2012           [Page 4]

Internet-Draft            SDN state reduction               July 2013

   In the normal case of SDN (as given in [OPENFLOW]) the controller
   is required to send configuration information to all devices along
   the path from ingress of the SDN-domain of this controller to the
   egress of that SDN-domain, alternatively a tunnel setup protocol
   like RSVP-TE is required to be triggered to distribute the per hop
   state between the ingress and egress.

   This draft proposes that since the controller knows the exact end-
   to-end path (down to the level of the links it wishes the packets
   to traverse) and that the diameter of an SDN-domain is likely to
   be a reasonable number of hops, that the controller should instead
   simply insert into a header the exact links it wishes the packet
   to traverse and thereby not have to deal either with per hop n-
   tuple state installation (very expensive) or with MPLS tunnel
   installation via RSVP-TE(complex). Such a mechanism also
   eliminates any concerns about Equal Cost Multi Path (ECMP) and/or
   Link Aggregation (LAG) as the controller can place traffic on
   exact links.

   Operations, Administration and Management (OAM) is also greatly
   simplified since data packets will flow on invariant paths that
   are known by both ends of the flow and can be the same as any OAM
   packets that probe the flow. This OAM "fate sharing" property is
   widely valued by network operators and considerable effort has
   already been expended to permit similar fate sharing between OAM
   and data paths with other carrier scale networking protocols such
   as 802.1ag and MPLS-TP. Of course if a controller does not wish to
   enforce symmetry and congruence it need not.

3. Logical Example

   The following is an example of an idealized strict link based
   source routing (SLSR) forwarding. We talk about possible
   implementations including MPLS methods after looking at the
   logical ideal.

   Consider the simple 7 node network shown in Figure 1 below. Here
   the nodes are named {A, B, C, D, E, F, G} and where each node has
   locally numbered interfaces named {1, 2, 3, 4, 5, 6}.

   For example node A has interfaces named 1, 2, 3, 4 and where
   interfaces 4 and 2 both go to node B. Node B has local interfaces
   1, 2, 3, 4 and 5 but the two interfaces going back to node A are
   locally named 1 and 3. Clearly node interface names are likely
   (but not necessarily) different at both ends of a link.

Ashwood-Smith, et al.   Expires February 18, 2012           [Page 5]

Internet-Draft            SDN state reduction               July 2013

                     +----+           +----+
                     |    | 4 ----- 3 |    |
                     | A  | 2 ----- 1 | B  |
                     +----+           +----+
                    1      3         4      2 5
                   /        \       /        \ \
                  3          4     3          2 3
                 +----+        +----+          +----+
                 | C  | 2----1 | D  | 2------1 | E  |
                 +----+        +----+          +----+
                  1          6     5          4
                   \        /       \        /
                    3      2         1      2
                     +----+           +----+
                     | F  | 1-------3 | G  |
                     +----+           +----+

    Figure 1 - simple 7 node network with local link identifiers.

4. Expressing a Path

   A path through a network labeled as per Figure 1 can clearly be
   expressed as a sequence of link names (an SLSR).

   For example, between nodes C and E the following are all valid

     C.3 -> A.2 -> B.2
     C.2 -> D.2
     C.1 -> F.2 -> D.2
     C.3 -> A.4 -> B.5

   Now since the links lead unambiguously to a known node, the paths
   can be more compactly expressed without the node names as follows:


   As long as we know the origin of the path (in this case node C),
   the list of link names unambiguously identifies a path and an
   egress point. In addition it identifies unambiguously which link
   from among parallel links between neighbors should be traversed.
   Of course it is possible to give a name to the set of links that
   all attach to the same neighbor and thereby leave the exact link
   in that path deliberately ambiguous and thereby subject to a local
   forwarding decision as to exactly which link in the set to follow.

Ashwood-Smith, et al.   Expires February 18, 2012           [Page 6]

Internet-Draft            SDN state reduction               July 2013

   Each path of course also has exactly one perfectly symmetric
   reverse. Note that the symmetric reverse path is not simply the
   same list of link names in reverse order. A reverse path has to be
   specified from the opposite end of the path so in this example the
   origin has to be E.

   The forward and corresponding reverse paths are therefore.

          C->E        E->C

        {3,2,2}      {2,1,1}
        {2,2}        {1,1}
        {1,2,2}      {1,6,3}
        {3,4,5}      {3,3,1}

   Various very efficient encodings of these kinds of paths in source
   routed headers are possible. Even a simple encoding using 8 bits
   per hop can encode every path in a large 8 hop network with fewer
   bits than an IP in IP tunnel.

5. Computing a Path

   It should be obvious that the output of any graph based
   computation which has as its goal various optimization criteria
   for flows can express its results as a series of such paths where
   each path is expressed as a Strict Link based Source Route (SLSR).
   This includes multiple different metric Dijkstra computations
   (i.e. shortest path, multi topology shortest path), CSPF type and
   of course more elaborate linear-programming or other convex type

   The expression of the path as an SLSR imposes no constraints on
   the type of computation being performed except possibly in path
   length. However in any real network under the control of a single
   controller it is not likely that path length would be a real issue
   unless perhaps unreasonably large link names are encoded.

   Convex and linear-programming type solutions to traffic placement
   are of particular interest because to do a good job they must
   exploit a considerable number of paths through a network (many
   more than shortest). These algorithms take the matrix of
   ingress/egress flows in a network together with all the usable
   paths between all sources and destinations and will assign
   percentages of the ingress/egress flows to the available paths in
   ratios that can optimize a number of simultaneous constraints. For
   example they can optimize the network's total throughput, the
   average link utilization, the fairness of the bandwidth available
   to each flow and can even optimize different linear and non linear
   combinations of those goals. What is interesting about all of

Ashwood-Smith, et al.   Expires February 18, 2012           [Page 7]

Internet-Draft            SDN state reduction               July 2013

   these kinds of optimizations however is that they need access to
   all of the reasonable paths across the network since it is by
   making trade-offs between busy and less busy parts of the network
   that they achieve their goals. Unfortunately the number of paths
   (shortest or otherwise) in a network grows exponentially with
   network size and with it the state distribution problem (or
   burden) the controller must deal with.

   It is important to make the distinction between a flow and a path.
   This draft concerns itself not with the immense numbers of micro
   flows but with the very large numbers of paths required to be
   supported onto which those micro flows are then aggregated. A set
   of micro flows can be treated as a single flow, and a single flow
   has a unique path through the network.

6. Downloading Forwarding State

   A controller likely takes as input the fields that identify the
   flow and its various statistical attributes. The controller then
   likely computes an end to end path for this flow either based on
   the single flow's attributes (in a re-active manner), or on more
   global knowledge of multiple flow attributes (in a pro-active
   manner). Flows may be meta (many micro flows) or individual micro
   flows depending on the implementation and its scale. The output of
   course is just a list of links that must be traversed for this
   flow together with matching rules to identify the flow at the

   The controller then delivers the flow matching rules and the
   Strict Link Based Source Route to the *single* node where the flow
   is to be encapsulated (i.e. where the flow first enters the SDN-

   The fact of only having to communicate with the *single* node at
   the head end of the path means that the controller experiences a
   reduction in its work load directly proportional to the number of
   hops in the path (as compared to traditional SDN which must
   program every hop along the path).

   Intuitively this translates to the following I/O burden reduction
   at the controller based on the number of links that must be
   traversed per average path.

Ashwood-Smith, et al.   Expires February 18, 2012           [Page 8]

Internet-Draft            SDN state reduction               July 2013

          | #Avg Path Len  |  % I/O Burden Reduction |
          | 1              |  0%                     |
          | 2              |  50%                    |
          | 3              |  66%                    |
          | 4              |  75%                    |
          | 5              |  80%                    |
          | ..             |  ..                     |
          | N              |  (100-100/n) %          |

   Since forwarding state download is typically a substantial part of
   a "normal" routers' re-convergence time, it seems reasonable that
   this will become a similar bottleneck for a central controller and
   quite possibly be further aggravated by the increased delays and
   larger amount of state that the central device must deal with.

   As a result this reduction in state and I/O burden should have a
   marked impact on convergence times assuming there are appropriate
   forwarding mechanisms that can implement the Strict Link Source
   Route (SLSR). Note also that the position of the controller
   relative to the ingress/egress nodes is now more important than
   its position relative to all nodes. Therefore studies as to
   controller optimum placement as defined by the Controller
   Placement Problem [PLACEMENT] would require different optimization

   An additional 50% reduction can also be obtained should the
   implementation of the forwarding be able to reverse the path on
   the fly. Such a reversal permits the implicit communication of the
   desired reverse path to the receiver thereby eliminating
   communication with the controller to obtain a reverse path. Of
   course if symmetry is not desired this further optimization is not

   For example, consider a network with 1000 nodes. It therefore has
   O(1,000,000) meta flows and assuming 10 possible paths for each
   flow has O(10,000,000) ingress forwarding entries that must be
   centrally configured (its burden). If each path on average takes 5
   hops then the burden on the controller grows 5 fold to
   O(50,000,000) entries but with SLSR the burden remains at
   O(10,000,000). If path reversal is supported and symmetric routing
   is desired then the burden with SLSR drops further to

Ashwood-Smith, et al.   Expires February 18, 2012           [Page 9]

Internet-Draft            SDN state reduction               July 2013

   Simulations done by one of us in [SRSDN] provide additional weight
   to the above arguments. In particular we simulated for various
   network sizes and diameters the differences between hop by hop SDN
   and SLSR and saw up to 3 x performance improvements in convergence
   times with SLSR. There were also a number of other benefits such
   as a markedly reduced standard deviation in convergence times for
   the different nodes (81% decreases) and a significantly reduced
   sensitivity to the placement of the controller (80% reduction in
   standard deviation). The performance improvements can perhaps
   better be understood by an analogy comparing the work required to
   fill in the area of an object (traditional SDN) vs. simply drawing
   the circumference of that object (SLSR). Since the circumference
   varies as a function of the diameter but the area varies as a
   function of the diameter squared the relative burden reduction
   with just dealing with the circumference (the edge of the network)
   becomes apparent. In fact in this simulation study we varied the
   radius and then plotted the relative convergence times of SLSR and
   traditional hop by hop forwarded SDN and saw a ratio of
   convergence times as a function of radius that indicated a trend
   towards 1/R as expected. Simply stated, the bigger the centrally
   controlled network the better source routing performs compared to
   hop by hop.

7. Logically Forwarding SLSR

   There are three distinct phases to be performed to logically
   forward unicast SLSR. These are similar to any tunnel technology
   and consist of 1) Ingress Encapsulation, 2) Tandem Forwarding, and
   3) Egress Decapsulation and Forwarding. We address the generic
   concepts first before looking at possible existing or new
   encapsulations and their applicability.

   Multicast SLSR is also possible (but with limitations to keep the
   header sizes from growing too large) and is briefly discussed
   after unicast.

7.1. Ingress Logical Unicast Forwarding

   Here the flow information, likely IP header(s) + UDP/TCP header(s)
   is looked up and a sequence of link identifiers and a current hop
   must be placed on the packet, the packet must then be forwarded to
   the first of those links. This operation is identical to almost
   every tunnel protocol except that IP ECMP and/or LAG hash would
   potentially be unnecessary because the first link name would often
   resolve to a physical link not a LAG bundle. For example:

Ashwood-Smith, et al.   Expires February 18, 2012           [Page 10]

Internet-Draft            SDN state reduction               July 2013

      |SrcIP     |  DstIP   | SrcPrt  | DstPrt |   SLSR |
      | | |  1000   |  98    |{3,2,5} |
      | | |  1001   |  99    |{3,4,5} |

   Of course nothing precludes the use of LAG and the link identifier
   therefore identifying an entire LAG bundle rather than an element
   of that LAG. In fact it is possible to simultaneously support both
   concepts so that some traffic can be forwarded to the entire LAG
   while other traffic could be placed on a particular LAG bundle
   member at the discretion of the central computation.

 7.2. Tandem Logical Unicast Forwarding

   At tandem devices the operation would start by incrementing the
   current hop in the packet header (shown with a ^ symbol) and then
   forward to the link identified in the new current hop. If we
   support reversal, we change the previous link name to the local
   link name for that link. For example, referring to Figure 1 (and
   disregarding non relevant headers/options) after matching the
   first flow tuple at the ingress node C the packet is encapsulated
   with the SLSR header {3,2,5} and then leaves node C on interface 3
   toward A. Then:

     Packet arrives at node A on local interface 1 where it looks
     like this:

        | 3 | 2 | 5 | | |..   <payload>   |

     Current hop is incremented while previous hop is changed to
     local interface name (3 changes to a 1).

        | 1 | 2 | 5 | | |..   <payload>   |

     Packet is forwarded to interface for current hop i.e. 2.

Ashwood-Smith, et al.   Expires February 18, 2012           [Page 11]

Internet-Draft            SDN state reduction               July 2013

     Packet arrives at node B on local interface 1.

        | 1 | 2 | 5 | | |..   <payload>   |

     Current hop is incremented while previous hop is changed to
     local interface name 1 (2 changes to a 1).

        | 1 | 1 | 5 | | |..   <payload>   |

     Packet is forwarded to interface for current hop i.e. 5.

     Packet arrives at node E on local interface 3.

        | 1 | 1 | 5 | | |..   <payload>   |

     Current hop is incremented while previous hop is changed to
     local interface name (5 changes to 3).

        | 1 | 1 | 3 | | |..   <payload>   |

     We are at the end of the path, so egress processing begins.

   One additional step not described above is a reverse path check.
   Prior to substituting the reverse link identifier into the SLSR
   header, the link identifier from the neighbor can be validated and
   the packet discarded if the neighbor link identifier in the packet
   is incorrect for the port the packet arrived on. This would reduce
   the chances of mis-delivery of the packet should a link identifier
   change or a link destination change while a packet is in flight.

7.3. Egress Logical Unicast Forwarding

   Here the operation consists of normal IP/Ethernet etc. forwarding
   based on the IP destination / MAC or other ACL rules. Basically
   the SLSR header is stripped and the packet is submitted to the
   Virtual Forwarding Instance, or Virtual Forwarding Function (VFI
   or VRF) for further processing.

   Optionally the link identifier from the neighbor can be validated
   against what is expected and the packet discarded in the case of a

Ashwood-Smith, et al.   Expires February 18, 2012           [Page 12]

Internet-Draft            SDN state reduction               July 2013

   mismatch. This reduces the chance of mis-delivery as in the tandem

   In addition, if path reversal is supported, the reverse path is
   compared against the current reverse path for this reverse flow
   and if it has changed the local forwarding state for the reverse
   flow would be updated. This would allow the head end to always
   dictate the forward and reverse path to be used for all packets in
   the flow without involving the controller on the egress side (and
   of course not needing to communicate with any tandem device).

   Processing the reverse flow/path in this manner means that a flow
   is already present for the reverse direction without having to re-
   actively or pro-actively consult the controller. This results in a
   further 50% reduction in controller load. In the case of asymmetry
   this optimization is of course not possible.

8. Logical Multicast Forwarding SLSR Packets

   Multicasting packets usually involve one of two approaches.

   The first approach simply re-uses unicast and sends multiple
   copies to a pre-determined list of receivers. There is little to
   discuss with this approach as we can replicate SLSR based unicast
   packets just as easily as any other tunneling mechanism. Clearly
   such a serial unicast approach has nearly identical bandwidth
   overhead as other protocols like VPLS which also use this serial
   unicast mechanism.

   It is therefore interesting to look at more efficient methods that
   involve the second multicast mechanism, which uses replication
   points in the network. These replication points are chosen so that
   copies are more efficiently made thereby eliminating multiple
   copies of the packet traversing any given link. Various logical
   tree structures are usually involved e.g. STP, SPB, TRILL, PIM,
   MOSPF etc.

   These tree based mechanisms could in theory be implemented without
   requiring tandem state as an SLSR by introducing a branch point
   concept into the list of indexes. In this manner a complete tree
   as a pre-order traversal could be encoded along with the packet
   payload. It is not difficult to define a variety of different
   encodings that would accomplish this. The obvious objection to
   such a scheme is the sheer size of header required especially
   where a large network with many multicast receivers is concerned.
   It is therefore unlikely to be practical to encode any large tree
   of receivers and the SLSRs between them in any single header.

Ashwood-Smith, et al.   Expires February 18, 2012           [Page 13]

Internet-Draft            SDN state reduction               July 2013

   This leads to a hybrid approach which would encode a subset of the
   tree, say a single replication point and 5 or so recipients. This
   little tree or 'tree-let', would efficiently get a single packet
   to 5 (or some suitably small number) of recipients with an SLSR to
   the replication point and then SLSRs to each of the receivers.
   Such an encoding is much more reasonable than trying to encode all
   receivers and all replication points of a single tree in one
   packet. However, since this one packet would not reach all
   receivers, the head end would have to generate as many copies of
   the data packets as necessary to cover all recipients. As a result
   this approach would be a compromise between a full tree, and full
   head end replication. Variations in the size of the 'tree-let'
   header would allow for space v.s. bandwidth efficiency trade-offs
   while meeting the goal of remaining stateless in the core.

   In the literature there are also non-exact methods to multicast
   without state such as with Bloom filters [BLOOM]. In this approach
   the links to be traversed are logically mapped into a field which
   is carried in the packet (for example if the links are given
   unique 128 bit sparse addresses then a 128 bit union of all the
   links to be traversed on the tree is encoded in the header). These
   mechanisms guarantee that all receivers will get a copy of the
   packet (because they check each link for inclusion in the Bloom
   Filter at each hop) however they do so at the expense of sending
   false positive copies to unintended receivers which must then
   filter the unwanted packets egress. Depending on the size of the
   Bloom Filter and the link identifiers various statistical trade-
   offs in false positives vs. packet header size can be made.

   Other exact methods to encode and methods to compute SLSR
   multicast etc. are FFS.

9. Failure Recovery

   A variety of failure recovery techniques can be employed with
   SLSR. The most obvious is to just re-compute all affected paths on
   indication of a link failure. This won't be discussed further.

   More interesting are the so called fast restoration mechanisms.
   These can broadly be broken down into head end and tandem

   Head end mechanisms that provide 1+1 protection have been around
   for a long time with MPLS-TP, PBT and SONET/DWDM. Similar
   mechanisms can be used with any tunnel type and of course SLSR is
   no exception. Probes can be sent down one source route, reflected
   back along the reverse source route and in this manner the forward
   and reverse paths can be simultaneously probed for failure. In the
   event of a failure a diverse alternate source route can rapidly be

Ashwood-Smith, et al.   Expires February 18, 2012           [Page 14]

Internet-Draft            SDN state reduction               July 2013

   added to the packet and the flow restored. The advantage of course
   with SLSR is that no state is required for either the primary or
   the backup path. As a result there is little added cost to having
   even greater redundancy than 1+1 with SLSR. The mechanisms to
   accomplish this are fairly obvious. Having the reverse path
   available at the egress means that fate sharing the forward and
   reverse probes is easy.

   In addition to 1+1 protection it is possible to do hop by hop fast
   reroute type detour protection. This can be done by substitution
   of a failed link identifier with a set of link identifiers that
   merge with the path downstream of the failure. An example is given
   further below for MPLS label stacks, however many other
   possibilities exist when a history of the packet's path is
   available to the detour mechanism. The history would permit the
   detour mechanism to spread the failing packets over different
   detours and thereby reduce the concentration of additional load
   imposed by the failure on the same set of links.

10. Comparison of Logical Model to Existing Source Routing

   There are a number of existing protocols that support forms of
   source routing (or can be used to do something close to source
   routing). IPV4 and V6 had strict and loose node-by-node source
   routing options (now deprecated) and we'll discuss them briefly.
   Likewise MPLS behavior can be used to do strict link source
   routing where a label stack represents a list of link names, this
   has recently been called segment routing in [SEGMENT].

 10.1. MPLS as a SLSR

   MPLS is of course not a source routed forwarding protocol, at
   least not by design. Rather, packets follow an arbitrary path by
   substitution of a previous hop label with a next hop label and
   each hop must be pre-configured with the <incoming port, label> to
   <outgoing port, label> relationship. This is clearly not source
   routing because tandem configuration is required per path and per
   hop. However MPLS has a stacking mechanism that can be exploited
   to create a consumable list of link names to be traversed as they
   are popped.

   The MPLS label stack can therefore be used to implement a flavor
   of SLSR. This is accomplished by pre-assigning a locally unique
   MPLS label to each outgoing link of a node. For example in figure
   1, node D's link 3 would be assigned MPLS label 3 (but more likely
   a label value which is 1:1 related to link 3, however we stick
   with label=link for simplicity of explanation).

Ashwood-Smith, et al.   Expires February 18, 2012           [Page 15]

Internet-Draft            SDN state reduction               July 2013

   The tunnel encapsulation operation would therefore be to push a
   set of labels onto the frame where each label indicates which link
   to follow at that given exact strict hop. For example:

   |SrcIP     |  DstIP   | SrcPrt | DstPrt | MPLS SLSR             |
   | | | 1000   | 98     |push(3,push(2,push(5)))|
   | | | 1001   | 99     |push(3,push(4,push(5)))|

   The tunnel tandem operation would then be to pop the label on the
   incoming frame (after optionally validating its reverse link
   identifier) and forward to the interface specified by the just
   popped label value. Every tandem node would be pre-configured
   approximately as per below. Note that as with any source routing
   mechanism, this tandem pre-configuration is independent of the
   actual paths that traverse the node. A table like the one below,
   with a few hundred interfaces and hence a few hundred labels,
   could support the transit of an infinite number of TE (or SPF)
   paths. For clarity we use label N = interface/N but in reality it
   would be label N = F(interface/N) since a 1:1 mapping 'F' is
   almost certainly required.

   |Incoming Interface | Label |    Actions                     |
   | any               |   1   |  Pop, forward to interface/1   |
   | any               |   2   |  Pop, forward to interface/2   |
   |  :                |   :   |  :                             |
   | any               |   N   |  Pop, forward to interface/N   |

   If reverse validation is required the tables would be a bit
   different because they must match the label to the incoming
   interface and then pop it and then forward based on the next
   label. Reverse validation therefore requires two label lookups per
   forwarding operation.

   Finally the tunnel egress operation would be normal forwarding to
   a VFI or VRF.

   MPLS in this manner could be made to do SLSR of unicast frames but
   cannot be made to reverse the route because the route is consumed
   in transit. This method also uses many more bits than are really
   necessary. Each label consumes 32 bits which is rather more than
   required to express the number of links/adjacencies on a typical
   switch or router. For example, if the average packet size if 512
   bytes, a 5 hop MPLS source route imposes a 4% overhead (20/512) on

Ashwood-Smith, et al.   Expires February 18, 2012           [Page 16]

Internet-Draft            SDN state reduction               July 2013

   some links with the largest overhead on the first few links. For
   larger packets this is likely not an issue, for smaller packets it
   is possibly a concern.

   A more realistic number actually required per hop is probably 8 or
   12 bits (256-4K links) and if more bits are required two hops can
   be consumed by any node with such a large nodal degree. The MPLS
   label also has an 8 bit TTL which is of course redundant in any
   source routing mechanism. This begs the question of if a smaller
   MPLS label would not be more suitable?

   There are other issues with the use of MPLS, in particular current
   hardware can usually not stack very many labels at a time (3 on
   some popular ASICs). This would limit the network diameter to 4
   hops. Of course NPUs or new ASICs could be extended to allow
   further ingress stacking.

   It does not seem possible to do SLSR multicast with MPLS except of
   course via head end replication.

   The hop(ingress stack size) limit, lack of reverse, consumable
   route and lack of efficient multicast still do not invalidate use
   of MPLS source routing for many networks and its use would have a
   noticeable positive impact on the scale/speed of a central
   controller in such environments.

   MPLS fast reroute mechanisms can also be implemented locally in a
   similar fashion thus further improving controller scale by
   alleviating the need for 50ms responses network wide from the
   controller and giving the controller more lee-way to recover after
   the fast reroutes have detoured traffic around the failed nodes
   and/or links.

   Consider possible local actions when the link A.2 between nodes A
   and B in Figure 1 fails. Since there is still a link A.4
   available, the node A can locally change the action associated
   with label 2 to instead send to interface 4 when interface 2
   fails. If an entire adjacency fails, such as would happen when
   both A.2 and A.4 fail, then a link detour can be locally performed
   by reprogramming the actions for labels 2 and 4 to now push labels
   3,3 and send to interface 3. This will cause a detour via D back
   to B. More elaborate kinds of detour are possible by processing
   two link names ahead instead of one, including nodal detours.
   These can be done locally without end to end path knowledge and
   hence scale independently to the number of paths. Eventually the
   controller will detect the failure and reconstruct the SLSRs at
   the head end and the use of the detour will stop without having to
   withdraw any state in the core.

Ashwood-Smith, et al.   Expires February 18, 2012           [Page 17]

Internet-Draft            SDN state reduction               July 2013

   If MPLS is of use in the context of SLSR then it would be worth
   considering a number of future extensions to MPLS. Some things to
   consider could be a smaller MPLS label option, say 16 bits with no
   TTL and the possibility of not popping but rotating the label to
   the bottom of the stack to preserve the path history for OAM and
   reversibility reasons. While these sorts of things are of course
   not possible with existing ASICs they are easy to do on existing
   NPU's and new work on Protocol Oblivious Forwarding [POF] allows
   near arbitrary bit pattern/action matches to be programmed by an
   SDN controller permitting a more optimal data path encoding of
   SLSR than can be obtained by simply reusing MPLS.

10.2. IPV4/6 Options as SLSR

   IP header option 9 [RFC791] defined (but now deprecated) the
   Strict Source and Record Route (SSRR) option for IPV4 packets.
   This option has(had) a 'length' field, a 'pointer' field and an
   array of 'route data' fields. The element in the array of 'route
   data' indexed by the 'pointer' field contains the IP address of
   the immediate next hop towards which the packet must be forwarded,
   the 'pointer' field is incremented, and the previous hop is filled
   in with the IP address of the current device prior to actually
   forwarding the packet. Up to 9 hops could be specified in this
   manner. IPV6 also had a similar option "RH0" which is also now
   deprecated [SRBAD].

   IPV4 and V6 Strict Source and Record Route methods could be used
   to implement Strict Link Source Routing. This would be
   accomplished by assigning a 32 bit number to the link and then
   using the 32 bit number in place of the IPV4 or V6 address in the
   route list.

   In both IPV4 and IPV6 the source routing options were found to be
   harmful to the Internet at large for a number of reasons. These
   reasons are described in [SRBAD] but briefly there were two broad
   classes of problem encountered. 1) Harm to intermediate links and
   2) harm to end hosts. For example:

   - Since it was possible to list a waypoint more than once in the
   route data, it was possible to loop traffic around multiple times
   (9 times in the case of IPV4 and 90 times in the case of IPV6).
   This looping allowed saturation of high speed links by hosts that
   had an order (or two) smaller bandwidth access to the Internet. A
   congestion style DOS was therefore possible from low speed access
   links against higher speed core links.

   - Various schemes such as bypassing of firewalls etc. are of
   course easy to do when a host can specify waypoints that detour
   around a firewall.

Ashwood-Smith, et al.   Expires February 18, 2012           [Page 18]

Internet-Draft            SDN state reduction               July 2013

   - Spoofing using the reverse route. Since the reverse source route
   is installed against the IP SA by a host that receives it, it is
   possible to use a bogus IP SA in combination with a reverse source
   route that detours the packets to the imposter host.

10.3. Protocol Oblivious Forwarding as SLSR mechanism

   The OpenFlow [OPENFLOW] protocol defines methods for an external
   controller to cause the manipulation of known packet headers and
   fields within those headers by a forwarding element. As such it is
   currently limited to matching on known fields like MPLS, IP,
   Ethernet etc. and taking actions on those fields. While flexible
   there are still many things at the data path level that OpenFlow
   cannot do including generic source routing such as SLSR.

   The Protocol Oblivious Forwarding [POF] protocol is a proposed
   extension to OpenFlow which permits arbitrary bit pattern
   matching/actions and is therefore much more flexible. The goal of
   POF is to allow a controller to define a new data path in addition
   to a new control plane and to then program the data path on the
   forwarding elements to its specifications. POF is therefore not
   limited to existing IP, MPLS, Ethernet fields.

   It would therefore be possible with POF to implement a highly
   flexible SDN tunnel data plane that closely resembles the
   idealized SLSR data path. Strictly by way of example POF could
   implement a flexible SLSR header along the following lines:

      | NextHop  |   Hop   |   Hop   |  Hop |  Hop |    Hop  |
      | Index:4  | Count:4 |  Size:4 |   0  |   1  |     N   |

   With only five bytes, this header could represent 3 hops with 256
   links per hop, 4 hops with 64 links per hop, or 6 hops with 16
   links per hop, etc. With additional bytes of course more/longer
   combinations are possible with very reasonable overhead. This is
   considerably more compact than the other described options and
   without sacrificing reversibility or giving up the OAM benefits of
   knowing the exact path the packet has taken.

   POF however could also implement other variations of SLSR based on
   MPLS. For example POF could implement a smaller MPLS label, say a
   16 bit label without a TTL. POF could theoretically also implement
   a rotating label list instead of a popping label stack.

   POF appears to be ideally suited for SLSR developments beyond what
   can currently be done with MPLS.

Ashwood-Smith, et al.   Expires February 18, 2012           [Page 19]

Internet-Draft            SDN state reduction               July 2013

11. Security Considerations

   Source Routing security concerns are also discussed in the
   previous section related to IPV4 and IPV6 now deprecated nodal
   source routing.

   This draft is proposing link based source routing and that it be
   used as a tunneling mechanism only. This means that only devices
   that are at the edge of an SDN sub-network would be allowed to
   insert strict link source routes. Note that an MPLS label can only
   be inserted by a Label Edge Router (LER) and processed by Label
   Switch Routers (LSR) and not by end hosts. Therefore SLSR should
   be no more or less secure than MPLS. In fact the absence of
   signaling protocols like RSVP-TE removes a point of attack. The
   fact that this mechanism is intended for use by a central
   controller further mitigates the possible attacks as encrypted
   communications are used to the edge devices which are the only
   device able to insert the strict link source routes.

   There is however the possibility that an attacker could attach to
   a core device and inject strict link source routed packets.
   Methods to prevent this however are not hard, in particular the
   adjacency would have to be reported to the controller and the
   controller would have to enable packet forwarding. Unless the
   controller recognized both ends of the link as being part of its
   controlled domain it should not enable the strict link source
   routing capability on that interface thus preventing the threat.

   Other interfaces, such as those facing a network of hosts or
   devices not in the domain of the controller would, as with current
   BCP's, drop any source routed frame in any format (new or old).

   As previously mentioned there are ways to spread the link names
   into a 32 bit space such that the exact mappings are only known by
   the controller and the tandem node in question. This would prevent
   any easy form of guessing being used to construct an SLSR. One
   such example of this kind of secure source routing is given in

   Source Routing also is unique in that the packets themselves give
   details about slices/cuts through the topology, therefore with
   sufficient interception of packets from diverse sources and
   destinations in the network, an attacker could build up a detailed
   view of the network topology, this would be a concern for a
   carrier SDN network in particular where details of topology are
   considered a valuable asset, although exploiting knowledge of the
   topology would be more challenging given the secure protocols that
   exist between a controller and the forwarding entities.

Ashwood-Smith, et al.   Expires February 18, 2012           [Page 20]

Internet-Draft            SDN state reduction               July 2013

   In the SDN context there appears to be little need for a loose
   source route. Loose source routing adds additional security
   concerns because it does not require knowledge of the entire path
   to construct an attack. If loose source routing is included the
   security concerns should be addressed.

12. Conclusions and Future work

   SDN where a central controller creates either pro-actively or re-
   actively the state for a sub-network of forwarding devices will
   have performance limitations that are related to network
   diameter/size, network recovery requirements and the amount of
   state they need to distribute. Strict Link Source Routing
   mechanisms can alleviate these problems allowing greater scale and
   faster recovery. MPLS can be used to implement this on a small
   scale with some of the benefits. IPV4 and IPV6 source routing
   options can be used to implement this on a larger scale with more
   of the benefits but at much larger packet overhead but are however
   perceived as risky and have been deprecated from IP. These risks
   however can be mitigated in this specific use. No existing
   mechanism however is optimum, and therefore there is room for a
   new mechanism that addresses these requirements and includes
   multicast methods and more efficient encoding of link names than
   is currently possible. One possible solution is to look at a
   smaller MPLS label for this purpose and to look at ways to retain
   the popped labels for the purposes of end to end path reversal and
   OAM. New work in SDN, in particular Protocol Oblivious Forwarding
   may make these kinds of things possible in a generic manner.

13. IANA Considerations

   This memo includes no request to IANA.

14. References

14.1. Informative References

  [BLOOM]           Active Bloom Filters for Multicast Addressing,
                    Z. Heszberger et. al. Budapest
                    University of Technology and Economics.



  [POF]             Protocol Oblivious Forwarding:

Ashwood-Smith, et al.   Expires February 18, 2012           [Page 21]

Internet-Draft            SDN state reduction               July 2013

  [PLACEMENT]       The Controller Placement Problem, Nick McKeon,
                    Brandon Heller, Rob Sherwood. HotSDN'12, August
                    13, 2012, Helsinki, Finland. 2012 ACM 978-1-

  [RFC791]          Internet Protocol, Information Sciences
                    Institute, RFC 791, September 1981.

  [SANE]            SANE: A Protection Architecture for Enterprise
                    Networks, Martin Casado, Nick McKeown,
                    Stanford and ICSI 2005.

  [SDNGOOG]         SDN at Google - Opportunities for WAN
                    optimization, E. Crabbe, V. Valancius, 8/1/2012.
                    Presentation at IETF84 SDN BOF.

  [SEGMENT]         Segment Routing with IS-IS, S.Previdi et. Al.

  [SLSR]            Software Defined Networking and Centralized
                    Controller State Distribution Reduction,

  [SRBAD]           Deprecation of Source Routing Options in IPV4

  [SRSDN]           Source Routed Forwarding with SDN, M. Soliman

Ashwood-Smith, et al.   Expires February 18, 2012           [Page 22]

Internet-Draft            SDN state reduction               July 2013

15. Authors' Addresses

   Peter Ashwood-Smith
   Huawei Canada Inc.
   303 Terry Fox Drive, Suite 400, Kanata, Ontario K2K 3J1

   Mourad Soliman
   Carleton University,
   1125 Colonel By Drive Ottawa, Ontario K1S 5B6 Canada

   Tao Wan
   Huawei Canada Inc.
   303 Terry Fox Drive, Suite 400, Kanata, Ontario K2K 3J1

16. Contributors

   We invite more contributors.

17. Acknowledgements

   We gratefully appreciate the feedback of Nigel Bragg, Sue Hares,
   Peter Willis, Biswajit Nandy and Linda Dunbar.

Ashwood-Smith, et al.   Expires February 18, 2012           [Page 23]