Internet Working Group                                        Y. Jiang
                                                                 W. Xu
Internet Draft                                                  Huawei
                                                                Z. Cao
Intended status: Standards Track

Expires: January 2016                                     July 6, 2015


               Fault Management in Service Function Chaining
                          draft-jxc-sfc-fm-02.txt


Status of this Memo

   This Internet-Draft is submitted to IETF in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html

   This Internet-Draft will expire on January 6, 2013.

Copyright Notice

   Copyright (c) 2015 IETF Trust and the persons identified as the
   document authors. All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.




Jiang and et al        Expires January 6, 2016                [Page 1]


Internet-Draft            SFC Fault Management               July 2015


Abstract

   This document specifies some Fault Management tools for SFC. It also
   describes the procedures of using these tools to perform connectivity
   verification and SFP tracing functions upon SFC services.



Table of Contents

   1.   Introduction .............................................. 2
      1.1. Conventions used in this document ...................... 3
      1.2. Terminology ............................................ 4
      1.3. SFC OAM Requirements ................................... 4
   2.   Packet Format ............................................. 5
   3.   Theory of Operation ....................................... 8
      3.1. Continuity Check and Connectivity Verification of SFC .. 8
      3.1.1.  MEP sending an SFC CC-CV packet ..................... 8
      3.1.2.  MEP terminating an SFC CC-CV packet ................. 9
      3.2. SFC Route Tracing ...................................... 9
      3.2.1.  MEP sending an SFC Trace Request ................... 11
      3.2.2.  SFF processing an SFC Trace Route Request .......... 11
      3.2.3.  Service Function treating an SFC Trace Request ..... 11
      3.2.4.  MEP receiving an SFC Trace Reply ................... 11
   4.   Security Considerations .................................. 12
   5.   IANA Considerations ...................................... 12
   6.   References ............................................... 12
      6.1. Normative References .................................. 12
      6.2. Informative References ................................ 12
   7.   Acknowledgments .......................................... 13



1. Introduction

   This document discusses Operations, Administration and Maintenance
   (OAM), specifically, fault management of Service Function Chaining
   (SFC), and further provides a solution that can be used in SFC OAM.



   An SFC layer OAM framework was described in [I-D.aldrin-sfc-oam-
   framework], which introduced multiple layers of OAM model including
   the link layer, the network layer, the service layer. The link layer
   and the network layer can leverage existing OAM tool sets available
   to perform OAM function, and thus out of scope of this document.



Jiang and et al        Expires January 6, 2016                [Page 2]


Internet-Draft            SFC Fault Management               July 2015


   This document is focused on the SFC OAM mechanism in the service
   layer. Currently the service layer does not have OAM tools yet, and
   new OAM tools should be developed according to the characteristic of
   service layer. In the design of SFC OAM, one important principle is
   to reuse existing OAM tools as much as possible. As a result, methods
   in this document follow some existing OAM mechanisms, e.g. Ping, BFD,
   etc.

   A requisite of SFC OAM is that SFC OAM messages must follow the same
   data path as normal SFC packets would traverse. This is usually
   called "In-band" signaling. SFC OAM tools are used primarily to
   validate the SFC data plane, and may further be used to verify the
   SFC data plane against the SFC control plane.

   SFC OAM tools discussed in this document include Connectivity
   Function, Continuity Function, and Trace Function.

   SFC Connectivity Function is used to verify the connectivity existing
   between classifier or service function and another service function.
   Connectivity verification packets will travel along the tested
   service chain path to an anticipant endpoint.

   SFC Continuity Function is used to validate or verify the constant
   reachability for a specific SFP. It can detect failure between a
   source device and destination device in a periodic of time, which
   usually be used in fast fault detection.

   SFC Trace Function is used to locate the fault position. The trace
   request packets will travel along the SFP, it will trigger every
   transit device to generate response. The trace reply message will
   carry information for fault location.

   Details of these SFC functions are described in section 3.



1.1. Conventions used in this document

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in [RFC2119].








Jiang and et al        Expires January 6, 2016                [Page 3]


Internet-Draft            SFC Fault Management               July 2015


1.2. Terminology

   Maintenance Entity Group (MEG): The set of one or more maintenance
   entities that maintain and monitor a section or a transport path in
   an OAM domain.

   MEP: MEG End Point, an OAM end point capable of initiating (source
   MEP) and terminating (sink MEP) OAM packets for fault management and
   performance monitoring.

   MIP: MEG Intermediate Point, an OAM intermediate point terminates and
   processes OAM packets that are sent to this particular MIP and may
   generate OAM packets in reaction to received OAM packets.

   Service Chaining Header: a header in front of packet, added by an SFF.
   SFF uses service chaining header information to forward service
   chaining packet.

   Service Chaining Packet: an original packet added with a service
   chaining header.

   Terminologies introduced in [I-D.ietf-sfc-architecture] will not be
   repeated here.



1.3. SFC OAM Requirements

   The following SFC OAM requirements MUST be supported:

   (R1) SFC OAM MUST allow for continuity check between SFFs.

   (R2) SFC OAM MUST allow for connectivity verification between SFFs.

   (R3) SFC OAM MUST support trace routing in a service function path.

   (R4) SFC OAM MUST support connectivity verification between SFs in an
   SFC chain.

   (R5) SFC OAM MUST support performance measurements in SFs and SFFs.

   (R6) SFC OAM MUST support monitoring of unidirectional and bi-
   directional SFC path.

   (R7) SFC OAM MUST support fate sharing of SFC OAM packets and SFC
   service packets on the same SFC path (congruent path).



Jiang and et al        Expires January 6, 2016                [Page 4]


Internet-Draft            SFC Fault Management               July 2015




   Since control plane is not a prerequisite for SFC, we cannot resort
   to control plane hello session. Furthermore, OAM packets need to be
   transported on the same data path as the SFC packets, so that any
   data plane failure can be identified.

   Therefore, there is a need to provide an OAM tool that would enable
   users to detect failures in the SFC data plane, and a mechanism to
   isolate and identify faults.

   This document discusses the fault management problem in SFC. The
   basic idea is to verify that packets in a particular Service Function
   Chain actually passing through the SFFs and SFs along the respective
   SFC path.

   It is proposed that this test be carried out by sending an OAM
   message (called an "SFC trace request message") across an SFC path.
   The SFC trace request message carries the SFC identifier whose SFC
   path is being verified.  This SFC OAM request message is forwarded on
   the SFC path just like any SFC data packet belonging to that Service
   Function Chain.

   The OAM message is processed by each SFF along the SFC, and the SFF
   will respond with an SFC trace Reply message, carrying information
   such as the previous SF identifier and its position in the SFC.



2. Packet Format



   An SFC OAM payload is depicted in Figure 1.














Jiang and et al        Expires January 6, 2016                [Page 5]


Internet-Draft            SFC Fault Management               July 2015


     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |Version|R|R|R|R| Message Type  |         Reserved              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                    Originator Handle                          |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                    Remote Handle                              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                    Sequence Number                            |
   |                 Sending Timestamp (seconds)                   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                 Sending Timestamp (microseconds)              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                 Receiving Timestamp (seconds)                 |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                 Receiving Timestamp (microseconds)            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                        TLVs                                   |
   .                                                               .
   .                                                               .
   .                                                               .
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                          Figure 1 SFC OAM message

   o Version: version of SFC OAM message. This field is 8 bits long,
      and current version is set to 0x01.

   o Message Type  indicate the type of SFC OAM message.

      The SFC OAM message has the following types:

          Value        Meaning

          -----        -------

              1        continuity check message

              2        trace request message

              3        trace reply message



   o Originator Handle: The Originator Handle is filled in by the
      packet original sender.



Jiang and et al        Expires January 6, 2016                [Page 6]


Internet-Draft            SFC Fault Management               July 2015


   o Remote Handle: The Remote Handle indicates the sink point. It
      usually used in verify part of SFP. It should be filled with
      0xffff when verify the whole SFP.

   o Sequence Number: The Sequence Number is assigned by the sender of
      the SFC request message and can be used to track the correct reply
      message.

   o The Sending Timestamp is the time-of-day (in seconds and
      microseconds, according to the sender's clock) when the SFC OAM
      request is sent.

   o The Receiving Timestamp in an SFC OAM reply message is the time-
      of-day (according to the receiver's clock) that the corresponding
      request was received.

   o TLVs (Type-Length-Value) have the following format:

     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |             Type              |           Length              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                        Value                                  |
   .                                                               .
   .                                                               .
   .                                                               .
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                           Figure 2 SFC OAM TLVs

   Types of SFC OAM TLV will be defined in the next revision; Length is
   the length of the Value field in octets; and Value field is variable
   depending on its Type (it is zero padded to align to a 4-octet
   boundary).














Jiang and et al        Expires January 6, 2016                [Page 7]


Internet-Draft            SFC Fault Management               July 2015


3. Theory of Operation

   In order to describe SFC OAM in an abstract way, we reuse some
   nomenclatures in MPLS WG. SFC OAM operates in the context of
   Maintenance Entities (MEs) that define a relationship between two
   points of a service function path to which maintenance and monitoring
   operations apply. The two points that define a maintenance entity are
   called Maintenance Entity Group End Points (MEPs).

   An abstract reference model for an ME is illustrated in Figure 4
   below:

                         +-+    +-+    +-+    +-+
                         |A|----|B|----|C|----|D|
                         +-+    +-+    +-+    +-+
                      Figure 3 SFC OAM Reference Model

   In Figure 4, node A can be a classifier or an entry SFF, node D can
   be an exit SFF, and node B and C can be any SFF or SF (with the
   restriction that any two SFs cannot be directly connected in SFC
   forwarding layer) on the SFC path.

   In general, MEG End Points (MEPs) are the source and sink points of a
   MEG for SFC OAM.

3.1. Continuity Check and Connectivity Verification of SFC

   Proactive Continuity Check (CC) can be used to detect a loss of
   continuity defect between two MEPs in a MEG.

   Proactive Connectivity Verification (CV) can be used to detect an
   unexpected connectivity defect between two MEGs or unexpected
   connectivity within the MEG with an unexpected MEP.

   BFD can also be used as a tool of proactive CC & CV in SFC, where BFD
   Control packets must be sent along the same path as the monitored SFC
   path.


3.1.1. MEP sending an SFC CC-CV packet

   A source MEP can proactively sends CC-CV packets periodically to its
   sink peer MEP. An SFC CC-CV packet is an SFC CC-CV message
   encapsulated with an SFC Header. The SFC header is set as described
   in [I-D. draft-ietf-sfc-nsh-00] and its flag O MUST be set to 1.

   The SFC OAM message is set as follows:


Jiang and et al        Expires January 6, 2016                [Page 8]


Internet-Draft            SFC Fault Management               July 2015


   - The message type MUST be set to 1.

   - The Sender's Handle is set by the original sender, and MUST be set
      with the sender's identifier.

   - The Sequence Number is set with a random value.


3.1.2. MEP terminating an SFC CC-CV packet

   A sink MEP detects a loss of continuity defect when it fails to
   receive proactive CC-V OAM packets from the source MEP for a
   consecutive time.

   When CC-V packets are received by a sink MEP, it is parsed. If any
   mis-connectivity defect is detected, a warning should be raised and
   fault management system should be notified of the detected defects.


3.2. SFC Route Tracing

   According to the SFC architecture [I-D. draft-ietf-sfc-architecture-
   09], SFC can be categorized into two abstraction layers, that is,
   service function layer and SFC forwarding layer. In the service
   function layer, a service function chain actually is a service
   function graph, where a service function is connected to another
   service function one by one in sequence. In the SFC forwarding layer,
   service functions are further attached to SFF nodes thus form a more
   detailed forwarding graph. As defects can be located on either
   service functions or SFF nodes, it is critical to trace route both
   service functions and SFF nodes to detect and isolate any defects for
   SFC.

   In order to trace route of a service function chain, different layers
   of service function chain can be monitored:

   o Service-function-layer, that is, only SF identifiers can be set as
      the destination MEP in the trace route request and response
      messages. The trace routing operation collects all the SFs'
      identifiers along an SFC path. By comparing this SF list with the
      pre-configured service function graph, an operator could determine
      whether there is any fault in the SF connectivity and locate the
      defect on an SF when there are any of them.






Jiang and et al        Expires January 6, 2016                [Page 9]


Internet-Draft            SFC Fault Management               July 2015


   o SFC-forwarding-layer, that is, both SF identifiers and SFF can be
      set as the destination MEP in the trace route request and response
      messages.  The trace routing operation collects all the SFs'
      identifiers and SFF identifiers along an SFC path. By comparing
      this SF and SFF list with the pre-configured SFC forwarding graph,
      an operator could determine whether there is any fault in the
      forwarding layer and locate the defect on an SFF or an SF.

   Furthermore, two different mechanisms may be used to trace route a
   service function chain:

   o TTL mechanism

   Similar to the IP trace route, the detection node launches a number
   of trace request messages in sequence to detect the fault in a
   specific path, the TTL of request message is set successively to 1,
   2, ..., and so on.

   The trace route request will pass the SFs along the service function
   graph, and each SF will decrease the TTL value by 1.

   A trace route reply message will be generated and send back to the
   launcher when the resulted TTL is equal to zero.

   In this way, the launcher of trace routing can get the list of SFs
   that the trace route request message passes by parsing all the trace
   route reply messages, and isolate the fault location if there is any.



   o record route mechanism

   The detection node launches a single trace route request message, and
   this message is transported over the specific SFC path.

   When the trace route request message is received by an SF in the SFC
   path, the SF adds its SF identifier to the end of an SF list carried
   in the message. Moreover, a trace route reply message should be
   generated and sent back to the launcher, and the new record route SF
   list MUST be copied to the trace route reply message.

   In this way, the launcher of trace routing can get the list of SFs
   that the trace route request message passes by parsing all the trace
   route reply messages, and isolate the fault location if there is any.





Jiang and et al        Expires January 6, 2016               [Page 10]


Internet-Draft            SFC Fault Management               July 2015


3.2.1. MEP sending an SFC Trace Request

   In general, MEG End Points (MEPs) are the source and sink points of a
   MEG for SFC OAM. An MEP initiates a trace route request packet to
   detect and track any fault in a Service Function Chain.
   An SFC Trace route request packet is an SFC trace route request
   message encapsulated with an SFC Header. The SFC header is set as
   described in [I-D.draft-ietf-sfc-nsh-00] and flag O MUST be set to 1.
   The SFC OAM message is further set as follows:
   - The message type MUST be set to 2.
   - The Sender's Handle MUST be set to the sender's identifier.
   - The Receiver's Handle can be set to the exit SFF's identifier.


3.2.2. SFF processing an SFC Trace Route Request

   When an SFF receives a trace route request packet with O flag being
   set in SFC header, it firstly adds its identifier to the end of the
   record route list in the trace request. It then performs service
   forwarding function, and sends the new trace route request packet to
   the next SF or next SFF.

   Furthermore, the SFF sends a trace reply packet back to the source
   MEP with a copy of the new record route SF list.



3.2.3. Service Function treating an SFC Trace Request

   An SF can only be configured as an MIP in an MEG of SFC. When an SF
   (being an MIP) receives a trace request packet with OAM flag being
   set in SFC header from an SFF, it only sends it back to the SFF
   transparently.



3.2.4. MEP receiving an SFC Trace Reply

   An MEP should only process an SFC trace reply packet in response to
   an SFC trace request that it has sent. Thus, upon receipt of an SFC
   trace reply packet, an MEP should try to match the trace reply packet
   with a trace request that it has previously sent, by checking the
   corresponding path identifier and Sequence Number in the SFC OAM
   packets. If no match is found, then the MEP MUST drop the trace reply
   packet silently.




Jiang and et al        Expires January 6, 2016               [Page 11]


Internet-Draft            SFC Fault Management               July 2015


   Since each SFF in the SFC path will send a trace reply packet when
   the trace request packet passes it, a source MEP will receive a
   sequence of trace reply packets from SFFs (other than the MEP itself)
   along the SFC path. Thus, the source MEP can get the full service
   topology and SFC path if there is no defect in the SFC data plane,
   and could detect and locate the data plane defects if there are any
   of them.



4. Security Considerations

   It will be considered in a future revision.

5. IANA Considerations

   It will be considered in a future revision.



6. References

6.1. Normative References




6.2. Informative References

   [I-D.ietf-sfc-problem-statement] P. Quinn, and T. Nadeau; Service
             Function Chaining Problem Statement; August 2014; Work in
             Progress

   [I-D.ietf-sfc-architecture] J. Halpern, C. Pignataro; Service
             Function Chaining (SFC) Architecture; September 2014; Work
             in Progress

   [I-D.aldrin-sfc-oam-framework] S. Aldrin; C. Pignataro; N. Akiya
             Service Function Chaining Operations, Administration and
             Maintenance Framework; July 2014; Work in Progress









Jiang and et al        Expires January 6, 2016               [Page 12]


Internet-Draft            SFC Fault Management               July 2015


7. Acknowledgments

   TBD




   Authors' Addresses

   Yuanlong Jiang
   Huawei Technologies Co., Ltd.
   Bantian, Longgang district
   Shenzhen 518129, China
   Email: jiangyuanlong@huawei.com


   Weiping Xu
   Huawei Technologies Co., Ltd.
   Bantian, Longgang district
   Shenzhen 518129, China
   Email: xuweiping@huawei.com

   Zhen Cao
   Email: zhencao.ietf@gmail.com
























Jiang and et al        Expires January 6, 2016               [Page 13]