Skip to main content

Framework of Fast Fault Detection for IP-baesd Networks
draft-wang-ffd-framework-00

The information below is for an old version of the document.
Document Type
This is an older version of an Internet-Draft whose latest revision state is "Active".
Authors Haibo Wang , Fengwei Qin , Lily Zhao , Shuanglong Chen
Last updated 2022-10-24
RFC stream (None)
Formats
Stream Stream state (No stream defined)
Consensus boilerplate Unknown
RFC Editor Note (None)
IESG IESG state I-D Exists
Telechat date (None)
Responsible AD (None)
Send notices to (None)
draft-wang-ffd-framework-00
Network Working Group                                            H. Wang
Internet-Draft                                                    Huawei
Intended status: Informational                                    F. Qin
Expires: 27 April 2023                                      China Mobile
                                                                 L. Zhao
                                                                 S. Chen
                                                                  Huawei
                                                         24 October 2022

        Framework of Fast Fault Detection for IP-baesd Networks
                      draft-wang-ffd-framework-00

Abstract

   The IP-based distributed system and software application layer often
   use heartbeat to maintain the network topology status.  However, the
   heartbeat setting is long, which prolongs the system fault detection
   time.  IP-based storage network is the typical usage of that
   scenario.  When an IP-based storage network fault occurs, NVMe
   connections need to be switched over.  Currently, no effective method
   is available for quick detection, switchover is performed only based
   on keepalive timeout, resulting in low performance.

   This document defines the basic framework of how network assisted
   host devices can quickly detect application connection failures
   caused by network faults.

Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [RFC2119].

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

Wang, et al.              Expires 27 April 2023                 [Page 1]
Internet-Draft              Abbreviated-Title               October 2022

   This Internet-Draft will expire on 27 April 2023.

Copyright Notice

   Copyright (c) 2022 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents (https://trustee.ietf.org/
   license-info) in effect on the date of publication of this document.
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.  Code Components
   extracted from this document must include Revised BSD License text as
   described in Section 4.e of the Trust Legal Provisions and are
   provided without warranty as described in the Revised BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
   2.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . .   3
   3.  Reference Models  . . . . . . . . . . . . . . . . . . . . . .   3
   4.  Functional Components . . . . . . . . . . . . . . . . . . . .   4
     4.1.  Server Endpoint (Storage Device)  . . . . . . . . . . . .   5
     4.2.  Client Endpoint (Host)  . . . . . . . . . . . . . . . . .   5
     4.3.  Network Device  . . . . . . . . . . . . . . . . . . . . .   6
   5.  Procedures  . . . . . . . . . . . . . . . . . . . . . . . . .   6
     5.1.  Network Deployment  . . . . . . . . . . . . . . . . . . .   7
     5.2.  Hosts and Storage devices . . . . . . . . . . . . . . . .   7
     5.3.  Status Infomation Sync And Notification . . . . . . . . .   7
   6.  Security Considerations . . . . . . . . . . . . . . . . . . .   8
   7.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .   9
   8.  References  . . . . . . . . . . . . . . . . . . . . . . . . .   9
     8.1.  Normative References  . . . . . . . . . . . . . . . . . .   9
     8.2.  Informative References  . . . . . . . . . . . . . . . . .   9
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .   9

1.  Introduction

   IP-based distributed systems are widely used, and the network is
   opaque to application-side systems.  When an IP network connected to
   a distributed system encounters a fault that affects IP connectivity,
   the application system cannot quickly detect the fault.  To enable
   the application system to quickly detect the fault, the application
   system needs to accelerate keepalive or deploy a detection mechanism,
   which brings extra overheads to the application system.

Wang, et al.              Expires 27 April 2023                 [Page 2]
Internet-Draft              Abbreviated-Title               October 2022

   The [I-D.guo-ffd-requirement] describes the requirements for these
   applications.  The most typical application scenario is the IP-based
   NVMe scenario.

   IP-based NVMe is an implementation of NVMe over Fabrics that best
   fits NVMe semantics.  It is the development trend of high-speed
   storage networks in the future.  IP-based NVMe for high-speed storage
   has high requirements on IP networks.  In an IP-based NVMe network,
   when a failure that affects an IP connection occurs, for example, an
   access link failure or a switch network failure that cannot perform
   route convergence, the NVMe connection cannot immediately detect the
   fault.  In the current implementation mechanism, this failure can
   only be detected based on keepalive timeout.  Generally, this failure
   lasts more than 10s.  To speed up detection, hosts and storage
   devices can use fast keepalive or BFD for fast detection.  However,
   the solution introduces additional load on hosts and storage devices,
   making it difficult to use in large-scale IP-based NVMe.

2.  Terminology

   NoF : NVMe of Fabrics

   FC : Fiber Channel

   NVMe : Non-Volatile Memory Express

   SAN: Storage Area Network

3.  Reference Models

   This document describes the framework based of IP-based NVMe as a
   typical application.

   An IP-based NVMe mainly includes three types of roles: an initiator
   (referred to as a host), a switch, and a target (referred to as a
   storage device).  Initiators and targets are also referred to as
   client endpoint and server endpoint.  Hosts and storage devices use
   the IP-based NVMe protocol to transmit data over the network to
   provide high-performance storage services.

Wang, et al.              Expires 27 April 2023                 [Page 3]
Internet-Draft              Abbreviated-Title               October 2022

                          +--+      +--+      +--+      +--+
              Host        |H1|      |H2|      |H3|      |H4|
           (Initiator)    +/-+      +-,+      +.-+      +/-+
                           |         | '.   ,-`|         |
                           |         |   `',   |         |
                           |         | ,-`  '. |         |
                         +-\--+    +--`-+    +`'--+    +-\--+
                         | SW1|    | SW2|    | SW3|    | SW4|
                         +--,-+    +---,,    +,.--+    +-.--+
                             `.          `'.,`         .`
                               `.   _,-'`    ``'.,   .`
               IP              +--'`+            +`-`-+
             Network           | SW5|            | SW6|
                               +--,,+            +,.,-+
                               .`   `'.,     ,.-``   ',
                             .`         _,-'`          `.
                         +--`-+    +--'`+    `'---+    +-`'-+
                         | SW7|    | SW8|    | SW9|    |SW10|
                         +-.,-+    +-..-+    +-.,-+    +-_.-+
                           | '.   ,-` |        | `.,   .' |
                           |   `',    |        |    '.`   |
                           | ,-`  '.  |        | ,-`  `', |
             Storage      +-`+      `'\+      +-`+      +`'+
             (Target)     |S1|      |S2|      |S3|      |S4|
                          +--+      +--+      +--+      +--+
                          Figure 1 : NVMe over IP-based Network

   This is a dual-plane NVMe over IP-based Network which applies to a
   large-scale storage device access network.  Storage devices on the
   dual-homed access network provide NVMe services using two different
   IP addresses.

   When an access link (for example, the S1-SW7 link) or a network-side
   link (for example, the SW7-SW5 link) fails, H1 cannot access the IP
   address of S1 connected to SW1.  H1 cannot quickly detect the
   failure.  After the keepalive timeout, H1 can detect the failure and
   then switch the NVMe connection to the IP address that S1 accesses
   through SW8.

4.  Functional Components

   The NVMe IP-based SANs consists of storage devices, hosts and
   switches.  The storage device provides services.  The host initiates
   an NVMe connection to the storage device.  That is, the host is the
   Client Endpoint, and the storage device is the Server Endpoint.

Wang, et al.              Expires 27 April 2023                 [Page 4]
Internet-Draft              Abbreviated-Title               October 2022

4.1.  Server Endpoint (Storage Device)

   As a service provider, the server endpoint does not need to detect
   the status of the client.  To enable the network to know the
   information about the server, the server needs to advertise its
   information to the access switch.

   To reduce the complexity of server endpoint, it is suggested to
   extend the LLDP protocol to support registration.

                     +-----------+           +--------+
                     | Server EP |           | Switch |
                     | (Storage) |           |        |
                     +----/------+           +----/---+
                          |                       |
                          |    Register Msg       |
                          |---------------------->|
                          |                       |
                          \                       \
                       Figure 2 : Server Endpoint

4.2.  Client Endpoint (Host)

   The client needs to quickly obtain the IP reachability status of the
   service endpoint.  In this case, the client needs to send a
   subscription request to the access switch.  In addition, to
   facilitate the network to know the location of the client endpoint,
   the client endpoint needs to register its information to the access
   switch.  When the switch network senses a failure required by the
   client endpoint, the access switch notifies the corresponding client
   endpoint of the fault state.

   Also, to reduce the complexity of client endpoints, it is recommended
   that the LLDP protocol be extended to support subscriptions.  For
   notification messages initiated by the switch to client endpoints, it
   is recommended that the L2 extension protocol be used to control the
   notification scope.

Wang, et al.              Expires 27 April 2023                 [Page 5]
Internet-Draft              Abbreviated-Title               October 2022

    +-----------+           +--------+
    | Client EP |           | Switch |
    |  (Host)   |           |        |
    +----/------+           +----/---+
         |                       |
         |    Register Msg       |
         |---------------------->|
         |                       |
         |    Subscribe Msg      |
         |---------------------->|
         |                       |
         |   Notification Msg    |
         |<----------------------|
         |                       |
         \                       \
       Figure 3 : Client Endpoint

4.3.  Network Device

   Network devices, such as access switches, can quickly detect failures
   on local access links.  The client endpoint that needs to obtain the
   failure may not be connected to that switch.  Therefore, the switch
   that detects the failure needs to synchronize the information to
   other switches so that the other switches can notify the required
   endpoint as required.

   On a large-scale network, reflector can be used to reduce the number
   of connections for information synchronization between switches.

   To ensure that synchronization messages can be reliably synchronized
   to other switches, a reliable transmission protocol, such as TCP or
   Quic, must be used.

    +--------+    +-----------+   +--------+
    | Switch |    | Reflector |   | Switch |
    +----/---+    +-----/-----+   +---/----+
         |              |             |
         |   Sync Msg   |             |
         |------------->|   Sync Msg  |
         |              |------------>|
         \              \             \
       Figure 4 : Network Device

5.  Procedures

   Here use the IP-based NVMe interaction example to see the complete
   deployment process of this framework.

Wang, et al.              Expires 27 April 2023                 [Page 6]
Internet-Draft              Abbreviated-Title               October 2022

5.1.  Network Deployment

   The IP-based NVMe uses the standard IP technology.  Network
   deployments typically use the current IP technologies.  For example,
   OSPF is usually deployed as an underlay protocol.

5.2.  Hosts and Storage devices

   Hosts and storage devices are connected to the IP network.  As shown
   by Figure 1, they may access the network in single-homing or dual-
   homing mode.  The administrator assigns access IP addresses to the
   hosts and storage devices.  In most scenarios, these routes can be
   advertised through the underlay protocol.

   To enable IP network devices to know the information about these
   access nodes, hosts and storage devices need to register their own
   network information, such as IP addresses and roles, with the access
   switches after accessing the network.  In addition, the host needs to
   initiate a subscription request to the access switch to notify the
   access switch of the information about the storage device it cares
   about.

5.3.  Status Infomation Sync And Notification

   Hosts and storage devices are connected to different switches.  To
   enable these switches to obtain the registration and subscription
   information of these hosts and storage devices, synchronizing the
   information between the switches is needed.

Wang, et al.              Expires 27 April 2023                 [Page 7]
Internet-Draft              Abbreviated-Title               October 2022

 +------+        +--------+   +-----------+   +--------+     +---------+
 | Host |        | Switch |   | Reflector |   | Switch |     | Storage |
 +--/---+        +----/---+   +-----/-----+   +---/----+     +----/----+
    |  Register Msg   |             |             |               |
    |---------------->|             |             |               |
    |  Subscribe Msg  |             |             |               |
    |---------------->|  Sync Msg   |             |               |
    |                 |------------>|   Sync Msg  |               |
    |                 |             |------------>|  Register Msg |
    |                 |             |   Sync Msg  |<--------------|
    |                 |   Sync Msg  |<------------|               |
    |                 |<------------|             |--/            |
    |                 |             |             |  |Fault       |
    |                 |             |             |  |Detection   |
    |                 |             |   Sync Msg  |<--            |
    |                 |   Sync Msg  |<------------|               |
    | Notification Msg|<------------|             |               |
    |<----------------|             |             |               |
    \                 \             \             \               \
            Figure 7 : Information Advertisement

   After detecting a local failure, the switch calculates the IP address
   affected by the failure.  If another access endpoint on the switch
   wants to obtain the IP address of the failure, the switch notifies
   that access endpoint of the fault.  In addition, the switch needs to
   synchronize the failure IP address to other switches on the network.
   After receiving the failure IP address information, other switches
   notify the access endpoints who need the information.

   When a link between network devices or a network device is failure,
   routes are converged on the network.  If services cannot be restored
   even after route convergence, such as SW7-SW5 shown in Figure 1, is
   faulty.  As a result, H1 cannot access the IP address used by S1 to
   access SW7.  In this case, after detecting the failure, the network
   device calculates the IP addresses affected by the failure.  Then,
   the network device notifies the required access endpoint of the
   failure information.  As shown in Figure 1, SW1 calculates that the
   IP address used by S1 to connect to SW7 is unreachable.  Therefore,
   SW1 notifies H1 of the failure so that H1 can quickly switch to
   another storage device.

6.  Security Considerations

   NA

Wang, et al.              Expires 27 April 2023                 [Page 8]
Internet-Draft              Abbreviated-Title               October 2022

7.  IANA Considerations

   NA

8.  References

8.1.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <https://www.rfc-editor.org/info/rfc2119>.

8.2.  Informative References

   [I-D.guo-ffd-requirement]
              Guo, L., Feng, Y., Zhao, J., Qin, F., Zhao, L., and H.
              Wang, "Requirement of Fast Fault Detection for IP-based
              Network", Work in Progress, Internet-Draft, draft-guo-ffd-
              requirement-00, 24 October 2022,
              <https://www.ietf.org/archive/id/draft-guo-ffd-
              requirement-00.txt>.

Authors' Addresses

   Haibo Wang
   Huawei
   No. 156 Beiqing Road
   Beijing
   100095
   P.R. China
   Email: rainsword.wang@huawei.com

   Fengwei Qin
   China Mobile
   Beijing
   China
   Email: qinfengwei@chinamobile.com

   Lily Zhao
   Huawei
   No. 3 Shangdi Information Road
   Beijing
   100085
   P.R. China
   Email: Lily.zhao@huawei.com

Wang, et al.              Expires 27 April 2023                 [Page 9]
Internet-Draft              Abbreviated-Title               October 2022

   Shuanglong Chen
   Huawei
   No. 156 Beiqing Road
   Beijing
   100095
   P.R. China
   Email: chenshuanglong@huawei.com

Wang, et al.              Expires 27 April 2023                [Page 10]