Skip to main content

Link State Over Ethernet
draft-ymbk-lsvr-lsoe-00

The information below is for an old version of the document.
Document Type
This is an older version of an Internet-Draft whose latest revision state is "Replaced".
Authors Randy Bush , Keyur Patel
Last updated 2018-03-14
Replaced by draft-ietf-lsvr-lsoe
RFC stream (None)
Formats
Stream Stream state (No stream defined)
Consensus boilerplate Unknown
RFC Editor Note (None)
IESG IESG state I-D Exists
Telechat date (None)
Responsible AD (None)
Send notices to (None)
draft-ymbk-lsvr-lsoe-00
Network Working Group                                            R. Bush
Internet-Draft                                              Arrcus & IIJ
Intended status: Standards Track                                K. Patel
Expires: September 14, 2018                                       Arrcus
                                                          March 13, 2018

                        Link State Over Ethernet
                        draft-ymbk-lsvr-lsoe-00

Abstract

   Used in a Massive Data Center (MDC), BGP-LS and BGP-SPF need link
   neighbor discovery, liveness, and addressability data.  Link State
   Over Ethernet protocols provide link discovery, exchange AFI/SAFIs,
   and discover addresses over raw Ethernet.  These data are pushed
   directly to BGP-LS/SPF, obviating the need for centralized controller
   architectures.  This protocol is more widely applicable, and has been
   designed to support a wide range of routing and similar protocols
   which need link discovery and characterisation.

Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" are to
   be interpreted as described in RFC 2119 [RFC2119] only when they
   appear in all upper case.  They may also appear in lower or mixed
   case as English words, without normative meaning.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on September 14, 2018.

Bush & Patel           Expires September 14, 2018               [Page 1]
Internet-Draft          Link State Over Ethernet              March 2018

Copyright Notice

   Copyright (c) 2018 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (https://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
   2.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . .   3
   3.  Background  . . . . . . . . . . . . . . . . . . . . . . . . .   4
   4.  Top Level Overview  . . . . . . . . . . . . . . . . . . . . .   4
   5.  Ethernet to Ethernet Protocols  . . . . . . . . . . . . . . .   5
     5.1.  Inter-Link Ether Protocol Overview  . . . . . . . . . . .   5
     5.2.  PDUs and Frames . . . . . . . . . . . . . . . . . . . . .   7
       5.2.1.  Frame TLV . . . . . . . . . . . . . . . . . . . . . .   7
       5.2.2.  Link KeepAlive / Hello  . . . . . . . . . . . . . . .  10
       5.2.3.  Capability Exchange . . . . . . . . . . . . . . . . .  10
       5.2.4.  Timer Negotiation . . . . . . . . . . . . . . . . . .  11
     5.3.  The AFI/SAFI Exchanges  . . . . . . . . . . . . . . . . .  11
       5.3.1.  AFI/SAFI Capability Exchange  . . . . . . . . . . . .  11
       5.3.2.  The AFI/SAFI PDU Skeleton . . . . . . . . . . . . . .  12
       5.3.3.  AFI/SAFI ACK  . . . . . . . . . . . . . . . . . . . .  13
       5.3.4.  Add/Drop/Prim . . . . . . . . . . . . . . . . . . . .  13
       5.3.5.  IPv4 Announce / Withdraw  . . . . . . . . . . . . . .  13
       5.3.6.  IPv6 Announce / Withdraw  . . . . . . . . . . . . . .  14
       5.3.7.  MPLS IPv4 Announce / Withdraw . . . . . . . . . . . .  14
       5.3.8.  MPLS IPv6 Announce / Withdraw . . . . . . . . . . . .  15
   6.  Layer 2.5 and 3 Liveness  . . . . . . . . . . . . . . . . . .  16
   7.  The North/South Protocol  . . . . . . . . . . . . . . . . . .  16
     7.1.  Topology Request for Full State . . . . . . . . . . . . .  16
     7.2.  PDU from Link Layer to Shim . . . . . . . . . . . . . . .  17
     7.3.  Link/ASN sub-PDU  . . . . . . . . . . . . . . . . . . . .  17
   8.  Security Considerations . . . . . . . . . . . . . . . . . . .  18
   9.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  18
   10. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . .  18
   11. Normative References  . . . . . . . . . . . . . . . . . . . .  19
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  19

Bush & Patel           Expires September 14, 2018               [Page 2]
Internet-Draft          Link State Over Ethernet              March 2018

1.  Introduction

   The Massive Data Center (MDC) environment presents unusual problems
   of scale, e.g.  O(10,000) switches, while its homogeneity presents
   opportunities for simple approaches.  Approaches such as Jupiter
   Rising use a central controller to deal with scaling, while BGP-SPF
   [I-D.keyupate-idr-bgp-spf] provides massive scale out without
   centralization using a tried and tested scalable distributed control
   plane, offering a scalable routing solution in Clos and similar
   environments.  But it needs link state and addressing data from the
   network to build the routing topology.  LLDP has scaling issues, e.g.
   in extending a PDU beyond 1,500 bytes.

   Link State Over Ethernet (LSOE) provides brutally simple mechanisms
   for devices to

   o  Discover each other's MACs,

   o  Run MAC keep-alives for liveness assurance,

   o  Discover each other's ASNs,

   o  Negotiate mutually supported AFI/SAFIs,

   o  Discover and maintain link IP/MPLS addresses,

   o  Enable layer three link liveness such as BFD, and finally

   o  Push these data up to BGP-SPF which computes the topology and
      builds routing and forwarding tables.

   This protocol is more widely applicable than BGP-SPF, and has been
   designed to support a wide range of routing and similar protocols
   which need link discovery and characterisation.

2.  Terminology

   Even though it concentrates on the Ethernet layer, this document
   relies heavily on routing terminology.  The following are some
   possibly confusing terms:

   AFI/SAFI:  Address Family Indicator and Subsequent Address Family
              Indicator.  I.e. classes of addresses such as IPv4, IPv6,
              ...
   ASN:       Autonomous System Number, a BGP identifier for an
              originator of routing, particularly BGP, announcements.
   BGP-SPF    A hybrid protocol using BGP transport but Dijkstra SPF
              decision process.  See [I-D.keyupate-idr-bgp-spf].

Bush & Patel           Expires September 14, 2018               [Page 3]
Internet-Draft          Link State Over Ethernet              March 2018

   Clos:      A hierarchic switch topology commonly used in data
              centers.
   Frame      The payload of an Ethernet packet.
   MAC:       Medium Access Control, essentially an Ethernet address,
              six octets.
   MDC:       Massive Data Center, O(1,000) TORs or more.
   PDU:       Protocol Data Unit, essentially an application layer
              message.
   SPF:       Shortest Path First, an algorithm for finding the shortest
              paths between nodes in a graph.
   TOR:       Top Of Rack switch, aggregates the servers in a rack and
              connects to the Clos spine.
   ZTP:       Zero Touch Provisioning gives devices initial addresses,
              credentials, etc. on boot/restart.

3.  Background

   LSOE assumes a Clos-like topology, though the acyclic constraint is
   not necessary.

   While LSOE is designed for the MDC, there are no inherent reasons it
   could not run on a WAN; though it is not clear that this would be
   useful.  The authentication and authorisation needed to run safely on
   the WAN are not (yet) included in this protocol.

   LLDP is not suitable because one can not extend a PDU beyond 1500
   bytes without hitting an IPR barrier.  It is also complex.

   UDP is unsuitable as it would require prior knowledge of IP level
   addressing, one of the key purposes of this discovery protocol.

   LSOE assumes a new IEEE assigned EtherType (TBD).

4.  Top Level Overview

   o  MAC Link State is exchanged over Ethernet

   o  AFI/SAFI data are exchanged and IP-Level Liveness Checks done

   o  BGP-SPF uses the data to discover and build the topology database

Bush & Patel           Expires September 14, 2018               [Page 4]
Internet-Draft          Link State Over Ethernet              March 2018

   +-------------------+   +-------------------+   +-------------------+
   |      Device       |   |      Device       |   |      Device       |
   |                   |   |                   |   |                   |
   |+-----------------+|   |+-----------------+|   |+-----------------+|
   ||                 ||   ||                 ||   ||                 ||
   ||     BGP-SPF     <+---+>     BGP-SPF     <+---+>     BGP-SPF     ||
   ||                 ||   ||                 ||   ||                 ||
   |+--------^--------+|   |+--------^--------+|   |+--------^--------+|
   |         |         |   |         |         |   |         |         |
   |         |         |   |         |         |   |         |         |
   |+--------+--------+|   |+--------+--------+|   |+--------+--------+|
   ||    Liveness     ||   ||    Liveness     ||   ||    Liveness     ||
   ||    AFI/SAFIs    ||   ||    AFI/SAFIs    ||   ||    AFI/SAFIs    ||
   ||    Addresses    ||   ||    Addresses    ||   ||    Addresses    ||
   |+--------^--------+|   |+--------^--------+|   |+--------^--------+|
   |         |         |   |         |         |   |         |         |
   |         |         |   |         |         |   |         |         |
   |+--------v--------+|   |+--------v--------+|   |+--------v--------+|
   ||                 ||   ||                 ||   ||                 ||
   ||   Ether PDUs    <+---+>   Ether PDUs    <+---+>   Ether PDUs    ||
   ||                 ||   ||                 ||   ||                 ||
   |+-----------------+|   |+-----------------+|   |+-----------------+|
   +-------------------+   +-------------------+   +-------------------+

   There are two sets of protocols:

   o  Ethernet to Ethernet protocols are used to exchange layer 2 data,
      i.e. MACs, and layer 2.5 and 3 data, i.e. ASNs, AFI/SAFIs, and
      interface addresses.

   o  A Link Layer to BGP protocol pushes these data up the stack to
      BGP-SPF, converting to the BGP-LS BGP-like data format.

   o  And, of course, the BGP layer crosses all the devices, though it
      is not part of these LSOE protocols.

5.  Ethernet to Ethernet Protocols

   The basic Ethernet Framed protocols

5.1.  Inter-Link Ether Protocol Overview

Bush & Patel           Expires September 14, 2018               [Page 5]
Internet-Draft          Link State Over Ethernet              March 2018

   |       Hello / KeepAlive (type=0)       |
   |--------------------------------------->|
   |                                        | MACs and Liveness
   |       Hello / KeepAlive (type=0)       | Mandatory
   |<---------------------------------------|
   |                                        |
   |                                        |
   |                                        |
   |         Timers (type=1, cap 1)         |
   |--------------------------------------->| Timers (type 1, cap 1)
   |                                        | Optional
   |         Timers (type=1, cap 1)         | Renegotiate at Any Time
   |<---------------------------------------|
   |                                        |
   |                                        |
   |                                        |
   |     Link AFI/SAFIs (type=1, cap 4)     |
   |--------------------------------------->| AFI/SAFI Support (cap 4)
   |<---------------------------------------| Mandatory
   |     Link AFI/SAFIs (type=1, cap 4)     | Renegotiate at Any Time
   |                                        |
   |                                        |
   |                                        |
   |    Interface MPLS Labels (type=10)     |
   |--------------------------------------->| Interface Labels
   |                                        | Optional
   |     Interface MPLS Labels (type=10)    | Renegotiate at Any Time
   |<---------------------------------------|
   |                                        |
   |                                        |
   |                                        |
   |   Interface IPv4 Addresses (type=14)   |
   |--------------------------------------->| Interface IPv4 Addresses
   |                                        | Optional
   |   Interface IPv4 Addresses (type=14)   | Renegotiate at Any Time
   |<---------------------------------------|
   |                                        |
   |                                        |
   |                                        |
   |   Interface IPv6 Addresses (type=16)   |
   |--------------------------------------->| Interface IPv6 Addresses
   |                                        | Optional
   |   Interface IPv6 Addresses (type=16)   | Renegotiate at Any Time
   |<---------------------------------------|

Bush & Patel           Expires September 14, 2018               [Page 6]
Internet-Draft          Link State Over Ethernet              March 2018

5.2.  PDUs and Frames

   This is all about inter-device Link State.

   A PDU is one or more Ethernet Frames.

   A Frame has a PDU Sequence Number and a Frame Number to allow
   assembly of out order frames.

   Because BGP-SPF and Data Plane payloads are assumed to be IP over the
   same Ethernet, one worries about congestion.

5.2.1.  Frame TLV

   The basic Ethernet PDU is a typical TLV (Type Length Value) PDU,
   except it's really LTV for the sake of alignment :)

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |        PDU Sequence No        |    Frame No   |     Flags     |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |            Checksum           |             Length            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |      Type     |
   +-+-+-+-+-+-+-+-+

   The fields of the basic Ethernet PDU are as follows:

   PDU Sequence No:  Semi-unique identifier of a TLV PDU (e.g. the low
      order 16 bits of UNIX time)

   Frame No:  0..255 Frame Sequence Number Within a multi-frame PDU

   Flags:  A bit field

         0 - Sender has been restarted
         1 - One of a multi-Frame sequence
         2 - last of a multi-Frame sequence
         3-7 - Reserved

   Checksum:  One's complement over Frame, detect bit flips

   Length:  Total Bytes in PDU including all frames and fields

   Type:  An integer

         0 - Hello / KeepAlive

Bush & Patel           Expires September 14, 2018               [Page 7]
Internet-Draft          Link State Over Ethernet              March 2018

         1 - Capability
         2-9 - Reserved
         10 - AFI/SAFI ACK
         11 - IPv4 Announce / Withdraw
         12 - IPv6 Announce / Withdraw
         13 - MPLS IPv4 Announce / Withdraw
         14 - MPLS IPv6 Announce / Withdraw
         15-255 Reserved

5.2.1.1.  The Checksum

   There is a reason conservative folk use a checksum in UDP.  And when
   the operators stretch to jumbo frames ...

   One's complement is a bit silly, though trivial to implement and
   might be sufficient.

Bush & Patel           Expires September 14, 2018               [Page 8]
Internet-Draft          Link State Over Ethernet              March 2018

   Sum up either 16-bit shorts in a 32-bit int, or 32-bit ints in a
   64-bit long, then take the high-order section, shift it right,
   rotate, add it in, repeat until zero.  -- smb off the top of his head

   /* The F table from Skipjack, and it would work for the S-Box.
      There are other S-Box sources as well. -- Russ Housley */
   const BYTE sbox[256] = {
   0xa3,0xd7,0x09,0x83,0xf8,0x48,0xf6,0xf4,0xb3,0x21,0x15,0x78,
   0x99,0xb1,0xaf,0xf9,0xe7,0x2d,0x4d,0x8a,0xce,0x4c,0xca,0x2e,
   0x52,0x95,0xd9,0x1e,0x4e,0x38,0x44,0x28,0x0a,0xdf,0x02,0xa0,
   0x17,0xf1,0x60,0x68,0x12,0xb7,0x7a,0xc3,0xe9,0xfa,0x3d,0x53,
   0x96,0x84,0x6b,0xba,0xf2,0x63,0x9a,0x19,0x7c,0xae,0xe5,0xf5,
   0xf7,0x16,0x6a,0xa2,0x39,0xb6,0x7b,0x0f,0xc1,0x93,0x81,0x1b,
   0xee,0xb4,0x1a,0xea,0xd0,0x91,0x2f,0xb8,0x55,0xb9,0xda,0x85,
   0x3f,0x41,0xbf,0xe0,0x5a,0x58,0x80,0x5f,0x66,0x0b,0xd8,0x90,
   0x35,0xd5,0xc0,0xa7,0x33,0x06,0x65,0x69,0x45,0x00,0x94,0x56,
   0x6d,0x98,0x9b,0x76,0x97,0xfc,0xb2,0xc2,0xb0,0xfe,0xdb,0x20,
   0xe1,0xeb,0xd6,0xe4,0xdd,0x47,0x4a,0x1d,0x42,0xed,0x9e,0x6e,
   0x49,0x3c,0xcd,0x43,0x27,0xd2,0x07,0xd4,0xde,0xc7,0x67,0x18,
   0x89,0xcb,0x30,0x1f,0x8d,0xc6,0x8f,0xaa,0xc8,0x74,0xdc,0xc9,
   0x5d,0x5c,0x31,0xa4,0x70,0x88,0x61,0x2c,0x9f,0x0d,0x2b,0x87,
   0x50,0x82,0x54,0x64,0x26,0x7d,0x03,0x40,0x34,0x4b,0x1c,0x73,
   0xd1,0xc4,0xfd,0x3b,0xcc,0xfb,0x7f,0xab,0xe6,0x3e,0x5b,0xa5,
   0xad,0x04,0x23,0x9c,0x14,0x51,0x22,0xf0,0x29,0x79,0x71,0x7e,
   0xff,0x8c,0x0e,0xe2,0x0c,0xef,0xbc,0x72,0x75,0x6f,0x37,0xa1,
   0xec,0xd3,0x8e,0x62,0x8b,0x86,0x10,0xe8,0x08,0x77,0x11,0xbe,
   0x92,0x4f,0x24,0xc5,0x32,0x36,0x9d,0xcf,0xf3,0xa6,0xbb,0xac,
   0x5e,0x6c,0xa9,0x13,0x57,0x25,0xb5,0xe3,0xbd,0xa8,0x3a,0x01,
   0x05,0x59,0x2a,0x46
   };

   /* example C code, constant time even, thanks Rob Austein */

   uint16_t sbox_checksum(const *b, const size_t n)
   {
     uint32_t sum[2] = {0, 0};
     for (int i = 0; i < n; i++)
       sum[i & 1] += sbox[b[i]];
     uint32_t result = (sum[0] << 8) + sum[1];
     result = (result >> 16) + (result & 0xFFFF);
     result = (result >> 16) + (result & 0xFFFF);
     return (uint16_t) result;
   }

Bush & Patel           Expires September 14, 2018               [Page 9]
Internet-Draft          Link State Over Ethernet              March 2018

5.2.2.  Link KeepAlive / Hello

   The Hello and KeepAlive PDUs are one and the same.

   Each device learns the other's MAC from its HELLO whining.  I.e., all
   devices on a wire/interface know each others MACs and learn each
   other's ASNs.

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |        PDU Sequence No        |    Frame No   |     Flags     |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |            Checksum           |          Length = 17          |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |    Type = 0   |                     MyASN                     |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |               |               YourASN (or Zero)               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |               |
   +-+-+-+-+-+-+-+-+

   Once two devices know each other's MACs, Ethernet keep-alives may be
   started to ensure layer two liveness.  The timing and acceptable drop
   of the keep-alives may be set with the Timer Negotiation capability
   exchange.

5.2.3.  Capability Exchange

   Peers on the Ethernet exchange capabilities, such as timers, AFI/
   SAFIs supported, etc.  There is a simple capability exchange.

   By convention, the device with the lowest MAC sends first.

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |        PDU Sequence No        |    Frame No   |     Flags     |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |            Checksum           |             Length            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |    Type = 1   |    RADflag    |           Capability          |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   The RADflag is an integer field which signals the capability
   negotiation.

      bit 0 - Request

Bush & Patel           Expires September 14, 2018              [Page 10]
Internet-Draft          Link State Over Ethernet              March 2018

      bit 1 - Accept
      bit 2 - Deny
      bits 3-255 - Reserved

5.2.4.  Timer Negotiation

   Different operational scenarios may call for layer two and layer
   three timers which differ from the defaults.  So there is a
   capability negotiation to modify these timers.

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |        PDU Sequence No        |    Frame No   |     Flags     |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |            Checksum           |          Length = 16          |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |    Type = 1   |    RADflag    |         Capability = 1        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |           Frequency           |  AllowMissCt  |    A/S Wait   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   The meaning of the timer fields are as follows:

   Frequency:    Seconds/10 between KeepAlives (Default is 600)
   AllowMissCt:  Number of missed KeepAlives before declared down
   A/S Wait      AFI/SAFI ACK Timeout in Sec/10 (default 10)

5.3.  The AFI/SAFI Exchanges

   The devices know each other's MACs, have means to ensure link state,
   and know each other's ASNs.  Now they can negotiate which AFI/SAFIs
   are supported, and announce their interface addresses (and labels).

5.3.1.  AFI/SAFI Capability Exchange

   First they negotiate what AFI/SAFIs are supported on the link.

   As before, the lowest MAC initiates the negotiation.

Bush & Patel           Expires September 14, 2018              [Page 11]
Internet-Draft          Link State Over Ethernet              March 2018

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |        PDU Sequence No        |    Frame No   |     Flags     |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |            Checksum           |          Length = 13          |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |    Type = 1   |    RADflag    |         Capability = 4        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   AFI/SAFIs   |
   +-+-+-+-+-+-+-+-+

   The AFI/SAFIs currently defined are as follows:

      10 - IPv4
      11 - IPv6
      12 - MPLS IPv4
      13 - MPLS IPv6
      ... - other tunnels (e.g.  GRE)

5.3.2.  The AFI/SAFI PDU Skeleton

   Now both sides can exchange their actual interfaces addresses for all
   the negotiated AFI/SAFIs.

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |        PDU Sequence No        |    Frame No   |     Flags     |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |            Checksum           |             Length            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   Type = 42   |                Sequence Number                |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |               |         AFI/SAFI Count        |  sub-PDUs...  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   The AFI/SAFI Exchange is over an unreliable transport so there are
   Sequence Numbers and ACKs.

   The Sequence Number is a point-to-point link announcement counter,
   incremented for each exchange in each direction on the link.

   The Receiver will ACK it with a Type=10, see following PDU.

   If the Sender does not receive an ACK in one second, they retransmit.
   Other delay timers may be negotiated using the Timing Capability.

Bush & Patel           Expires September 14, 2018              [Page 12]
Internet-Draft          Link State Over Ethernet              March 2018

   If a sender has multiple links on the same interface, separate
   counters must be kept for each.

5.3.3.  AFI/SAFI ACK

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |        PDU Sequence No        |    Frame No   |     Flags     |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |            Checksum           |             Length            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   Type = 10   |                Sequence Number                |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |               |
   +-+-+-+-+-+-+-+-+

5.3.4.  Add/Drop/Prim

   Each AFI/SAFI interface address may actually be announced, or
   withdrawn.

   An interface may have multiple AFI/SAFIs.

   For each AFI/SAFI on an interface there might be multiple addresses.

   One address per AFI/SAFI SHOULD be marked as primary.

    0                   1                   2                   3
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   Add/Drop    |   Primary     |          Reserved             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

5.3.5.  IPv4 Announce / Withdraw

Bush & Patel           Expires September 14, 2018              [Page 13]
Internet-Draft          Link State Over Ethernet              March 2018

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |        PDU Sequence No        |    Frame No   |     Flags     |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |            Checksum           |             Length            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   Type = 11   |                Sequence Number                |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |               |         AFI/SAFI Count        | Add/Drop/Prim |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                        IPv4 Prefix/Len                        |
   +               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |               | Add/Drop/Prim |                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               +-+-+-+-+-+-+-+-+
   |                IPv4 Prefix/Len                |    more ...   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

5.3.6.  IPv6 Announce / Withdraw

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |        PDU Sequence No        |    Frame No   |     Flags     |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |            Checksum           |             Length            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   Type = 12   |                Sequence Number                |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |               |         AFI/SAFI Count        | Add/Drop/Prim |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   +                                                               +
   |                                                               |
   +                                                               +
   |                                                               |
   +                                                               +
   |                        IPv6 Prefix/Len                        |
   +               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |               |                    more ...                   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

5.3.7.  MPLS IPv4 Announce / Withdraw

Bush & Patel           Expires September 14, 2018              [Page 14]
Internet-Draft          Link State Over Ethernet              March 2018

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |        PDU Sequence No        |    Frame No   |     Flags     |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |            Checksum           |             Length            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   Type = 13   |                Sequence Number                |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |               |         AFI/SAFI Count        | Add/Drop/Prim |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                 Label                 | Exp |S|      TTL      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                        IPv4 Prefix/Len                        |
   +               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |               |                    more ...                   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

5.3.8.  MPLS IPv6 Announce / Withdraw

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |        PDU Sequence No        |    Frame No   |     Flags     |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |            Checksum           |             Length            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   Type = 14   |                Sequence Number                |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |               |         AFI/SAFI Count        | Add/Drop/Prim |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                 Label                 | Exp |S|      TTL      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   +                                                               +
   |                                                               |
   +                                                               +
   |                                                               |
   +                                                               +
   |                        IPv6 Prefix/Len                        |
   +               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |               |                    more ...                   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Bush & Patel           Expires September 14, 2018              [Page 15]
Internet-Draft          Link State Over Ethernet              March 2018

6.  Layer 2.5 and 3 Liveness

   Now IP/Label liveness may be tested.

   Assume one or more AFI/SAFI addresses will be used to ping, BFD, or
   whatever the operator configures.

7.  The North/South Protocol

   Thus far, we have a one-hop point-to-point link discovery protocol.

   We know what ASNs and AFI/SAFIs are on each Link Interface.

   At the Ethernet layer we did not want to do topology discovery and
   Dijkstra a la IS-IS.

   So the link ASNs, AFI/SAFIs, and state changes are pushed North to
   BGP-SPF which discovers the topology, runs Dijkstra, and builds the
   routing database.

   We assume there is a shim to convert and buffer the ether layer data
   to [RFC7752] BGP-like PDUs which can be digested by BGP-SPF.

   We assume a reliable intra-device transport, so no ACKs are needed.

   We assume a PDU capable of 64k.

   The protocol is [re]started by a request from the 7752 topology Shim
   Layer.

   The Ether Layer then sends the full topology, its full link neighbor
   state, North.

   The Ether layer sends incremental updates as links and/or addressing
   change.

7.1.  Topology Request for Full State

   The [RFC7752] shim on a device requests a full state dump from the
   Ethernet layer on the device

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |    Type = 0   |      Flag     |          Length = 4           |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Bush & Patel           Expires September 14, 2018              [Page 16]
Internet-Draft          Link State Over Ethernet              March 2018

7.2.  PDU from Link Layer to Shim

   The Northbound PDU has a frame independent of the peer ASNs and links

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |    Type = 1   |      Flag     |             Length            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                        Sequence Number                        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   Link Count  |           Multiple Link/ASN sub-PDUs          |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   There are multiple sub-PDUs for all the learned ASNs and all the AFI/
   SAFIs for each ASN learned.

   The fields of the header PDU are as follows:

   Flag:  An integer:

         0 - This is the start of a Full State transfer
         1 - Continuation PDU
         2 - Last PDU of transfer
         3 - This is the start of a Update for a state change
         4-255 - Reserved

   Link Count:  Number of Link/ASN sub-PDUs to follow

   Multiple Link/ASN LSAs:  see following

7.3.  Link/ASN sub-PDU

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                             My ASN                            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                           Their ASN                           |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |     Count     | AFI/SAFI Type | Add/Drop/Prim |               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               +
   |                    Single AFI/SAFI of Type                    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | AFI/SAFI Type | Add/Drop/Prim |                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               +-+-+-+-+-+-+-+-+
   |            Single AFI/SAFI of Type            |    more ...   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Bush & Patel           Expires September 14, 2018              [Page 17]
Internet-Draft          Link State Over Ethernet              March 2018

   The fields in the AFI/SAFI are as follows:

   Count:  Number of AFI/SAFIs in this sub-PDU

   AFI/SAFI Type: An integer

         11 - IPv4
         12 - IPv6
         13 - MPLSv4
         14 - MPLSv6
         ...

   Add/Drop/Prim (bits)

         0 - Announce(1) / Withdraw(0)
         1 - Primary
         2-7 - Reserved

8.  Security Considerations

   The protocol as is MUST NOT be used outside a datacenter environment
   due to lack of authentication and authorisation.  These will be
   worked on in a later effort, likely using credentials configured
   using ZTP.

   Many MDC operators have a strange belief that physical walls and
   firewalls provide sufficient security.  This is not credible.  These
   protocols need to be examined for exposure and attack surface.

   On the wire Ethernet is assumed to be secure, though it could be
   tapped and data modified by an in-house attacker.

   Malicious nodes/devices could mis-announce addressing, form malicious
   associations, etc.

9.  IANA Considerations

   This document has no IANA Considerations.

   This document does need a new EtherType.

10.  Acknowledgments

   The authors thank Cristel Pelsser for multiple reviews, Martijn
   Schmidt for his contribution, Rob Austein for reviews and checksum
   code, Russ Housley for checksum discussion and sBox, and Steve
   Bellovin for more checksum discussion.

Bush & Patel           Expires September 14, 2018              [Page 18]
Internet-Draft          Link State Over Ethernet              March 2018

11.  Normative References

   [I-D.keyupate-idr-bgp-spf]
              Patel, K., Lindem, A., Zandi, S., and G. Velde, "Shortest
              Path Routing Extensions for BGP Protocol", draft-keyupate-
              idr-bgp-spf-04 (work in progress), January 2018.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <http://www.rfc-editor.org/info/rfc2119>.

   [RFC7752]  Gredler, H., Ed., Medved, J., Previdi, S., Farrel, A., and
              S. Ray, "North-Bound Distribution of Link-State and
              Traffic Engineering (TE) Information Using BGP", RFC 7752,
              DOI 10.17487/RFC7752, March 2016,
              <http://www.rfc-editor.org/info/rfc7752>.

Authors' Addresses

   Randy Bush
   Arrcus & IIJ
   5147 Crystal Springs
   Bainbridge Island, WA  98110
   United States of America

   Email: randy@psg.com

   Keyur Patel
   Arrcus
   2077 Gateway Place, Suite #250
   San Jose, CA  95119
   United States of America

   Email: keyur@arrcus.com

Bush & Patel           Expires September 14, 2018              [Page 19]