Network Working Group R. Bush
Internet-Draft Arrcus & IIJ
Intended status: Standards Track K. Patel
Expires: September 14, 2018 Arrcus
March 13, 2018
Link State Over Ethernet
draft-ymbk-lsvr-lsoe-00
Abstract
Used in a Massive Data Center (MDC), BGP-LS and BGP-SPF need link
neighbor discovery, liveness, and addressability data. Link State
Over Ethernet protocols provide link discovery, exchange AFI/SAFIs,
and discover addresses over raw Ethernet. These data are pushed
directly to BGP-LS/SPF, obviating the need for centralized controller
architectures. This protocol is more widely applicable, and has been
designed to support a wide range of routing and similar protocols
which need link discovery and characterisation.
Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" are to
be interpreted as described in RFC 2119 [RFC2119] only when they
appear in all upper case. They may also appear in lower or mixed
case as English words, without normative meaning.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on September 14, 2018.
Bush & Patel Expires September 14, 2018 [Page 1]
Internet-Draft Link State Over Ethernet March 2018
Copyright Notice
Copyright (c) 2018 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3
3. Background . . . . . . . . . . . . . . . . . . . . . . . . . 4
4. Top Level Overview . . . . . . . . . . . . . . . . . . . . . 4
5. Ethernet to Ethernet Protocols . . . . . . . . . . . . . . . 5
5.1. Inter-Link Ether Protocol Overview . . . . . . . . . . . 5
5.2. PDUs and Frames . . . . . . . . . . . . . . . . . . . . . 7
5.2.1. Frame TLV . . . . . . . . . . . . . . . . . . . . . . 7
5.2.2. Link KeepAlive / Hello . . . . . . . . . . . . . . . 10
5.2.3. Capability Exchange . . . . . . . . . . . . . . . . . 10
5.2.4. Timer Negotiation . . . . . . . . . . . . . . . . . . 11
5.3. The AFI/SAFI Exchanges . . . . . . . . . . . . . . . . . 11
5.3.1. AFI/SAFI Capability Exchange . . . . . . . . . . . . 11
5.3.2. The AFI/SAFI PDU Skeleton . . . . . . . . . . . . . . 12
5.3.3. AFI/SAFI ACK . . . . . . . . . . . . . . . . . . . . 13
5.3.4. Add/Drop/Prim . . . . . . . . . . . . . . . . . . . . 13
5.3.5. IPv4 Announce / Withdraw . . . . . . . . . . . . . . 13
5.3.6. IPv6 Announce / Withdraw . . . . . . . . . . . . . . 14
5.3.7. MPLS IPv4 Announce / Withdraw . . . . . . . . . . . . 14
5.3.8. MPLS IPv6 Announce / Withdraw . . . . . . . . . . . . 15
6. Layer 2.5 and 3 Liveness . . . . . . . . . . . . . . . . . . 16
7. The North/South Protocol . . . . . . . . . . . . . . . . . . 16
7.1. Topology Request for Full State . . . . . . . . . . . . . 16
7.2. PDU from Link Layer to Shim . . . . . . . . . . . . . . . 17
7.3. Link/ASN sub-PDU . . . . . . . . . . . . . . . . . . . . 17
8. Security Considerations . . . . . . . . . . . . . . . . . . . 18
9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 18
10. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 18
11. Normative References . . . . . . . . . . . . . . . . . . . . 19
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 19
Bush & Patel Expires September 14, 2018 [Page 2]
Internet-Draft Link State Over Ethernet March 2018
1. Introduction
The Massive Data Center (MDC) environment presents unusual problems
of scale, e.g. O(10,000) switches, while its homogeneity presents
opportunities for simple approaches. Approaches such as Jupiter
Rising use a central controller to deal with scaling, while BGP-SPF
[I-D.keyupate-idr-bgp-spf] provides massive scale out without
centralization using a tried and tested scalable distributed control
plane, offering a scalable routing solution in Clos and similar
environments. But it needs link state and addressing data from the
network to build the routing topology. LLDP has scaling issues, e.g.
in extending a PDU beyond 1,500 bytes.
Link State Over Ethernet (LSOE) provides brutally simple mechanisms
for devices to
o Discover each other's MACs,
o Run MAC keep-alives for liveness assurance,
o Discover each other's ASNs,
o Negotiate mutually supported AFI/SAFIs,
o Discover and maintain link IP/MPLS addresses,
o Enable layer three link liveness such as BFD, and finally
o Push these data up to BGP-SPF which computes the topology and
builds routing and forwarding tables.
This protocol is more widely applicable than BGP-SPF, and has been
designed to support a wide range of routing and similar protocols
which need link discovery and characterisation.
2. Terminology
Even though it concentrates on the Ethernet layer, this document
relies heavily on routing terminology. The following are some
possibly confusing terms:
AFI/SAFI: Address Family Indicator and Subsequent Address Family
Indicator. I.e. classes of addresses such as IPv4, IPv6,
...
ASN: Autonomous System Number, a BGP identifier for an
originator of routing, particularly BGP, announcements.
BGP-SPF A hybrid protocol using BGP transport but Dijkstra SPF
decision process. See [I-D.keyupate-idr-bgp-spf].
Bush & Patel Expires September 14, 2018 [Page 3]
Internet-Draft Link State Over Ethernet March 2018
Clos: A hierarchic switch topology commonly used in data
centers.
Frame The payload of an Ethernet packet.
MAC: Medium Access Control, essentially an Ethernet address,
six octets.
MDC: Massive Data Center, O(1,000) TORs or more.
PDU: Protocol Data Unit, essentially an application layer
message.
SPF: Shortest Path First, an algorithm for finding the shortest
paths between nodes in a graph.
TOR: Top Of Rack switch, aggregates the servers in a rack and
connects to the Clos spine.
ZTP: Zero Touch Provisioning gives devices initial addresses,
credentials, etc. on boot/restart.
3. Background
LSOE assumes a Clos-like topology, though the acyclic constraint is
not necessary.
While LSOE is designed for the MDC, there are no inherent reasons it
could not run on a WAN; though it is not clear that this would be
useful. The authentication and authorisation needed to run safely on
the WAN are not (yet) included in this protocol.
LLDP is not suitable because one can not extend a PDU beyond 1500
bytes without hitting an IPR barrier. It is also complex.
UDP is unsuitable as it would require prior knowledge of IP level
addressing, one of the key purposes of this discovery protocol.
LSOE assumes a new IEEE assigned EtherType (TBD).
4. Top Level Overview
o MAC Link State is exchanged over Ethernet
o AFI/SAFI data are exchanged and IP-Level Liveness Checks done
o BGP-SPF uses the data to discover and build the topology database
Bush & Patel Expires September 14, 2018 [Page 4]
Internet-Draft Link State Over Ethernet March 2018
+-------------------+ +-------------------+ +-------------------+
| Device | | Device | | Device |
| | | | | |
|+-----------------+| |+-----------------+| |+-----------------+|
|| || || || || ||
|| BGP-SPF <+---+> BGP-SPF <+---+> BGP-SPF ||
|| || || || || ||
|+--------^--------+| |+--------^--------+| |+--------^--------+|
| | | | | | | | |
| | | | | | | | |
|+--------+--------+| |+--------+--------+| |+--------+--------+|
|| Liveness || || Liveness || || Liveness ||
|| AFI/SAFIs || || AFI/SAFIs || || AFI/SAFIs ||
|| Addresses || || Addresses || || Addresses ||
|+--------^--------+| |+--------^--------+| |+--------^--------+|
| | | | | | | | |
| | | | | | | | |
|+--------v--------+| |+--------v--------+| |+--------v--------+|
|| || || || || ||
|| Ether PDUs <+---+> Ether PDUs <+---+> Ether PDUs ||
|| || || || || ||
|+-----------------+| |+-----------------+| |+-----------------+|
+-------------------+ +-------------------+ +-------------------+
There are two sets of protocols:
o Ethernet to Ethernet protocols are used to exchange layer 2 data,
i.e. MACs, and layer 2.5 and 3 data, i.e. ASNs, AFI/SAFIs, and
interface addresses.
o A Link Layer to BGP protocol pushes these data up the stack to
BGP-SPF, converting to the BGP-LS BGP-like data format.
o And, of course, the BGP layer crosses all the devices, though it
is not part of these LSOE protocols.
5. Ethernet to Ethernet Protocols
The basic Ethernet Framed protocols
5.1. Inter-Link Ether Protocol Overview
Bush & Patel Expires September 14, 2018 [Page 5]
Internet-Draft Link State Over Ethernet March 2018
| Hello / KeepAlive (type=0) |
|--------------------------------------->|
| | MACs and Liveness
| Hello / KeepAlive (type=0) | Mandatory
|<---------------------------------------|
| |
| |
| |
| Timers (type=1, cap 1) |
|--------------------------------------->| Timers (type 1, cap 1)
| | Optional
| Timers (type=1, cap 1) | Renegotiate at Any Time
|<---------------------------------------|
| |
| |
| |
| Link AFI/SAFIs (type=1, cap 4) |
|--------------------------------------->| AFI/SAFI Support (cap 4)
|<---------------------------------------| Mandatory
| Link AFI/SAFIs (type=1, cap 4) | Renegotiate at Any Time
| |
| |
| |
| Interface MPLS Labels (type=10) |
|--------------------------------------->| Interface Labels
| | Optional
| Interface MPLS Labels (type=10) | Renegotiate at Any Time
|<---------------------------------------|
| |
| |
| |
| Interface IPv4 Addresses (type=14) |
|--------------------------------------->| Interface IPv4 Addresses
| | Optional
| Interface IPv4 Addresses (type=14) | Renegotiate at Any Time
|<---------------------------------------|
| |
| |
| |
| Interface IPv6 Addresses (type=16) |
|--------------------------------------->| Interface IPv6 Addresses
| | Optional
| Interface IPv6 Addresses (type=16) | Renegotiate at Any Time
|<---------------------------------------|
Bush & Patel Expires September 14, 2018 [Page 6]
Internet-Draft Link State Over Ethernet March 2018
5.2. PDUs and Frames
This is all about inter-device Link State.
A PDU is one or more Ethernet Frames.
A Frame has a PDU Sequence Number and a Frame Number to allow
assembly of out order frames.
Because BGP-SPF and Data Plane payloads are assumed to be IP over the
same Ethernet, one worries about congestion.
5.2.1. Frame TLV
The basic Ethernet PDU is a typical TLV (Type Length Value) PDU,
except it's really LTV for the sake of alignment :)
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| PDU Sequence No | Frame No | Flags |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Checksum | Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type |
+-+-+-+-+-+-+-+-+
The fields of the basic Ethernet PDU are as follows:
PDU Sequence No: Semi-unique identifier of a TLV PDU (e.g. the low
order 16 bits of UNIX time)
Frame No: 0..255 Frame Sequence Number Within a multi-frame PDU
Flags: A bit field
0 - Sender has been restarted
1 - One of a multi-Frame sequence
2 - last of a multi-Frame sequence
3-7 - Reserved
Checksum: One's complement over Frame, detect bit flips
Length: Total Bytes in PDU including all frames and fields
Type: An integer
0 - Hello / KeepAlive
Bush & Patel Expires September 14, 2018 [Page 7]
Internet-Draft Link State Over Ethernet March 2018
1 - Capability
2-9 - Reserved
10 - AFI/SAFI ACK
11 - IPv4 Announce / Withdraw
12 - IPv6 Announce / Withdraw
13 - MPLS IPv4 Announce / Withdraw
14 - MPLS IPv6 Announce / Withdraw
15-255 Reserved
5.2.1.1. The Checksum
There is a reason conservative folk use a checksum in UDP. And when
the operators stretch to jumbo frames ...
One's complement is a bit silly, though trivial to implement and
might be sufficient.
Bush & Patel Expires September 14, 2018 [Page 8]
Internet-Draft Link State Over Ethernet March 2018
Sum up either 16-bit shorts in a 32-bit int, or 32-bit ints in a
64-bit long, then take the high-order section, shift it right,
rotate, add it in, repeat until zero. -- smb off the top of his head
/* The F table from Skipjack, and it would work for the S-Box.
There are other S-Box sources as well. -- Russ Housley */
const BYTE sbox[256] = {
0xa3,0xd7,0x09,0x83,0xf8,0x48,0xf6,0xf4,0xb3,0x21,0x15,0x78,
0x99,0xb1,0xaf,0xf9,0xe7,0x2d,0x4d,0x8a,0xce,0x4c,0xca,0x2e,
0x52,0x95,0xd9,0x1e,0x4e,0x38,0x44,0x28,0x0a,0xdf,0x02,0xa0,
0x17,0xf1,0x60,0x68,0x12,0xb7,0x7a,0xc3,0xe9,0xfa,0x3d,0x53,
0x96,0x84,0x6b,0xba,0xf2,0x63,0x9a,0x19,0x7c,0xae,0xe5,0xf5,
0xf7,0x16,0x6a,0xa2,0x39,0xb6,0x7b,0x0f,0xc1,0x93,0x81,0x1b,
0xee,0xb4,0x1a,0xea,0xd0,0x91,0x2f,0xb8,0x55,0xb9,0xda,0x85,
0x3f,0x41,0xbf,0xe0,0x5a,0x58,0x80,0x5f,0x66,0x0b,0xd8,0x90,
0x35,0xd5,0xc0,0xa7,0x33,0x06,0x65,0x69,0x45,0x00,0x94,0x56,
0x6d,0x98,0x9b,0x76,0x97,0xfc,0xb2,0xc2,0xb0,0xfe,0xdb,0x20,
0xe1,0xeb,0xd6,0xe4,0xdd,0x47,0x4a,0x1d,0x42,0xed,0x9e,0x6e,
0x49,0x3c,0xcd,0x43,0x27,0xd2,0x07,0xd4,0xde,0xc7,0x67,0x18,
0x89,0xcb,0x30,0x1f,0x8d,0xc6,0x8f,0xaa,0xc8,0x74,0xdc,0xc9,
0x5d,0x5c,0x31,0xa4,0x70,0x88,0x61,0x2c,0x9f,0x0d,0x2b,0x87,
0x50,0x82,0x54,0x64,0x26,0x7d,0x03,0x40,0x34,0x4b,0x1c,0x73,
0xd1,0xc4,0xfd,0x3b,0xcc,0xfb,0x7f,0xab,0xe6,0x3e,0x5b,0xa5,
0xad,0x04,0x23,0x9c,0x14,0x51,0x22,0xf0,0x29,0x79,0x71,0x7e,
0xff,0x8c,0x0e,0xe2,0x0c,0xef,0xbc,0x72,0x75,0x6f,0x37,0xa1,
0xec,0xd3,0x8e,0x62,0x8b,0x86,0x10,0xe8,0x08,0x77,0x11,0xbe,
0x92,0x4f,0x24,0xc5,0x32,0x36,0x9d,0xcf,0xf3,0xa6,0xbb,0xac,
0x5e,0x6c,0xa9,0x13,0x57,0x25,0xb5,0xe3,0xbd,0xa8,0x3a,0x01,
0x05,0x59,0x2a,0x46
};
/* example C code, constant time even, thanks Rob Austein */
uint16_t sbox_checksum(const *b, const size_t n)
{
uint32_t sum[2] = {0, 0};
for (int i = 0; i < n; i++)
sum[i & 1] += sbox[b[i]];
uint32_t result = (sum[0] << 8) + sum[1];
result = (result >> 16) + (result & 0xFFFF);
result = (result >> 16) + (result & 0xFFFF);
return (uint16_t) result;
}
Bush & Patel Expires September 14, 2018 [Page 9]
Internet-Draft Link State Over Ethernet March 2018
5.2.2. Link KeepAlive / Hello
The Hello and KeepAlive PDUs are one and the same.
Each device learns the other's MAC from its HELLO whining. I.e., all
devices on a wire/interface know each others MACs and learn each
other's ASNs.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| PDU Sequence No | Frame No | Flags |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Checksum | Length = 17 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type = 0 | MyASN |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | YourASN (or Zero) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+-+-+-+-+-+-+-+-+
Once two devices know each other's MACs, Ethernet keep-alives may be
started to ensure layer two liveness. The timing and acceptable drop
of the keep-alives may be set with the Timer Negotiation capability
exchange.
5.2.3. Capability Exchange
Peers on the Ethernet exchange capabilities, such as timers, AFI/
SAFIs supported, etc. There is a simple capability exchange.
By convention, the device with the lowest MAC sends first.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| PDU Sequence No | Frame No | Flags |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Checksum | Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type = 1 | RADflag | Capability |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The RADflag is an integer field which signals the capability
negotiation.
bit 0 - Request
Bush & Patel Expires September 14, 2018 [Page 10]
Internet-Draft Link State Over Ethernet March 2018
bit 1 - Accept
bit 2 - Deny
bits 3-255 - Reserved
5.2.4. Timer Negotiation
Different operational scenarios may call for layer two and layer
three timers which differ from the defaults. So there is a
capability negotiation to modify these timers.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| PDU Sequence No | Frame No | Flags |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Checksum | Length = 16 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type = 1 | RADflag | Capability = 1 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Frequency | AllowMissCt | A/S Wait |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The meaning of the timer fields are as follows:
Frequency: Seconds/10 between KeepAlives (Default is 600)
AllowMissCt: Number of missed KeepAlives before declared down
A/S Wait AFI/SAFI ACK Timeout in Sec/10 (default 10)
5.3. The AFI/SAFI Exchanges
The devices know each other's MACs, have means to ensure link state,
and know each other's ASNs. Now they can negotiate which AFI/SAFIs
are supported, and announce their interface addresses (and labels).
5.3.1. AFI/SAFI Capability Exchange
First they negotiate what AFI/SAFIs are supported on the link.
As before, the lowest MAC initiates the negotiation.
Bush & Patel Expires September 14, 2018 [Page 11]
Internet-Draft Link State Over Ethernet March 2018
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| PDU Sequence No | Frame No | Flags |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Checksum | Length = 13 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type = 1 | RADflag | Capability = 4 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| AFI/SAFIs |
+-+-+-+-+-+-+-+-+
The AFI/SAFIs currently defined are as follows:
10 - IPv4
11 - IPv6
12 - MPLS IPv4
13 - MPLS IPv6
... - other tunnels (e.g. GRE)
5.3.2. The AFI/SAFI PDU Skeleton
Now both sides can exchange their actual interfaces addresses for all
the negotiated AFI/SAFIs.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| PDU Sequence No | Frame No | Flags |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Checksum | Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type = 42 | Sequence Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | AFI/SAFI Count | sub-PDUs... |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The AFI/SAFI Exchange is over an unreliable transport so there are
Sequence Numbers and ACKs.
The Sequence Number is a point-to-point link announcement counter,
incremented for each exchange in each direction on the link.
The Receiver will ACK it with a Type=10, see following PDU.
If the Sender does not receive an ACK in one second, they retransmit.
Other delay timers may be negotiated using the Timing Capability.
Bush & Patel Expires September 14, 2018 [Page 12]
Internet-Draft Link State Over Ethernet March 2018
If a sender has multiple links on the same interface, separate
counters must be kept for each.
5.3.3. AFI/SAFI ACK
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| PDU Sequence No | Frame No | Flags |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Checksum | Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type = 10 | Sequence Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+-+-+-+-+-+-+-+-+
5.3.4. Add/Drop/Prim
Each AFI/SAFI interface address may actually be announced, or
withdrawn.
An interface may have multiple AFI/SAFIs.
For each AFI/SAFI on an interface there might be multiple addresses.
One address per AFI/SAFI SHOULD be marked as primary.
0 1 2 3
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Add/Drop | Primary | Reserved |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
5.3.5. IPv4 Announce / Withdraw
Bush & Patel Expires September 14, 2018 [Page 13]
Internet-Draft Link State Over Ethernet March 2018
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| PDU Sequence No | Frame No | Flags |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Checksum | Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type = 11 | Sequence Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | AFI/SAFI Count | Add/Drop/Prim |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| IPv4 Prefix/Len |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | Add/Drop/Prim | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+
| IPv4 Prefix/Len | more ... |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
5.3.6. IPv6 Announce / Withdraw
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| PDU Sequence No | Frame No | Flags |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Checksum | Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type = 12 | Sequence Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | AFI/SAFI Count | Add/Drop/Prim |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+ +
| |
+ +
| |
+ +
| IPv6 Prefix/Len |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | more ... |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
5.3.7. MPLS IPv4 Announce / Withdraw
Bush & Patel Expires September 14, 2018 [Page 14]
Internet-Draft Link State Over Ethernet March 2018
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| PDU Sequence No | Frame No | Flags |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Checksum | Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type = 13 | Sequence Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | AFI/SAFI Count | Add/Drop/Prim |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Label | Exp |S| TTL |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| IPv4 Prefix/Len |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | more ... |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
5.3.8. MPLS IPv6 Announce / Withdraw
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| PDU Sequence No | Frame No | Flags |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Checksum | Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type = 14 | Sequence Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | AFI/SAFI Count | Add/Drop/Prim |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Label | Exp |S| TTL |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+ +
| |
+ +
| |
+ +
| IPv6 Prefix/Len |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | more ... |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Bush & Patel Expires September 14, 2018 [Page 15]
Internet-Draft Link State Over Ethernet March 2018
6. Layer 2.5 and 3 Liveness
Now IP/Label liveness may be tested.
Assume one or more AFI/SAFI addresses will be used to ping, BFD, or
whatever the operator configures.
7. The North/South Protocol
Thus far, we have a one-hop point-to-point link discovery protocol.
We know what ASNs and AFI/SAFIs are on each Link Interface.
At the Ethernet layer we did not want to do topology discovery and
Dijkstra a la IS-IS.
So the link ASNs, AFI/SAFIs, and state changes are pushed North to
BGP-SPF which discovers the topology, runs Dijkstra, and builds the
routing database.
We assume there is a shim to convert and buffer the ether layer data
to [RFC7752] BGP-like PDUs which can be digested by BGP-SPF.
We assume a reliable intra-device transport, so no ACKs are needed.
We assume a PDU capable of 64k.
The protocol is [re]started by a request from the 7752 topology Shim
Layer.
The Ether Layer then sends the full topology, its full link neighbor
state, North.
The Ether layer sends incremental updates as links and/or addressing
change.
7.1. Topology Request for Full State
The [RFC7752] shim on a device requests a full state dump from the
Ethernet layer on the device
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type = 0 | Flag | Length = 4 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Bush & Patel Expires September 14, 2018 [Page 16]
Internet-Draft Link State Over Ethernet March 2018
7.2. PDU from Link Layer to Shim
The Northbound PDU has a frame independent of the peer ASNs and links
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type = 1 | Flag | Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Sequence Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Link Count | Multiple Link/ASN sub-PDUs |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
There are multiple sub-PDUs for all the learned ASNs and all the AFI/
SAFIs for each ASN learned.
The fields of the header PDU are as follows:
Flag: An integer:
0 - This is the start of a Full State transfer
1 - Continuation PDU
2 - Last PDU of transfer
3 - This is the start of a Update for a state change
4-255 - Reserved
Link Count: Number of Link/ASN sub-PDUs to follow
Multiple Link/ASN LSAs: see following
7.3. Link/ASN sub-PDU
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| My ASN |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Their ASN |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Count | AFI/SAFI Type | Add/Drop/Prim | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +
| Single AFI/SAFI of Type |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| AFI/SAFI Type | Add/Drop/Prim | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+
| Single AFI/SAFI of Type | more ... |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Bush & Patel Expires September 14, 2018 [Page 17]
Internet-Draft Link State Over Ethernet March 2018
The fields in the AFI/SAFI are as follows:
Count: Number of AFI/SAFIs in this sub-PDU
AFI/SAFI Type: An integer
11 - IPv4
12 - IPv6
13 - MPLSv4
14 - MPLSv6
...
Add/Drop/Prim (bits)
0 - Announce(1) / Withdraw(0)
1 - Primary
2-7 - Reserved
8. Security Considerations
The protocol as is MUST NOT be used outside a datacenter environment
due to lack of authentication and authorisation. These will be
worked on in a later effort, likely using credentials configured
using ZTP.
Many MDC operators have a strange belief that physical walls and
firewalls provide sufficient security. This is not credible. These
protocols need to be examined for exposure and attack surface.
On the wire Ethernet is assumed to be secure, though it could be
tapped and data modified by an in-house attacker.
Malicious nodes/devices could mis-announce addressing, form malicious
associations, etc.
9. IANA Considerations
This document has no IANA Considerations.
This document does need a new EtherType.
10. Acknowledgments
The authors thank Cristel Pelsser for multiple reviews, Martijn
Schmidt for his contribution, Rob Austein for reviews and checksum
code, Russ Housley for checksum discussion and sBox, and Steve
Bellovin for more checksum discussion.
Bush & Patel Expires September 14, 2018 [Page 18]
Internet-Draft Link State Over Ethernet March 2018
11. Normative References
[I-D.keyupate-idr-bgp-spf]
Patel, K., Lindem, A., Zandi, S., and G. Velde, "Shortest
Path Routing Extensions for BGP Protocol", draft-keyupate-
idr-bgp-spf-04 (work in progress), January 2018.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997,
<http://www.rfc-editor.org/info/rfc2119>.
[RFC7752] Gredler, H., Ed., Medved, J., Previdi, S., Farrel, A., and
S. Ray, "North-Bound Distribution of Link-State and
Traffic Engineering (TE) Information Using BGP", RFC 7752,
DOI 10.17487/RFC7752, March 2016,
<http://www.rfc-editor.org/info/rfc7752>.
Authors' Addresses
Randy Bush
Arrcus & IIJ
5147 Crystal Springs
Bainbridge Island, WA 98110
United States of America
Email: randy@psg.com
Keyur Patel
Arrcus
2077 Gateway Place, Suite #250
San Jose, CA 95119
United States of America
Email: keyur@arrcus.com
Bush & Patel Expires September 14, 2018 [Page 19]