Aggregating BGP routes in Massive Scale Data Centers
draft-heitz-idr-msdc-bgp-aggregation-00

Document Type Active Internet-Draft (individual)
Last updated 2018-10-22
Stream (None)
Intended RFC status (None)
Formats plain text xml pdf html bibtex
Stream Stream state (No stream defined)
Consensus Boilerplate Unknown
RFC Editor Note (None)
IESG IESG state I-D Exists
Telechat date
Responsible AD (None)
Send notices to (None)
IDR                                                             J. Heitz
Internet-Draft                                                    D. Rao
Intended status: Standards Track                                   Cisco
Expires: April 25, 2019                                 October 22, 2018

          Aggregating BGP routes in Massive Scale Data Centers
                draft-heitz-idr-msdc-bgp-aggregation-00

Abstract

   A design for a fabric of switches to connect up to one million
   servers in a data center is described.  At that scale, it is
   impractical for every switch to maintain knowledge about every other
   switch and every other link in the fabric.  Aggregation of routes is
   an excellent way to scale such a fabric.  However, aggregation
   presents some problems under link failures or switch failures.  This
   design solves those problems.

Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in [RFC2119].

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on April 25, 2019.

Copyright Notice

   Copyright (c) 2018 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

Heitz & Rao              Expires April 25, 2019                 [Page 1]
Internet-Draft            MSDC BGP Aggregation              October 2018

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (https://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
   2.  Solution Overview . . . . . . . . . . . . . . . . . . . . . .   3
   3.  Problems with negative routes . . . . . . . . . . . . . . . .   4
   4.  Use of a negative route in BGP  . . . . . . . . . . . . . . .   4
   5.  Implementation Notes to Reduce CPU Time Consumption . . . . .   5
   6.  Smooth Startup and Avoidance of Too Many Negative Routes  . .   5
   7.  Avoidance of Transients . . . . . . . . . . . . . . . . . . .   6
   8.  Configuration . . . . . . . . . . . . . . . . . . . . . . . .   7
   9.  South Triggered Automatic Disaggregation (STAD) . . . . . . .   7
   10. Configuration for STAD  . . . . . . . . . . . . . . . . . . .   8
   11. Security Considerations . . . . . . . . . . . . . . . . . . .   9
   12. IANA Considerations . . . . . . . . . . . . . . . . . . . . .   9
   13. Acknowldgements . . . . . . . . . . . . . . . . . . . . . . .   9
   14. References  . . . . . . . . . . . . . . . . . . . . . . . . .   9
     14.1.  Normative References . . . . . . . . . . . . . . . . . .   9
     14.2.  Informative References . . . . . . . . . . . . . . . . .   9
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .   9

1.  Introduction

   [RFC7938] defines a massive scale data center as one that contains
   over one hundred thousand servers.  It describes the advantages of
   using BGP as a routing protocol in a Clos switching fabric that
   connects these servers.  It laments the need to announce all routes
   individually, because of the problems associated with route
   aggergation.  A fabric design that scales to one million servers is
   considered enough for the forseeable future and is the design goal of
   this document.  Of course, the design should also work for smaller
   fabrics.

   A switch fabric to connect one million servers will consist of
   between 35000 and 130000 switches and 1.5 million to 8 million links,
   depending on how redundantly the servers are connected to the fabric
   and the level of oversubscription in the fabric.  A switch that needs
   to store, send and operate on hundreds of routes is clearly cheaper
   than one that needs to store, send and operate on millions of links.

Heitz & Rao              Expires April 25, 2019                 [Page 2]
Show full document text