Internet Draft T. Anderson
Expiration: April 2002 Intel Labs
File: draft-anderson-forces-model-00.txt November 2001
Working Group: ForCES
ForCES Architectural Framework and FE Functional Model
draft-anderson-forces-framework-00.txt
Status of this Memo
This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026. Internet-Drafts are
working documents of the Internet Engineering Task Force (IETF),
its areas, and its working groups. Note that other groups may
also distribute working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other
documents at any time. It is inappropriate to use Internet-Drafts
as reference material or to cite them other than as ``work in
progress.''
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
Conventions used in this document
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in
this document are to be interpreted as described in [RFC-2119].
1. Abstract
This document defines an architecture for ForCES network elements
and a functional model for ForCES forwarding elements. This model
is used to describe the capabilities of ForCES forwarding elements
within the context of the ForCES protocol. The architecture and
forwarding element model defined herein is intended to satisfy the
requirements specified in the ForCES requirements draft [FORCES-
REQ].
Anderson [Page 1]
2. Definitions
Most of these definitions are copied from the ForCES requirements
document [FORCES-REQ].
Addressable Entity (AE) - A physical device that is directly
addressable given some interconnect technology. For example, on
Ethernet, an AE is a device to which we can communicate using an
Ethernet MAC address; on IP networks, it is a device to which we can
communicate using an IP address; and on a switch fabric, it is a
device to which we can communicate using a switch fabric port
number.
Physical Forwarding Element (PFE) - An AE that includes hardware
used to provide per-packet processing and handling. This hardware
may consist of (but is not limited to) network processors, ASIC's,
or general-purpose processors. For example, line cards in a
forwarding backplane are PFEs.
PFE Partition - A logical partition of a PFE consisting of some
subset of each of the resources (e.g., ports, memory, forwarding
table entries) available on the PFE. This concept is analogous to
that of the resources assigned to a virtual router [REQ-PART].
Physical Control Element (PCE) - An AE that includes hardware used
to provide control functionality. This hardware typically includes
a general-purpose processor.
PCE Partition - A logical partition of a PCE consisting of some
subset of each of the resources available on the PCE.
Forwarding Element (FE) - A logical entity that implements the
ForCES protocol. FEs use the underlying hardware to provide per-
packet processing and handling as directed by a CE via the ForCES
protocol. FEs may use the hardware from PFE partitions, whole PFEs,
or multiple PFEs.
Proxy FE - A name for a type of FE that cannot directly modify its
underlying hardware but instead manipulates that hardware using some
intermediate form of communication (e.g., a non-ForCES protocol or
DMA). A proxy FE will typically be used in the case where a PFE
cannot implement (e.g., due to the lack of a general purpose CPU)
the ForCES protocol directly.
Control Element (CE) - A logical entity that implements the ForCES
protocol and uses it to instruct one or more FEs as to how they
should process packets. CEs handle functionality such as the
execution of control and signaling protocols. CEs may use the
hardware of PCE partitions or whole PCEs. (The use of multiple PCEs
will usually be modeled as separate CEs.)
Anderson [Page 2]
Pre-association Phase - The period of time during which a FE does
not know which CE is to control it and vice versa.
Post-association Phase - The period of time during which a FE does
know which CE is to control it and vice versa.
ForCES Protocol - While there may be multiple protocols used within
a device supporting ForCES, the term "ForCES protocol" refers only
to the ForCES post-association phase protocol (see below).
ForCES Post-Association Phase Protocol - The protocol used for post-
association phase communication between CEs and FEs. This protocol
does not apply to CE-to-CE communication, FE-to-FE communication, or
to communication between FE and CE managers. The ForCES protocol is
a master-slave protocol in which FEs are slaves and CEs are masters.
FE Model - A model that describes the logical processing functions
of a FE.
FE Manager - A logical entity that operates only in the pre-
association phase and is responsible for determining to which CE(s)
a FE should communicate. This determination process is called CE
discovery and may involve the FE manager learning the capabilities
of available CEs. A FE manager may use anything from a static
configuration to a pre-association phase protocol (see below) to
determine which CE to use. Being a logical entity, a FE manager
might be physically combined with any of the other logical entities
mentioned in this section.
CE Manager - A logical entity that operates only in the pre-
association phase and is responsible for determining to which FE(s)
a CE should communicate. This determination process is called FE
discovery and may involve the CE manager learning the capabilities
of available FEs. A CE manager may use anything from a static
configuration to a pre-association phase protocol (see below) to
determine which FE to use. Being a logical entity, a CE manager
might be physically combined with any of the other logical entities
mentioned in this section.
Pre-association Phase Protocol - A protocol between FE managers and
CE managers that helps them determine which CEs or FEs to use. A
pre-association phase protocol may include a CE and/or FE capability
discovery mechanism. It is important to note that this capability
discovery process is wholly separate from (and does not replace)
that used within the ForCES protocol. However, the two capability
discovery mechanisms may utilize the same FE model (see Section 5).
Pre-association phase protocols are not discussed further in this
document.
ForCES Network Element (NE) - An entity composed of one or more CEs
and one or more FEs. To entities outside a NE, the NE represents a
single point of management. Similarly, a NE usually hides its
Anderson [Page 3]
internal organization from external entities. However, one
exception to this rule is that CEs and FEs may be directly managed
to transition them from the pre-association phase to the post-
association phase.
ForCES Protocol Element - A FE or CE.
High Touch Capability - This term will be used to apply to the
capabilities found in some forwarders to take action on the contents
or headers of a packet based on content other than what is found in
the IP header. Examples of these capabilities include NAT-PT,
firewall, and L7 content recognition.
Bootstrap CE - The first CE that a FE connects to in a ForCES NE.
CE set - One or more equivalently capable CEs designed to operate
concurrently (for load sharing) or in a 1+N failover mode (for
redundancy).
3. Introduction
[TBD]
4. Architecture
This section defines a ForCES architectural framework. This ForCES
framework consists primarily of ForCES NE's but also includes
several ancillary components. ForCES NE's appear to external
entities as monolithic pieces of network equipment, e.g., routers,
NAT's, firewalls, or load balancers. (See [FORCESREQ], Section 5,
Requirement 4.) Internally, however, ForCES NE's are composed of
several logical components. By defining logical components and
specifying the interactions between them, the ForCES architecture
allows these components to be physically separated. This physical
separation accrues several benefits to the ForCES architecture. For
example, separate components would allow vendors to specialize in
one component without having to become experts in all components.
Scalability is also provided by this architecture in that additional
forwarding or control capacity can be added to existing network
elements without the need for forklift upgrades. The components of
the ForCES architecture and their relationships are pictured in the
following diagram. For convenience, the interactions between
components are labeled by reference points Gp, Gc, Gf, Gr, Gl, and
Gi.
---------------------------------------
| ForCES Network Element |
| ------------------- |
| | CE Set 1 | |
| | | |
-------------- Gc | |-----------------| Gr ------------ |
| CE Manager |---------+-| Head | CE 2..N |----| CE Set 2 | |
Anderson [Page 4]
-------------- | | CE | | | | |
| | ------------------- ------------ |
| Gl | |\ ---------/ | |
| | Gp | \ / Gp | Gp |
| | | --/----------\ | |
-------------- Gf | -------------- -------------- |
| FE Manager |---------+-| FE | Gi | FE | |
-------------- \ | | |------| | |
\ Gf | -------------- -------------- |
----+------------------------/ |
---------------------------------------
4.1. Control Elements
This architecture permits multiple CEs to be present in a network
element. These CEs may be used for any combination of redundancy,
load sharing, or distributed control. Redundancy is the case where
one or more CEs are prepared to take over should an active CE fail.
Load sharing is the case where two or more CEs are concurrently
active and where any request that can be serviced by one of the CEs
can also be serviced by any of the other CEs. In both redundancy
and load sharing, the CEs involved are equivalently capable. The
only difference between these two cases is in terms of how many
active CEs there are. Distributed control is the case where two or
more CEs are concurrently active but where certain requests can only
be serviced by certain CEs.
To enable multiple CEs, control in a ForCES NE is handled by one or
more CE sets. Each CE set can specialize in handling a particular
subset of the control functions of a NE. For example, one CE set
may handle routing functions while another may handle firewall or
QoS functions. Each CE set is itself composed of multiple CEs. All
of the CEs in a CE set are equivalently capable, meaning that each
is capable of performing the same set of functions albeit with
possibly different performance. The remaining members of a CE set
may be used for load sharing or redundancy purposes. Communication
between members of a CE set or between CE sets is discussed in
Section 4.10. CEs are wholly responsible for coordinating amongst
themselves to provide redundancy, load sharing, or distributed
control, if desired.
CEs are concerned with controlling the layer-3 and above
capabilities of FEs. CEs are not concerned with controlling the
layer-2 and below communication aspects of the FE.
While the ForCES model allows for multiple CEs, the coordination of
those CEs is beyond the current scope of ForCES. In cases where an
implementations uses multiple CEs or CE sets, it is still required
that an implementation must maintain the invariant that a single NE
MUST NOT appear as multiple NEs even in the presence of link
failures between FEs and/or CEs.
Anderson [Page 5]
4.2. Forwarding Elements
FEs are responsible for per-packet processing and handling as
directed by its CEs. FEs have no initiative of their own. Instead,
FEs are slaves to their CEs and only do as they are told (Section
4.9). FEs may communicate with one or more CEs, either from the
same or different CE sets concurrently. However, FEs have no notion
of CE redundancy, load sharing, or distributed control. Instead,
FEs accept commands from any CE authorized to control them. This
architecture mandates that a coarse grain mapping of requests to CE
sets be possible but also allows finer grain mappings. For example,
at a minimum, a CE must be able to specify a single CE set to which
all requests generated by the FE should be sent. However, the
architecture also allows different CE sets to be mapped to different
types of requests if the FE is capable of differentiating between
request types.
This architecture permits multiple FEs to be present in a NE. Each
of these FEs may potentially have a different set of capabilities.
FEs express these capabilities using the ForCES FE model described
in Section 5. FEs are responsible for establishing and maintaining
layer-2 connectivity with other FEs or with entities external to the
NE. Thus, FEs are also responsible for any signaling required at
layer-2.
4.3. CE Managers
CE managers are responsible for determining which FEs a CE should
control. It is legitimate for CE managers to be hard-coded with the
knowledge of with which FEs its CEs should communicate. Likewise,
CE managers can communicate with any other entity or perform any
kind of computation to make that determination.
4.4. FE Managers
FE managers are responsible for determining to which CE any
particular FE should initially communicate. Like CE managers, no
restrictions are placed on how a FE manager decides to which CEs its
FEs should communicate. The FE manager can be hard-coded with this
information or communicate with any other entity to make that
determination.
4.5. Gl Reference Point
CE managers and FE managers may communicate with each other across
the Gl reference point in order to help them decide which CEs and
FEs should communicate with each other. Communication across the Gl
reference point is entirely optional in this architecture. No
requirements are placed on this reference point.
CE managers and FE managers may be operated by different entities.
The operator of the CE manager may not want to divulge, except to
Anderson [Page 6]
specified FE managers, any characteristics of the CEs it manages.
Similarly, the operator of the FE manager may not want to divulge FE
characteristics, except to authorized entities. As such, CE
managers and FE managers may need to authenticate one another.
Subsequent communication between CE managers and FE managers may
require other security functions such as privacy, non-repudiation,
freshness, and integrity.
Once the necessary security functions have been performed, the CE
and FE managers MAY communicate to determine which CEs and FEs
should communicate with each other. In this process, the CE and FE
managers will likely learn of the existence of available FEs and CEs
respectively. This process is called discovery and will likely
entail one or both managers learning the capabilities of the
discovered ForCES protocol elements.
4.6. Gf Reference Point
The Gf reference point is used to inform forwarding elements of the
decisions made by FE managers. Only authorized entities may
instruct a FE with respect to which CE should control it.
Therefore, authentication is a necessary between FE managers and
FEs. Privacy, integrity, and freshness are also required. Once the
appropriate security has been established, FE managers may instruct
FEs across this reference point to join a new NE or to disconnect
from an existing NE.
4.7. Gc Reference Point
The Gc reference point is used to inform control elements of the
decisions made by CE managers. Only authorized entities may
instruct a CE to control certain FEs. Privacy, integrity, and
freshness are also required across this reference point. Once
appropriate security has been established, the CE manager may
instruct CEs as to which FEs they should control and how they should
control them.
4.8. Gi Reference Point
Packets that enter the NE via one FE and leave the NE via a
different FE are transferred between FEs across the Gi reference
point. (See [FORCESREQ], Section 5, Requirement 3.)
4.9. Gp Reference Point
Based on the information acquired through CEs' control processing,
CEs will frequently need to manipulate the packet-forwarding
behaviors of their FE(s). This manipulation of the forwarding plane
is performed across the Gp ("p" meaning protocol) reference point.
In this architecture, the ForCES protocol is exclusively used for
all communication across the Gp reference point.
Anderson [Page 7]
4.10. Gr Reference Point
Varying degrees of synchronization are necessary to provide
redundancy, load sharing or distributed control. However, in all
cases, consistency protocols between CEs take place across the Gr
reference point and are out of the scope of this document.
Likewise, detecting the inability to synchronize due to a loss of
connectivity between CEs is out of the scope of this document.
It is not necessary to define any protocols across the Gr reference
point to enable simple control/forwarding separation (i.e., single
CE and multiple FEs). However, to make it possible to define Gr at
a later time, the concept of CE sets and the associated CE/FE
behavior should be included in the first versions of the ForCES
protocol. From the basic CE set building block concept, protocols
across the Gr reference point can be defined to provide the desired
effect.
5. FE Model
This section describes a model that can be used to express the
capabilities of a ForCES FE. (As we will see, this model can also
be used as the basis to control a FE's capabilities.) This model
satisfies the requirements set forth in ForCES requirements document
[FORCES-REQ] with respect to FE modeling. Our model is composed of
two level hierarchy of detail. The higher level of the hierarchy
expresses which logical data path elements exist in the FE and
describes how these elements are interconnected. We call these
logical data path elements "stages." The lower level of the
hierarchy expresses the capabilities of each stage that the FE
provides. In general, the lower level expresses these capabilities
in terms of five categories: 1) what information the stage uses to
classify packets, 2) once classified, the actions the stage can
perform on the packet, 3) the statistics the stage collects in this
process, 4) the asynchronous events the stage may send to the CE as
part of this process, and 5) the parameters that the stage uses to
control its overall behavior.
5.1. Introduction
The ForCES architecture allows Forwarding Elements (FEs) of varying
functionality to participate in a ForCES network element. The
implication of this varying functionality is that CEs can make only
minimal assumptions about the functionality provided by its FEs.
Instead, CEs discover the capabilities of their FEs. [FORCES-REQ]
mandates that this capability information be expressed in the form
of a FE model. [FORCES-REQ] further requires that this FE model
describe which logical functions (i.e., stages) are present in the
FE and in which order these stages are performed. See [FORCES-REQ]
for types of logical functions that this model must support. For
each logical function, [FORCES-REQ] also requires that the FE model
be able to describe each stageÆs "capabilities."
Anderson [Page 8]
A stage's capabilities clarify what the stage does but not how it
does it. (There is a small exception to this described later for
the case where the FE allows the CE to choose which algorithm the
stage should use.) For example, a forwarding function may perform a
lookup on destination IP address and mask to find a next hop IP
address and egress interface. However, the fact that the forwarding
function uses a Patricia Trie or a CAM to accomplish this lookup is
not relevant to the CE. Stage capabilities are best illustrated by
the following description of the logical packet-processing model of
a stage.
Stages logically process packets using the following process.
First, the stage receives a packet and performs a classification
step on the packet. This classification step finds the highest
priority rule (i.e., filter) in the stage's rule set (i.e.,
classification or rule table) that matches the given packet. Next,
the stage performs one or more actions associated with the matching
rule. As part of this process, the stage may update certain
statistics (e.g., number of packets processed, number of packets
matching each filter rule) to reflect the types of packets it has
processed. As one of the actions (or occasionally asynchronously),
the stage may generate an event for further processing by the CE.
For example, a stage may detect that the router alert IP option is
present in a packet and would then generate a "packet redirection"
event to send the packet to the CE. Finally, some stages may have
tunable "knobs" that affect how they process packets. For example,
a FE may provide various algorithms for performing a metering
function (e.g., average rate, exponentially weighted moving average,
token bucket).
From this process, we see that the capabilities of stages can be
modeled by describing the five logical sets of data maintained by
each stage. The first two sets of data are the filtering rules and
associated actions that are applied to each packet as they pass
through the stage. The third set of data is the statistics
maintained by the stage. The fourth set is the current state of the
stage's tunable "knobs." Finally, the fifth set is the set of
events for which the CE has registered to receive notifications from
the stage. Manipulation of these five logical databases can be used
as a model for control of each stage.
5.2. Model Approach
There are many ways that one could model the packet processing
capabilities of a FE. However, as we shall see, there is often a
tradeoff between the flexibility of a FE model and the ease with
which the CE can interpret that model to provide services. One
approach to this problem is to define a number of simple "device
types." Each of these device types would have well-known components
connected together in well-known ways. For example, we could define
a RFC1812 router device type that does a longest prefix match on
Anderson [Page 9]
destination IP address and mask and forwards packets to the
associated next hop IP address. However, since many services (e.g.,
QoS, firewall, intrusion detection) are being added to network
devices, the number of possible device types would be exponential in
the number of services. Writing a CE that understood exponentially
many device types would be a daunting task. Therefore, one would
likely want to restrict the number of devices types to a small set
of "likely" devices. Coming up with this set would be difficult.
Furthermore, restricting device types would seem to disallow vendors
from creating interesting new devices. One could attempt to solve
this problem by allowing vendors to define their own proprietary
device types but this only leads to another explosion of device
types and introduces interoperability problems for CE vendors who do
not have access to the description of FE vendors' proprietary device
types.
The FE model proposed in this document tries to strike a balance
between flexibility of the model and ease of use by the CE. The
model tries to strike this balance by describing packet processing
in two levels of detail. The higher level of detail (Section 5.3)
uses the concept of logical functions to make it easier for CEs to
determine how to implement a service with a given model. The lower
level of detail (Section 5.4) allows great flexibility to express
the realization of a logical function chosen by a FE. The model
allows arbitrary topologies to be described. While arbitrary
topologies make it harder for the CE to understand the FE, it is
asserted that static topology (or small set of topologies) is
insufficient to describe the types of devices already in use.
5.3. Logical Functions and Topology
There are two largely orthogonal parts to the FE model proposed in
this draft. The first part provides a way to describe which logical
functions are present in a FE and how packets flow between these
logical functions. The concept of a logical function is akin to
that of an abstract base class in object-oriented terminology. By
saying that a FE supports a logical function, what we are really
saying is that the FE implements a specific concrete "derived class"
version of the logical function. The following inheritance diagram
illustrates this concept.
Stage
/ | \
/ | \
/ | \
/ | \
/ | \ Logical
Forwarder Meter Shaper <======== Function
/ \ | \ Level
/ \ | \
/ \ | \
RFC1812Fwder WebSwitch Token Leaky <===== Capability
Anderson [Page 10]
Bucket Bucket Level
By describing the FE at this high level, the FE model is able to
give a broad overview of what processing a FE may perform on
packets. The goal of this part of the FE model is to provide a way
for the CE to know which stage(s) to modify to achieve a given
service. As such, this model allocates a namespace for the
specification of different logical functions. (We expect about 15
to 20 logical functions to be defined initially, e.g., ingress port,
egress port, forwarder, meter, marker, shaper, scheduler, queue,
encapsulator, decapsulator, encrypter, decrypter, NAT, mux, demux,
and editor.) Each FE allocates a FE-unique stage identifier (USI)
to each of its stages and passes the USI along with the
corresponding logical function name as part of the FE capability
description. This allows there to be multiple instances of the same
logical function in each FE's model. We will start with a simple
version of the model illustrating a capability exchange. In
subsequent sections, we will expand the model and refine the same
capability exchange. The following is the first version of the
capability exchange that indicates which logical functions are
present and how they are connected together.
- The number of stages supported.
- For each stage:
- The USI.
- The logical function name (from the namespace) that this stage
implements.
- The number of downstream stages to which this stage can send
packets.
- For each downstream stage:
- The USI of the downstream stage.
- A label for this exit point (i.e., target) from the stage.
This representation allows zero or more instances of each logical
function to be present in a FE model. Furthermore, this
representation encodes the topology of the provided stages. Since
it is not possible to represent all possible FEs' processing models
using a fixed topology, the model presented in this draft allows
functions to be connected with largely arbitrary topologies. The
only restrictions on topology relate to the source and sink natures
of ingress and egress port functions respectively. For example,
egress port functions must not have any downstream stages whereas no
other stage may refer to an ingress port function as one of its
downstream stages. Cycles in the topology are permitted.
5.4. Stage Capabilities
This section defines how the capabilities of all the stages in our
model can be expressed using a single methodology. We achieve this
uniformity by viewing all stages as acting according to the
classification/action paradigm. In this paradigm, when a packet
logically enters a stage, the stage first performs a classification
Anderson [Page 11]
on the packet. This classification is performed according to a
logical database of classification entries maintained by the stage.
Next, the stage performs one or more actions associated with the
matching classification entry. Each classification entry contains
this set of actions that the stage should perform for all packets
that match the entry.
This paragraph provides several examples of how the stages
identified in Section 3 can be viewed as acting according to the
classification/action paradigm. This paradigm is most naturally
applied to the generic filtering stages. In those stages,
prioritized filters (e.g., ACLs) are installed in a stageÆs logical
database. These filters specify which fields in the packet should
be evaluated and which values should be present in those fields for
the filter to match. In each filter, a pass or drop action is
typically specified that determines the disposition of packets
matching the filter. This paradigm maps to classical layer 3
forwarding in the following way. The logical database of
classification/action entries corresponds to a forwarding table.
The entries in this forwarding table have typically consisted of a
network address, a network mask, a next hop IP address, and an
egress interface number. The network address and mask make up the
classification portion of this entry while the next hop IP address
and egress interface correspond to a parameterized "forwarding
decision" action. The typical longest-prefix match algorithms
utilized by forwarding stages are nothing but classification
algorithms optimized for a masked match against a packetÆs
destination IP address. Finally, the metering stage can also be
viewed in terms of classification and action. Meters take a flow
specification and some rate limiting parameters (and optionally a
rate limiting algorithm). This flow specification may be based on
DSCP, 5-tuple or some other arbitrary packet contents. In any case,
this flow specification essentially defines a classification entry.
The rate limiting parameters are parameters to the specified rate
limiting action (or to an assumed rate limiting algorithm when one
is not explicitly specified).
While most of the functionality of a stage can be described
according to the classification/action paradigm, some additional
functions remain. These additional functions relate to how the
stage as a whole operates (as opposed to how the stage handles
individual flows), the kinds of asynchronous notifications that the
stage can send to the CE and the types of statistics the stage
maintains. While we will often have no control over the algorithm
the stage uses to perform its function, there may be certain knobs
and dials that we can adjust to control the algorithm. We call
these knobs and dials "parameters" to the stage because they
resemble parameters to algorithms. For example, one can view an
ingress port stage as running an ARP algorithm that responds to ARP
requests. In order for the ARP algorithm to know when to respond to
an ARP request, the ARP algorithm needs to know the IP addresses of
Anderson [Page 12]
each port. Thus, IP addresses can be viewed as parameters to the
ingress port stage.
Next, some stages can be viewed as the originators of asynchronous
notifications, i.e., events. These events correspond to occurrences
that the CE cannot anticipate. For example, the ingress and egress
port stages may be able to send the link up/down event when they
detect that their port link state has changed. Likewise, one or
more stages may support the packet redirection event for sending
well-known control packets to the CE. Since CEs may not want to
receive all the events that a FE may generate, the ForCES protocol
SHOULD support a registration/deregistration mechanism where the CE
can signal its interest in receiving the events that it has
discovered via this FE model. Finally, stages may maintain certain
statistics related to their packet processing.
In simplest terms, we describe the capabilities of each stage simply
by listing the names of the items in each of the five categories
that that stage supports. This approach is illustrated in the
following updated capability exchange.
- The number of stages supported.
- For each stage:
- The USI.
- The logical function name (from the namespace) that this stage
implements.
- The number of properties supported by the stage.
- For each property:
- The name of the property from the property namespace.
- The number of properties supported by the stage.
- For each action:
- The name of the action from the action namespace.
- The number of parameters supported by the stage.
- For each parameter:
- The name of the parameter from the parameter namespace.
- The number of events supported by the stage.
- For each event:
- The name of the event from the event namespace.
- The number of statistics supported by the stage.
- For each statistic:
- The name of the statistic from the statistic namespace.
- The number of downstream stages to which this stage can send
packets.
- For each downstream stage:
- The USI of the downstream stage.
- A label for this exit point (i.e., target) from the stage.
Anderson [Page 13]
The following paragraphs describe in more detail how the
classification, action, parameter, event and statistics capabilities
are expressed.
5.4.1. Classification Capabilities
The classification capabilities of a stage are expressed in our
model through a variable length sequence of "properties." Each
property in the sequence indicates that the stage is capable of
including that property in any of the classification entries for
that stage. Properties come in two varieties: packet properties and
metadata (tag) properties. Packet properties are those protocol
fields that occur explicitly in packets. For example, in the IP
protocol, the version, type of service bits, fragment offset, time-
to-live, protocol, source address, and destination address are
potentially useful packet properties for classification. Other
examples of useful packet properties include UDP source/destination
port, TCP source/destination port, and ICMP type and code fields.
Metadata (tag) properties are those values associated with a packet
that do not occur explicitly in the packet. For example, the
"ingress port" tag may be associated with a packet by the ingress
stage. This tag indicates by which port the packet entered the FE.
This tag may be useful to classify on in subsequent stages. For
example, some stages may give preferential treatment to packets
arriving on a certain port because that port is associated with a
customer receiving premium service. Without the "ingress port" tag,
subsequent stages would have no way of knowing on which port a
packet entered the FE. As another example, if the forwarder stage
is processing a multicast packet, that stage may need to know what
port the packet came in on so that the forwarder does not send the
packet back along the original link. In order to exchange property
information, we must agree on how to represent the presence of
absence of a property. This model allocates a property namespace
for this purpose. This namespace is shared across all stages
because many stages will classify on the same properties (e.g.,
ingress/egress port number or destination IP address).
5.4.2. Action Capabilities
Similarly, the action capabilities of a stage are represented by a
logical sequence of "actions." Each action in the sequence
indicates that the stage is capable of having that action associated
with one of the stageÆs classification entries. Actions come in
three varieties. The first type of action edits (e.g., changes a
field, inserts/removes a header) the current packet being processed.
The second type of action associates or dissociates a piece of
metadata (tag) with the packet being processed. The third type of
action selects a target (i.e., downstream stage) for the packet.
For example, the action provided by the forwarder stage typically
associates the "forwarding decision" tag with a packet. (The
Anderson [Page 14]
forwarding decision tag is a parameterized tag that specifies which
interface(s) the packet should be sent out and what the next hop IP
address is of the next router(s).) The egress stage then logically
classifies on this forwarding decision tag to determine which
interface to send the packet out. As another example, the Meter
stage may be configured to either drop packets exceeding a certain
rate limit or it may be configured to simply "tag" those packets
(e.g., with the "exceeding guaranteed rate" tag). A subsequent
stage may be configured to drop or pass packets tagged this way
depending on some other characteristic of the system. In contrast,
NAT stages would use the first type of action to edit the current
packet by rewriting the source or destination IP address. Some
stages may be configured to drop packets matching certain
classifiers. Drop may be seen as removing all the headers and
payload from the packet and removing all associated metadata
properties as well. Like properties, this model allocates a
namespace for the identification of different actions. This
namespace is shared across all stages because different stages may
share the same action (e.g., drop).
5.4.3. Parameter Capabilities
The parameters supported by a stage are expressed by a logical
sequence of "parameters." Each parameter in the sequence represents
one of the knobs or dials used by the stage. A namespace is
allocated for the identification of parameters. This namespace is
shared across all stages because stages may share the same
parameters.
5.4.4. Event Capabilities
The events supported by a stage are expressed by a logical sequence
of "events." Each event in the sequence represents one of the
events that the FE may be configured to send to the CE when the
event happens. A namespace is allocated for the identification of
events. This namespace is shared across all stages because stages
may share the same events (e.g., packet redirection or link
up/down).
5.4.5. Statistics Capabilities
The statistics collected by a stage are expressed by a logical
sequence of "statistics." Each statistic in the sequence represents
one of the statistics maintained by the stage. A namespace is
allocated for the identification of statistics. This namespace is
shared across all stages because stages may share the same
statistics (e.g., number of packets processed).
5.5. Read-only Stages
The FE model must be able to express that certain stages in a FE may
not be modifiable by a CE. However, the model cannot simply ignore
Anderson [Page 15]
these stages, as it may be necessary to understand their
functionality to predict the behavior of the FE. For example,
consider the following subset of a FE model. While the FE may allow
the Demux to be configured to select different kinds of traffic to
be sent to the A, B, and X targets, the subsequent meters may not be
programmable. However, the behavior of these meters must be known
so that the CE can make decisions as to which traffic should be sent
to which target (depending on the QoS desired for the traffic).
+-----+ +-----+
| | | |--------------->
Demux +->| |-->| | +-----+
+-----+ | | | | |---->| |
| A|------ +-----+ +-----+ +-----+
--->| B|-----+ Marker1 Meter1 Absolute
| X|---+ | Dropper1
+-----+ | | +-----+ +-----+
| | | | | |--------------->
| +->| |-->| | +-----+
| | | | |---->| |
| +-----+ +-----+ +-----+
| Marker2 Meter2 Absolute
| Dropper2
| +-----+ +-----+
| | | | |--------------->
|--->| |-->| | +-----+
| | | |---->| |
+-----+ +-----+ +-----+
Marker3 Meter3 Absolute
Dropper3
Two additions to the model are necessary to support read-only
stages: first, a Boolean flag that indicates whether the stage is
read-only or not, and second, an agreed upon way of expressing any
static classification/action entries. (There may be static
parameters as well, which will need a similar expression.) In each
classification/action entry, there are zero or more properties and
one or more actions. When multiple properties are present, the
result is a logical AND of each property (e.g., if destination IP
address==X AND IP protocol==TCP AND TCP destination port
number==80). When multiple actions are present, all those actions
are performed on matching packets. To represent each property or
action, a type/length/value (TLV) approach is used. The names
defined the property and action namespaces are suitable as the type
in the TLV. The length of the TLV is an appropriately sized integer
and represents the size of the "value" portion of the TLV. The
value portion of the TLV may itself have some structure and it is
therefore necessary to standardize a data structure that corresponds
to each type in the namespace. Combining all these concepts
together, the following model is used to express the static
classification/action entries:
Anderson [Page 16]
- The number of static classification/action entries.
- For each entry:
- The number of properties.
- The number of actions.
- For each property:
- The name of the property.
- The length of the property.
- The value of the property (using the data structure
corresponding to the given name.)
- For each action:
- The name of the action.
- The length of the action.
- The value of the action (using the data structure
corresponding to the given name.)
5.6. TLV Errata
The capability exchange shown in Section 5.4 represents an all-or-
nothing approach to the five categories of capabilities. For
example, either you support all types of classification (e.g., equal
to, not equal to, range matching, inverse range matching) for all
values of a property or you support no classification for that
field. However, in practice, things are often not as simple. For
example, some stages may be able to classify on specific values for
certain fields but no others, or a stage may be able to match the IP
protocol field for either TCP or UDP but nothing else. The FE model
must therefore be capable of expressing these sorts of restrictions
on the values associated with any of the five categories of
capabilities. To express these restrictions, no longer can we
describe capabilities by listing the names of supported items in
each of the five namespaces. Instead, along with each supported
item, the model must describe any restrictions associated with that
item. The model describes these restrictions in the following way.
Like section 5.5, a TLV structure is used. However, each TLV
contains two values instead of one. The first value represents the
bottom of a range of allowable values for the item while the second
value represents the top of a range of allowable values. It is
important to note the difference between the ability to select one
specific value in a range between A and B and the ability to select
a range of values, C-D, between A and B (A < C < D < B). The two
values in the TLV represent A and B but do not imply the ability to
do range checking. In fact, several different kinds of matching are
capable with the specific range of values. There is "equal to"
matching (e.g., does field X have the value C, where A < C < B?),
"not equal to" matching (e.g., is X not equal to C?), "less than"
matching, "not less than" matching, "inside range" matching (e.g.,
is X in C-D?), and "not inside range" matching (e.g., is X not in C-
D?). "Less than" and "not less than" matching are specialized forms
of range matching and can be expressed in that form given an
appropriate lower or upper bound. We therefore need four additional
flags associated with each specified range (i.e., A-B). These flags
Anderson [Page 17]
indicate whether equal to, not equal to, inside range, or not inside
range types of matching are allowed. Using the property category as
an example, the capability expression model becomes the following:
- The number of properties supported by the stage.
- For each property:
- The name of the property from the property namespace.
- The length of the value portion associated with this property.
- A flag indicating whether "equal to" classification is allowed.
- A flag indicating whether "not equal to" classification is
allowed.
- A flag indicating whether "inside range" classification is
allowed.
- A flag indicating whether "not inside range" classification is
allowed.
- The bottom of a range of values, using the data structure
associated with the given property.
- The top of a range of values, using the data structure
associated with the given property.
The previous paragraph describes capabilities inside one contiguous
range. This paragraph describes how capabilities are represented in
non-contiguous ranges, as in the one that motivated this section
(i.e., matching the IP protocol field for TCP or UDP only). To
express capabilities for non-contiguous ranges, multiple
capabilities entries are used, each having the same name from the
chosen namespace. For example, to express our motivating example,
the following two entries are used.
- 2 properties entries to follow.
- Entry 1:
- Name: IP protocol
- Length: two octets.
- Equal to: True
- Not equal to: False
- Inside range: False
- Not inside range: False
- Bottom: 6, TCP
- Top: 6, TCP
- Entry 2:
- Name: IP protocol
- Length: two octets.
- Equal to: True
- Not equal to: False
- Inside range: False
- Not inside range: False
- Bottom: 17, UDP
- Top: 17, UDP
Unlike properties, the other four categories have no need for the
flags indicating the four types of classification. However, the
other four categories still do need the bottom and top of range to
Anderson [Page 18]
indicate the range of allowable values from which the CE can
select only one.
5.7. Completed Capability Exchange
Having updated the capability exchange data model to express each
stage's capabilities according to the five categories, the
capability exchange consists of the following information:
- The number of stages supported.
- For each stage:
- The USI.
- The logical function name (from the namespace) that this stage
implements.
- The number of properties supported by the stage.
- For each property:
- The name of the property from the property namespace.
- The length of the value portion associated with this property.
- A flag indicating whether "equal to" classification is
allowed.
- A flag indicating whether "not equal to" classification is
allowed.
- A flag indicating whether "inside range" classification is
allowed.
- A flag indicating whether "not inside range" classification is
allowed.
- The bottom of a range of values, using the data structure
associated with the given property.
- The top of a range of values, using the data structure
associated with the given property.
- The number of actions supported by the stage.
- For each action:
- The name of the action from the action namespace.
- The length of the value portion associated with this action.
- The bottom of a range of values, using the data structure
associated with the given action.
- The top of a range of values, using the data structure
associated with the given action.
- The number of parameters supported by the stage.
- For each parameter:
- The name of the parameter from the parameter namespace.
- The length of the value portion associated with this
parameter.
- The bottom of a range of values, using the data structure
associated with the given parameter.
- The top of a range of values, using the data structure
associated with the given parameter.
- The number of events supported by the stage.
Anderson [Page 19]
- For each event:
- The name of the event from the event namespace.
- The length of the value portion associated with this event.
- The bottom of a range of values, using the data structure
associated with the given event.
- The top of a range of values, using the data structure
associated with the given event.
- The number of statistics supported by the stage.
- For each statistic:
- The name of the statistic from the statistic namespace.
- The length of the value portion associated with this
statistic.
- The bottom of a range of values, using the data structure
associated with the given statistic.
- The top of a range of values, using the data structure
associated with the given statistic.
- A flag indicating whether the stage is read-only.
- The number of static classification/action entries.
- For each static classification/action entry:
- The number of properties.
- The number of actions.
- For each property:
- The name of the property.
- The length of the property.
- The value of the property (using the data structure
corresponding to the given name.)
- For each action:
- The name of the action.
- The length of the action.
- The value of the action (using the data structure
corresponding to the given name.)
- The number of static parameters.
- For each static parameter:
- The name of the parameter.
- The length of the parameter.
- The value of the parameter (using the data structure
corresponding to the given name.)
- The number of downstream stages to which this stage can send
packets.
- For each downstream stage:
- The USI of the downstream stage.
- A label for this exit point (i.e., target) from the stage.
6. Applicability to RFC1812
[To be done.]
Anderson [Page 20]
7. Security Considerations
Significant security considerations need to be documented but were
not done in time for submission. Next revision will begin to address
these issues.
8. References
[FORCES-REQ] T. Anderson, et. al., "Requirements for Separation of
IP Control and Forwarding", work in progress, September
2001, <draft-anderson-forces-req-02.txt>.
9. Authors' Addresses
Todd A. Anderson
Intel Labs
2111 NE 25th Avenue
Hillsboro, OR 97124 USA
Phone: +1 503 712 1760
Email: todd.a.anderson@intel.com
1. Abstract........................................................1
2. Definitions.....................................................2
3. Introduction....................................................4
4. Architecture....................................................4
4.1. Control Elements...........................................5
4.2. Forwarding Elements........................................6
4.3. CE Managers................................................6
4.4. FE Managers................................................6
4.5. Gl Reference Point.........................................6
4.6. Gf Reference Point.........................................7
4.7. Gc Reference Point.........................................7
4.8. Gi Reference Point.........................................7
4.9. Gp Reference Point.........................................7
4.10. Gr Reference Point........................................8
5. FE Model........................................................8
5.1. Introduction...............................................8
5.2. Model Approach.............................................9
5.3. Logical Functions and Topology............................10
5.4. Stage Capabilities........................................11
5.4.1. Classification Capabilities..........................14
5.4.2. Action Capabilities..................................14
5.4.3. Parameter Capabilities...............................15
5.4.4. Event Capabilities...................................15
5.4.5. Statistics Capabilities..............................15
5.5. Read-only Stages..........................................15
5.6. TLV Errata................................................17
5.7. Completed Capability Exchange.............................19
6. Applicability to RFC1812.......................................20
7. Security Considerations........................................21
8. References.....................................................21
Anderson [Page 21]
9. Authors' Addresses.............................................21
Anderson [Page 22]