Javascript disabled? Like other modern websites, the IETF Datatracker relies on Javascript. Please enable Javascript for full functionality.
Diameter Overload Control Solution Issues
draft-campbell-dime-overload-issues-00

Versions:
The information below is for an old version of the document.
Document	Type	This is an older version of an Internet-Draft whose latest revision state is "Expired".
	Author	Ben Campbell
	Last updated	2013-06-06
	RFC stream	(None)
	Formats	txt xml htmlized pdf bibtex bibxml
Stream	Stream state	(No stream defined)
	Consensus boilerplate	Unknown
	RFC Editor Note	(None)
IESG	IESG state	I-D Exists
	Telechat date	(None)
	Responsible AD	(None)
	Send notices to	(None)
Email authors IPR References Referenced by Nits Search email archive
draft-campbell-dime-overload-issues-00
Network Working Group                                        B. Campbell
Internet-Draft                                                   Tekelec
Intended status: Informational                                 June 2013
Expires: December 01, 2013

               Diameter Overload Control Solution Issues
                 draft-campbell-dime-overload-issues-00

Abstract

   The Diameter Maintenance and Extensions (DIME) working group has
   undertaken an "overload control" work item, with the goal of
   standardizing a mechanism to allow Diameter nodes to report overload
   information among themselves.  Requirements currently include, among
   others, the need to accurately report the scope of overload
   conditions, and the ability to report overload information between
   nodes that are not directly connected at the transport layer.  These
   requirements introduce complex issues.  This document describes those
   issues, in the hope that it will assist the working group's decision
   process.

Status of this Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on December 01, 2013.

Copyright Notice

   Copyright (c) 2013 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents (http://trustee.ietf.org/
   license-info) in effect on the date of publication of this document.
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.  Code Components
   extracted from this document must include Simplified BSD License text
   as described in Section 4.e of the Trust Legal Provisions and are
   provided without warranty as described in the Simplified BSD License.

Campbell               Expires December 01, 2013                [Page 1]
Internet-Draft Diameter Overload Control Solution Issues       June 2013

Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  2
   2.  Documentation Conventions  . . . . . . . . . . . . . . . . . .  3
   3.  Overload Scopes  . . . . . . . . . . . . . . . . . . . . . . .  3
     3.1.  Types of Overload Scopes . . . . . . . . . . . . . . . . .  4
       3.1.1.  Diameter Node Scope  . . . . . . . . . . . . . . . . .  4
         3.1.1.1.  Peer-Node Scope-Type . . . . . . . . . . . . . . .  5
         3.1.1.2.  Destination-Host Scope-Type  . . . . . . . . . . .  5
         3.1.1.3.  Non-Adjacent Nodes . . . . . . . . . . . . . . . .  6
       3.1.2.  Realm Scope  . . . . . . . . . . . . . . . . . . . . .  6
       3.1.3.  Diameter Application Scope . . . . . . . . . . . . . .  6
       3.1.4.  Origin-Host Scope  . . . . . . . . . . . . . . . . . .  7
     3.2.  Scope Values . . . . . . . . . . . . . . . . . . . . . . .  7
     3.3.  Combining Scopes . . . . . . . . . . . . . . . . . . . . .  8
     3.4.  Scope Extensibility  . . . . . . . . . . . . . . . . . . .  8
     3.5.  Scope Recommendations  . . . . . . . . . . . . . . . . . .  8
   4.  Non-adjacent Overload Information  . . . . . . . . . . . . . .  8
     4.1.  Use-Cases for Non-adjacent Overload Control  . . . . . . .  9
       4.1.1.  Interconnect . . . . . . . . . . . . . . . . . . . . .  9
       4.1.2.  Non-Supporting Agents  . . . . . . . . . . . . . . . . 10
     4.2.  Issues with Non-Adjacent Overload Control  . . . . . . . . 10
       4.2.1.  Topology Issues  . . . . . . . . . . . . . . . . . . . 10
       4.2.2.  Support Negotiation  . . . . . . . . . . . . . . . . . 10
       4.2.3.  Overload Report Delivery . . . . . . . . . . . . . . . 11
       4.2.4.  Non-Adjacent Overload Scopes . . . . . . . . . . . . . 13
     4.3.  Non-adjacent Overload Control Recommendations  . . . . . . 14
   5.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 15
   6.  Security Considerations  . . . . . . . . . . . . . . . . . . . 15
   7.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 15
     7.1.  Normative References . . . . . . . . . . . . . . . . . . . 15
     7.2.  Informative References . . . . . . . . . . . . . . . . . . 16
   Appendix A. Contributors . . . . . . . . . . . . . . . . . . . . . 16
   Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 16

1.  Introduction

   When a Diameter [RFC6733] server or agent becomes overloaded, it
   needs to be able to gracefully reduce its load, typically by
   requesting other nodes to reduce the number of Diameter requests for
   some period of time.

   The Diameter Overload Control Requirements [I-D.ietf-dime-overload-
   reqs] describe requirements for overload control mechanisms.
   Requirement 31 states that Diameter nodes must be able to report
   overload with sufficient granularity to avoid forcing available
   capacity to go unused.  Requirement 34 requires the ability to report
   overload across Diameter nodes that do not support the mechanism.
   These requirements introduce significant and interrelated
   complexities to potential solutions.  This document describes the
   related issues.  The author hopes that this document will assist the
   working group's decision process related to these requirements.

Campbell               Expires December 01, 2013                [Page 2]
Internet-Draft Diameter Overload Control Solution Issues       June 2013

   At the time of this writing, there have been two proposals for
   Diameter overload control mechanisms.  "A Mechanism for Diameter
   Overload Control" (MDOC) [I-D.roach-dime-overload-ctrl] defines a
   mechanism that piggybacks overload and load state information over
   existing Diameter messages.  "The Diameter Overload Control
   Application" (DOCA) [I-D.korhonen-dime-ovl] defines a mechanism that
   uses a new and distinct Diameter application to communicate similar
   information.

      While there are significant differences between the two proposals,
      they carry similar information.  In many ways, the issues related
      to Requirements 31 and 34 apply to both proposals.  This
      discussion is not specific to one proposal or the other, unless
      explicitly mentioned.

2.  Documentation Conventions

   This document uses terms defined in [RFC6733] and [I-D.ietf-dime-
   overload-reqs].  In particular, the terms "client",
   "server","upstream", and "downstream" are used as defined in RFC
   6733. In addition, this document uses the following terms:

   Overload: A condition where a Diameter needs a reduction in the
             number of requests that it must handle.

   Overload Report: A request to reduce traffic that contributes to an
             overload condition.

   Overload Scope: The set of requests that may contribute to an
             overload conditions.

   Reporting Node: The node that sends an overload report.  Also known
             as an "overloaded node".

   Reacting Node: A node that consumes and possibly acts on an overload
             report.

   Adjacent Overload Report: An overload report sent between adjacent
             Diameter peers.

   Non-Adjacent Overload Report: An overload report sent between
             Diameter separated by one or more intermediary nodes (i.e.
             agents or proxies) .

3.  Overload Scopes

Campbell               Expires December 01, 2013                [Page 3]
Internet-Draft Diameter Overload Control Solution Issues       June 2013

   Diameter overload may affect some requests and not others.  The
   Diameter overload requirements [I-D.ietf-dime-overload-reqs] list
   several scenarios that illustrate overload that affects some requests
   but not others.  We refer to the set of requests affected by a
   particular overload event as the "overload scope" (or "scope") of the
   event.  The overload requirements require an extensible scope
   mechanism, with support for at least scopes of type "Diameter node",
   "Realm", and "Diameter Application".

   An scope indication in an overload report is a set of classifiers
   that identify requests likely to contribute to the overload
   condition.  In general, this could include any aspect of a Diameter
   message that a reacting node can observe.  For example, requests
   could be classified by Attribute Value Pair (AVP) values or next-hop
   routing decisions.

   The ability to express the scope of an overload condition is only
   useful when reacting nodes can act on the information.  There are
   only a small number of actions a reacting node may take to mitigate
   load.  Essentially these actions boil down to reducing the number of
   requests that match the scope, either by sending fewer requests in
   the first place, or by routing around the problem.  The former is
   limited by the node's ability to select between requests that match
   the overload scope, and request that do not.  The latter is limited
   by the node's ability to predict or influence how a request will be
   routed.

      Reacting nodes most likely take additional application-specific
      actions to mitigate overload conditions.  If a client reduces the
      number of messages it sends, it almost certainly has to take
      additional application-specific steps that affect its own client
      application.  Depending on the application, it might refuse some
      client application requests, redirect some of its own clients to
      different services (e.g.  offloading mobile data sessions to local
      WiFi networks), or trigger an overload condition in the client
      application protocol (e.g.  The Session Initiation Protocol (SIP)
      ).

   This section discusses the meanings of the required scope-types, and
   analyses their implications for the selected mechanism.

3.1.  Types of Overload Scopes

   There are several different kinds, or types, of overload scopes.  The
   type of a scope defines how the reacting node interprets it.

3.1.1.  Diameter Node Scope

   The "Diameter Node" scope-type indicates that a particular Diameter
   node is overloaded.  Other nodes should mitigate the overload by
   reducing the number of requests that will land on the overloaded
   node, either by sending fewer requests, or by attempting to route
   requests around the overloaded node.

Campbell               Expires December 01, 2013                [Page 4]
Internet-Draft Diameter Overload Control Solution Issues       June 2013

   In practice, the reporting node may have three distinct relationships
   with the reacting node.  The reporting node may be a Diameter peer,
   meaning it has a direct transport layer connection with the reacting
   node.  It may be an endpoint, that is, a Diameter server (or client,
   in the case of server-to-client requests).  Finally, it may be a non-
   adjacent agent, that is, a node that is neither a peer or an
   endpoint.  Each of these cases is effectively a separate scope-type,
   since each requires different behaviors from reacting nodes.

3.1.1.1.  Peer-Node Scope-Type

   In the case of a peer, the reacting node simply sends fewer requests
   directly to the peer.  If it has other peers that are candidates for
   the requests, it may reroute requests to them.  We refer to this
   scope-type as "Peer-Node"

   The "Peer-Node" scope-type can further be broken down by transport
   connection.  Large-scale Diameter nodes are often implemented as
   clusters of IP hosts, which may or may not share their knowledge
   about upstream overload conditions.  Certain IP hosts in a cluster
   could become overloaded when others do not.  Therefore it may be
   useful to specify a "Peer-Connection" scope-type, to request
   reduction of traffic on a specific transport (i.e.  TCP or SCTP)
   connection.

3.1.1.2.  Destination-Host Scope-Type

   If the overloaded node is an endpoint from the reacting node's
   perspective, the best the reacting node can do is reduce the number
   of requests that contain a Destination-Host AVP that match the
   overloaded node.  Rerouting will not help in general, since the
   requests will simply take different routes to arrive at the
   overloaded server.  Unless the destination node is a direct peer, the
   reacting node cannot do much about requests that don't contain a
   Destination-Host AVP in the first place, since it cannot predict
   whether these requests will land on the overloaded endpoint.  We
   refer to this scope-type as "Destination-Host".  While Destination-
   Host scopes may offer less utility to reacting nodes than Peer-Node
   scopes, they are still useful for requests bound to a particular
   server, for example, mid-session requests for a session-stateful
   application.

Campbell               Expires December 01, 2013                [Page 5]
Internet-Draft Diameter Overload Control Solution Issues       June 2013

   Diameter agents that implement certain topology-hiding schemes may
   modify Origin-Host AVPs inserted by servers, and use some local
   mechanism to bind sessions to specific servers.  The "Destination-
   Host" type may not function correctly in this case.  MDOC specifies a
   "session-group" scope-type, where a topology hiding agent can assign
   a common identifier to sessions that are fate-shared in some way,
   such as being bound to the same server.  If that server becomes
   overloaded, the agent can send an overload report that matches
   requests in all sessions with the matching identifier.  This scope-
   type may be useful under certain circumstances, but may also be
   complex to implement.  Further discussion is needed to determine if
   the session-group type should be included in the base mechanism.
   Since the mechanism is required to allow extensible scope-types,
   session-groups could still be added in the future.

3.1.1.3.  Non-Adjacent Nodes

   The reacting node cannot in general predict which requests will
   impact a particular non-adjacent agent, other than by guessing that a
   certain percentage of requests for a particular realm or application
   might traverse it.  Those examples would be better handled with scope
   types designed for that purpose, e.g.  "Realm" or "Diameter
   Application".

3.1.2.  Realm Scope

   The "Realm" scope-type indicates overload for all servers that handle
   requests for the particular Diameter realm.  That is, it impacts all
   requests with the particular realm in the Destination-Realm AVP.

   The Realm scope-type is useful for declaring a global overload
   condition within a network serving a single realm.  It is also useful
   for requesting third-parties to reduce Diameter traffic sent to a
   particular realm, for example, in roaming scenarios.

   Since the Realm scope-type indicates overload for an entire realm,
   reacting nodes should reduce the number of messages sent for the
   realm.  Rerouting traffic does not make sense for the Realm scope
   type, since it would almost be useful for Diameter nodes to reroute
   traffic destined for an overloaded realm to a different, non-
   overloaded realm.  Client applications might, however, be able to
   choose to use services from a different operator if the Diameter
   realm of one operator reports an overload condition.

3.1.3.  Diameter Application Scope

   The "Diameter Application" scope-type indicates overload for a
   particular Diameter application.  That is, it impacts all requests

Campbell               Expires December 01, 2013                [Page 6]
Internet-Draft Diameter Overload Control Solution Issues       June 2013

   with the matching value in an Application-Id AVP.

   The Diameter Application scope-type is useful for declaring an
   overload condition that affects a specific Diameter service,
   typically, but not necessarily, in a specific realm.

   Since the Diameter Application scope-type indicates overload for an
   entire application, reacting nodes should reduce the number of
   requests sent for that application.  Similarly to the Realm scope-
   type, it will rarely if ever make sense for a Diameter node to
   reroute traffic to a different Diameter application.

3.1.4.  Origin-Host Scope

   While most scope-types refer to where a request is likely to go, the
   "Origin-Host" scope-type refers to where the request originates.
   That is, any request with a matching Origin-Host AVP would match.
   The Origin-Host scope type is useful for situations where a specific
   client or set of clients sends an excessive number of requests.  An
   overload report with an Origin-Host scope would tell matching clients
   to reduce traffic, or agents to throttle requests that came from
   matching clients.

      Note that the Origin-Host scope-type is not explicitly mentioned
      in the requirements document.  The authors include it here because
      others have mentioned the need in conversation.

3.2.  Scope Values

   Scope labels in an overload report will typically take the form of a
   scope-type and a value.  For example, if the "example.com" realm is
   overloaded for all services, the overload report would indicate a
   scope-type of "Realm" and a scope-value of "example.com"

   A possible exception is the "Peer-Connection" scope-type.  Since an
   overload report with a Peer-Connection scope is only actionable by
   one of the peers connected via the specified connection, it makes
   sense to treat the Peer-Connection scope-type as always having a
   value of "this connection".

   There has been discussion among working group participants about
   whether scope-values are really needed for a piggy-backed overload-
   control mechanism.  The discussion boils down to a question about
   whether an overload-report indicates overload just for the realm,
   application, etc, of the Diameter message carrying the report, or
   whether it can indicate overload for other realms, applications, etc.
   MDOC allows values for most scope-types, even though it is a piggy-

Campbell               Expires December 01, 2013                [Page 7]
Internet-Draft Diameter Overload Control Solution Issues       June 2013

   backed mechanism.

   Implicit scope values would preclude the ability to signal just a
   realm, just an application, or just a connection, without signaling
   all three in combination.  The overload control requirements
   explicitly require the ability to specify each of these.  An implicit
   scope value approach would violate those requirements.

   Scope-values are required for application-based mechanisms.  For
   example, an overload report with a Diameter application scope will
   almost always need to talk about Diameter applications other than the
   "overload-control" application.

3.3.  Combining Scopes

   Diameter nodes will commonly need to construct overload reports that
   apply to a combination of scopes.  For example, if a given realm is
   overloaded for subset of the applications it supports, it might
   indicate both a realm scope and and one or more Diameter application
   scopes.

   Logically, combining multiple scopes of different types reduces the
   overall set of requests to which the overload report would apply.
   Combining multiple scopes of the same type increases the applicable
   set.  A function that determines the requests affected by an overload
   report could model this as a logical "and" or "intersection" operator
   for combining scopes of different types, and a logical "or" or
   "union" operator for combining scopes of the same type.

   We need further discussion about whether all possible combinations
   should be allowed.  For example, it may or may not make sense to
   combine a "Peer-Connection" scope with other scopes, or to allow more
   than one "Peer-Connection" scope-value for a single overload report.

3.4.  Scope Extensibility

   [I-D.ietf-dime-overload-reqs] requires scope-types to be extensible.
   This requirement implies that the chosen mechanism or mechanisms must
   discuss how new scope-types can be added, how support for specific
   scope-types should be declared or negotiated, and which scope-types
   might be mandatory to support.

3.5.  Scope Recommendations

   In the author's opinion, the selected solution or solutions should
   support, at a minimum, the "Peer-Connection", "Peer-Host",
   "Destination-Host", "Realm" and "Application-ID" scope-types.  The
   working group should consider also adding the "Origin-Host" scope-
   type.

   The working group should consider whether the advantages of the
   "session-group" concept and scope-type are worth the complexity.

4.  Non-adjacent Overload Information

Campbell               Expires December 01, 2013                [Page 8]
Internet-Draft Diameter Overload Control Solution Issues       June 2013

   Requirement 34 of [I-D.ietf-dime-overload-reqs] says that the
   selected Diameter overload control mechanism "SHOULD" be able to
   communicate overload and load information across intermediaries that
   do not support the mechanism.  This requirement introduces a number
   of complications to the solution effort, creating complications in
   how Diameters negotiate support for overload control, address and
   route overload reports to the right places, and act on received
   overload reports.

   While the requirement does not explicitly say it, we interpret
   "intermediaries" in this context to mean Diameter agents.  The
   requirement is irrelevant for lower layer intermediaries (e.g.
   routers), and cannot be reasonably applied for non-Diameter entities,
   or hybrid entities such as gateways between Diameter and other
   protocols.

   The requirement to traverse non-supporting intermediaries is not
   necessarily the same thing as a requirement for end-to-end
   communication of overload reports between Diameter clients and
   servers.  Diameter agents can also originate and consume overload
   reports.  Therefore, we refer to this requirement as "Non-adjacent
   Overload Control".

4.1.  Use-Cases for Non-adjacent Overload Control

   There are two primary use-cases for non-adjacent overload control.

4.1.1.  Interconnect

   The first significant non-adjacent use-case is the interconnect
   scenario described in section 2.3 of the overload control
   requirements [I-D.ietf-dime-overload-reqs].  Two or more Diameter
   network operators communicate with each other across a third-party
   interconnect provider that brokers Diameter traffic between the
   operators.

   If the interconnect provider does not support Diameter overload
   control, each operator network becomes an island of overload control,
   similar to those in the non-supporting agent use-case (Section
   4.1.2). Even if the interconnect provider does support overload
   control, the operators may not trust it to generate and act on
   overload reports on the operators' behalves, and may prefer to
   exchange overload and load information directly with each other.

   The interconnect use-case may introduce additional security concerns.
   While the non-supporting agent use case typically (but not

Campbell               Expires December 01, 2013                [Page 9]
Internet-Draft Diameter Overload Control Solution Issues       June 2013

   necessarily) occurs inside a single administrative domain, the
   interconnect case will almost always involve sending overload reports
   across multiple administrative domains.  Since a malicious or
   incorrect overload report can effectively shut down Diameter
   processing, the current lack of a viable solution for end-to-end
   integrity protection of Diameter messages may be a problem.

4.1.2.  Non-Supporting Agents

   [I-D.ietf-dime-overload-reqs] requires the solution to function in
   networks where not all Diameter elements support it.  That is, the
   solution must allow gradual deployment, and must not require a flag-
   day cutover.  If non-adjacent overload control is not supported, one
   or more non-supporting Diameter Agents can divide a network into
   overload control islands, where overload information is communicated
   inside each island, but not among separate islands.

      In the author's strictly personal opinion, the non-supporting
      agent use case is less compelling than the interconnect case.  The
      non-supporting agent case would typically occur inside one
      administrative domain.  The operator of that domain has
      considerably more control over the implementations used in the
      domain than it might have for third-party domains.

4.2.  Issues with Non-Adjacent Overload Control

4.2.1.  Topology Issues

   Many of the issues with non-adjacent overload control derive from the
   fact that a Diameter node is unlikely to know the topology of the
   Diameter network past its immediate peers.  In a trivial topology,
   that is, a Diameter network with only clients and servers, this is
   not a problem.  But if the immediate peer is a Diameter agent, a node
   is unlikely to know what next hop the relay will select for a given
   Diameter message.  This is particularly difficult if the agent hides
   topology in either direction, or uses dynamic peer discovery.  While
   a node may be able to infer the path a given message will take in
   some specific cases (e.g.  for mid-session messages), they cannot do
   this in general.  And even those specific cases may fail if an agent
   on the message path performs topology hiding.

   This lack of topology knowledge impacts the way that nodes can
   negotiate overload-control support, the ways they send overload
   reports, and the ways a reacting node can act to mitigate overload.
   A non-adjacent overload-control mechanism will need to solve the
   topology issues, either by offering ways to discover non-adjacent
   topologies, or offering ways to constrain overload-control relevant
   parts of such topologies in ways where a node could reasonably know
   them in advance.

4.2.2.  Support Negotiation

Campbell               Expires December 01, 2013               [Page 10]
Internet-Draft Diameter Overload Control Solution Issues       June 2013

   Diameter nodes need to negotiate or otherwise indicate their support
   for overload control to other nodes.  This includes indicating
   support for overload control in general, as well as potentially
   indicating support of certain parameters of the overload control
   solution.  For example, a node may need to indicate which overload
   algorithms it supports.  This becomes complex if two non-adjacent
   nodes need to negotiate support.

   In a Diameter application-based solution, support for the overload
   control application would occur during the capabilities exchange
   between peers.  Diameter capabilities exchange occurs strictly
   between peers; Diameter offers no mechanism for indicating support of
   a given Application-ID between non-adjacent nodes.

   Diameter allows non-negotiated use of an arbitrary Application-Id
   between non-adjacent nodes across Diameter agents that implement the
   Diameter Relay application.  In theory, this means that an
   application-based, non-adjacent overload control could only traverse
   Diameter relays, or Diameter proxies that explicitly support the
   overload-control Application-Id.  In the latter case, we assume that
   a proxy will not indicate support for the overload-control
   Application-Id unless it supports the overload-control mechanism;
   such a proxy cannot be considered a non-supporting agent.

   In practice, a Diameter agent can act as a proxy for some purposes
   and a relay for others.  If a Diameter proxy indicates support for
   the Diameter relay application, we assume that it will relay any
   arbitrary application.  This means it can be considered a relay for
   the purposes of overload control.

   For both application-based and piggybacked solutions, a supporting
   node needs know the other nodes with which it should negotiate.  For
   overload-control between Diameter peers, this is easy; a node
   exchanges support information with its immediate peers.  But for non-
   adjacent overload control, this is more difficult for reasons
   discussed in Section 4.2.1.

   Therefore, for non-adjacent overload control negotiation, each
   supporting node either needs advance knowledge of all nodes with
   which it may negotiate overload-control support, or it needs a
   mechanism for discovering that knowledge dynamically.

4.2.3.  Overload Report Delivery

   With hop-by-hop overload control mechanisms, overload report
   addressing and delivery is relatively simple.  A node sends overload
   reports directly to its peers.  This becomes more complex for non-

Campbell               Expires December 01, 2013               [Page 11]
Internet-Draft Diameter Overload Control Solution Issues       June 2013

   adjacent overload-control.

   For application-based overload control, nodes could address overload
   reports to specific endpoint nodes using the Destination-Host AVP.
   Doing so would be subject to the same non-adjacent topology issues
   described in Section 4.2.1. That is, a node can only send overload
   reports to non-adjacent clients or servers that it knows about,
   either from prior knowledge (i.e.  provisioning) or from which it has
   observed previous Diameter messages.

   An application-based mechanism could possibly address reports to non-
   adjacent Diameter agents using the Destination-Host AVP. This would
   effectively make the agent into an endpoint for the overload-control
   application.

   A piggy-backed mechanism will have more difficulty addressing non-
   adjacent overload reports.  A piggy-backed mechanism sends overload
   reports in already existing Diameter requests; That is, requests that
   have their own purposes and destinations independent of the overload-
   report.  Thus, nodes can only select the destination of an overload
   report by bundling it into a Diameter message that was already going
   to that destination.  While a piggy-backed mechanism might be able to
   send overload-reports across quiescent transport connections using
   watchdog (DWR/DWA) messages, these message are cannot be exchanged
   between non-adjacent nodes.

      In some cases, the limit of sending overload reports to
      destinations to which existing traffic is bound may be acceptable.
      If a node is contributing to an overload condition, then it's
      reasonable to assume that node is regularly exchanging traffic
      with the overloaded node.  However, there may be cases where an
      overload report causes a connection become quiescent.  If the
      reporting node needed to tell a reacting node that the condition
      has resolved or improved, it would need to send a new report
      across the now quiescent connection.  There may also be cases
      where a reacting node redirects traffic along a different path,
      causing a previously quiescent node to suddenly start sending
      requests to the overloaded node.  Thus, without careful selection
      of the overload report scope, an overloaded node may find itself
      engaged in a game of Whack-a-Mole [Whac-a-Mole] with previously
      quiescent non-adjacent nodes.

   For both piggy-backed and application-based solutions, non-adjacent
   overload control introduces a need to identify the sender of a
   report, or at least determine whether the report is from an adjacent
   or non-adjacent node.  This is not required for purely hop-by-hop
   solutions, since the sender could always be assumed to be the peer.

   For example, a non-adjacent report with a "Peer-Connection" scope
   does not make sense.  If a node receives one, it should probably
   ignore it.  But in order to make that decision, it must be able
   distinguish a non-adjacent report from an adjacent one.  For example,
   in an application-based mechanism,

Campbell               Expires December 01, 2013               [Page 12]
Internet-Draft Diameter Overload Control Solution Issues       June 2013

4.2.4.  Non-Adjacent Overload Scopes

   A reacting node will typically attempt to mitigate an overload
   condition by either reducing the number of requests that contribute
   to the condition, or by rerouting part of that traffic to avoid the
   problem.  In both cases, the reacting node's is limited by its
   ability to determine to which Diameter requests contribute to the
   overload condition in the first place.  The overload scope concept
   (Section 3) offers a way for overloaded nodes to indicate what
   traffic is likely to overload and should be abated.

   Not all of the scope-types described in Section 3 make sense for non-
   adjacent overload control.  The "connection" scope-type is an obvious
   example, since the reacting node will never share a transport
   connection with a non-adjacent node; this is the very definition of
   non-adjacent nodes.

   Since a Diameter node cannot control in general how requests are
   forwarded to non-adjacent nodes, the "host" scope-type also does not
   work well, especially when there are multiple possible destinations
   up or downstream from the adjacent peer.  For example in Figure 1,
   Node A sends Diameter requests to Nodes B and C across a non-
   supporting agent.  If Node B becomes overloaded but Node C does not,
   Node A cannot reroute requests to Node C, since it has very little
   way to influence where the agent will forward any given request.  If
   Node A tries to reduce traffic by 50%, the agent will likely still
   send half of the remaining traffic to Node B. If B and C are
   endpoints, Node A may in some cases be able to use the Destination-
   Host AVP for this purpose (in which case the "Destination-Host"
   scope-type would be more appropriate), but this does not help if B
   and C are agents rather than servers.

                      +--------+       +--------+
                      | Node B |       | Node C |
                      +----+---+       +---+----+
                           |               |
                           +-------+-------+
                                   |
                           +-------+--------+
                           | Non-Supporting |
                           |  Agent         |
                           +-------+--------+
                                   |
                                   |
                              +----+----+
                              | Node  A |
                              +---------+

Campbell               Expires December 01, 2013               [Page 13]
Internet-Draft Diameter Overload Control Solution Issues       June 2013

   Scope-types that classify traffic by origin or final destinations,
   such as "Origin-Host","Destination-Realm", "Application-ID", and
   "Destination-Host" can be used for non-adjacent overload control.  In
   general, scope-types that may denote non-adjacent intermediary
   devices, such "host" cannot, nor can scope-types that refer only to
   peers, e.g.  "Peer-Connection".

   Even for destination-oriented scope-types, the sender of an overload
   report must be authoritative for the indicated scope.  That is, it
   must have full knowledge of the congestion state for the scope.  For
   example, if Node B and C both serve the ream "example.com", and B
   becomes 50% overloaded while C does not, B cannot simply report 50%
   overload at realm scope.  If it did, Node A would reduce its
   generated traffic by 50%. Since the overall realm is really only
   overloaded by 75%, this would leave the realm operating beneath
   available capacity.

      The need to be authoritative for an indicated scope is also true
      for strictly hop-by-hop mechanisms.  But in an hop-by-hop
      mechanism, it is easier for an intervening agent to learn the
      overload state of upstream nodes.  In the example, if the agent
      supported the overload control mechanism, it would most likely
      receive reports from Nodes B and C, and could then construct
      downstream reports that incorporate the state of B, C, and its own
      local state.  This contrasts with the non-adjacent case where B
      must understand the current state of C even though it is not in
      the path of overload reports from C.

   Therefore, a given node must only report overload for scopes for
   which it has full knowledge of the load and overload state.  That is,
   it must be a "scope authority" for any scope it reports.  In the
   example, nodes B and C (and any other nodes serving "example.com")
   would be required to share current load and overload state.  The
   state-sharing requirement could be substantial for high-capacity
   nodes.

   When a node reports overload for a certain scope, reacting nodes will
   treat the overload condition as uniform across the entire scope.  For
   example, if a node reports overload for an entire realm, reacting
   nodes will reduce traffic equally for all servers that serve that
   realm.  If the servers are unequally overloaded, they must use a more
   granular scope-type, for example, "Destination-Host".

4.3.  Non-adjacent Overload Control Recommendations

   A hop-by-hop mechanism allows for very flexible and fine grained
   overload control.  It solves or simplifies a number of issues, such
   as negotiation of support and parameters, requirements for topology
   knowledge, end-to-end security, etc, by avoiding them in the first
   place.  Adding non-adjacent support to such a mechanism would
   complicate it considerably.

Campbell               Expires December 01, 2013               [Page 14]
Internet-Draft Diameter Overload Control Solution Issues       June 2013

   Non-adjacent overload control mechanism are better for connecting
   islands of overload control.  Such a mechanism works well for larger
   scopes and relatively static topologies.

   The author believes that we are unlikely to find a single solution
   that works well for both hop-by-hop and non-adjacent overload
   control.  While a single solution is more desirable in general, a
   single solution that works well for both cases is likely to be
   extremely complicated.  Therefore, the working group should consider
   a separate mechanism for the non-adjacent delivery of overload
   reports.

   If the group chooses to accept two separate solutions, we should be
   able to specify a single data model and set of AVPs that work for
   both, with some restrictions.  (For example, the non-adjacent
   solution would likely forbid the use of the "Peer-Connection" scope-
   type.)

5.  IANA Considerations

   This draft makes no requests of IANA.

6.  Security Considerations

   Overload reports induce Diameter nodes to reduce or reroute traffic.
   For large scopes, a single erroneous or malicious overload report
   could effectively shut down Diameter processing for an entire realm.
   A Diameter overload control solution needs mechanisms to ensure that
   overload reports are only accepted from trusted sources, and that
   nothing tampers with the reports en route.

   For hop-by-hop approaches, the transport connection can be protected
   with TLS or IPSec.  But this will not help for non-adjacent
   reporting, since no such transport connection exists.

   While such work is in progress in the DIME working group, Diameter
   has no currently viable mechanism for end-to-end authentication and
   integrity protection.  The working group should consider either
   making non-adjacent overload control contingent on a generic Diameter
   end-to-end protection mechanism, or adding a specialized protection
   mechanism to any resulting non-adjacent overload control solution.

7.  References

7.1.  Normative References

   [RFC6733]  Fajardo, V., Arkko, J., Loughney, J. and G. Zorn,
              "Diameter Base Protocol", RFC 6733, October 2012.

   [I-D.ietf-dime-overload-reqs]
              McMurry, E. and B. Campbell, "Diameter Overload Control
              Requirements", Internet-Draft draft-ietf-dime-overload-
              reqs-06, April 2013.

Campbell               Expires December 01, 2013               [Page 15]
Internet-Draft Diameter Overload Control Solution Issues       June 2013

7.2.  Informative References

   [I-D.roach-dime-overload-ctrl]
              Roach, A. and E. McMurry, "A Mechanism for Diameter
              Overload Control", Internet-Draft draft-roach-dime-
              overload-ctrl-03, May 2013.

   [I-D.korhonen-dime-ovl]
              Korhonen, J. and H. Tschofenig, "The Diameter Overload
              Control Application (DOCA)", Internet-Draft draft-
              korhonen-dime-ovl-01, February 2013.

   [Whac-a-Mole]
              "Whack-a-Mole Colloquial Usage", , <http://
              en.wikipedia.org/wiki/Whack-a-mole#Colloquial_usage>.

Appendix A.  Contributors

   Eric McMurry and Robert Sparks made significant contributions to the
   concepts in this draft.

Author's Address

   Ben Campbell
   Tekelec
   17210 Campbell Rd.
   Suite 250
   Dallas, TX 75252
   US
   
   Email: ben@nostrum.com

Campbell               Expires December 01, 2013               [Page 16]
Diameter Overload Control Solution Issues draft-campbell-dime-overload-issues-00

Diameter Overload Control Solution Issues
draft-campbell-dime-overload-issues-00