Diameter Overload Rate Control
RFC 8582

Note: This ballot was opened for revision 10 and is now closed.

(Ben Campbell) Yes

(Spencer Dawkins) Yes

Comment (2019-01-22 for -10)
Thank you for doing this work, and for basing it on SIP Overload Control. It's nice when protocol designers adopt good ideas from each other. 

There are three SHOULDs in Section 5.1, Reporting Node Overload Control State, I'd like to understand better. 

   A reporting node that uses the rate abatement algorithm SHOULD
   maintain reporting node Overload Control State (OCS) for each
   reacting node to which it sends a rate Overload Report (OLR).


^^ This one - I'm guessing that this is "SHOULD unless you're still writing code upgrading an implementation that treats all reacting nodes the same way", based on this next sentence, but I'm guessing. Why wouldn't you do this?

      This is different from the behavior defined in [RFC7683] where a
      single loss percentage sent to all reacting nodes.

   A reporting node SHOULD maintain OCS entries when using the rate
   abatement algorithm per supported Diameter application, per targeted
   reacting node and per report type.

^^ Your answer to my previous question is likely to help me understand this one, but I'm guessing reasons why you wouldn't do this. 

   A rate OCS entry is identified by the tuple of Application-Id, report
   type and DiameterIdentity of the target of the rate OLR.

   The rate OCS entry SHOULD include the rate allocated to the reacting
   note.

^^ I'm really interested on this one - does the rate abatement algorithm work without knowing the rate that's allocated? but assuming that it does work, I'm still guessing why you wouldn't do this. 

   A reporting node that has selected the rate overload abatement
   algorithm MUST indicate the rate requested to be applied by DOIC
   reacting nodes in the OC-Maximum-Rate AVP included in the OC-OLR AVP.

   All other elements for the OCS defined in [RFC7683] and
   [I-D.ietf-dime-agent-overload] also apply to the reporting nodes OCS
   when using the rate abatement algorithm.

Ignas Bagdonas No Objection

Deborah Brungard No Objection

Alissa Cooper No Objection

Benjamin Kaduk No Objection

Comment (2019-01-23 for -10)
Thanks for this well-written document!  My comments are all essentially
editorial in nature.

One comment of a general note regards the usage of the word "indicate" --
usually when I read "indicate" I expect that to be part of some
protocol message or other formal data structure, but IIUC the OCS is
entirely a local matter, so "indicating" something in the OCS could be
equally well said as "storing" or "noting" or similar.  (I do see other
similar usage of "indicate" in RFC 7683, so it's unclear that there are
really any grounds for changing the usage in this document.)

Section 4

nit: Saying that nodes MUST indicate support for *both* loss and rate seems
to duplicate the requirement from RFC 7683 and would potentially complicate
future updates.  The descriptive note about "nodes supporting the rate
feature will support both" seems a better way to phrase things.

Section 5.1

Is keeping track of how much a reacting node is actually sending considered
to not be part of the OCS (as opposed to the allocated rate, which is part
of the OCS as noted here)?

Section 6.2

   This extension does not define new overload report types.  The
   existing report types of host and realm defined in [RFC7683] apply to
   the rate control algorithm.  The peer report type defined in
   [I-D.ietf-dime-agent-overload] also applies to the rate control
   algorithm.

side note: I'm curious how the directionality is such that the report type
applies to the algorithm, as opposed to the other way around.

Section 7.1

   Upon receiving the overload report with a target maximum Diameter
   request rate, each reacting node applies abatement treatment for new
   Diameter requests towards the reporting node.

(nit?) My (hasty) reading of 7683 is that "abatement treatment" means
either diversion or throttling, and that traffic processed normally is not
considered to receive "abatement treatment".  If that reading is correct,
then this text is suggesting that no new requests receive normal treatment
after the reception of an OLR with a target rate, which does not seem quite
right.

Section 7.2

   Note that the value of OC-Maximum-Rate AVP (in request messages per
   second) for the rate algorithm provides an upper bound on the traffic
   sent by the reacting node to the reporting node.

I see that this is not using normative language, and that the following
paragraph does clarify the caveats, but "upper bound" usually is read as
"strict upper bound", and there are several ways in which this bound could
(at least temporarily) not be strict.  Perhaps "loose upper bound" is
better phrasing.

Section 7.3.1

Perhaps note explicitly that "//" denotes comments?

   In determining whether or not to transmit a specific message, the
   reacting node can use any algorithm that limits the message rate to
   the OC-Maximum-Rate AVP value in units of messages per second.  For
   ease of discussion, we define T = 1/[OC-Maximum-Rate] as the target
   inter-Diameter request interval.  It may be strictly deterministic,
   or it may be probabilistic.  It may, or may not, have a tolerance

nit: The intervening sentence defining 'T' seems to change the binding of
"It" away from "the algorithm".

   Note that when the OC-Maximum-Rate value is 0 with a non-zero OC-
   Validity-Duration, then the reacting node should apply abatement
   treatment to 100% of Diameter requests destined to the overloaded
   reporting node.  However, when the OC-Validity-Duration value is 0,
   the reacting node should stop applying abatement treatment.

nit: this paragraph seems like it would be better placed elsewhere, as its
content is independent of any particular throttling algorithm.

   Reporting nodes with a very large number of reacting nodes, each with
   a relatively small arrival rate, will generally benefit from a
   smaller value for TAU in order to limit queuing (and hence response
   times) at the reporting node when subjected to a sudden surge of
   traffic from all reacting nodes.  Conversely, a reporting node with a
   relatively small number of reacting nodes, each with proportionally
   larger arrival rate, will benefit from a larger value of TAU.

Am I correct in assuming that "larger" and "smaller" values of TAU here are
to be measured with respect to T (i.e., as a ratio)?  This may be worth
stating more explicitly.

Section 8.3

Do you want to add this requirement as a "Note" on the IANA registry
itself?

Section 9

Other than what Mirja has already noted, I only have one minor remark.

It seems that an attacker that can set up reacting nodes has a slightly
different way to disrupt legitimate traffic when "rate" is used vs. "loss",
but the details of any attack depend on implementation behavior at the
reporting node (e.g., whether it divides its total capacity evenly amongst
reacting nodes or uses a more complicated allocation scheme).  And since an
attacker that can set up new reacting nodes is almost certainly able to
send traffic from those nodes, in practice there is no substantial
difference, so the decision to ignore this difference and just refer to the 7683 security
considerations seems justified.

Suresh Krishnan No Objection

Mirja Kühlewind No Objection

Comment (2019-01-21 for -10)
The security considerations of rfc7683 have an own section on non-complaint nodes (section 10.3). While this is discussed in rfc7683, I think it is especially important for this document to remind the reader that there may be non-compliant nodes that may send with a higher than indicated rate. I would recommend to add one more statement to the security considerations section of this doc and potentially point the reader explicitly at section 10.3 of rfc7683. 

Two comments on normative language:

1) Section 5.6: "Any algorithm implemented MUST result in the
      correct rate of traffic being sent to the reporting node."
I would recommend to maybe change this to:
"Any algorithm implemented MUST correctly limit the maximum
 rate of traffic being sent to the reporting node."
Otherwise I would think this is hard to implement in practice.

2) Section 7.2: "... the reporting node MUST periodically evaluate its overload state..." 
Not sure if the normative language is really appropriate here as this does not impact interoperability, nor can be checked. If at all, I guess I would recommend a "SHOULD" instead.

And two more editorial comments:

1) As section 7.3 only describes (in quite some detail) an example algorithm, I would rather have put this in an appendix. But I guess that's a matter of taste...

2) I don't think section 8.2. is needed.

(Terry Manderson) No Objection

Alexey Melnikov No Objection

(Eric Rescorla) No Objection

Alvaro Retana No Objection

Adam Roach No Objection

Comment (2019-01-22 for -10)
Thanks to everyone who worked on this document. I have a handful of editorial
comments on its contents that the editor may wish to consider when revising it
to address the I-D nits identified by the document shepherd.

---------------------------------------------------------------------------

§1:

>  of reporting nodes when subjected to rapidly changing loads.  The

Nit: "...rapidly-changing..."

>  One of the benefits of a rate based algorithm over the loss algorithm

Nit: "...rate-based..."

>  to distribute it's load over the set of reacting nodes from which it

Nit: "...its load..."

>  specify the amount of traffic on a per reacting node basis implies

Nit: "...per-reacting-node basis..."

>  traffic to that reacting node.  If the number of reacting node
>  changes, either because new nodes are added, nodes are removed from

Nit: "...number of reacting nodes..."

>  Conveyance (DOIC) solution [RFC7683] to add support for the rate
>  based overload abatement algorithm.

Nit: "...rate-based..."

---------------------------------------------------------------------------

§4:

>  As defined in [RFC7683], a DOIC reporting node supporting the rate
>  feature MUST select a single abatement algorithm

Consider whether normatively reiterating normative language from another
specification makes sense. In the general case, this is a bad idea, since it
opens the door to conflicting normative definitions of behavior. Non-normative
restatement of behavior with a citation to the document that has the normative
description is typically safer.

---------------------------------------------------------------------------

§5.1:

>     This is different from the behavior defined in [RFC7683] where a
>     single loss percentage sent to all reacting nodes.

Nit: "...percentage is sent..." (consider also: "...where a reporting node
sends a single loss percentage to all reacting nodes.")

---------------------------------------------------------------------------

§5.2:

>  OC-OLR AVP as an element of the abatement algorithm specific portion
Nit: "...abatement-algorithm-specific portion..."

---------------------------------------------------------------------------

§5.3:

>  A reporting node that has selected the rate overload abatement
>  algorithm and enters an overload condition MUST indicate rate as the
>  abatement algorithm in the resulting reporting node OCS entries.
>
>  A reporting node that has selected the rate abatement algorithm and
>  enters an overload condition MUST indicate the selected rate in the
>  resulting reporting node OCS entries.

These paragraphs are similar enough that it's kind of tricky to pull out the
intended distinction being made. Consider:

   A reporting node that has selected the rate overload abatement
   algorithm and enters an overload condition MUST indicate rate as the
   abatement algorithm and MUST indicate the selected rate in the resulting
   reporting node OCS entries.

---------------------------------------------------------------------------

§5.6:

>     Other algorithms for controlling the rate MAY be implemented by
>     the reacting node.  Any algorithm implemented MUST result in the
>     correct rate of traffic being sent to the reporting node.

It's not clear why this paragraph is indented. In some RFCs, it's conventional
to indent non-normative editor's notes to help with clarity. The presence of
normative language in this paragraph points away from that usage. Consider
either un-indenting this paragraph, or explaining the way in which this document
uses indented paragraphs (e.g., in the Introduction)

---------------------------------------------------------------------------

§7.  Rate Based Abatement Algorithm

Nit: "Rate-Based..."



---------------------------------------------------------------------------

§8.1:

>  New AVPs defined by this specification are listed in Section 6.  All
>  AVP codes are allocated from the 'Authentication, Authorization, and
>  Accounting (AAA) Parameters' AVP Codes registry.

This is a bit confusing -- it's not clear to me whether the information in §6.3
requires IANA action. It would probably be best to be a bit more explicit in
this section about specifically which actions IANA needs to take.

---------------------------------------------------------------------------

§8.3:

>  All DOIC report types defined in the future MUST indicate whether or
>  not the rate algorithm can be used with that report type.

It is also unclear what actions IANA is expected to perform based on this input.

---------------------------------------------------------------------------

§10:

Either remove or (preferably) populate this section.

Martin Vigoureux No Objection