Telechat Review of draft-ietf-dime-agent-overload-10
review-ietf-dime-agent-overload-10-genart-telechat-gont-2017-03-25-00

Request Review of draft-ietf-dime-agent-overload
Requested rev. no specific revision (document currently at 11)
Type Telechat Review
Team General Area Review Team (Gen-ART) (genart)
Deadline 2017-03-14
Requested 2017-02-14
Draft last updated 2017-03-25
Completed reviews Opsdir Last Call review of -08 by Will LIU (diff)
Secdir Last Call review of -08 by Watson Ladd (diff)
Secdir Last Call review of -09 by Ólafur Guðmundsson (diff)
Genart Telechat review of -10 by Fernando Gont (diff)
Assignment Reviewer Fernando Gont
State Completed
Review review-ietf-dime-agent-overload-10-genart-telechat-gont-2017-03-25
Reviewed rev. 10 (document currently at 11)
Review result Ready with Issues
Review completed: 2017-03-25

Review
review-ietf-dime-agent-overload-10-genart-telechat-gont-2017-03-25

I am the assigned Gen-ART reviewer for this draft. The General Area
Review Team (Gen-ART) reviews all IETF documents being processed
by the IESG for the IETF Chair. Please wait for direction from your
document shepherd or AD before posting a new version of the draft.

For more information, please see the FAQ at

<https://trac.ietf.org/trac/gen/wiki/GenArtfaq>.

Document: draft-ietf-dime-agent-overload-??
Reviewer: Fernando Gont
Review Date: 2017-03-26
IETF LC End Date: 2017-01-23
IESG Telechat date: Not scheduled for a telechat

Summary:
The document is well-written, when the issue below (which requires a
clarification) is resolved. The document seems to use RFC2119-terms
inconsistently -- this should also be fixed befre progressing the document.

Major issues:
Page 6, Section 3.1.2:

>    When the client has an active and a standby connection to the two
>    agents then an alternative strategy for responding to an overload
>    report from an agent is to change the standby connection to active
>    and route all traffic through the new active connection.

Can't this scheme lead to a situation in which with high oscillations of
load that essentially result in:
a) one agent is overloaded
b) the client switches to the standby agent, overloading it
c) so right after that it has to switch back to the former agent, and
the process continues

Of the options presented, this one looks like the least desirable.


Minor issues:
Section 5.2.3, page 12:
>    The reacting node does not delete an OCS when receiving an answer
>    message that does not contain an OC-OLR AVP (i.e. absence of OLR
>    means "no change").
>
>    The reacting node sets the abatement algorithm based on the OC-Peer-
>    Algo AVP in the received OC-Supported-Features AVP.

Should these two items be "MUSTs", too?


* Section 6.2.1, page 15:
> 6.2.1.  OC-Report-Type AVP
>
>    The following new report type is defined for the OC-Report-Type AVP.
>
>    PEER_REPORT 2   The overload treatment should apply to all requests
>       bound for the peer identified in the overload report.  If the peer
>       identified in the overload report is not a peer to the reacting
>       endpoint then the overload report should be stripped and not acted
>       upon.

Should RFC2119 keywords be used here? -- FWIW, there are a number of
instances in the I-D where it seems the text switches from using RFC2119
keywords to not using them.


Nits/editorial comments: 
* Page 9, Section 5.1.2:
>       Note: The transaction state is used when the DOIC Node is acting
>       as a peer-report reporting node and needs send OC-OLR reports of
>       type peer in answer messages.

s/needs/needs to/