Skip to main content

IETF Recommendations Regarding Active Queue Management
draft-ietf-aqm-recommendation-11

Yes

(Alia Atlas)
(Brian Haberman)
(Jari Arkko)
(Stephen Farrell)

No Objection

(Barry Leiba)
(Richard Barnes)

Note: This ballot was opened for revision 09 and is now closed.

Alia Atlas Former IESG member
Yes
Yes (for -09) Unknown

                            
Alissa Cooper Former IESG member
Yes
Yes (2015-02-16 for -09) Unknown
Thanks for the hard work on this document. I have a few comments below.

-- General:
I think it would be useful to define "network devices" up front, and in particular to clarify whether endpoint devices are subsumed in this category. Are the recommendations in this document meant to apply to queues in tablet/smartphone/laptop OSes as well as in routers, switches, etc.?

-- Sec 1.2:
"instead it provides recommendations on how to select appropriate
   algorithms and recommends that algorithms should be used that a
   recommended algorithm is able to automate any required tuning for
   common deployment scenarios."

Seems like there are some extra words here.

-- Sec 3:
"There is a growing set of UDP-based applications whose congestion
       avoidance algorithms are inadequate or nonexistent (i.e, a flow
       that does not throttle its sending rate when it experiences
       congestion).  Examples include some UDP streaming applications
       for packet voice and video, and some multicast bulk data
       transport.  If no action is taken, such unresponsive flows could
       lead to a new congestion collapse.  Some applications can even
       increase their traffic volume in response to congestion (e.g. by
       adding forward error correction when loss is experienced), with
       the possibility that they contribute to congestion collapse."

Would be nice to have a citation or two in this paragraph (though I can see why you might not want to).

"Lastly, some applications (e.g. current web browsers) open a
       large numbers of short TCP flows for a single session.  This can
       lead to each individual flow spending the majority of time in the
       exponential TCP slow start phase, rather than in TCP congestion
       avoidance.  The resulting traffic aggregate can therefore be much
       less responsive than a single standard TCP flow."

I note that HTTP/2 is on its way to publication and there are a large number of existing implementations, so the characterization of "current web browsers" seems a bit off. I would suggest something like "(e.g. web browsers primarily supporting HTTP 1.1)."

-- Sec 7:
I think there's actually a really important privacy aspect that should be called out here, which is that by virtue of recommending that AQM algorithms not be dependent on specific transport or application behaviors, network devices need not gain insight into upper layer protocol information for the purpose of supporting AQM. That is, the document's explicit recommendation for algorithms to be able to operate in a transport- and application-agnostic fashion is a privacy-enhancing feature.

-- Sec 9:
I'm a little surprised that almost all of the research referenced in this document is from the 1990s, given recent attention that has been paid to this topic.
Brian Haberman Former IESG member
Yes
Yes (for -09) Unknown

                            
Jari Arkko Former IESG member
Yes
Yes (for -09) Unknown

                            
Martin Stiemerling Former IESG member
Yes
Yes (2015-02-11 for -09) Unknown
The authors have acknowledged Elwyn's GEN-ART review [1] and they will integrate the comments in an updated version after the IESG review. 

[1] https://mailarchive.ietf.org/arch/msg/ietf/gHzWBxmv64q6PkbQ0AFW5q6TvpU
Spencer Dawkins Former IESG member
Yes
Yes (2015-02-16 for -09) Unknown
I appreciate very much the work on this document. I'm a Yes, with some niggling.

In this text:

Abstract

   The note largely repeats the recommendations of RFC 2309, and
   replaces these after fifteen years of experience and new research.
   
I'm thinking that doesn't match the "replaces" language in section 1.4, which I think is about right. Perhaps something like

   The note replaces the recommendations of RFC 2309 based on 
   fifteen years of experience and new research.

In this text:

1.1.  Congestion Collapse

   The original fix for Internet meltdown was provided by Van Jacobsen.
   Beginning in 1986, Jacobsen developed the congestion avoidance
   mechanisms [Jacobson88] that are now required for implementations of
   the Transport Control Protocol (TCP) [RFC0768] [RFC1122].  

I'm wondering if RFC 7414 would be a helpful reference here, and elsewhere in the document. I'm bemused by the use of RFC 793 as the reference for TCP later in this document, since RFC 793 TCP behaves nothing like TCP as characterized here. 

I know I'm confused by the reference to RFC 768 - that's UDP, as cited correctly elsewhere in the document.

In this text:

   2.  Non-Responsive Flows

       The User Datagram Protocol (UDP) [RFC0768] provides a minimal,
       best-effort transport to applications and upper-layer protocols
       (both simply called "applications" in the remainder of this
       document) and does not itself provide mechanisms to prevent
       congestion collapse and establish a degree of fairness [RFC5405].

I'm not entirely comfortable with the idea that non-responsive flows use UDP transport (especially in our tunneled world). If you guys think this is OK, I'll hold my nose, but if you wanted to say anything about "other flows that are as non-responsive as UDP transport", I'd think that would be helpful. 

It certainly fits at least as well as the "large number of short-lived TCP flows that are much less responsive" paragraph that you end this section with, which probably fits better under the following list item, 

   3.  Transport Flows that are less responsive than TCP
   
In this text:

   It is essential that all Internet hosts respond to loss [RFC5681],
   [RFC5405][RFC4960][RFC4340].  Packet dropping by network devices that
   are under load has two effects: It protects the network, which is the
   primary reason that network devices drop packets.  The detection of
   loss also provides a signal to a reliable transport (e.g.  TCP, SCTP)
   that there is potential congestion using a pragmatic heuristic; "when
   the network discards a message in flight, it may imply the presence
   of faulty equipment or media in a path, and it may imply the presence
   of congestion.  To be conservative, a transport must assume it may be
   the latter."  Unreliable transports (e.g. using UDP) need to
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   similarly react to loss [RFC5405]
   ^^^^^^^^^^^^^^^^^^^^^^^
   
would it be more correct to say "Applications using unreliable transports (e.g. UDP)"?
Stephen Farrell Former IESG member
Yes
Yes (for -09) Unknown

                            
Ted Lemon Former IESG member
Yes
Yes (2015-02-19 for -09) Unknown
I'm quite surprised that the introduction doesn't mention the problems that high or unpredictable latency can cause with flows that are attempting to do congestion control at the ends (e.g., TCP).   If I were reading this without already knowing about that, I would assume that the goal of this document is to reduce latency for the benefit of applications that require low latency, like VoIP and gaming.   It would be nice if the introduction made mention of the issue of high latency as it affects TCP flows.

The document also talks about congestion collapse as a future risk to be prevented, but I think that this isn't telling the whole story: users of the Internet see localized congestion collapse quite frequently, and have done for quite some time.   It's essentially normal network behavior in hotels, cafes and on airplanes: anywhere where available bandwidth is substantially short of demand.   I don't think this is a problem with technical accuracy, but I think someone reading this document who isn't an expert on congestion control might not realize that this document is talking about that specific sort of failure mode as well as failures deep in the network.

I'm really happy to see this document being published.  The above comments are just suggestions based on my particular concerns about congestion, and do not reflect any degree of expertise, so if they seem exceptionally clueless you should just ignore them.
Adrian Farrel Former IESG member
No Objection
No Objection (2015-02-18 for -09) Unknown
Very readable. Thanks.
Barry Leiba Former IESG member
No Objection
No Objection (for -09) Unknown

                            
Benoît Claise Former IESG member
(was Discuss) No Objection
No Objection (2015-02-24 for -10) Unknown
Hi Gorry,

Thanks for engaging.
>
> Benoit,
>
> We think we have resolved the remaining issues and would like to propose
> text that we think could address you DISCUSS:
>
> We think our point was that tuning should not be required
> *in*the*normal*case*, not
> that they should *never* require tuning (I'm not sure we have created
> anything that
> is 100% auto-tuning). 
If it was a never, then the sentence would be

  3.  AQM algorithm deployment MUST NOT require tuning of initial or
configuration parameters.

> I'm OK with his phrasing in both cases, but would
> suggest the
> words "in common use cases" should be added:
>
>
>   3.  AQM algorithm deployment SHOULD NOT require tuning of initial or
> configuration
I believe that the "in common use case" is redundant (and somehow confusing) with the SHOULD in your proposal.
SHOULD (RFC 2119):

    3. SHOULD   This word, or the adjective "RECOMMENDED", mean that there
       may exist valid reasons in particular circumstances to ignore a
       particular item, but the full implications must be understood and
       carefully weighed before choosing a different course.

However, I don't want to be picky on that point. I'll let your responsible AD decide.
My main point is covered. I'll clear that DISCUSS point.
>
> parameters in common use cases.
>
> 4.3 AQM algorithm deployment SHOULD NOT require tuning in common use cases.
I don't see this change in the v10.
There is an important word in here: "deployment" as opposed to "deployed" in the current 4.3 section title (4.3. AQM algorithms deployed SHOULD NOT require operational tuning)
"Deployment" brings to the notion of initial deployment as opposed to "deployed".
This is the reason why I propose:

NEW:
4.3 AQM algorithm deployment SHOULD NOT require operational tuning

I hope you will include this change, but it's not DISCUSS-worth IMO.
Same remark as above regarding "AQM algorithm deployment"

Regards, Benoit
Joel Jaeggli Former IESG member
No Objection
No Objection (2015-02-19 for -09) Unknown
I see relatively little, and that includes here mention that time scales for queue management and the time scale for application responsiveness to congestion signals are wildly different. e.g. one is measured in usecs the other is bounded by RTT. queue sizing and policing around abberant events, for example micro-as loops driven by a prefix withdraw isn't really dealt with at all.
Kathleen Moriarty Former IESG member
No Objection
No Objection (2015-02-18 for -09) Unknown
Thanks for your work on this draft, it looks good.  There are some tiny nits that the SecDir reviewer found that you might want to consider:
https://www.ietf.org/mail-archive/web/secdir/current/msg05357.html
Pete Resnick Former IESG member
No Objection
No Objection (2015-02-18 for -09) Unknown
This document does not have the pre-5378 boilerplate. Have all of the authors of 2309 actually signed the appropriate things, or does this document need the pre-5378 boilerplate?
Richard Barnes Former IESG member
No Objection
No Objection (for -09) Unknown