DDoS Open Threat Signaling (DOTS) Architecture
draft-ietf-dots-architecture-18

Note: This ballot was opened for revision 16 and is now closed.

Roman Danyliw Yes

Benjamin Kaduk Yes

Comment (2020-02-05 for -16)
Thanks for this well-written document!  It's a great high-level summary of
DOTS and I just have some fairly minor comments.

There might be a bit of mismatch between describing the signal channel
session as associated with "an ephemeral security association" in Section
3.1 and as "expected to be long-lived" in Section 3.2.4.1.

Section 2.2.3

   The DOTS gateway MUST perform full stack DOTS session termination and
   reorigination between its client and server side.  The details of how
   this is achieved are implementation specific.  The DOTS protocol does
   not include any special features related to DOTS gateways, and hence
   from a DOTS perspective, whenever a DOTS gateway is present, the DOTS
   session simply terminates/originates there.

Does the 'cdid' count as a "special feature"?

Section 2.3.1

   An example is a DOTS gateway at the network client's side, and
   another one at the server side.  The first gateway can be located at
   a CPE to aggregate requests from multiple DOTS clients enabled in an

nit: "CPE" does not appear as "well known" at
https://www.rfc-editor.org/materials/abbrev.expansion.txt and should be
expanded on first use.

Section 3.2.3

We could mention that the recursing gateway (e.g., Cn in Figure 12) must
still be authorized to request mitigation for the resources (also)
controlled by client Cc (though perhaps the closing discussion about there
typically being a SLA among client, recursed, and recursing domain suffices).

Section 3.2.4.1

   DOTS client to initialize a new DOTS session.  This challenge might
   in part be mitigated by use of resumption via a PSK in TLS 1.3
   [RFC8446] and DTLS 1.3 [I-D.ietf-tls-dtls13] (session resumption in
   TLS 1.2 [RFC5246] and DTLS 1.2 [RFC6347]), but keying material must
   be available to all DOTS servers sharing the anycast Service Address
   in that case.

"which has operational challenges of its own", perhaps.

   session may involve diverting traffic to a scrubbing center.  If the
   DOTS session flaps due to anycast changes as described above,
   mitigation may also flap as the DOTS servers sharing the anycast DOTS
   service address toggles mitigation on detecting DOTS session loss,
   depending on whether the client has configured mitigation on loss of
   signal.

I am not sure if we've mentioned configuring mitigation on loss of signal,
yet.  A forward reference to Section 3.3.3 might help.

Section 3.2.5

   Network address translators (NATs) are expected to be a common
   feature of DOTS deployments.  The Middlebox Traversal Guidelines in
   [RFC8085] include general NAT considerations for DOTS deployments
   when the signal channel is established over UDP.

nit: the guidelines in 8085 are not specifically about DOTS deployments, so
probably we should say "that are applicable to" DOTS deployments.

Section 3.2.5.1

   request accurate mitigation scopes.  To that aim, the DOTS client can
   rely on mechanisms, such as [RFC8512] to retrieve static explicit
   mappings.  This document does not prescribe the method by which

nit: no comma.

Section 3.3.3

   The impact of mitigating due to loss of signal in either direction
   must be considered carefully before enabling it.  Signal loss is not
   caused by links congested with attack traffic alone, and as such
   mitigation requests triggered by signal channel degradation in either

nit: I think this could be parsed as "links are congested by attack traffic
and other traffic", whereas we intend to say that "attack traffic is not the
only possible cause of link congestion".  Perhaps "Attack traffic congesting
links is not the only reason why signal could be lost" is more clear.

Section 5

   DOTS is at risk from three primary attack vectors: agent
   impersonation, traffic injection and signal blocking.  These vectors

We seem to only partially discuss countermeasures for these attacks in the
rest of the section; one piece that seems noteworthy in its absence is the
requirement (already described in the body text) to authenticate the peer
and perform authorization checks on client requests.  Mitigating against
signal blocking is in general hard, but we could consider mentioning again
that the automated mitigation on loss of signal discussed in Section 3.3.3
is an option, albeit one with risks of its own.

Section 8.2

One could perhaps argue that RFC 4033 and RFC 6887 should be normative
("[RFC4033] must be used where [...]", "[RFC6887] may be used to [...]").

There's a stronger case that RFC 4786 should be normative, as we use a BCP
14 keyword allowing its deployment.

Deborah Brungard No Objection

Alissa Cooper No Objection

Comment (2020-02-05 for -16)
Section 3.2.5.4: "as long as the name is internally and externally resolvable by the same name." I get what this means but I think it could be stated in a less circular fashion.

(Suresh Krishnan) No Objection

Warren Kumari No Objection

Comment (2020-02-05 for -16)
Firstly, thank you for a well written, and easy to understand document -- I personally always find architecture type documents helpful...

I do have a few non-blocking comments:
1: "For example, if the DOTS client domain leverages the DDoS mitigation service of its Internet Transit Provider (ITP), the ITP knows the prefixes assigned to the DOTS client domain. However, if the DDoS Mitigation is offered by a third party DDoS mitigation service provider, it does not know the resources owned by the DOTS client domain."
This is vastly oversimplifying the real-world, to the point that it is harmful / propagates dangerous misconceptions. ISPs may or may not know the prefixes assigned to their customers - they really *should* know the prefixes that the client is choosing to announce to them, but the client may have other prefixes which they only announce through other transits (yes, in this case this ISP would only be providing mitigations for prefixes announced to it, but this isn't clear). In addition, if the DDoS mitigation is provides by a 3rd party, it could know what resources are owned by the client -- in fact, it kind of has to if it is going to agree to mitigate for those prefixes. Note that I *almost* made this a DISCUSS point - I really really think that this bit should be either carefully revised, or, better yet, just struck... 

2: "Signal loss is not caused by links congested with attack traffic alone, and as such mitigation requests triggered by signal channel degradation in either direction may incur unnecessary costs, in network performance and operational expense alike."
The "operational expense" is vary vague - enabling DDoS mitigations is almost definitely going to cause a user visible impact, especially in the case where the  mitigator announces a BGP route to attract traffic. Is this covered by 'operational expense'? This section also leaves out the fact that there is likely a financial impact. 

3: "The signal and data channels are loosely coupled, and may not terminate on the same DOTS server." - s/may not/might not/. Every-time I'm sitting on a plane and the safety briefing says that oxygen mask will fall from the ceiling and that "the bag may not inflate" I have visions of IETFers (and similar pedants!) sitting there and squeezing the bag to ensure that it doesn't... "the bag *might* not inflate" is what is intended, and is also what you want :-P

(Mirja Kühlewind) No Objection

Comment (2020-02-03 for -16)
Maybe double-check use of normative language. There seem to be a few occasions where normative language could be used but isn't.

Barry Leiba No Objection

Comment (2020-02-04 for -16)
A well done document; thanks,  I have just a few minor comments:

— Section 1.1.1 —
You don’t *quite* have the BCP 14 boilerplate verbatim; please fix that.

— Section 1.3 —

   o  The signal and data channels are loosely coupled, and may not
      terminate on the same DOTS server.

I suggest “might not”, lest someone misread it to mean that they are not permitted to (the strict English meaning of “may not”).  Look for “may not” elsewhere also: I saw it in Section 2 as well, and one or two other places.

— Section 2 —

   Thus, DOTS neither specifies how an attack target decides it is under
   DDoS attack, nor does DOTS specify how a mitigator may actually
   mitigate such an attack.

The structure of this “neither...nor” doesn’t work.

NEW
   Thus, DOTS specifies neither how an attack target decides it is under
   DDoS attack, nor how a mitigator may actually mitigate such an attack.
END

Alvaro Retana No Objection

(Adam Roach) No Objection

Comment (2020-02-04 for -16)
Thanks for the work that went into creating this architecture document.
I found it a useful introduction to DOTS.

---------------------------------------------------------------------------

§3.2.5:

Without needing to go into too much detail, it seems that this section would
benefit from citations to RFC 6886, RFC 7659, and ISO/IEC 29341-1-2:2017 as
alternate means to learn about NAT mappings.

Éric Vyncke No Objection

Comment (2020-02-03 for -16)
Dear authors,

Thank you for the work put into this document. As a side note, I really liked the section about the manual/over-the-phone part of it.

Until now, I have read only this document (dots-architecture) from the dots WG, so, please accept my ignorance for details. But, I have a couple of non-blocking questions where your reply will be welcome and appreciated:

Q1) is the monetary cost part of the DOTS signaling ? (I.e., the mitigator telling the target that it will cost so many EUR per hour)

Q2) Using DOTS in an under-attack network, did you consider recommending dual-stack signaling to cope with the rare case where IPv4 is disrupted while IPv6 still works (of course if the DoS is plain flooding this won't help a lot probably; and the dual proposition exists). 

Q3) While I appreciate the value of Anycast DOTS server, hence UDP is mostly required for signaling transport, I wonder whether the choice of UDP (often used AFAIK as volumetric attack as it is easier to spoof) is a good choice compared to TCP or DSCP or ...

Q4) When having multiple DOTS servers, I assume that the case of a dual-stack DOTS server is also covered. Therefore, a word on whether Happy Eyeball (RFC 8305) should probably be useful **IF** applicable

Regards

-éric

Regards,

-éric