Last Call Review of draft-ietf-wish-whip-09
review-ietf-wish-whip-09-genart-lc-worley-2023-08-08-00
review-ietf-wish-whip-09-genart-lc-worley-2023-08-08-00
I am the assigned Gen-ART reviewer for this draft. The General Area Review Team (Gen-ART) reviews all IETF documents being processed by the IESG for the IETF Chair. Please treat these comments just like any other last call comments. For more information, please see the FAQ at <https://trac.ietf.org/trac/gen/wiki/GenArtfaq>. Document: draft-ietf-wish-whip-09 Reviewer: Dale R. Worley Review Date: 2023-08-08 IETF LC End Date: 2023-08-08 IESG Telechat date: [not known] Summary: This draft is on the right track but has open issues, described in the review. The exposition could be clarified in several places. A few of the clarifications could be considered technical issues. I list the issues in textual order, with an initial summary of the most important issues. * Summary of the major issue: A very important aspect of this document is that it defines the usage of "ICE via SDP via HTTP", equivalently, "supporting ICE offer/answer over HTTP" (as opposed to the original "ICE via SDP via SIP"). That definition likely will have broader usage than just WHIP. But the document suffers from the "expert's disease", in that the authors clearly have deep knowledge of ICE and consequently only document the details of the ICE processing without providing any of the framework. This leaves naive readers to reconstruct the big picture. I was going to add "As far as I can tell, all of the needed details are listed somewhere in section 4", but as you can see below, once I wrote an outline of what is needed, there are a few points that don't seem to be specified, at least not to the point that I recognized it.) I suggest that section 4 be organized by: - stating that it is defining "supporting ICE offer/answer over HTTP" (so that it can be clearly referenced by later specifications) - specifying the abstract operations involved (with the mapping to specific operations listed either in this list or a later list): - - starting the ICE negotiation (via POST carrying application/sdp) - - updating the ICE when trickling (via PATCH carrying application/trickle-ice-sdpfrag) - - restarting ICE (via PATCH carrying (via PATCH carrying application/trickle-ice-sdpfrag, but it's not clear how this is distinguished, RFC 8840 does not specify it; perhaps because it carries ice-ufrag/ice-pwd attributes (see section 4.1)?) - - termination (via DELETE from the client; not clear how from the server) - specifying how WHIP constrains the ICE abstraction (Media server MUST not do trickle ICE updates, not clear if Media server MAY do ICE restarts, Media server MAY not support trickle ICE or ICE restart, etc.) - discussion of how out-of-order requests may arise (which isn't at all clear to me, as requests seem to be generated only by the server, and carried by TCP, so they seem to be inherently serialized) - discussion of how ETags MAY/MUST be handled, as that is comprehensible only when one looks at the serialization needs of all of the operations together - specifying the particular features of each of the operations, including any particulars of request and response fields and what response codes are to be used for what error situations * Summary of minor issues that might have technical content: At various points in section 4 the text refers to a device sending a request or generating a response but in many of them, the text doesn't state whether the device is on the client or server end, or might be both. I assume that the nature of the operation implies what devices might perform it based on text elsewhere, but it's hard to fully understand on the first reading pass. If possible, each sentence should make this clear. -- Then again, if section 4 is reorganized to describe generic ICE/SDP/HTTP usage, attention should be paid that the generic usage may define how e.g. the server does an operation, even if WHIP servers may not do that operation. Section 4 lists a collection of 4xx and 5xx response codes to be used in certain situations. Are these set due to the specific semantics of the codes or just to ensure that the recipient can tell what the cause of the error was? (HTTP suffers from having a large number of error responses but all of them have vague semantics. Unfortunately, there doesn't seem to be an HTTP response header that can carry codes defined for ICE usage specifically.) Section 4 requires the WHIP server to reject certain specific request methods, but no others. Verify that you intend this specific limitation. Section 4.1 references sections 6.6.2, 2.3, and 3.1 of RFC 9110. All of these references appear to be incorrect, in that those sections do not discuss the topics at hand. Section 4.3 references section 6.4.7 of RFC 9110, which does not exist. Some of these appear to be referencing sections in RFC 7231, which is obsoleted by RFC 9110. Section 4.1 talks about out-of-order PATCH requests but is unclear about what may cause them, as the immediate implication is that somehow the client is sending overlapping PATCH requests. Probably the authors fully understand the allowed sequences of operations that cause this but the text needs to clarify what is being described. Section 4.1 has inconsistent statements about a WHIP resource's response code if the client requests an ICE restart but the resource does not support ICE restart. Section 4.2 extends RFC 8842, and this document should be marked as doing so. In Section 4.4, I notice that there's no explicit differentiation between the STUN and the TURN server information in returned Link header fields. Please verify that either (1) the presence of username and credential-type differentiates the two cases, or (2) the Media client/server do not need to differentiate the two cases. Section 4.7 requires POST responses to include Link headers for any WHIP extensions supported by the server. Perhaps OPTIONS responses should be required to do so also. The designators of WHIP extensions are described as URIs. In fact, they are URNs, and it would be more precise to refer to them as URNs in the text. * All issues (most of them exposition/editorial), in text order: 1. Introduction It also describes how to negotiate media flows using the Offer/Answer Model with the Session Description Protocol (SDP) [RFC3264] as well as the formats for data sent over the wire (e.g., media types, codec parameters, and encryption). I think "as well as" should be "including". RTSP [RFC7826], which is based on RTP, is not compatible with the SDP offer/answer model [RFC3264]. This statement might be clarified, as the naive reader (me) considers offer/answer to also be based on RTP. I suspect that the critical meaning is "RTSP, unlike SDP offer/answer, has no provisions for negotiating the characteristics of the media session." This document proposes a simple protocol for supporting WebRTC as media ingestion method which: I would amend to "... a simple protocol, WHIP, for ..." to have a clear assertion in the introduction of what WHIP is. I would also add "HTTP", as the preceding text describes what WHIP *isn't* based on: This document proposes a simple protocol, WHIP, based on HTTP, for supporting WebRTC as media ingestion method which: -- * Allows for ingest both in traditional media platforms and in WebRTC end-to-end platforms with the lowest possible latency. Verify that "ingest" is used in the field as shorthand for "ingestion" (rather than being a typo). The text seems to use both words with the same meaning; it should probably use only one. * Is usable both in web browsers and in native encoders. I would say "standalone encoders", but perhaps this is the industry terminology. 2. Terminology For clarity, I would move Figure 1 to after paragraph 1 of section 2 and amend it to something like this. Or perhaps better, relocate the current section 3 before section 2, so it can provide the context for the glossary. +--------------+ +-------------+ +---------------+ +--------------+ | Media client | | WHIP client | | WHIP server | | Media server | +------+-------+ +--+----------+ +---------+---+-+ +------+-------+ | | | | | | | | +-+-------------+ | | | | | WHIP resource | | | | | +--------|------+ | | | | | | | |HTTP POST (SDP Offer) | | | | +------------------------>+ | | | |201 Created (SDP answer) | | | | +<------------------------+ | | | | ICE REQUEST | | +---------------------------------------------------------------->+ | | ICE RESPONSE | | |<----------------------------------------------------------------+ | | DTLS SETUP | | |<===============================================================>| | | RTP/RTCP FLOW | | +<--------------------------------------------------------------->+ | HTTP DELETE | +----------------------------------->| | 200 OK | <------------------------------------+ The reason to move it before the definitions is that definitions would be a lot clearer with the diagram already in mind. The "WHIP client" is separated into two components, one dealing with HTTP and SDP and one dealing with RTP, with both being involved in ICE, in the same way that the server is divided into two components with the same division of responsibilities. I suspect that historically, the server-side is conceptualized that the "Media Server" exists a priori with the "WHIP endpoint" being added to it as an extra unit, but the client-side is conceptualized as being a unitary system. That structure makes the specification harder to understand, though, and I think showing the client and server with more parallel structures would clarify the specification. The "WHIP resource" is shown as being a dependent of the "WHIP server". The "WHIP endpoint" is renamed "WHIP server". At least in the SIP world, "endpoint" is used to refer to *both* ends of one connection, and so the term "WHIP endpoint" would be expected to include the WHIP client as well. The capitalization of "Media server" is made consistent with the other terms. (See also the entry in the glossary.) Although conceptually the left two items (Media client and WHIP client) and the right three items (WHIP server, WHIP resource, and Media server) collectively form the two end-systems of the client/server relationship, I have not tried to invent names for the two collections because there doesn't seem to be a need to reference them as wholes. These changes do require adjusting the terminology in the rest of the document. 3. Overview See comments on section 2. 4. Protocol Operation The SDP offer SHOULD use the "sendonly" attribute and the SDP answer MUST use the "recvonly" attribute in any case. You may want to extend this to "SHOULD use the "sendonly" attribute but MAY use the "sendrecv" attribute", since the other possible values are useless. "in any case" seems redundant given the presence of "MUST". POST /whip/endpoint HTTP/1.1 I recommend preceding each of the HTTP messages with a description of what the endpoints are. E.g. "POST request from WHIP client to WHIP server" and "201 Created response from WHIP server to WHIP client". Figure 2: HTTP POST doing SDP O/A example Perhaps more informative as "Example WHIP POST executing SDP offer/answer". Once a session is setup, ICE consent freshness [RFC7675] SHALL be used to detect non graceful disconnection and DTLS teardown for session termination by either side. I think the meaning is "Once a session is set up, ICE consent freshness [RFC7675] MUST be maintained. Both endpoints MUST monitor freshness and MUST tear down the connection if freshness is lost." (Note I don't know how ICE consent freshness is done, so I am assuming that no further details need be specified to ensure interoperability.) Probably the 2nd following paragraph ("A Media Server terminating...") can be combined with this paragraph. Also, expand it to place the same requirement on the Media client. The WHIP endpoints MUST return an "405 Method Not Allowed" response for any HTTP GET, HEAD or PUT requests on the endpoint URL in order to reserve its usage for future versions of this protocol specification. This and the 2nd following paragraph only forbid a few methods, and allow an implementation to implement any others as extensions. Please verify that these are exactly what you need. Also, the wording should be "reserve their usage" because the referent is "any ... requests". The WHIP endpoints MUST support OPTIONS requests for Cross-Origin Resource Sharing (CORS) as defined in [FETCH] and it SHOULD include an "Accept-Post" header with a mime type value of "application/sdp" on the "200 OK" response to any OPTIONS request received as per [W3C.REC-ldp-20150226]. I think this would be less awkward as The WHIP endpoints MUST support OPTIONS requests for Cross-Origin Resource Sharing (CORS) as defined in [FETCH]. The "200 OK" response to any OPTIONS request SHOULD include an "Accept-Post" header with a mime type value of "application/sdp" as per [W3C.REC-ldp-20150226]. 4.1. ICE and NAT support Text might be added to the beginning of this section similar to the text at the beginning of section 4.2, as WHIP puts specific restrictions on ICE usage for simplicity's sake. It seems like "NAT" should be replaced by something else, as "NAT" is not mentioned elsewhere in the document. "STUN/TURN" seems to be natural, but there is a section for that (4.4). Perhaps this section should be titled "ICE usage". Also, several subsections of section 4 seem to be about usage and restrictions of various protocols; perhaps they should have parallel names e.g. "ICE usage", "WebRTC usage", "STUN/TURN usage". In order to simplify the protocol, there is no support for exchanging gathered trickle candidates from Media Server ICE candidates once the SDP answer is sent. I think this would be clearer as follows, since that moves the fact we are talking about the Media server earlier in the sentence: In order to simplify the protocol, there is no support for the Media server sending further trickle candidates once the SDP answer is sent. -- The WHIP client MAY perform trickle ICE or ICE restarts as per [RFC8838] by sending an HTTP PATCH request ... PATCH is an unusual method, defined in RFC 5789, and it seems like there should be a reference for it here. Also, you probably want to insert here a statement whether the WHIP resource MAY *initiate* ICE restarts; as far as I can tell, the text does not address this. Also, you want to directly state that the semantics (meaning) of an ICE update/restart request and thus how it is to be processed, is determined by the body, as specified in RFC 8840. The current text leaves that silently implied. If the WHIP resource supports either Trickle ICE or ICE restarts, but not both, it MUST return a "405 Not Implemented" response for the HTTP PATCH requests that are not supported. If the WHIP resource does not support the PATCH method for any purpose, it MUST return a "501 Not Implemented" response, as described in [RFC9110] Section 6.6.2. The canonical description for "405" is "Method not allowed". But I would reorganize and combine these paragraphs along these lines: If the WHIP resource supports either Trickle ICE or ICE restarts, but not both, it MUST return a "405 Method not allowed" response for HTTP PATCH requests for the feature that is not supported. If neither feature is supported, it MUST return a "501 Not Implemented" response for such HTTP PATCH requests. This allows PATCH requests for other purposes, if any of those are ever defined. BTW, the reference "[RFC9110] Section 6.6.2" ("Trailer") seems to be unrelated to the text; please verify it. As the HTTP PATCH request sent by a WHIP client may be received out- of-order by the WHIP resource, ... To clarify this, I would expand it to: The WHIP client MAY send overlapping HTTP PATCH requests to one WHIP resource. As the HTTP PATCH request sent by a WHIP client may be received out-of-order by the WHIP resource, ... However it is unclear to me why the client would want to do this. Perhaps the concept is that the server may also be sending PATCH requests for trickle ICE etc.? In that case, you would want to phrase it more like: As the WHIP client and the WHIP resource may be both be sending HTTP PATCH requests without coordination, an HTTP PATCH request sent by a WHIP client may be received out-of-order by the WHIP resource, ... In any case, you should ensure you understand the possible complications and explain them here to the reader. Also, the final sentence of the paragraph is: Note that including the ETag in the original "201 Created" response is only REQUIRED if the WHIP resource supports ICE restarts and OPTIONAL otherwise. I *think* this means that the entirety of the preceding paragraph is conditional "If the WHIP resource supports ICE restarts, the following complications are possible ... and so the WHIP resource must return ETag header fields ...". Whatever the causation/conditions are, they should be put at the start of the paragraph. A WHIP client sending a PATCH request for performing trickle ICE MUST include an "If-Match" header field with the latest known entity-tag as per [RFC9110] Section 3.1. When the PATCH request is received by the WHIP resource, it MUST compare the indicated entity-tag value with the current entity-tag of the resource as per [RFC9110] Section 3.1 and return a "412 Precondition Failed" response if they do not match. Beware that this paragraph appears to be a continuation of the preceding paragraph, in that the sentence "Note ..." appears to be a condition on it also. If that is true, when you relocate the condition to before the first paragraph, the condition should itself be paragraph-level, so it is clear that its scope is not ended by the end of the first paragraph. A WHIP resource receiving a PATCH request with new ICE candidates, but which does not perform an ICE restart, MUST return a "204 No Content" response without body. According to the previous text, if a WHIP resource receives a PATCH requesting an ICE restart but it does not implement ICE restart, it MUST return either 405 or 501. This needs to be clarified; perhaps there is a difference between "does not support" and "does not perform"? If the Media Server does not support a candidate transport or is not able to resolve the connection address, it MUST accept the HTTP request with the "204 No Content" response and silently discard the candidate. Note that the Media server does not generate HTTP responses in any case. I assume the meaning is "the WHIP resource must accept ...". I think the intended meaning is If a WHIP resource receives a PATCH request containing a new ICE candidate with a transport that the Media Server does not support or is unable to resolve the address, the resource must MUST silently discard the candidate and process the remainder of the request. Typically, it will accept the HTTP request with the "204 No Content" response. The last sentence is written as it is because the resource may have some other reason for rejecting the PATCH. A WHIP client sending a PATCH request for performing ICE restart MUST contain an "If-Match" header field with a field-value "*" as per [RFC9110] Section 3.1. Naively, this seems questionable, as presumably all ICE operations need to be serialized. Or is it that ICE restart causes all preceding ICE updates to become irrelevant? Again, some explanation of the allowed sequencing of ICE operations and the necessary serialization/dependencies would be good introduction. Also, the "200 OK" response for a successful ICE restart MUST contain the new entity-tag corresponding to the new ICE session in an ETag response header field and MAY contain a new set of ICE candidates for the Media Server. This reads oddly because it combines "new ICE session" with the possibility that the triggering PATCH request may not include new ICE candidates. What is the implied processing if there are no candidates? There are two possibilities, one that the new ICE session starts with no candidates (in that direction), the other is that the previously specified set of candidates is retained. The latter possibility implies that the new session depends on the previous session and introduces serialization requirements. If the ICE request cannot be satisfied by the WHIP resource, the resource MUST return an appropriate HTTP error code and MUST NOT terminate the session immediately. You probably mean "If the ICE update or restart request ...". Also, you probably mean "... and MUST continue the previous ICE session." In either case, the session MUST be terminated if the ICE consent expires as a consequence of the failed ICE restart as per [RFC7675] Section 5.1. Probably intensify to "In any case, ...". Because the WHIP client needs to know the entity-tag associated with the ICE session in order to send new ICE candidates, it MUST buffer any gathered candidates before it receives the HTTP response to the initial POST request or the PATCH request with the new entity-tag value. I think this could be clarified by putting more of the sentence in time order: Because the WHIP client needs to know the entity-tag associated with the ICE session in order to send a PATCH containing new ICE candidates, it MUST wait until it receives the HTTP response to the initial POST request or the previous PATCH request before it can send any candidates that were gathered after the previous request was sent. -- ... the STUN requests will contain invalid ICE information and will be rejected by the server. When this situation is detected by the WHIP Client, it MUST ... Given that this is a MUST, we need a more rigid specification of the specific responses that compel this behavior in the client. Also, "the server" is used, but apparently in the sense "the STUN server", not "the Media server" as elsewhere in the text. Also, it seems to be implied that the Media server will not do ICE restart if it receives these errors from the STUN server. Is that true? 4.2. WebRTC constraints In the specific case of media ingestion into a streaming service, Given that the abstract says "ingestion of content into streaming services and/or CDNs", this appears to confine the paragraph to discussing a subset of WHIP implementations. This clause should probably be revised to clarify that it covers all WHIP usages. Both the WHIP client and the WHIP endpoint SHALL use SDP bundle [RFC9143]. ... Please verify that all proper specifications are both "MUST support" and "MUST use". E.g., there is "... will use RTP/RTCP multiplexing ..." which should say "SHALL"/"MUST". ... bundled "m=" sections as per [RFC8858] i. The ending "i" seems like a typo, but please verify it isn't the truncation of intended text. ... in a single MediaStream as defined in [[!RFC8830]] ... It appears that RFC 8830 is an intended reference but it is not listed in section 8. When a WHIP client sends an SDP offer, it SHOULD insert an SDP "setup" attribute with an "actpass" attribute value, as defined in [RFC8842]. However, if the WHIP client only implements the DTLS client role, it MAY use an SDP "setup" attribute with an "active" attribute value. If the WHIP endpoint does not support an SDP offer with an SDP "setup" attribute with an "active" attribute value, it SHOULD reject the request with a "422 Unprocessable Entity" response. NOTE: [RFC8842] defines that the offerer must insert an SDP "setup" attribute with an "actpass" attribute value. However, the WHIP client will always communicate with a Media Server that is expected to support the DTLS server role, in which case the client might choose to only implement support for the DTLS client role. Note this means that this document updates RFC 8842, and should be marked as such. This part seems to be incomplete or inconsistent. Clearly, a WHIP server MUST support the DTLS server role, and so it MUST be able to accept "a=setup:active". This means that the described 422 error response should never occur. 4.3. Load balancing and redirections WHIP endpoints and Media Servers might not be colocated on the same server, so it is possible to load balance incoming requests to different Media Servers. In stead of the passive "it is possible ...", it would be clearer to say "the WHIP server may load balance incoming requests to multiple Media servers." WHIP clients SHALL support HTTP redirection via the "307 Temporary Redirect" response as described in [RFC9110] Section 6.4.7. RFC 9110 does not contain a section 6.4.7. 4.4. STUN/TURN server configuration A reference to each STUN/TURN server will be returned using the "Link" header field [RFC8288] with a "rel" attribute value of "ice- server". I notice that there's no explicit differentiation between the STUN and the TURN server information. Please verify that either (1) the presence of username and credential-type differentiates the two cases, or (2) the Media client/server do not need to differentiate the two cases. Figure 5: Example ICE server configuration It seems the caption should be "Example STUN/TURN server configuration". The generation of the TURN server credentials may require performing a request to an external provider, which can both add latency to the OPTIONS request processing and increase the processing required to handle that request. In order to prevent this, the WHIP Endpoint SHOULD NOT return the STUN/TURN server configuration if the OPTIONS request is a preflight request for CORS, that is, if The OPTIONS request does not contain an Access-Control-Request-Method with "POST" value and the the Access-Control-Request-Headers HTTP header does not contain the "Link" value. I think the normative structure can be made clearer as: In order to avoid adding latency and processing to an OPTIONS request, the WHIP server SHOULD NOT return return the STUN/TURN server configuration in OPTIONS responses unless either (1) the configuration can be determined without extra processing or latency (e.g. by performing a request to an external provider), or (2) if the OPTIONS request is an XXX, and contains an Access-Control-Request-Method with "POST" value and the the Access-Control-Request-Headers HTTP header contains the "Link" value. "is an XXX" optional and is to clarify the situation that exception (2) supports. From the current text, I think it should be "is not a preflight request for CORS". If "preflight request for CORS" is retained, please add a reference to "[FETCH]" here, as the only other mention of CORS in the document which has the reference to "[FETCH]" is a long way away. It might be also possible to configure the STUN/TURN server URIs with long term credentials provided by either the broadcasting service or an external TURN provider on the WHIP client, overriding the values provided by the WHIP endpoint. This should start with that it is the WHIP client that is being configured: The WHIP client MAY be configured with the STUN/TURN server URIs and long term credentials provided by either the broadcasting service or an external TURN provider, overriding the values provided by the WHIP endpoint. 4.7. Protocol extensions The WHIP endpoint MUST return one "Link" header field for each extension, with the extension "rel" type attribute and the URI for the HTTP resource that will be available for receiving requests related to that extension. This seems awkward to me. Perhaps The WHIP endpoint MUST return one "Link" header field for each extension that it supports, with the "rel" attribute having the extension URN as its value and the URI being suitable for that extension. The part "the URI being suitable for that extension" recognizes that we can't predict what the semantics of the URI will be for a future extension, but the extension will define the semantics. Protocol extensions supported by the WHIP server MUST be advertised to the WHIP client in the "201 Created" response to the initial HTTP POST request sent to the WHIP endpoint. It seems like a good idea to require protocol extension support declaration in OPTIONS responses as well. 5. Security Considerations The writers should do a spelling-check pass over section 5, as there are enough misspellings to inconvenience the Editor. On top of that, the WHIP protocol exposes a thin new attack surface expecific [sic] of the REST API methods used within it: It seems like the following three items could be condensed, as most of the text enumerates problems that are well-known or clear from the context. Perhaps something like: * HTTP flooding and resource exhaustion: Both the WHIP server and the WHIP resources SHOULD implement a rate limit and avalanche control mechanism for incoming requests. * Security of WHIP resource URLs: Servers for WHIP resource URLs SHOULD either implement authentication and authorization similarly to WHIP servers (see section 4.5) or use URLs that are capabilities, in that they are unguessable and the possession of the URL shows the client has the right to access the resource. 6.1. Link Relation Type: ice-server The link relation type below has been registered by IANA per Section 4.2 of [RFC8288]. I do not see "ice-server" in https://www.iana.org/assignments/link-relations/link-relations.xhtml, so it appears the wording should be "IANA is asked to register ...". 6.3.1. Specification Template * The designated contact shall be responsible for reviewing and enforcing uniqueness. Should this be "designated expert"? Compare with sections 6.4.1 and 6.4.2. 6.4.1. Registration Procedure The RFC SHOULD include any attributes defined. If this means what I think it does, it should be expanded to something like The RFC MUST include the syntax and semantics of any extension-specific attributes that may be provided in a Link header field advertising the extension. [END]