Ballot for draft-ietf-httpbis-semantics-19

Comment (2021-06-10 for -16) Sent

** Section 2.2. Per “Additional (social) requirements are placed on implementations …”, what’s a “social” requirement?

** Section 4.2.2.

Resources made available via the "https" scheme have no shared
identity with the "http" scheme. They are distinct origins with
separate namespaces. However, an extension to HTTP that is defined
to apply to all origins with the same host, such as the Cookie
protocol [RFC6265], can allow information set by one service to
impact communication with other services within a matching group of
host domains.

It would be worth reiterating that it might be risky for extensions to treat http + https origins on the same host uniformly.

** Section 5.1. Convention question. Per “Field names … ought to be registered within the …” HTTP field name registry, I have a question about the strength of the recommendation based on the use of the verb “ought” – is that RECOMMENDED? SHOULD? Section 8.3.1, 8.3.2, 8.4.1, 8.8.3, etc use a similar construct.

** Section 7.2. Does this text allow for the possibility for both a Host and :authority be included?

** Section 9.2.1. In addition to example of the access logs files filling the disk, there could be significant CPU load if the target is a script.

** Section 17.1. The text provides helpful caution by stressing that “… the user's local name resolution service [is used] to determine where it can find authoritative responses. This means that any attack on a user's network host table, cached names, or name resolution libraries becomes an avenue for attack on establishing authority for "http" URIs.” The subsequent text highlights DNSSEC as improving authenticity. It seems that the integrity provided by DoT or DoH would also be relevant.

** Section 17.2

Users need to be aware that intermediaries are no more trustworthy than the people who run them; HTTP itself cannot solve this problem.

No disagreement with the sentiment, but I would recommend not framing it in term of the trustworthiness of the _people_ (i.e., intermediaries with poor security or privacy practices is not necessarily due to the lack of trustworthiness of the engineers operating the service; perhaps these services are also run in of jurisdictions where confidence in them should be reduced).

NEW
Users need to be aware that intermediaries are no more trustworthy that the entities that operate them and the policies governing them; HTTP itself cannot solve this problem.

** Section 17.5. More generically describe the attack

OLD
Failure to limit such processing can result in buffer overflows, arithmetic overflows, or increased vulnerability to
denial-of-service attacks.

NEW
Failure to limit such processing can result in arbitrary code execution due to a buffer overflows or arithmetic overflows; or increased vulnerability to denial-of-service attacks.

** Editorial

-- Section 3.5. Should s/advance configuration choices/advanced configuration choices/?

-- Section 4.2.2.
A client MUST ensure that its HTTP requests for an "https" resource
are secured, prior to being communicated, and that it only accepts
secured responses to those requests. Note that the definition of
what cryptographic mechanisms are acceptable to client and server are
usually negotiated and can change over time.

This text goes from referring to “secured” in the first sentence to “[acceptable] cryptographic mechanisms” in the second sentence. To link them, perhaps s/are secured/are cryptographically secured/

-- Section 6.5. Typo. s/section_ are are/section_ are/

-- Section 11.1. The text (at least when read as a .txt) isn’t showing RFC7617 or RFC7616 as references.

-- Section 14.1.1. Typo. s/gramar/grammar/

Comment (2021-06-16 for -16) Not sent

Like Martin I learnt HTTP though osmosis - although "learnt" might be overselling it. Perhaps "randomly searched stack-overflow for some related words" would be a better description. 
When I saw initially saw the length of this document I was somewhere between apprehensive and distraught, but I figured that I'd get some revenge by providing lots and lots of nitty comments that you'd have to address... but, apparently, I was thwarted in that. There is very little for me to complain about, and others have already done so, so.... um... 

<Warren shakes fist for having been forced to learn something... >

Comment (2021-06-16 for -16) Sent

Big thanks to editors and contributors of the this document.

I found this document to be very well written with right level of description which surely makes the developer's life a bit easier, specially having all the important considerations and recommendations in one place.

I have following observations -

* Server push is mentioned in section 1.2. I was expecting some descriptions in this document on how the server push is realized specially using the methods defined in this document.

* Section 4.2.2: it says-

"The origin server for an "https" URI is identified by the authority
component, which includes a host identifier and optional port number
([RFC3986], Section 3.2.2). If the port subcomponent is empty or not
given, TCP port 443 (the reserved port for HTTP over TLS) is the
default. "

how does this default work with HTTP/3 which used UDP port 443?

* It felt like security consideration section missing considerations for the TRACE method. The section 9.3.8 says - "A client MUST NOT generate fields in a TRACE request containing sensitive data" , I am just wondering is that good enough warning.

* I support Roman's comment about the strength of the recommendation based on the use of the verb “ought”. This might be a bit more confusing to the readers with non-native English language background. I would suggest to use more recommend or should or must in the entire document instead of "ought to".

* Lars provided very good input on editorial fixes/nits, I would skip mine and hope his will be addressed by the editors.

Comment (2021-06-11 for -16) Sent

Thank you for the work put into this document.

Martin Duke wrote a nice sentence that fits my experience with this document: "As someone who has learned HTTP via osmosis, it was very helpful to finally read it all collected in one place".

Please find below some non-blocking COMMENT points (but replies would be appreciated), and some nits.

I hope that this helps to improve the document,

Regards,

-éric

== COMMENTS ==

Is there any reason why the 'Forwarded' header (RFC 7239) is not listed ?

-- Section 3.3 --
"As a result, a server MUST NOT assume that two requests ..." unsure about the use of "as a result" as this paragraph is not really a conclusion of the previous §. Suggest to remove "As a result".

-- Section 3.9 --
Strongly suggest to update the example by avoiding OpenSSL/0.9.71 as its TLS version is probably insecure (or even historic) ;-)

-- Section 4.3.3 --
2nd and 3rd § differ as it seems that only H/2 and H/3 accept cert with several entries. AFAIK, H/1 also accepts several subject names in on cert.

3rd § I am afraid that the meaning of the text about DNS query has completely escaped me :-(

-- Section 4.3.4 --
Should there be a reference to "CN-ID" ?

-- Section 4.3.5 --
I quickly read the relevant sections of RFC 3986 and RFC 5280 but I cannot find the information on whether 1/ IPv6 link-local 2/ IPv6 ULA addresses are valid and how to verify them. Should there some text in this section ?

== NITS ==

-- Section 7.2 --
References to H/2 and H/3 have previously be given, no need to repeat ?

Comment (2021-06-16 for -16) Not sent

I regret that I haven't finished my review prior to the telechat.  I'll try to get my proper ballot in before Benjamin's DISCUSS is resolved, or at least before the approval is processed.

Yes (2021-07-25 for -17) Sent for earlier

Thank you for addressing my Discuss points (both on this document
and on -messaging, since the proper fix for that concern ended up
being in this document)!  I will duly ballot Yes, as promised.

However, I will retain unchanged the COMMENT section from my ballot
position on the -16, since on further inspection there are many of
them that do not seem to have been discussed in github yet.

I also had one new comment on the -17: in Section 10.2 we had some
good text cleanup (I think, prompted by one of my comments -- thank you!),
but the removed text included a note about how the semantics of a response
header field might be refined by the semantics of the request method and/or
the response status code.  That seems like it would be useful to have
mentioned, and I'm not sure if this text was replicated elsewhere.

=====BEGIN PREVIOUS COMMENT SECTION=====
This document updates RFC 3864, which is part of BCP 90.
However, this document is targeting Proposed Standard status, which
means it cannot become a part of BCP 90 as part of that update.
Did we consider splitting out the RFC 3864 updates into a separate,
BCP-level, document, that would become part of BCP 90?

Section 1.2

   HTTP/2 ([RFC7540]) introduced a multiplexed session layer on top of
   the existing TLS and TCP protocols for exchanging concurrent HTTP
   messages with efficient field compression and server push.  HTTP/3
   ([HTTP3]) provides greater independence for concurrent messages by
   using QUIC as a secure multiplexed transport over UDP instead of TCP.

My understanding was that h2 and h3 also use non-text-based headers, in
contrast to HTTP/1.1's "text-based messaging syntax" that we mention
earlier.  Is that non-text nature worth noting here?

Section 3.7

                                 Proxies are often used to group an
   organization's HTTP requests through a common intermediary for the
   sake of security, annotation services, or shared caching.  [...]

The term "security" can mean so many different things to different
audiences that its meaning in isolation is pretty minimal.  I suggest
finding a more specific term for the intended usage, perhaps relating to
an auditing, exfiltration protection, and/or content-filtering function.

   For example, an _interception proxy_ [RFC3040] (also commonly known
   as a _transparent proxy_ [RFC1919]) differs from an HTTP proxy
   because it is not chosen by the client.  Instead, an interception
   proxy filters or redirects outgoing TCP port 80 packets (and
   occasionally other common port traffic).  Interception proxies are
   commonly found on public network access points, as a means of
   enforcing account subscription prior to allowing use of non-local
   Internet services, and within corporate firewalls to enforce network
   usage policies.

Is this text still accurate in the era of https-everywhere and Let's
Encrypt?

Section 3.9

As Éric notes, OpenSSL 0.9.7l supports only SSL and TLSv1.0, which per
RFC 8996 is no longer permitted -- I concur with his recommendation to
update the example (potentially including Last-Modified).

Section 4.2.x

   The hierarchical path component and optional query component identify
   the target resource within that origin server's name space.

Would a BCP 190 reference be appropriate here (emphasizing that the name
space belongs to the origin server)?

Section 4.2.2

   The "https" URI scheme is hereby defined for minting identifiers
   within the hierarchical namespace governed by a potential origin
   server listening for TCP connections on a given port and capable of
   establishing a TLS ([RFC8446]) connection that has been secured for
   HTTP communication.  [...]

Is "capable" the correct prerequisite, or does the server need to
actually do so on that port?  (Given the following definition of
"secured", though, the ability to successfully do so would seem to
depend on the trust anchor configuration on the client, which is not
really something the server can control...)

Section 4.3.3

   Note, however, that the above is not the only means for obtaining an
   authoritative response, nor does it imply that an authoritative
   response is always necessary (see [Caching]).

Is it intentional that this paragraph diverges from the analogous
content in §4.3.2 (which also mentions Alt-Svc and other protocols
"outside the scope of this document")?

Section 5.3

      |  *Note:* In practice, the "Set-Cookie" header field ([RFC6265])
      |  often appears in a response message across multiple field lines
      |  and does not use the list syntax, violating the above
      |  requirements on multiple field lines with the same field name.
      |  Since it cannot be combined into a single field value,
      |  recipients ought to handle "Set-Cookie" as a special case while
      |  processing fields.  (See Appendix A.2.3 of [Kri2001] for
      |  details.)

The reference seems to conclude only that the situation for "Set-Cookie"
is underspecified, and doesn't really give me much guidance on what to
do if I receive a message with multiple field lines for "Set-Cookie".
(It does talk about the "Cookie" field and how semicolon is used to
separate cookie values, which implies that "Cookie" would get special
treatment to use semicolon to join field lines, but doesn't really give
me the impression that "Set-Cookie" should also have such treatment.)

Section 5.4

   A client MAY discard or truncate received field lines that are larger
   than the client wishes to process if the field semantics are such
   that the dropped value(s) can be safely ignored without changing the
   message framing or response semantics.

Is it worth saying anything about fields that the client does not
recognize?  (Per the previous discussion, the server needs to either
know that the client recognizes the field or only send fields that are
safe to ignore if unrecognized, if I understand correctly...)

Section 6.4.1

   In a response, the content's purpose is defined by both the request
   method and the response status code (Section 15).  For example, the
   content of a 200 (OK) response to GET (Section 9.3.1) represents the
   current state of the target resource, as observed at the time of the
   message origination date (Section 10.2.2), whereas the content of the
   same status code in a response to POST might represent either the
   processing result or the new state of the target resource after
   applying the processing.

Doesn't the last clause mean that there is some additional (meta)data
that can affect the content's purpose (e.g., a Content-Location field)?
Or how else would one know if the 200 POST response is the processing
result vs the new state?  It seems incomplete to just say "is defined by
both" and list only method and status code as the defining factors.

Section 7.6.3

[I had the same question as Martin Duke about default *TCP* port, and
the interaction with the scheme.  I see that it has been answered since
I initially drafted these notes, hooray.]

Section 7.7

   A proxy MUST NOT modify the "absolute-path" and "query" parts of the
   received target URI when forwarding it to the next inbound server,
   except as noted above to replace an empty path with "/" or "*".

I found where (in the discussion of normalization in §4.2.3) we say to
replace the empty path with "/" for non-OPTIONS requests.  I couldn't
find anywhere "above" where it was noted to replace an empty path with
"*" (presumably, for the OPTIONS requests), though.

Section 8.3

                          Implementers are encouraged to provide a means
   to disable such sniffing.

"encouraged to provide a means to disable" could be read as also
encouraging implementation of the (sniffing) mechanism itself.  Is it
actually the case that we encourage implementation of MIME sniffing?

Section 8.8.1

                                                                  A
   strong validator is unique across all versions of all representations
   associated with a particular resource over time.  [...]

My understanding is that, e.g., a cryptographic hash over the
representation and metadata would be intended to be a strong validator,
but for such a construction the "unique" property can only be guaranteed
probabilistically.  Are we comfortable with this phrasing that implies
an absolute requirement?

Section 8.8.4

   *  SHOULD send the Last-Modified value in non-subrange cache
      validation requests (using If-Modified-Since) if only a Last-
      Modified value has been provided by the origin server.

   *  MAY send the Last-Modified value in subrange cache validation
      requests (using If-Unmodified-Since) if only a Last-Modified value
      has been provided by an HTTP/1.0 origin server.  The user agent
      SHOULD provide a way to disable this, in case of difficulty.

I'm failing to come up with an explanation for why it's important to
specifically call out the HTTP/1.0 origin server in the latter case.
What's special about an HTTP/1.1 origin server that only provided a
Last-Modified value and subrange cache validation requests that makes
the MAY inapplicable?  (What's the actual expected behavior for that
situation?)

Section 9.2.2

   A request method is considered _idempotent_ if the intended effect on
   the server of multiple identical requests with that method is the
   same as the effect for a single such request.  [...]

I sometimes worry that a definition of idempotent like this hides the
interaction of repeated idempotent requests with other requests
modifying the same resource.  A+A is equivalent to A, but A+B+A is often
not equivalent to A+B...

Section 9.3.5

   Likewise, other implementation aspects of a resource might need to be
   deactivated or archived as a result of a DELETE, such as database or
   gateway connections.  In general, it is assumed that the origin
   server will only allow DELETE on resources for which it has a
   prescribed mechanism for accomplishing the deletion.

The specific phrasing of "only allow DELETE [...]" calls to mind (for
me) an expectation of authorization checks as well.  In some sense this
is no different than for POST or PUT, and thus may not be worth
particular mention here, but I thought I'd ask whether it makes sense to
mention authorization (and authentication).

   A client SHOULD NOT generate content in a DELETE request.  Content
   received in a DELETE request has no defined semantics, cannot alter
   the meaning or target of the request, and might lead some
   implementations to reject the request.

We had a similar paragraph earlier in the discussion of GET and HEAD,
but those paragraphs included a clause about "close the connection
because of its potential as a request smuggling attack" -- is DELETE not
at risk of use for request smuggling?

Section 10.1.1

   *  A server that responds with a final status code before reading the
      entire request content SHOULD indicate whether it intends to close
      the connection (e.g., see Section 9.6 of [Messaging]) or continue
      reading the request content.

The referenced section seems to cover the "close" connection option,
which is a positive signal of intent to close.  Is the absence of that
connection option to be interpreted as a positive signal of intent to
continue reading the request content, or is there some other positive
signal of such intent to continue reading?

Section 10.1.2

   A server SHOULD NOT use the From header field for access control or
   authentication.

It seems that the level of security provided by the From header field is
at most that of a bearer token, and that the natural choice of such
token is easily guessable (though unguessable choices are possible).
I'm having a hard time coming up with an IETF-consensus scenario where
it would make sense to use From for access control or authentication
(i.e., could this be MUST NOT instead?).

Section 10.1.3

                 Some servers use the Referer header field as a means of
   denying links from other sites (so-called "deep linking") or
   restricting cross-site request forgery (CSRF), but not all requests
   contain it.

I think we should say something about the effectiveness of Referer
checks as a CSRF mitigation mechanism.

                        Most general-purpose user agents do not send the
   Referer header field when the referring resource is a local "file" or
   "data" URI.  A user agent SHOULD NOT send a Referer header field if

This seems like a curious statement.  Are we expecting future
general-purpose user agents to emulate this behavior?  If so, then why
not recommend it explicitly?

   the referring resource was accessed with a secure protocol and the
   request target has an origin differing from that of the referring
   resource, unless the referring resource explicitly allows Referer to
   be sent.  A user agent MUST NOT send a Referer header field in an

How does a referring resource indicate that Referer should be sent?

Section 10.1.4

   The TE field value consists of a list of tokens, each allowing for
   optional parameters (except for the special case "trailers").

Should the prose mention the 'weight' part of the t-codings construction
(the "weight" production itself does not seem to be defined until §11.4.2)?

Section 10.1.5

   For example, a sender might indicate that a message integrity check
   will be computed as the content is being streamed and provide the
   final signature as a trailer field.  This allows a recipient to

Please pick one of "message integrity check" and "signature" and use it
consistently; these are both cryptographic terms of art (with different
meanings).

   Because the Trailer field is not removed by intermediaries, it can
   also be used by downstream recipients to discover when a trailer
   field has been removed from a message.

It seems that this usage is only possible if sending the Trailer field is a
binding commitment to emit the relevant trailer fields; otherwise the
recipient cannot distinguish between a removal by an intermediary and a sender
declining to generate the trailer field.

Section 10.1.6

         A user agent SHOULD send a User-Agent header field in each
   request unless specifically configured not to do so.

(I assume that a reference to client-hints (or UA-CH) was considered and
rejected.)

   A user agent SHOULD NOT generate a User-Agent header field containing
   needlessly fine-grained detail and SHOULD limit the addition of
   subproducts by third parties.  Overly long and detailed User-Agent
   field values increase request latency and the risk of a user being
   identified against their wishes ("fingerprinting").

client-hints might even be more appropriate as a reference here than it
would be above...or just in §17.13.

Section 10.2

It seems like it might be worth listing the fields already defined in the
previous section (as request context fields) that can also appear as response
context fields.

Section 12.2

   Reactive negotiation suffers from the disadvantages of transmitting a
   list of alternatives to the user agent, which degrades user-perceived
   latency if transmitted in the header section, and needing a second
   request to obtain an alternate representation.  Furthermore, this
   specification does not define a mechanism for supporting automatic
   selection, though it does not prevent such a mechanism from being
   developed as an extension.

I'm not sure that I understand how an HTTP extension would help specify
a mechanism for automatic selection in reactive negotiation; isn't this
just an implementation detail in the user-agent?

Section 12.5.1

      |  *Note:* Use of the "q" parameter name to control content
      |  negotiation is due to historical practice.  Although this
      |  prevents any media type parameter named "q" from being used
      |  with a media range, such an event is believed to be unlikely
      |  given the lack of any "q" parameters in the IANA media type
      |  registry and the rare usage of any media type parameters in
      |  Accept.  Future media types are discouraged from registering
      |  any parameter named "q".

This note seems like it would be more useful in the IANA media-types
registry than "some random protocol specification that uses media
types".

Section 12.5.3

   For example,

   Accept-Encoding: compress, gzip
   Accept-Encoding:
   Accept-Encoding: *
   Accept-Encoding: compress;q=0.5, gzip;q=1.0
   Accept-Encoding: gzip;q=1.0, identity; q=0.5, *;q=0

Are these supposed to be multiple standalone examples or one single example with
multiple field lines?  (I note that they appear in a single <sourcecode>
element in the XML source.)  If they are supposed to be one single
example, I would have expected some remark about the combination of "*"
and "*;q=0" (my understanding is that the q=0 renders codings not listed
as unacceptable, even despite the implicitly q=1 wildcard).
It seems that in other instances where we provide multiple examples in
a single artwork, the prefacing text is "Examples:" plural, that makes
some effort to disambiguate.

      |  *Note:* Most HTTP/1.0 applications do not recognize or obey
      |  qvalues associated with content-codings.  This means that
      |  qvalues might not work and are not permitted with x-gzip or
      |  x-compress.

This wording implies to me that there is a normative requirement
somewhere else that qvalues cannot be used with x-gzip and x-compress,
but I'm not sure where that would be.  (It's also a bit hard to
understand how x-gzip would be affected but not plain gzip, given that
§18.6 lists it as an alias for gzip ... additional restrictions don't
quite match up with an "alias" nature.)

Section 12.5.4

   It might be contrary to the privacy expectations of the user to send
   an Accept-Language header field with the complete linguistic
   preferences of the user in every request (Section 17.13).

This leaves me wondering how to improve on the situation and pick which
subset of requests to send the header field in.  I would expect that a
blind random sampling approach would not yield privacy improvements over
always sending them.

Section 12.5.5

   An origin server SHOULD send a Vary header field when its algorithm
   for selecting a representation varies based on aspects of the request
   message other than the method and target URI, unless the variance
   cannot be crossed or the origin server has been deliberately
   configured to prevent cache transparency.  [...]

I don't think I know what it means to "cross" a variance.  The example
(elided from this comment) about Authorization not needing to be
included gives some hint as to what is meant, but I still don't have a
clear picture.

Section 13.2.2

   5.  When the method is GET and both Range and If-Range are present,
       evaluate the If-Range precondition:

       *  if the validator matches and the Range specification is
          applicable to the selected representation, respond 206
          (Partial Content)

   6.  Otherwise,

       *  all conditions are met, so perform the requested action and
          respond according to its success or failure.

I think that if the If-Range doesn't match, we're supposed to ignore the
Range header field when performing the requested action, which doesn't
seem to match up with this unadorned directive to "perform the requested
action" (which would include the Range header field).
(We might also change point (5) to use the "if true" phrasing that the
other items use in the context of evaluating the precondition.)

Section 15.4.9

      |  *Note:* This status code is much younger (June 2014) than its
      |  sibling codes, and thus might not be recognized everywhere.
      |  See Section 4 of [RFC7538] for deployment considerations.

This document obsoletes RFC 7538; if we believe that content is still
useful we should probably consider incorporating it into this document.

Section 16.3.1

   Field names are registered on the advice of a Designated Expert
   (appointed by the IESG or their delegate).  Fields with the status
   'permanent' are Specification Required ([RFC8126], Section 4.6).

I would have expected IANA to ask for the phrase "Expert Review" to be
used for the general case (if they did not already), since that's the
relevant registration policy defined in RFC 8126.

   Registration requests consist of at least the following information:
[...]
   Specification document(s):
      Reference to the document that specifies the field, preferably

If the registration consists of "at least" a group of information that
includes a specification document, doesn't that mean the policy is
*always* "Specification Required", not just for permanent registrations?

   Provisional entries can be removed by the Expert(s) if - in
   consultation with the community - the Expert(s) find that they are
   not in use.  The Experts can change a provisional entry's status to
   permanent at any time.

(The ability to freely convert a provisional registration to permanent
seems to also require a specification document to always be present,
even for provisional registrations.)

Section 17

A few potential considerations that don't seem to be mentioned in the
subsections:

- Implementation divergence in handling multi-member field values when
  singletons are expected, could lead to security issues (in a similar
  vein as how request smuggling works)

- Though ETag is formally opaque to clients, any internal structure to
  the values could still be inspected and attacked by a malicious
  client.  We might consider giving guidance that ETag values should
  be unpredictable.

- When the same information is present at multiple protocol layers
  (e.g., the transport port number and the Host field value), in the
  general case, attacks are possible if there is not check for
  consistency of the values in the different layers.  It's often helpful
  to provide guidance on which entit(ies) should perform the check, to
  avoid scenarios where all parties are expecting "someone else" to do
  it.

- Relatedly, the port value is part of the https "origin" concept, but is not
  authenticated by the certificate and could be modified (in the
  transport layer) by an on-path attacker.  The safety of per-origin
  isolation relies on the server to check that the port intended by the
  client matches the port the request was actually received on.

- We mention that in response to some 3xx redirection responses, a
  client capable of link editing might do so automatically.  Doing so
  for http-not-s responses would allow for a form of privilege
  escalation, converting even a temporary access into more permanent
  changes on referring resources.

- We make heavy use of URIs and URI components; referencing the security
  considerations of RFC 3986 might be worthwhile

Section 17.1

   Unfortunately, communicating authority to users can be difficult.
   For example, _phishing_ is an attack on the user's perception of
   authority, where that perception can be misled by presenting similar
   branding in hypertext, possibly aided by userinfo obfuscating the
   authority component (see Section 4.2.1).  [...]

We might also mention "confusable" domain names here as well (which are
possible even without resorting to IDNs).

Section 17.5

Should we also discuss situations where there might be redundant lengths
at different encoding layers (e.g., HTTP framing and MIME multipart
boundaries), in a similar vein to
https://datatracker.ietf.org/doc/html/draft-ietf-quic-http-34#section-10.8
?

Section 17.16.3

   Authentication schemes that solely rely on the "realm" mechanism for
   establishing a protection space will expose credentials to all
   resources on an origin server.  [...]

There's also not any clear authorization mechanism for the origin to claim
use of a given realm value, which can lead to the client sending
credentials for the claimed realm without knowing that the server should
be receiving such credentials.

Section 19.2

Should RFC 5322 be normative?  We rely on it for, e.g., the "mailbox"
ABNF construction.

Appendix A

[Just noting that I did not attempt to validate the ABNF, since the
shepherd writeup notes that they have been validated]

Appendix B.4

   Clarified that If-Unmodified-Since doesn't apply to a resource
   without a concept of modification time.  (Section 13.1.4)

I couldn't really locate which text was supposed to be providing this
clarification.


NITS

Section 3.1

                                              Most resources are
   identified by a Uniform Resource Identifier (URI), as described in
   Section 4.
[...]
   HTTP relies upon the Uniform Resource Identifier (URI) standard
   [RFC3986] to indicate the target resource (Section 7.1) and
   relationships between resources.

Are these two statements compatible?  (What is used for the non-URI
resource identification scenarios?)

Section 5.5

We seem to use the obs-text ABNF construction prior to its definition,
which is in Section 5.6.4.

Section 5.6.1.1

   In any production that uses the list construct, a sender MUST NOT
   generate empty list elements.  In other words, a sender MUST generate
   lists that satisfy the following syntax:

     1#element => element *( OWS "," OWS element )

Are the two formulations equivalent without some restriction on
'element' itself?

Section 6.4.2

   2.  If the request method is GET and the response status code is 200
       (OK), the content is a representation of the resource identified
       by the target URI (Section 7.1).

   3.  If the request method is GET and the response status code is 203
       (Non-Authoritative Information), the content is a potentially
       modified or enhanced representation of the target resource as
       provided by an intermediary.

   4.  If the request method is GET and the response status code is 206
       (Partial Content), the content is one or more parts of a
       representation of the resource identified by the target URI
       (Section 7.1).

   5.  If the response has a Content-Location header field and its field
       value is a reference to the same URI as the target URI, the
       content is a representation of the target resource.

I count two "target resource" and two "resource identified by the target
URI".  Is there an important distinction between those two phrasings or
could we normalize on a single term?

Section 7.3.3

   If no proxy is applicable, a typical client will invoke a handler
   routine, usually specific to the target URI's scheme, to connect
   directly to an origin for the target resource.  How that is
   accomplished is dependent on the target URI scheme and defined by its
   associated specification.

This document is the relevant specification for the "http" and "https"
URI schemes; a section reference to the corresponding procedures might
be in order.

Section 8.8.2.1

   An origin server with a clock MUST NOT send a Last-Modified date that
   is later than the server's time of message origination (Date).  If

I suspect some relevant details for this clock are covered in §10.2.2;
maybe a forward reference would be useful.

Section 10.2

   The remaining response header fields provide more information about
   the target resource for potential use in later requests.

I didn't see a previous enumeration of fields such that "remaining" would have
meaning.
(Also, the whole toplevel section seems to contain multiple sentences that are
nearly redundant.)

Section 10.2.2

   A recipient with a clock that receives a response message without a
[...]
   A recipient with a clock that receives a response with an invalid

Are we using "with a clock" as shorthand for "have a clock capable of
providing a reasonable approximation of the current instant in
Coordinated Universal Time"?  It might be worth clarifying if this
different phrasing than above is intended to convey different semantics.

Section 11.7.3

   The Proxy-Authentication-Info response header field is equivalent to
   Authentication-Info, except [...]

Is it worth calling out again that it can be sent as a trailer field, in
case someone specifically goes searching for trailer fields?

Section 13.2.1

   server that can provide a current representation.  Likewise, a server
   MUST ignore the conditional request header fields defined by this
   specification when received with a request method that does not
   involve the selection or modification of a selected representation,
   such as CONNECT, OPTIONS, or TRACE.

We do say "can be used with any method" regarding If-Match, earlier,
which is not very well aligned with this "MUST ignore".

Section 15.4

   5.  If the request method has been changed to GET or HEAD, remove
       content-specific header fields, including (but not limited to)
       Content-Encoding, Content-Language, Content-Location,
       Content-Type, Content-Length, Digest, ETag, Last-Modified.

The discussion in §8.8.3 seems to indicate that ETag is only used in
responses, not requests, so I'm not sure in what scenarios it would need
to be removed from the redirected request.

      |  *Note:* In HTTP/1.0, the status codes 301 (Moved Permanently)
      |  and 302 (Found) were defined for the first type of redirect
      |  ([RFC1945], Section 9.3).  Early user agents split on whether
      |  the method applied to the redirect target would be the same as
      |  the original request or would be rewritten as GET.  Although
      |  HTTP originally defined the former semantics for 301 and 302
      |  (to match its original implementation at CERN), and defined 303
      |  (See Other) to match the latter semantics, prevailing practice
      |  gradually converged on the latter semantics for 301 and 302 as
      |  well.  The first revision of HTTP/1.1 added 307 (Temporary
      |  Redirect) to indicate the former semantics of 302 without being
      |  impacted by divergent practice.  For the same reason, 308
      |  (Permanent Redirect) was later on added in [RFC7538] to match
      |  301.  [...]

I had to read this text several times to find a way to understand it
that seems to make sense to me (but might still be wrong!).
I think part of my confusion is that the word "former" is being used in
two different senses (the first of the two choices, and the
historical/earlier version).  Perhaps it's more clear to just talk about
"method rewriting" (and not rewriting) instead of using the overloaded
term.
=====END PREVIOUS COMMENT SECTION=====

Yes (2021-06-10 for -16) Sent

As someone who has learned HTTP via osmosis, it was very helpful to finally read it all collected in one place. Thank you. I especially appreciate the effort to document legacy undesirable behaviors, which implementers need to account for in their work.

(7.6.3) Via

"If a port is not provided, a recipient MAY interpret that as meaning it was received on the default TCP port, if any, for the received-protocol."

So if received-protocol is "3", it's a UDP port.

If received-protocol is "1" or "1.1", is the default port 80 or 443? IIUC the scheme isn't included to determine this.

(7.7) Message Transformations

"A proxy that transforms the content of a 200 (OK) response can inform downstream recipients that a transformation has been applied by changing the response status code to 203 (Non-Authoritative Information)"

Why not an normative word, instead of "can"?

(12.5.3) Is it correct that "identity" and having no field value for Accept-Encoding are synonymous?

"Servers that fail a request due to an unsupported content coding ought to respond with a 415 (Unsupported Media Type) status"

Why not s/ought to/SHOULD ?

(14.3) Why can only origin servers send "Accept-Ranges: bytes"? Why not intermediaries?

(15.3.7) "A sender that generates a 206 response with an If-Range header field"... (13.1.5) leads me to believe that only clients can send If-Range. So how can there be a response with If-Range?

(15.3.7.2) The last instance of THIS_STRING_SEPARATES has a trailing '--'. If this is intentional, it ought to be explained.

(16.3.1) says field names SHOULD begin with a letter, but (16.3.2.1) says they SHOULD begin with "an alphanumeric character". More broadly, the "Field name:" description in (16.3.1) should probably refer to (16.3.2.1) unless I'm misunderstanding the scope of these sections.

(17.13) s/TCP behavior/TCP or QUIC behavior

(B) It would be good to mention here that accept-ext has been removed in (12.5.1), and accept-charset is deprecated in (12.5.2), if that is new to this spec.

Yes (for -16) Not sent

No Objection (for -16) Not sent

No Objection (2021-06-11 for -16) Sent

Dale Worley's GenART review hasn't been responded to yet.

Section 3. , paragraph 2, comment:
>    HTTP was created for the World Wide Web (WWW) architecture and has
>    evolved over time to support the scalability needs of a worldwide
>    hypertext system.  Much of that architecture is reflected in the

Given the degree to which HTTP is now used a a transport for things other than
the HTML web, the last part of this sentence seems dated.

Section 5.6.7. , paragraph 25, comment:
>    Recipients of a timestamp value in rfc850-date format, which uses a

Suggest to add an actual reference to RFC850.

Section 10.2.2. , paragraph 9, comment:
>    A recipient with a clock that receives a response with an invalid
>    Date header field value MAY replace that value with the time that
>    response was received.

"Invalid" as in not well-formed, or as in inaccurate?

Possible DOWNREF from this Standards Track doc to [Welch]. If so, the IESG
needs to approve it.

-------------------------------------------------------------------------------
All comments below are about very minor potential issues that you may choose to
address in some way - or ignore - as you see fit. Some were flagged by
automated tools (via https://github.com/larseggert/ietf-reviewtool), so there
will likely be some false positives. There is no need to let me know what you
did with these suggestions.

"Table of Contents", paragraph 1, nit:
> Table of Contents

Not all section headings consistently use title case.

Section 1.2. , paragraph 4, nit:
>    HTTP/2 ([RFC7540]) introduced a multiplexed session layer on top of

It's unusual to put a reference (also) in parenthesis; suggest to remove the
parenthesis here and check similar occurrences throughout the document.

Section 3.8. , paragraph 2, nit:
>    requests.  Any client or server MAY employ a cache, though a cache
>    cannot be used while acting as a tunnel.

cannot -> MUST NOT?

Section 3.8. , paragraph 7, nit:
-    caching to optimise regional and global distribution of popular
-                     ^
+    caching to optimize regional and global distribution of popular
+                     ^

Section 5.6.7. , paragraph 2, nit:
-    implementations, all three are defined here.  The preferred format is
-                                                      ^^^^^^^^^
+    implementations, all three are defined here.  The RECOMMENDED format is
+                                                      ^^^^^^^^^^^

Section 5.6.7. , paragraph 4, nit:
-    An example of the preferred format is
-                      ^^^^^^^^^
+    An example of the RECOMMENDED format is
+                      ^^^^^^^^^^^

Section 5.6.7. , paragraph 10, nit:
-    Preferred format:
+   RECOMMENDED format:

Section 8.3.1. , paragraph 6, nit:
-    the first is preferred for consistency (the "charset" parameter value
-                 ^^^^^^^^^
+    the first is RECOMMENDED for consistency (the "charset" parameter value
+                 ^^^^^^^^^^^

Section 8.8.3. , paragraph 9, nit:
-    preferable to send Etag as a header field unless the entity-tag is
-    ^^^^^^^^^^
+    RECOMMENDED to send Etag as a header field unless the entity-tag is
+    ^^^^^^^^^^^

Section 8.8.4. , paragraph 6, nit:
-    In other words, the preferred behavior for an origin server is to
-                        ^^^^^^^^^
+    In other words, the RECOMMENDED behavior for an origin server is to
+                        ^^^^^^^^^^^

Section 11.6.1. , paragraph 10, nit:
-    Some user agents do not recognise this form, however.  As a result,
-                                   ^
+    Some user agents do not recognize this form, however.  As a result,
+                                   ^

Section 14.1.1. , paragraph 2, nit:
-    following gramar is generic: each range unit is expected to specify
+    following grammar is generic: each range unit is expected to specify
+                 +

Section 15.5.5. , paragraph 2, nit:
-    permanent; the 410 (Gone) status code is preferred over 404 if the
-                                             ^^^^^^^^^
+    permanent; the 410 (Gone) status code is RECOMMENDED over 404 if the
+                                             ^^^^^^^^^^^

"B.2. ", paragraph 9, nit:
-    comma from the allowed set of charaters for a host name in received-
+    comma from the allowed set of characters for a host name in received-
+                                       +

Section 1.1. , paragraph 3, nit:
> sual to put a reference (also) in parenthesis; suggest to remove the parenth
>                                ^^^^^^^^^^^^^^
Did you mean "in parentheses"? "parenthesis" is the singular.

Section 1.1. , paragraph 3, nit:
> erence (also) in parenthesis; suggest to remove the parenthesis here and chec
>                               ^^^^^^^^^^^^^^^^^
The verb "suggest" is used with the gerund form.

Section 2.4. , paragraph 2, nit:
> the degree to which HTTP is now used a a transport for things other than the
>                                      ^^^
Two determiners in a row. Choose either "a" or "a".

Section 5.6.3. , paragraph 3, nit:
>  )) ; e.g., Jun 2 HTTP-date is case sensitive. Note that Section 4.2 of [Cach
>                                ^^^^^^^^^^^^^^
This word is normally spelled with a hyphen.

Section 5.6.4. , paragraph 4, nit:
>  describe and route the message, a headers lookup table of key/value pairs fo
>                                    ^^^^^^^
An apostrophe may be missing.

Section 5.6.4. , paragraph 6, nit:
> nbounded stream of content, and a trailers lookup table of key/value pairs f
>                                   ^^^^^^^^
An apostrophe may be missing.

Section 6.2. , paragraph 6, nit:
> located within a _trailer section_ are are referred to as "trailer fields" (o
>                                    ^^^^^^^
Possible typo: you repeated a word.

Section 7.3.2. , paragraph 2, nit:
> age with a field value that is the lesser of a) the received value decrement
>                                    ^^^^^^
Use "least", "lessest", "littlest" to express an extreme with this adjective.

Section 14.1. , paragraph 2, nit:
> n use three-digit integer values outside of that range (i.e., 600..999) for
>                                  ^^^^^^^^^^
This phrase is redundant. Consider using "outside".

Section 14.4. , paragraph 8, nit:
> ubsections below, if the field would have been sent in a 200 (OK) response to
>                                ^^^^^^^^^^^^^^^
Did you mean "had been"?

Section 15.4. , paragraph 19, nit:
>  what (if any) content codings would have been accepted in the request. On t
>                                ^^^^^^^^^^^^^^^
Did you mean "had been"?

Section 16. , paragraph 1, nit:
> d. Registrations happen on a "First Come First Served" basis (see Section 4.4
>                                     ^^^^
It seems that a comma is missing.

Section 19.1. , paragraph 3, nit:
> Range units are compared in a case insensitive fashion. (Section 14.1) The p
>                               ^^^^^^^^^^^^^^^^
This word is normally spelled with a hyphen.

Section 19.1. , paragraph 16, nit:
> ferring to RFC 723x. * Remove acknowledgements specific to RFC 723x. * Move
>                               ^^^^^^^^^^^^^^^^
Do not mix variants of the same word ("acknowledgement" and "acknowledgment")
within a single text.

Section 19.1. , paragraph 16, nit:
> pecific to RFC 723x. * Move "Acknowledgements" to the very end and make them
>                              ^^^^^^^^^^^^^^^^
Do not mix variants of the same word ("acknowledgement" and "acknowledgment")
within a single text.

"Appendix A. ", paragraph 18, nit:
> s/6>) * In Section 16.6.1, advise to make new content codings self-descriptiv
>                                   ^^^^^^^
Did you mean "making"? Or maybe you should add a pronoun? In active voice,
"advise" + "to" takes an object, usually a pronoun.

"C.3. ", paragraph 1, nit:
> , RFC 7234, and RFC 7235. The acknowledgements within those documents still
>                               ^^^^^^^^^^^^^^^^
Do not mix variants of the same word ("acknowledgement" and "acknowledgment")
within a single text.

Uncited references: [RFC2145], [RFC7617], [RFC7234], [RFC2617].

Obsolete reference to RFC2145, obsoleted by RFC7230 (this may be on purpose).

Obsolete reference to RFC2068, obsoleted by RFC2616 (this may be on purpose).

These URLs point to tools.ietf.org, which is being deprecated:
 * https://tools.ietf.org/html/draft-ietf-httpbis-messaging-16
 * https://tools.ietf.org/html/draft-ietf-httpbis-cache-16
 * https://tools.ietf.org/html/draft-ietf-quic-http-34

These URLs in the document can probably be converted to HTTPS:
 * http://arxiv.org/abs/cs.SE/0105018

No Objection (2021-06-17 for -16) Sent

Apologies but I have not been able to fully review the entire draft!

However, I would like to thank the authors and WG on the time and effort they have put into updating the HTTP specs which will make it much easier for HTTP implementers to check that they are up to date with the spec.

In my light reading of the doc, I did notice a few minor nits, which I will leave to the authors/WG to decide whether they wish to address:

2.5. Protocol Version

HTTP's major version number is incremented when an incompatible
message syntax is introduced. The minor number is incremented when
changes made to the protocol have the effect of adding to the message
semantics or implying additional capabilities of the sender.

It is perhaps fairly obvious, but do you want to say that
the minor version number is reset to 0 when the major version number is incremented?

4.2.3. http(s) Normalization and Comparison

Two HTTP URIs that are equivalent after normalization (using any
method) can be assumed to identify the same resource, and any HTTP
component MAY perform normalization. As a result, distinct resources
SHOULD NOT be identified by HTTP URIs that are equivalent after
normalization (using any method defined in Section 6.2 of [RFC3986]).

With the text in this section (4.2.3), I found it a bit unclear as to whether this normative text included the additional rules that were listed after it. My assumption
is that they must, but I wasn't sure that the text was
that clear on this point.

9.2.1. Safe Methods

"crash the server"

Flagging in case you want to express this differently (e.g., cause the server to fail)

9.2.3. Methods and Caching

This specification defines caching semantics for GET, HEAD, and POST,
although the overwhelming majority of cache implementations only
support GET and HEAD.

It is somewhat ambiguous as to which specification is being referred to by "This". Is it the semantics draft of the caching draft?

HTTP Semantics draft-ietf-httpbis-semantics-19

HTTP Semantics
draft-ietf-httpbis-semantics-19