Skip to main content

Analysis of Potential Solutions for Revealing a Host Identifier (HOST_ID) in Shared Address Deployments
draft-ietf-intarea-nat-reveal-analysis-10

Yes

(Brian Haberman)

No Objection

(Adrian Farrel)
(Gonzalo Camarillo)
(Jari Arkko)
(Stewart Bryant)

Note: This ballot was opened for revision 06 and is now closed.

Brian Haberman Former IESG member
Yes
Yes (for -06) Unknown

                            
Adrian Farrel Former IESG member
No Objection
No Objection (for -07) Unknown

                            
Barry Leiba Former IESG member
(was Discuss) No Objection
No Objection (2013-04-01 for -06) Unknown
Comments about the shepherd writeup:

"This document does not define a protocol. Hence there are no implementations of this document."

Well, but the document "analyzes a set of solution candidates to mitigate some of the issues encountered when address sharing is used."  There certainly could be implementations of some or all of these solution candidates, and it would be useful to know about those.

"In particular, as a result of WG consensus, this version of the 
draft does not make any recommendations as to a preferred solution
among those analyzed."

I would have appreciated a brief summary explaining why the working group did not want to make any recommendations -- that would have been a useful part of the working group discussion for the shepherd to summarize, especially given the answer to question 9 (and my former DISCUSS, because I didn't understand).

------------------------------------------------------------------------------
On the document:

-- Section 1 --

Total nit: in paragraph 1, "SPAM" is not an acronym, and the custom is to spell it with no caps, as "spam".  The canned meat is "SPAM" or "Spam", and those are trademarked.

-- Section 2 --

   HOST_ID is not designed to reveal the identity of a user, a
   subscriber, or an application.  HOST_ID is designed to identify a
   host under a shared IP address.

I understand that it's not *designed* to reveal the identity of a user.  Nevertheless, it's *likely* to reveal the identity of users often.  Because you address privacy issues in Section 3, I think this paragraph needs a forward reference to that section.

-- Section 3 --

   The proposals considered in this document add a measure of
   identifiability back to hosts that share a public IP address.  The
   extent of that uniqueness depends on what information is included in
   the HOST_ID.

The extent of what uniqueness?  Do you mean "the extent of that identifiability"?

   The volatility of the HOST_ID information is similar to the source IP
   address:

Later in the paragraph you say "internal IP address".  For clarity, you should be consistent in how you refer to the IP addresses on either side of the NAT.  Also fixing a nit, I think this should be, "The volatility of the HOST_ID information is similar to that of the internal IP address:".

-- Section 3.1 --

   Uniqueness of identifiers in HOST_ID:  It is recommended that
      HOST_IDs be limited to providing local uniqueness rather than
      global uniqueness.

I don't understand how that matters at all: as you point out above, if the HOST_ID is locally unique, the combination of it and the external IP address is globally unique.  Perhaps an explanation of what you're actually recommending and why might make this clearer.

In fact, I think that's true of all four items in this section: I'm having trouble understanding the real point of any of them, so a bit more explanation of what they're really trying to say, and including "why", would really help.

-- Section 4.1.1 --

You cite RFC 6864 here, and that document consistently refers to the field you're talking about as "the IPv4 ID field".  You're calling it "the IP-ID field".  Wouldn't it make sense to make your usage consistent with that of the document you're citing?

   Note that this field is not altered by some NATs; hence,
   there are some side effects such as counting hosts behind a NAT

Not altered: isn't that the point?  I thought the whole *point* of this was to provide a HOST_ID that survives the NAT and can be inspected outside.  Now I'm confused.

   Address-sharing devices using this solution would be required to
   indicate that out of band, possibly using a special DNS record.

What is the antecedent of "that"?  This seems very odd in a paragraph all on its own.

-- Section 4.1.2 --

   Complications may arise if the packet is fragmented before reaching
   the device injecting the HOST_ID.  To appropriately handle those
   packets, the address-sharing function will need to maintain a lot of
   state.

"The packet" and "those packets" disagree in number, making this confused.  Perhaps you mean, "To appropriately handle the multiple packets created by fragmentation, ..." ?

   one can argue this coordinated NAT scenario is not a typical
   deployment scenario but still using IP-ID as a channel to convey a
   HOST_ID is ill-advised.

I can't parse this, and don't know what you're trying to say (though I do get that the end result is that this is ill-advised).  Perhaps you need some punctuation?

   The risk to experience session failures due to handling a new TCP
   Option is low as measured in [Options].
   [I-D.abdo-hostid-tcpopt-implementation] provides a detailed
   implementation and experimentation report of a HOST_ID TCP Option.
   [I-D.abdo-hostid-tcpopt-implementation] investigated in depth the
   impact of activation HOST_ID on the host, the address-sharing
   function, and the enforcement of policies at the server side.
   [I-D.abdo-hostid-tcpopt-implementation] reports a failure ratio of
   0.103% among top 100000 websites.

I strongly suggest that you re-word this paragraph.  Having three citations to the same document in three consecutive sentences is really awkward, and the whole thing is choppy and hard to follow.

-- Section 5 --

   o  "Success ratio" indicates the ratio of successful communications
      with remote servers when the HOST_ID is injected using a candidate
      solution.

Measured how?  So these are actually coded and people did experiments?  Are there raw numbers that we can look at?

Ah, I see that you kinda-sorta try to answer this below the table.  Please re-organize the text (or maybe just use a "see below" reference) so that it's clearer what this row of the table means and where it comes from.  (Mostly, I'm sensing that it's a "wild guess", rather than anything remotely real.)

    (7)  The solution is a theoretical construct.

In other words, this is just an idea that someone threw out, which isn't completely baked?  It would have been *really* nice to have said that up in Sections 4.1, 4.6, and 4.7, rather than only including it as a note in this table!

-- Section 7 --

You should refer to Section 3 in here.  Perhaps insert a new third paragraph that says something like, "For more discussion of privacy issues related to HOST_ID, see Section 3."
Benoît Claise Former IESG member
No Objection
No Objection (2013-04-11 for -07) Unknown
No objection on the publication of this document, but please engage the discussion on my COMMENT.

From the abstract
   The host identifier must be unique to each host under the same
   shared IP address.

And later on: 
   Because HOST_ID is used by a remote server to sort out the packets by
   sending host, HOST_ID must be unique to each host under the same IP
   address.

Shouldn't the HOST_ID be unique across IPv4 and IPv6, so unique per host, as the name implies?
I have in mind a scenario such a MultiPath TCP with two addresses, IPv4 and IPv6.

If agreed, this following sentence is not correct

  The volatility of the HOST_ID information is similar to that of the
   internal IP address: a distinct HOST_ID may be used by the address-
   sharing function when the host reboots or gets a new internal IP
   address.

If disagreed, I believe that the term HOST-ID is wrong. What you speak about is actually a HOST-IPADDRESS-ID, or mabye a HOST-NAT-ID.
Gonzalo Camarillo Former IESG member
No Objection
No Objection (for -07) Unknown

                            
Jari Arkko Former IESG member
No Objection
No Objection (for -07) Unknown

                            
Joel Jaeggli Former IESG member
No Objection
No Objection (2013-04-10 for -07) Unknown
no objection once the current dicuss points are resolved.
Martin Stiemerling Former IESG member
(was Discuss) No Objection
No Objection (2013-04-23 for -09) Unknown
Thanks for addressing my DISCUSS. 

Two comments left:
The text in Section 2 looks lost in that place and would be better placed in Section 4.3 (TCP): 
   HOST_ID mechanisms need to be aware of E2E issues and avoid	
   interfering with them. One example of such interference would be	
   injecting or removing TCP options of transited packets; another such	
   interference involves terminating and re-originating TCP connections	
   not belonging to the transit device.

Please expand E2E to end-to-end.
Pete Resnick Former IESG member
No Objection
No Objection (2013-04-10 for -07) Unknown
4.4

You talk about HTTP here, which more or less has the same mechanism as SIP, but you don't really go into SMTP. SMTP can do similar things, but the mechanisms are going to be much different than HTTP or SIP. It's not even alluded to in section 5. I'd say either analyze SMTP properly, say something like, "Related mechanisms could be developed for other application-layer protocols, but the discussion in this document is limited to HTTP and similar protocols", or drop it from the discussion completely.

4.5.1

   The solution, referred to as Proxy Protocol [Proxy], does not require
   any application-specific knowledge.  The rationale behind this
   solution is to prepend each connection with a line reporting the
   characteristics of the other side's connection as shown in the
   example depicted in Figure 2.   The header line shown in this example
   is for a TCP over IPv4 connection received from 192.0.2.1:56324 and
   destined to 192.0.2.15:443.  The "PROXY" string is used to identify
   the Proxy Protocol while "\r\n" indicates CRLF.
   
You wouldn't know what the above was talking about unless you read the Proxy document. I suggest:

   The solution, referred to as Proxy Protocol [Proxy], does not require
   any application-specific knowledge.  The rationale behind this
   solution is to insert identification data directly into the
   application data stream prior to the actual protocol data being sent,
   regardless of the protocol. Every application protocol would begin
   with a textual string of "PROXY", followed by some textual
   identification data, ending with a CRLF, and only then the would the
   application data be inserted. Figure 2 shows and example of a line of
   data used for this, in this case for  a TCP over IPv4 connection
   received from 192.0.2.1:56324 and destined to 192.0.2.15:443.
Richard Barnes Former IESG member
No Objection
No Objection (2013-04-09 for -06) Unknown
A few minor revisions:

In Section 1. Adding a comma to "application proxies or A+P" would help clarify that the two are not equivalent.

In Section 3.1. The sentence after "Interference between HOST_IDs" didn't really parse for me.  Suggested revision: "If an address sharing system can inject multiple types of HOST_ID value at different layers, then all injected HOST_ID values must reference the same underlying data.  For example, one might reference a full IP address, while another references the lower 16 bits."

In Section 4.2.2.  Change "host-hint" to "HOST_ID".

In Section 4.3.2., Second bullet.  It might also be worth noting that including the HOST_ID in the ACK would interfere with TCP Fast Open.  If the server wanted to deliver different data based on HOST_ID, then it would have to wait for the ACK before transmitting.
<http://tools.ietf.org/html/draft-ietf-tcpm-fastopen>

In Section 4.6.1.  It could be helpful to reference some current BEHAVE-related work here:
<http://tools.ietf.org/html/draft-ietf-behave-lsn-requirements>
<http://tools.ietf.org/html/draft-donley-behave-deterministic-cgn>

In Section 4.8.2., next-to-last bullet.  It might be more helpful to reference the NAT-specific logging documents being developed in BEHAVE:
http://tools.ietf.org/html/draft-ietf-behave-ipfix-nat-logging
http://tools.ietf.org/html/draft-ietf-behave-syslog-nat-logging
Sean Turner Former IESG member
No Objection
No Objection (2013-04-11 for -07) Unknown
I support Martin's #2.
Stephen Farrell Former IESG member
No Objection
No Objection (2013-04-09 for -06) Unknown
- abstract: saying "is used" makes the reader think that you
will provide a preferred solution, but since you're not going
to, "could be used" would be better.

- abstract: Saying "a remote server" without qualification is
odd - do you mean any other host on the Internet and do you
really mean a server only? Maybe an "authorized remote host"
would be better?

- section 2: it might be clearer if you clarify that you still
mean a host when that host is one computer in a home behind a
DSL modem. I think that's implied, when you say that HOST_ID
is not designed to reveal the identity of a subscriber, but
it'd be good to be extra clear on that.

- section 3: shouldn't possible solutions say if they allow
for some control somewhere as to which remote hosts can or
cannot freely access the HOST_ID? In particular some of the
possible solutions seem to differ in how the user's ISP could
or could not control some of that on behalf of the user.

- section 3: similarly the solutions differ as to whether or
not middleboxes or just the remote hosts with which a user
wants to interact can track users and that's also a
consideration esp. if TLS were used.

- section 3: a HOST_ID can also expose location information
depending on how its done. (And without any geopriv style
protection.) I think that should get a mention here too and
also be considered by HOST_ID specification documents.  I
think that also warrants a mention in 3.1.

- section 3: Shouldn't the honest end-user also be given some
element of control too? I think adding a design consideration
that different end-user preferences should be supported would
be good.

- section 3.1: I think all the above points on section 3 would
be fine additions to 3.1 (of course:-)

- section 5: I think "encrypted traffic" here means encryption
via TLS or above, is that right? What about IPsec? Anyway a
note on that would be good.

- section 5: the "success ratio" figures don't seem to
correlate that well with the earlier "analysis" sections which
surprised me. (just a comment)

- the end: ...is very abrupt:-) I guess the WG just couldn't
reach even a rough consensus on something to recommend? If so,
it might be worth saying a bit about that if its not too
controversial as that might help future readers know what to
do with this document.
Stewart Bryant Former IESG member
No Objection
No Objection (for -06) Unknown

                            
Ted Lemon Former IESG member
No Objection
No Objection (2013-04-10 for -07) Unknown
This title of this document suggests that it is scoped to all address-sharing models, but in fact the text about host identifiers only accurately relates to DS-Lite.   I think the document is relevant to all address-sharing models, but really reads as if it were written specifically with DS-Lite in mind.

I'm just throwing this comment out to see what reaction it gets; I'm not convinced that there's an action item here.   If the authors really do intend to address all address-sharing models, some tweaking of the wording in section 2 would help, and maybe a mention in section 3 that in a MAP or lw4over6 scenario, port set allocation is mandatory and host identities don't need to be revealed at all—only the HG identity is actually revealed.