CAPPORT Architecture
draft-ietf-capport-architecture-08

Summary: Has 2 DISCUSSes. Has enough positions to pass once DISCUSS positions are resolved.

Martin Duke (was No Objection) Discuss

Discuss (2020-06-08)
Sec 2.3  says:
At minimum, the API MUST provide: (1) the state of captivity and (2) a URI for the Captive Portal Server.

But in section 5 of capport-api, user-portal-url is an optional field.

Both a capport-api author and a WG chair agreed that the architecture doc should be fixed, so I'm moving the DISCUSS here.
Comment (2020-06-08)
I found the terminology around “Captive Portal API server” and “Captive Portal Server” to be a little confusing, as these are similar terms. The latter also doesn’t get its own discussion in Section 2 and is confusingly called the “web portal server” in Figure 1.

After Figure 1, this seems to be consistently called the “web portal” (sec 2.6 and 4). In the API doc it is called a "user portal." It would be great to unify the terminology across the documents as a whole.

Benjamin Kaduk Discuss

Discuss (2020-06-08)
(1) and (2) should be easy to fix; (3) may well be "fixed" by telling me
I'm too naive :)

(1) Given that section 1 describes other options, the abstract should not
limit to just DHCP and RA as options for provisioning the API URL.

(2) Section 4.1 says that:

   5.  The Captive Portal API server indicates to the Enforcement Device
       that the User Equipment is allowed to access the external
       network.

but I believe this should be the "Captive Portal Server" (or, as the
previous point has it, the "web portal").

(3) Probably a "discuss discuss", but ... in Section 1 we have:

   *  Solutions SHOULD NOT require the forging of responses from DNS or
      HTTP servers, or any other protocol.  In particular, solutions
      SHOULD NOT require man-in-the-middle proxy of TLS traffic.

I'd like to understand the motivation for this one a little better.
Naively, it seems like we could get away with "MUST NOT require" while
still allowing it to be done.  Am I missing something obvious?
Comment (2020-06-08)
I'd like to see some more discussion of which signals are authenticated
and how, and what kind of authorization checks are possible.  In
well-run networks DHCP and RA signals should be relatively trustworthy,
but clients don't always have a good indicator for whether a given
network falls into that category.  Are there (other) mechanisms that can
be used to give trust in the authenticity of a given Captive Portal API
URI and that that API is authorthorized to provide unconstrained access
for the network in question?  We require TLS for accessing the API
server, but (as I note inline) there are more details that can be given
about this TLS usage.  What can be done to authenticate and authorize
the Captive Portal Server?  Most importantly (and most appropriately for
an architecture document), which of these properties are strictly
required vs. merely optional?  These are not Discuss-level points
because an architecture does not strictly-speaking need to specify all
of them, but having some indication of how we plan to achieve them would
give greater confidence that this architecture will be a useful one.

I'm happy to see the response to the genart reviewer's comment regarding
"a" vs. "the" capport architecture; thanks!

Abstract

   This document describes a CAPPORT architecture.  DHCP or Router
   Advertisements, an optional signaling protocol, and an HTTP API are
   used to provide the solution.  The role of Provisioning Domains

nit: there's perhaps a bit of a lack of parallelism in the list
structure, where we talk about specific mechanisms for provisioning
without describing the more abstract concept of provisioning, and list
that alongside an abstract mention of "a signaling protocol" and the
both-abstract-and-concrete "HTTP API".

Section 1

   Implementations generally require a web server, some method to allow/
   block traffic, and some method to alert the user.  Common methods of

nit: I'd suggest clarifying that this is "implementations of captive
portals" (or is it "captive networks"?).

   alerting the user involve modifying HTTP or DNS traffic.

nit: perhaps "at present" or "prior to this work"?  If I understand
correctly one of the goals of this work is to shift the balance of
captive portals away from these practices (while acknowledging that
fully eliminating them is not feasible in the near future).

   *  Solutions MAY allow a device to be alerted that it is in a captive
      network when attempting to use any application on the network.

I'm also not sure I understand this one, especially in light of the
following (paraphrased) "SHOULD allow learning of captivity before
application attempts to use the network".  What's the alternative to
"MAY allow", not-allowing such detection at all?

   *  The architecture MUST provide a path of incremental migration,
      acknowledging a huge variety of portals and end-user device
      implementations and software versions.

nit: "preexisting" or similar would go a long way here.

   *  Network provisioning protocols provide end-user devices with a

side note: using the word "provisioning" to describe things like DHCP
and RA feels odd to me, presumably due to my background and what I
expect provisioning to be.  I can see why it makes sense to use the term
for this purpose, though.  Perhaps an additional adjective could help
clarify what is meant, though I don't have a suggestion at hand.

      for this purpose are available in [RFC7710bis].  Other protocols
      (such as RADIUS), Provisioning Domains [I-D.pfister-capport-pvd],
      or static configuration may also be used.  A device MAY query this

side note: personally, I'd expand to "may also be used to convey this
API URI", though it's probably not required for clarity.

      The device MAY take immediate action to satisfy the portal
      (according to its configuration/policy).

side note: it's not entirely clear to me that we need a normative MAY
for this.

Section 2.1

   have Internet access).  The User Equipment communication is typically
   restricted by the Enforcement Device, described in Section 2.4, until
   site-specific requirements have been met.

It seems like these "site-specific requirements" must be the "Captive
Portal Conditions" that we just defined.

   *  SHOULD have a mechanism for notifying the user of the Captive
      Portal

It is pretty important that this mechanism be non-spoofable by, e.g.,
untrusted websites.  I think we should mention something about
"non-spoofable" here.

   *  MAY prevent applications from using networks that do not grant
      full network access.  E.g., a device connected to a mobile network
      may be connecting to a captive WiFi network; the operating system
      MAY avoid updating the default route until network access
      restrictions have been lifted (excepting access to the Captive

nit: maybe say in which direction the update would go and/or something
about why the move to wifi is desirable?

   None of the above requirements are mandatory because (a) we do not
   wish to say users or devices must seek full access to the captive
   network, (b) the requirements may be fulfilled by manually visiting
   the captive portal web application, and (c) legacy devices must
   continue to be supported.

side note: in my opinion, it's possible to support legacy devices in
practice without baking their limitations into the spec.

   If User Equipment supports the Captive Portal API, it MUST validate
   the API server's TLS certificate (see [RFC2818]).  An Enforcement

We should probably cite RFC 6125 here and say something about how the UE
gets a name to validate the server's certificate against (and what name
type to use).

   [I-D.ietf-capport-api] for more information.  If certificate
   validation fails, User Equipment MUST NOT proceed with any of the
   behavior described above.

I'm not sure which behavior the "behavior described above" is.
"[accessing...] OCSP responders, CRLs, and NTP servers" doesn't seem
quite right since that's *how* you determine that certificate validation
fails, but the bits further up about "navigate [to] the Captive Portal
user interface" do not seem to clearly call out a single behavior or set
of behaviors by the UE.

Section 2.2.2

   Although still a work in progress, [I-D.pfister-capport-pvd] proposes
   a mechanism for User Equipment to be provided with PvD Bootstrap
   Information containing the URI for the JSON-based API described in
   Section 2.3.

I don't think "JSON-based" is supported by the text of § 2.3 (and isn't
really appropriate for an architecture doc in most cases, anyway).

Section 2.3

   The purpose of a Captive Portal API is to permit a query of Captive
   Portal state without interrupting the user.  This API thereby removes
   the need for User Equipment to perform clear-text "canary" HTTP
   queries to check for response tampering.

nit: probably don't need to be specific about HTTP, here.

   At minimum, the API MUST provide: (1) the state of captivity and (2)
   a URI for the Captive Portal Server.

Is there anything useful to say about the URI scheme for the captive
portal server URI?  I guess I could probably (grudgingly) come up with a
case where http-not-s would be tolerable, but given that we admit the
possibility of "payment" as a captive portal condition, I don't want us
to encourage sending payment or other sensitive information over schemes
inappropriate for such information.

   A caller to the API needs to be presented with evidence that the
   content it is receiving is for a version of the API that it supports.

What about evidence that the content it is receiving is intended to be
used with, and authorized to speak for, the network it is joining?

   When User Equipment receives Captive Portal Signals, the User
   Equipment MAY query the API to check the state.  The User Equipment

nit: we seem to use "the state of its captivity" most places.

   The API MUST use TLS to ensure server authentication.  The
   implementation of the API MUST ensure both confidentiality and
   integrity of any information provided by or required by it.

It's a little weird to split the TLS requirements between here and
Section 2.1, though I guess if we're splitting things by role it's
probably unavoidable.  (I made my RFC 6125 comment in Section 2.1 and it
probably doesn't need to appear in both places.)

Section 2.4

   *  May signal User Equipment using the Captive Portal Signaling
      protocol if certain traffic is blocked.

nit: I think that "optionally signals" might be a better fit for the
list structure as used in the other bullet points.

Section 2.5

   When User Equipment first connects to a network, or when there are
   changes in status, the Enforcement Device could generate a signal
   toward the User Equipment.  This signal indicates that the User
   Equipment might need to contact the API Server to receive updated
   information.  For instance, this signal might be generated when the
   end of a session is imminent, or when network access was denied.

Would this signal also be used when the UE has successfully met the
Captive Portal Conditions?

Section 2.6

   *  The User Equipment queries the API to learn of its state of
      captivity.  If captive, the User Equipment presents the portal
      user interface from the Web Portal Server to the user.

[we previously discussed this UE behavior as optional.  I don't mind
having the text be descriptive like this, since it's describing the
diagram, and the diagram is not binding on all UEs, but it seemed worth
noting just in case.]

Section 3.1

   An Identifier is a characteristic of the User Equipment used by the
   components of a Captive Portal to uniquely determine which specific
   User Equipment is interacting with them.  An Identifier MAY be a

Do we want to say anything about what scope within which the uniqueness
must hold?  ("No" is probably fine.)

Section 3.2.1

   Each instance of User Equipment interacting with the Captive Network
   MUST be given an identifier that is unique among User Equipment
   interacting at that time.

side note: "MUST be given" gets a knee-jerk "by whom?" response from me.
It's probably okay for this document to not specify, though, as it may
depend on the nature of the Captive Network.

   Over time, the User Equipment assigned to an identifier value MAY
   change.  Allowing the identified device to change over time ensures
   that the space of possible identifying values need not be overly
   large.

Is the identifier assigned to a given UE on the same network expected to
be able to change as well?  This may have some privacy considerations...

Section 3.2.2

   are active at the same time.  This property is particularly important
   when the User Equipment is extended externally to devices such as
   billing systems, or where the identity of the User Equipment could
   imply liability.

nit(?): is it the UE that is extended externally or the identifier
thereof?

Section 3.2.4

   In some situations, the User Equipment may have multiple IP
   addresses, while still satisfying all of the recommended properties.

nit: as written, "while still satisfying all of the recommended
properties" is describing the UE, but the context of Section 3.4
suggests that we want to be talking about the recommended properties for
identifiers.

Section 3.5

   Accessing the API MAY depend on contextual information.  However, the
   URIs provided in the API SHOULD be unique to the UE and not dependent
   on contextual information to function correctly.

Should the per-UE APIs and/or the mapping between UE and per-UE API be
unguessable?  (Do we want to reference Capability URLs
[https://www.w3.org/TR/capability-urls/]?)

Section 4

I might consider explicitly saying "non-normative" somewhere in here.

Section 4.1

   4.  If necessary, the User navigates the web portal to gain access to
       the external network.

nit: "navigates to"

Section 4.2

   3.  The User Equipment's UI indicates that the length of time left
       for its access has fallen below a threshold

   4.  The User Equipment visits the API again to validate the expiry
       time

side note: I feel like there's implicitly some User action in here,
though I don't know that we need to actually say anything about it.
(Otherwise we wouldn't have the UI indicating things.)

Section 4.3

   Whenever a new Portal URI is received by end User Equipment, it
   SHOULD discard the old URI and use the new one for future requests to
   the API.

What kind of validation/authorization checks need to be applied to the
new Portal URI?

(nit: we probably should check the terminology in this section; the
Section 1.2 lexicon would call this information the "Captive Portal API
Server URI" and not a "Portal URI".)

Section 7

This mechanism rather inherently requires having multiple entities track
the UE's identity (and, thus, likely be tracking a proxy for the user's
identity).  It seems appropriate to include some discussion of the
privacy considerations of this tracking, and whether/what kind of
anonymity support is appropriate!

Section 7.1

   Given that a user chooses to visit a Captive Portal URI, the URI
   location SHOULD be securely provided to the user's device.  E.g., the
   DHCPv6 AUTH option can sign this information.

I'm not sure that I understand the intent behind the "Given that"
construction here.  Is it trying to emphasize user choice, and thus the
need for informed choice?

Section 7.2

[In the vein of my previous remarks, there are many ways to use TLS, and
usually we provide more details on how we expect TLS to be used.]

Section 7.3

   The API MUST ensure the integrity of this information, as well as its
   confidentiality.

Who/what is the attacker(s) that we need to preserve confidentiality from?

Section 7.4

   *  Accesses to the API Server are rate limited, limiting the impact
      of a repeated attack.

One might consider a flooding attack that tries to get the UE to use all
its (rate-limited) connections to get some information that is not the
information that it's most important for the UE to have.  If there's
only a single operation that can be performed at the API Server (which I
believe is the intent?) there is no such attack, but it may be worth
mentioning that there is no such attack.

Section 8.1

Interestingly, none of the places where we reference 7710bis have
surrounding text that clearly incur a normative dependency.

Appendix A

We explain the use of the "canary" term here, but have already used it
twice (with no forward-reference) in the body of the document.

   Another test that can be performed is a DNS lookup to a known address
   with an expected answer.  If the answer differs from the expected
   answer, the equipment detects that a captive portal is present.  DNS
   queries over TCP or HTTPS are less likely to be modified than DNS
   queries over UDP due to the complexity of implementation.

Is the reader supposed to draw the conclusion that DoTCP/DoH provide
less-reliable captive-portal detection than Do53?  (I assume "TCP" is
not a typo for "TLS", here, though am unsure enough to want to check.)

   Malicious or misconfigured networks with a captive portal present may
   not intercept these requests and choose to pass them through or
   decide to impersonate, leading to the device having a false negative.

nit: I suggest "these 'canary' requests" to clarify which requests we're
talking about.

Barry Leiba Yes

Deborah Brungard No Objection

Roman Danyliw No Objection

Comment (2020-06-09)
I support Martin and Ben's DISCUSS positions.

Thanks for laying out the architecture to explain the subsequent protocol drafts.  A few areas of feedback:

** Section 2.1. Per “At this time we consider …”, to what is “at this time” referring (maybe this is referring to the WG scope)?  This might not age well as currently framed.

** Section 2.2.  The architecture doesn’t explicitly describe which component is responsibility for provisioning the user equipment sufficiently so it can access the IP network anywhere.  I would have expected it to be the Provisioning Service.  Section 2.1, 2.3 and 2.4 describe the role of these components in the architecture and their requirements.  Section 2.2 does not.  Instead it describes candidate technologies.  It would be helpful to explicitly say.

** Section 2.3.  Perhaps this is too pedantic, but should the obvious be explicitly called out: the user equipment should only be able to check it’s own captivity status?  This would be some explicit notion of authorization.

** Section 2.3  Per “A caller to the API needs to be presented with evidence that the content it is receiving is for a version of the API that it supports.”, is the caller the User Equipment, the web browser or the end user – does that distinction matter – does each layer need anything different?

** Section 3.2.1.  Per “Each instance of User Equipment interacting with the Captive Network MUST be given an identifier that is unique among User Equipment interacting at that time.”, is “unique among user equipment interacting at that time” the same as saying “unique among the identifiers currently in use in the Captive Network”?  It might be useful to frame this guidance within the scope of the previous definitions.

** Section 3.2.2.  The acceptable workfactor for “hard” still isn’t clear here but I understand the difficulty of pinning it down while remaining flexible.

** Section 4.  Does this section provide normative guidance?  The introductory sentence suggests no by saying that this section describes “possible workflow[s]”.  However, Section 4.3 uses a normative SHOULD.

** Section 4.2.  Between step #2 and #3, did some kind of signaling happen to indicate that expiration is imminent, or did the UE keep state of some kind?  Keeping state isn’t mentioned as a UE requirement in Section 2.1.  Section 2.5. notes that a “signal might be generated when the end of a session is imminent”.

** Section 7.  This section would benefit from a discussion of the privacy impacts of the implicit identifiers embedded into the architecture (e.g., re-identification)

** Section 7.1. Per “If a user decides to incorrectly trust an attacking network ….”, you have an on-path attacker so additional risks include traffic redirection to arbitrary destinations to server malicious payloads; traffic analysis and loss of confidentiality; inline traffic modification; etc.

** Section 7.2.  Per “The solution described here assumes that when the User Equipment needs to trust the API …”, why is this conditional.  Doesn’t the UE have to trust the API server?

** Section 7.3.  In addition to integrity and confidentiality, is there an authenticity requirement?  I ask because Section 2.1. noted that the UE “SHOULD [be] allow[ed] access to any services that User Equipment could need to contact to perform certificate validation.”

Murray Kucherawy No Objection

Comment (2020-06-06)
No email
send info
Pretty straightforward.  Again, nice work.

Some nits:

Although I see why you did it, the capitalization of the bullet list in Section 3.2 appears peculiar.

Also curious is that "User Equipment" is defined in Section 2.1, but not shortened to "UE" anywhere other than in Section 3.5.

In Section 4.1, what's an "RA"?

Warren Kumari No Objection

Alvaro Retana No Objection

Comment (2020-06-09)
I have some minor comments:


(1) Please expand CAPPORT.


(2) §1: s/This document standardizes an architecture/This document describes an architecture   This is not a standard track document.


(3) §1: "MAY allow a device to be alerted"   Other parts of the document (even in the same section) talk about "devices can be notified" or "informs an end-user", while "alert" is not mentioned anywhere else.  Given that "alert" has the normative attachment, it would be nice to use consistent language.


(4) §2.1: "E.g....MAY avoid updating..."   s/MAY/may   This is an example, not a normative statement.


(5) §3.1: "An Identifier MAY be a field...Or, an Identifier MAY be an ephemeral property..."   s/MAY/may  These seem to be statements and not normative statements.

Éric Vyncke No Objection

Comment (2020-06-08)
Thank you for the work put into this document. The document is easy to read. I also appreciate the fact that "devices without user interfaces" are not ignored by this document.

Please find below a couple on non-blocking COMMENTs. A response/comment for those COMMENT will be read with interest.

I hope that this helps to improve the document,

Regards,

-éric

== COMMENTS ==

Is there a reason why the words "captive portal" do not appear in the abstract? This would assist normal human beings (outside of the WG) to find the document.

I found no text about what happens to the traffic inside the captive network. Is it allowed even when still in captive mode ?

-- Section 1.2 --
Even if the document support "devices without user interfaces", I wonder why the I-D uses "User Equipment" rather than "Client Equipment" (which is also more aligned with "Server"). Nothing dramatic, just curious about the reason.

-- Section 2.1 --
"At this time we consider only devices with web browsers" while the previous text was about "devices without user interfaces". Finally, is this document for devices with or without human interface ?

-- Section 2.6 --
While the components are described as being optional collocated, what about resiliency ? I.e., having two different instances on one component.

-- Section 3.4.2 ---
While I appreciate that the section contains text about multiple IPv6 addresses, I suggest to mention the dual-stack use case explicitly.

-- Section  3.4 --
I was expecting to see the MAC address also used as identifier. Is there any reason why it is not mentioned? If so, may I suggest to document the absence of a MAC address section in the examples?

Robert Wilton No Objection

Comment (2020-06-11)
I found this document easy to read, but have a few comments.

I support the 3rd bullet of Ben's discuss.

I was surprised by the diagram in section 2.6, since it seems to imply that the Provisioning Service kicks everything off, but I would have expected the User equipment to initiate the flow, which is articulated in the first step of section 4.1.  Hence, I think that the diagram could be more clear if it also showed the initial request from the client (as per the first step in 4.1).

Finally, I note that this document makes no mention of OAM considerations.  Having some text covering these aspects would probably be beneficial.

Erik Kline Recuse

Alissa Cooper No Record

Martin Vigoureux No Record

Magnus Westerlund No Record