NFS Version 4.1 Update for Multi-Server Namespace

Summary: Needs a YES. Has a DISCUSS. Needs 9 more YES or NO OBJECTION positions to pass.

Benjamin Kaduk Discuss

Discuss (2019-03-05)
I think this is a useful document to clarify the proper operation of NFS
4.1, and to clarify the distinction between trunking address changes and
migration, and intend to change my ballot to Yes once the following
issues are resolved.

The Security Considerations state or imply in multiple places that the
use of RPCSEC_GSS to access a designated server can be used to verify
the target name and address resolution, but this only holds when the
GSSAPI (and Kerberos) implementation refrains from using insecure DNS to
canonicalize the target principal hostname.  (Several major ones do not,
by default.)  It is irresponsible to not note this risk here.

Specifically, the following text from Section 1.3 of RFC 4120 is honored
more in the breach:

   Implementations of Kerberos and protocols based on Kerberos MUST NOT
   use insecure DNS queries to canonicalize the hostname components of
   the service principal names (i.e., they MUST NOT use insecure DNS
   queries to map one name to another to determine the host part of the
   principal name with which one is to communicate).

We are deleting Section 11.7 of 5661, but it talks about encoding
"opaque" values conveyed from server to server via client in big-endian
order, and don't seem to cover that in this document.  Is that problematic?

I also have a few other issues/questions that don't quite rise to
DISCUSS level, but are still important to address; I tagged those with
"IMPORTANT" in the comment section, so they don't get drowned in the
Comment (2019-03-05)
In general, my remarks from mv0-trunking-update about clarity of
references to "the current document" will apply to this document as
well.  But I mention this only so that the authors/AD can note it in
their interactions with the RFC Editor; I don't think we need to discuss
it at all at this point in the process.

There's some stuff in here that, taken literally, is removing
functionality and/or changing assumptions that a client could make, in a
way that is sufficiently incompatible with NFS 4.1 that it should
arguably be a manner for a new minor version of the protocol.  But, I'm
inclined to agree with the assessment of the WG that these are
essentially giant errata fixes and are being made to reflect the intent
of the protocol, remedying internal inconsistencies in 5661.

Section 2

Use the RFC 8174 boilerplate please.  (Sorry, I know this is the Nth
time you're hearing it.)

Section 3.1

   o  Each NFSv4 server is assumed to have a set of IP addresses to
      which NFSv4 requests may be sent by clients.  These are referred

nit: do we want to limit ourselves to only *IP* addresses?  I don't
remember to what extent RDMA connections do/don't use IP.

   o  Two network addresses connected to the same server such that those
      addresses can be used to support a single common session are
      referred to as session-trunkable.  Note that two addresses may be
      server-trunkable without being session-trunkable and that when two
      connections of different connection types are made to the same
      network address and are based on a single file system location
      entry they are always session-trunkable, independent of the
      connection type, as specified by [RFC5661], since their derivation
      from the same file system location entry together with the
      identity of their network addresses assures that both connections
      are to the same server and will return server-owner information
      allowing session trunking to be used.

We may need to hear from some of my IESG colleagues, but I'm not sure
there's anything in principle that prevents a load-balancer or similar
device that presents a single public-facing IP address but dispatches to
different backends based on TCP/UDP port (which could break this
assumption of session-trunkability).

Section 3.2

   o  Reorganization made necessary by the fact that two network access
      paths to the same file system instance needs to be distinguished
      clearly from two different replicas since the former share locking
      state and can share session state.

nit: there's maybe a singular/plural mismatch here, but I think the
clarity is best if we say "the case of two network access paths" and
"the case of two different replicas".

   o  The need for a clear statement regarding the desirability of
      transparent transfer of state together with a recommendation that
      either that or a single-fs grace period be provided.

nit: maybe expand that the transfer of state is across replicas.

   o  A clarification of the fs_locations_info attribute to specify
      which portions of the information provided apply to a specific
      network access path and which to the replica which that path is
      used to access.

nit: s/which to/to which/

Section 4.2

   o  In some cases, a server will have a namespace more extensive than
      its local namespace, by using features associated with attributes
      that provide file system location information.  These features,
      which allow construction of a multi-server namespace are all

nit: comma after "namespace"

   o  File system location elements are derived from location entries
      location element.  File system location entries that contain a
      host name, are resolved using DNS, and may result in one or more

nit: no comma after "host name"

Section 4.3

   NFSv4.1 contains RECOMMENDED attributes that provide information
   about how (i.e. at what network address and namespace position) a

nit: comma after "i.e.".

Section 4.4

We don't seem to talk about Section 4.5.3 at all, here.

Section 4.5

   Where a file system was previously absent, specification of file

nit: "previously absent" might be a strange wording; "is absent without
a transition to absent" is perhaps the intended scenario (though it's
also awkward wording).

Section 4.5.2

   o  The client may fetch the file system location attribute for the
      file system.  This will provide either the name of the server
      (which can be turned into a set of network addresses using DNS),
      or a set of server-trunkable location entries.  [...]

If the set of addresses in the server response are not all guaranteed to
be server-trunkable (which is my understanding, since replication
addresses for different instsances can also appear), then this text
should probably acknowledge that somehow.

Section 4.5.3

   Fs_locations_info includes a flag, FSLI4TF_RDMA, which, when set
   indicates that RPC-over-RDMA support is available using the specified
   location entry, by "stepping up" an existing TCP connection to
   include support for RDMA operation.  This flag makes it convenient
   for a client wishing to use RDMA to establish a TCP connection and
   then convert to use of RDMA.  [...]

nit: ambiguous parsing "use RDMA to establish a TCP connection", perhaps
s/ to/, as it can/

Section 4.5.5

   Although a single successor location is typical, multiple locations
   may be provided.  When multiple locations are provided, the client
   will typically use the first one provided.  If that is inaccessible
   for some reason, later ones can be used.  In such cases the client
   might consider that the transition to the new replica as a migration
   event, even though some of the servers involved might not be aware of
   the use of the server which was inaccessible.  In such a case, a
   client might lose access to locking state as a result of the access

Just to double-check: "such cases" are only when the first location is

Section 4.5.6

(soapbox) There already is an example of a global filesystem namespace,
in the form of AFS.  I trust the WG to make the right decision as to
whether an informational reference is appropriate, though, since I have
a conflict of interest.

   However, there are facilities provided that allow different clients
   to be directed to different sets of data, to enable adaptation to
   such client characteristics as CPU architecture.  These facilities
   are described in Section 11.10.3 of [RFC5661] and in Section 12.2.3
   of the current document.

IMPORTANT: There is no fundamental restriction of this technology to
that use, so some hedging language like "such as to enable adaptation"
may be in order.

Section 4.5.7

   o  Additions to the list of network addresses for the current file
      system instance need not be acted on promptly.  However the client
      can choose to use the new address whenever it needs to switch
      access to a new replica.

Is this "replica" correct, or are we talking about the case of switching
to a different endpoint for the current instance?

Section 5

   o  The additional Section 9 in the current document discusses the
      case in which a shift to a different replica is made and state is
      transferred to allow the client the ability to have continues

nit: "continued"

      access to the accumulated locking state on the new server.

nit: maybe a comma before "on the new server", and/or s/the accumulated/its

   o  Sections 11.7 and 11.7.1 of [RFC5661] are to be replaced by
      Sections 8 8.1 respectively of the current document These sections
      would appear as Section 11.9 and 11.9.1 in an eventual
      consolidated document.

nits: "8 and 8.1, respectively,", and full stop before "These".

   o  Section 11.77 of [RFC5661] is to be replaced by Section 8.9.

nit: "11.7.7"

Section 7

   o  When there are no potential replacement addresses in use but there
      are valid addresses session-trunkable with the one whose use is to
      be discontinued, the client can use BIND_CONN_TO_SESSION to access
      the existing session using the new address.  Although the target
      session will generally be accessible, there may be cases in which
      that session in no longer accessible, in which case a new session
      can be created to provide the client continued access to the
      existing instance.

IMPORTANT: When this new session is created, do I still get to use "the
same filehandles, stateids, and client IDs" as before and have
continuity of lock state?  If not, the intro paragraph needs to have an
appropriate disclaimer on that list.

Section 8

IMPORTANT: I think (but am not 100% sure) that some of the deletions
from Section 11.7.2 and subsections are nominally behavior changes by
their removal, such as preserving the fsid in both locations or fileid
values not changing across the transition.  Can someone help point me to
the right place for all of these bullet points?

Section 8.2

[There appears to be no difference between this text and Section 11.7.3
of RFC 5661.]

Section 8.5

[There appears to be no difference between this text and Section 11.7.6
of RFC 5661.]

Section 10.1

   Note that the specified actions only need to be taken if they are not
   already going on.  For example, when NFS4ERR_MOVED is received when
   accessing a file system for which transition recovery already going
   on, the client merely waits for that recovery to be completed while
   the receipt of SEQ4_STATUS_LEASE_MOVED indication only needs to
   initiate migration discovery for a server if it is not going on for
   that server.

nit: "it" is perhaps unclear; I suggest "if such discovery is not
already underway".

Section 10.3

   1.  Performing an EXCHANGE_ID directed at the location address.  This
       operation is used to register the client-owner with the server,

I'm not seeing "client-owner" used as a term much.  Do we mean a "client
id string" in the terminology introduced in Section 3.1? (Of course, we
do have client_owner and eia_clientowner and such as other options...)

Section 10.4

   In some situations, it is possible for a BIND_CONN_TO_SESSION to
   succeed without session migration having occurred.  If state merger
   has taken place then the associated client ID may have already had a
   set of existing sessions, with it being possible that the sessionid
   of a given session is the same as one that might have been migrated.

Is this a situation that the client can detect and act on?

Section 10.5

   Access to appropriate locking state should need no actions beyond
   access to the session.  [...]

I understand that this is an 8174 lowercase "should", but I think that
we would benefit from some clarity as to this being the expected state
in normal operation but that some exceptional circumstances (examples?)
could cause it to not be the case, and as such the following checks
are not just pro forma.

Section 11

   In the event of file system migration, when the client connects to
   the destination server, it needs to be able to provide the client
   continued to access the files it had open on the source server.
   There are two ways to provide this:

nit: I think the "it" in "it needs to be able to provide" formally binds
to the client, as the subject of the previous clause.  I think we mean
the destination server, though.

      These features are discussed separately in Sections 11.2 and 11.3,
      which discuss Transparent State Migration and session migration

nit: this text is indented as if it is part of the second bullet point,
but it seems to be general discussion that applies to both bullet

Section 11.2

   the file system being migrated.  In addition to client id string and
   verifier, the source server needs to provide, for each stateid:

IMPORTANT: There seem to be security considerations relating to trusting
the client id string (which is, IIRC, spoofable), that are not discussed
in the security considerations section.  Could an attacker, e.g., claim
to be a different client that the attacker knows is going to be
scheduled for migration?

Section 11.3

   One part of the necessary adaptation to these sorts of issues would
   restrict enforcement of normal slot sequence enforcement semantics
   until the client itself, by issuing a request using a particular slot
   on the destination server, established the new starting sequence for
   that slot on the migrated session.

(This means that any reordering that affected the first request received
on a particular slot would cause the loss/rejection of requests sent
prior to it, right?)

Section 12.2.1

   Note that both of these items specify attributes of the file system
   replica and should not be different when there are multiple
   fs_locations_server4 structures for the same replica, each specifying
   a network path to the chosen replica.

I'm not sure what the "both" is referring to, not seeing any class of
things listed where exactly two elements are present.

We may also want to specify some error-handling for when multiple
fs_locations_server4 structures appear for the same replica with
conflicting information.

   With the exception of the transport-flag field (at offset
   FSLIBX_TFLAGS with the fls_info array), all of this data applies to
   the replica specified by the entry, rather that the specific network
   path used to access it.


   o  Explicit definition of the various specific data items within XDR
      would limit expandability in that any extension within would
      require yet another attribute, leading to specification and
      implementation clumsiness.  In the context of the NFSv4 extension
      model in effect at the time fs_locations_info was designed (i.e.
      that described in [RFC5661]), this would necessitate a new minor
      to effect any Standards Track extension to the data in in

nit: "minor version"

   In light of the new extension model defined in [RFC8178] and the fact
   that the individual items within fls_info are not explicitly
   referenced in the XDR, the following practices should be followed
   when extending or otherwise changing the structure of the data
   returned in fls_info within the scope of a single minor version.

(This text and the items following it suggests that we should have an
IANA registry, which would then have a column for replica vs. path,
etc. Not a good fit for this document, though.)

Section 12.2.3

[There appears to be no difference between this text and Section 11.10.3
of RFC 5661.]

Section 13.1

                     They are all based upon attributes that allow one
   file system to specify alternate, additional, and new location
   information which specifies how the client may access to access that
   file system.

nit: s/to access//
Also maybe s/which/that/ but the RFC Editor is great about this.

Section 13.2

                                  Thus, the identifiers generated by two
   servers within that set can be assumed compatible so that, in some
   cases, identifiers by one server in that set that set may be
   presented to another server of the same scope.

Presumably this is "identifiers *generated* by one server"?

Section 13.3

Please mention the Section number of RFC 5661 here as well as
in the toplevel Section 13.

Setion 13.7.3

   to another client.  This can occur because the reclaim has been done
   outside of a grace period of implemented by the server, after the

nit: s/of implemented/implemented/

Section 14.3

   In situations in which the registration of the client_owner has not
   occurred previously, the client ID must first be used, along with the
   returned eir_sequenceid, in creating an associated session using

Do we want to mention the BIND_CONN_TO_SESSION usage here as well?

   A server MUST NOT provide the same client ID to two different
   incarnations of an eir_clientowner.

I see this is basically unchanged from 5661 so maybe it is out of scope,
but what does the server use to distinguish different "incarnations" or
an eir_clientowner?

Section 15.3

   It should be noted that there are situations in which a client needs
   to issue both forms of RECLAIM_COMPLETE.  An example is an instance
   of file system migration in which the file system is migrated to a
   server for which the client has no clientid.  As a result, the client
   needs to obtain a clientid from the server (incurring the
   responsibility to do RECLAIM_COMPLETE with rca_one_fs set to FALSE)
   as well as RECLAIM_COMPLETE with rca_one_fs set to TRUE to complete
   the per-fs grace period associated with the file system migration.

IMPORTANT: Is there an ordering requirement between the rca_one_fs ==

Section 15.4

   o  When servers have no support for becoming the destination server
      of a file system subject to migration, there is no possibility of
      a per-fs RECLAIM_COMPLETE being done legitimately and occurrences
      of it SHOULD be ignored.  However, the negative consequences of
      accepting mistaken use are quite limited as long as the does not
      issue it before all necessary reclaims are done.

As long as who does not issue it?

   o  When a server might become the destination for a file system being
      migrated, inappropriate use per-fs RECLAIM_COMPLETE is more
      concerning.  In the case in which the file system designated is

nit: "use of"

      not within a per-fs grace period, it SHOULD be ignored, with the
      negative consequences of accepting it being limited, as in the
      case in which migration is not supported.  However, if it should
      encounter a file system undergoing migration, it cannot be
      accepted as if it were a global RECLAIM_COMPLETE without
      invalidating its intended use.

We seem to be using "it" throughout here to refer both to
RECLAIM_COMPLETE and to the server, which is kind of confusing.

Section 16

It seems that neither RFC 5661 nor this document explicitly note that if
the attacker gets a client to use bad locations and sends the client to
an attacker-controlled server, the attacker can give back arbitrary data
for the filesystem hierarchy for the client to use.  Hopefully this is
obvious, but it may be worth explicitly noting as the motivation for
taking the location information so seriously.

I agree with the secdir reviewer that:

      In light of the above, a server should present file system
      location entries that correspond to file systems on other servers
      using a host name.  This would allow the client to interrogate the

should be an RFC 2119 "SHOULD" (not sure what the WG poll turned up).

I do see several good proposed changes in response to the secdir review;
I'd propose further changing "avoid corruption of data in flight" to
"avoid modification of data in flight" since "corruption" can have a
connotation of randon bit flips, but the intended meaning is that an RFC
3552 attacker can deliberately change things.

(I'm also fairly sympathetic to the desire to not change the MTI
algorithms in this document, even though they really are quite overdue
for updates.  My understanding is that this document is intended as a
glorified errata about the migration/trunking behavior and is trying to
not change anything else.)

Appendix B

      o  Sections 11.8, 11.8.1, 11.8.2, and 11.9, are unchanged although
         they would be renumber as Sections 11.13 (with included

nit: "renumbered"

(Spencer Dawkins) Yes

Deborah Brungard No Objection

(Mirja Kühlewind) No Objection

Comment (2019-03-07)
I didn't have the time to review this document in depth but I agree with others that the selected approach is hard to follow. I would have preferred a bis document or a stand-alone document that updates RFC5661 content-wise but does not have a patch-style and therefore can "simply" be consumed in addition to RFC5661.

(Alexey Melnikov) No Objection

Comment (2019-03-06)
No email
send info
I am borderline between Abstain and No Objections on usefulness of this document as a set of patches to another RFC.

(Ben Campbell) Abstain

Comment (2019-03-05)
Thanks for the huge amount of work in this draft, but I cannot recommend publishing it in it's current form. I am balloting "ABSTAIN" rather than "DISCUSS", so that I do not get in the way of publication if the rest of the IESG thinks it's fine as is. I apologize for not doing a more technical review, but I think the structural issues need to be addressed first.

I once spent several weeks over a summer in high school applying page updates to a shelf full of US Tax Code binders in a CPAs office. The IRS, (or whoever published the code) would send out boxes of new page-sets, each labeled for which original pages it would replace.  When we were done in the CPA office, we had a coherent set of documents, at least to the degree you can apply words like "coherent" to tax codes.

This draft does something akin to that, except that we don't end up with a coherent document as a result. If the RFC Editor applied patches from RFCs that  "UPDATE" other RFCs to render final text,  things would be different. But that doesn't happen with RFCs.  That being said, I don't object to RFC updates in general, as long as those updates are simple and fairly self-contained. But this draft is over 100 pages of intricate updates, reorganizations, and explanatory text. This will make life very hard for readers that were not involved in the standards process.

I think the right approach for an update of this complexity is to just do a 5661bis draft, and use this one as an input to that. Such a bis draft could be something as simple as applying the changes in this draft (and maybe trunking-update)  and publishing the result.

Alissa Cooper Abstain

Comment (2019-03-06)
I agree with Ben Campbell and am therefore abstaining. It seems like the proper path to a more broadly usable specification would have been to obsolete RFC 5661 and produce a bis with the changes applied. I don't see an explanation anywhere for why that was not done.

The RFC 8174 boilerplate should be used in Section 2.

The shepherd write-up notes an issue with "minor copyright dates." The issue is not one of dates but rather about whether the pre-RFC5378 disclaimer is needed or not.

(Suresh Krishnan) Abstain

Comment (2019-03-07)
Thanks to the authors for doing the hard work in creating this document. I sincerely tried to give this document a proper review but, like my abstaining co-ADs, I gave up along the way. The biggest issues for me were the reorganizations, lack of information about interoperability and backward compatibility with unupdated implementations, code snippet replacements with no apparent change (e.g. section 14.1 replacing 18.35.1, section 14.2 replacing 18.35.2. etc.) and so on. I also think that doing a new version of RFC5661 with the changes applied is the right way to go about this, but I will abstain so that the WG and the responsible AD can make the call.

(Terry Manderson) Abstain

Comment (2019-03-07)
My review of this document took me some time as I had to do a side my side read with RFC5661. While not normally a bad way to implement 'updates' I found myself lost more times than not. The complexity of the updates in this way loses readability and technical coherence, even though a lot of thought and effort has been invested in this work. I think that the loss of readability and technical coherence should be avoided for many reasons if not based alone on what we currently see in the set of documents which describes DNS. On this document I am balloting ABSTAIN as I feel that there are better ways to handle the depth breadth of this update, however I will not stand in the way of publication.

(Eric Rescorla) Abstain

Comment (2019-03-07)
I concur with my colleagues that a 100 page document consisting of diffs is not really something we can expect people to process

Alvaro Retana Abstain

Comment (2019-03-06)
I agree with Ben's opinion.  As it is being discussed on the nsfv4 list, there are enough issues and changes to rfc5661 to justify a bis [1], the most significant seems to be what is already included in this document.


(Adam Roach) Abstain

Comment (2019-03-06)
I agree with Ben's comments that this format is hostile to implementors, and that the community would be far better served by a bis document.

Roman Danyliw No Record

Martin Duke No Record

Erik Kline No Record

Murray Kucherawy No Record

Warren Kumari No Record

Barry Leiba No Record

Martin Vigoureux No Record

Éric Vyncke No Record

Magnus Westerlund No Record

Robert Wilton No Record