Skip to main content

Last Call Review of draft-ietf-nfsv4-rfc3530-migration-update-07
review-ietf-nfsv4-rfc3530-migration-update-07-genart-lc-davies-2016-01-18-00

Request Review of draft-ietf-nfsv4-rfc3530-migration-update
Requested revision No specific revision (document currently at 08)
Type Last Call Review
Team General Area Review Team (Gen-ART) (genart)
Deadline 2016-01-18
Requested 2016-01-07
Authors David Noveck , Piyush Shivam , Chuck Lever , Bill Baker
Draft last updated 2016-01-18
Completed reviews Genart Last Call review of -07 by Elwyn B. Davies (diff)
Genart Last Call review of -07 by Elwyn B. Davies (diff)
Opsdir Telechat review of -07 by Victor Kuarsingh (diff)
Assignment Reviewer Elwyn B. Davies
State Completed
Review review-ietf-nfsv4-rfc3530-migration-update-07-genart-lc-davies-2016-01-18
Reviewed revision 07 (document currently at 08)
Result Almost Ready
Completed 2016-01-18
review-ietf-nfsv4-rfc3530-migration-update-07-genart-lc-davies-2016-01-18-00
  
  
    I am the assigned Gen-ART reviewer for this draft. The General Area
    


    Review Team (Gen-ART) reviews all IETF documents being processed
    


    by the IESG for the IETF Chair. Please wait for direction from your
    


    document shepherd or AD before posting a new version of the draft.
    





    For more information, please see the FAQ at
    







<http://wiki.tools.ietf.org/area/gen/trac/wiki/GenArtfaq>

.
    





    Document:
    draft-ietf-nfsv4-rfc3530-migration-update-07.tx


    Reviewer:
    Elwyn Davies


    Review Date:
    2016/01/16


    IETF LC End Date:
    2016/01/18


    IESG Telechat date: 2016/01/18





    Summary:
    Almost ready.  There are a few minor issues and a considerable
    number of nits that need attention.  The text is extremely complex
    and in some places the logic is not entirely clear.  I would
    encourage the authors to revisit the various usages of RFC 2119
    language;  a number of the instances, especially several of the
    'SHOULDs', appear to refer to operational or administrative
    decisions rather than matters that affect the protocol - and are
    therefore not actionable by the receiving implementation.  If
    appropriate the language should be modified.





    Major issues:
    


    ------------------


    None





    Minor issues:


    ------------------


    Bits possibly not covered from RFC 7530:


    ---------------------------------------------------------


    With respect to s4:


    In s9.1.1, para 1 of RFC 7530 there is:







Breaking
   the lease state amounts to the server removing all lock, share
   reservation, and, where the server is not supporting the
   CLAIM_DELEGATE_PREV claim type, all delegation state associated with
   the same client with the same identity.  For a discussion of
   delegation state recovery, see 

Section 10.2.1

.





    There is nothing in the update about servers that don't support the
    CLAIM_DELEGATE_PREV claim type.  Should something be said about
    this?





    The general description of the distinction between open owners and
    lock owners in para 2, 3 and 4 of s9.1.1 is lost if the new text is
    taken as a true replacement of s9.1.1.  This is undesirable.  





    ------------------------------------------


    Items in the text of this draft:


    -----------------------------------------


    s5.1.3: When a file system is migrated, the source server is queried
    about the new location of the file system using
    GETATTR(fs_locations).  Since the file system involved has migrated,
    the server no longer has a connection with the relevant file system
    but still needs to respond to the GETATTR appropriately.  Does there
    need to be some discussion of how long the server needs to hang onto
    information that will allow it to respond to these requests,
    particularly since the suggested means of discovery uses an
    arbitrary file from the file system.  It is possible that I have
    missed something or my understanding/memory of how fs_locations
    works is faulty. 





    s7.4.5.1: It would be helpful to indicate the if-then structure of
    the bullets by making current bullets 3 and 4 sub-bullets of bullet
    2.   Indeed, on second thoughts, it is not clear which 'if' (either
    that in bullet 2 or bullet 3) the 'otherwise' in bullet 4 applies
    to.  Please make the logic clearer.





    s7.5, bullet 1: Surely the decision as to whether privacy is needed
    is an administrative/operational deployment decision?  Transfer
    integrity is vital and provision should be made for transfer privacy
    to be possible if the installation requires it, but I can't see that
    privacy needs to be ensured in all possible circumstances.





    =======================





    Nits/editorial comments:
    


    -----------------------------------





    General: s/e.g. /e.g., /g (5 instances); s/i.e. /i.e., /g  (5
    instances).





    General (in s5): s/openowner/open-owner/g; s/lockowner/lock-owner/g





    General: s/filesystem/file system/g (In line with usage in RFC
    7530).





    General: Lack of terminology section. There are a number of terms
    that might be advantageously incorporated into a terminology section
    - lock types, client id, stateid, incarnation verifier (and its
    relation to reboots/'boot verifier'), server (address) trunking,
    open owners, lock owners spring to mind.  








    s3 , para 6/s4.1, para 7: The terms 'client boot instance' and
    (subsequently) 'boot verifier' are not used in RFC 7530.  s9.1.1
    uses the term 'client incarnation verifier' (and this term is
    actually used in s4.1 , para 10 (second bullet after nfs_client_id4
    data structure definition).  I personally think is a more
    descriptive term.  I would be inclined to replace 'boot verifier'
    with 'incarnation verifier' throughout, and provide a definition in 
    a terminology section (see above).  This could also cover the case
    where the incarnation verifier is changed because the structure is
    destroyed rather then the client host rebooting (as noted in s4.6)
    although the intention is that it should only change on a reboot.





    s3, 2nd set of bullets, bullets 2 and 3; Also s4.9, para 1: s/client
    id/client id string/





    s3, last bullet: It would be worth pointing out that there is a
    complete revision of the definition of the SETCLIENTID operation. 
    This is relevant because there is a pointer to the definition of
    SETCLIENTID in s4.





    s4: Correct section names from RFC 7530..


    OLD:


    The replaced sections are named "client ID" and "Server Release of
    Clientid."


    NEW:


    The replaced sections are named "Client ID" and "Server Release of
    Client ID."


    END





    s4.1, bullet 1: s/client IP address/the client IP address/





    s4.1,bullet 2: I would replace 'save' by 'maintain a non-volatile
    record across reboots of' to make it clear what is intended.





    s4.1, last para:  The concept of 'server trunking' needs to be
    defined.  NFSv4.0 per RFC 7530 does not have this concept - it is
    introduced in NFSv4.1, thus somebody looking from the PoV of a 4.0
    implementer need not know anything about trunking.   





    s4.2: Section references to relevant parts of RFC 7530 that provide
    definitions of structures and operations would be helpful here. 





    s4.2, bullet 1 of second set of bullets (under struct
    nfs_client_id):







The id field is a variable-length string which uniquely identifies
      a specific client.  Although, we describe it as a string and it is
      often referred to as a "client string," it should be understood
      that the protocol defines this as opaque data.





    Later on s5.1 recommends that opaque data such as this should be
    encoded in network byte order.  It would be helpful to repeat or
    introduce this recommendation (make it a MUST???) here to ensure
    that the id will be identical on servers of whatever endianess. 
    Also the term 'network byte order' is primarily relevant to data for
    which the internal representation is a (binary) number.  A better
    way of saying this might be 'the encoding and decoding processes
    (e.g., using network byte order for the external representation) for
    the opaque data result in the same internal representation whatever
    the endianness of the originating and receiving machines' even if it
    is somewhat more long winded.





    s4.2, para 20 (across the boundary between p7 and p8):







   This shorthand client identifier (a client ID) is
   assigned by the server and should be chosen so that it will not
   conflict with a client ID previously assigned by same server.  This
   applies across server restarts or reboots.





    Would there be any advantage to recommending that servers should try
    to generate clientid4's that are unique to the server as far as
    possible as well as different across reboots, etc.?  It would appear
    to minimise the risk of clashes during migrations but this may be
    unnecessarily complicated and the client has to cope with the remote
    possibility anyway. I note that there is a statement in s4.8, para 4
    on page 19







      Given that it is already highly unlikely that the clientid XC is
      duplicated by distinct servers, the probability that SCn is
      duplicated as well has to be considered vanishingly small. 





    Adding the recommendation would support this statement.





    s4.2, para 7 on page 8:


    s/of administrative action,/of administrative action./ (swap comma
    for period/full stop)





    s4.2, last bullet: 







   o  Merger of state under the associated lease with another lease
      under a different clientid causes the clientid4 serving as the
      source of the merge to cease being recognized on its server.
      (Always returns NFS4ERR_STALE_CLIENTID)






    Should 'clientid causes the clientid4' be 'client ID causes the
    clientid4'?  


    There are 10 other instances of clientid (rather than clientid4)
    where the same question applies.





    s4.2, para after last bullet on p9:







In
   cases of server or client error resulting in this error, use of
   SETCLIENTID to establish a new lease is desirable as well.





    Which error is meant?  There isn't one mentioned in this para - I
    guess _STALE_CLIENTID but could be 'any of the above errors'.


     


    s4.2, page 9.







(See the section entitled "Server Failure and Recovery") 





    and







(see the section entitled "lock-owner" for
   details)





      I take these are intended to be sections in RFC 7530 (respectively
    9.6.2 and 9.1.5) -  adding these section numbers and the RFC ref
    would be helpful.





    s4.2, para last but two:







   In the last two cases, different recovery procedures are required.
   See Section 5.1.1 for details. 





    It isn't clear which cases you are referring to here... the last two
    bullets two paras before or the last two cases referred to in the
    previous para out of







   In the event of a server reboot, loss of lease state due to lease
   expiration, or administrative revocation of a clientid4, the client






    Please make this more explicit.








    s4.2, last para:







   See the detailed descriptions of SETCLIENTID and SETCLIENTID_CONFIRM
   for a complete specification of these operations.





    Section references would help, plus a note that the definition of
    SETCLIENTID is now in this document (s7.4) rather than Section 16.33
    of [RFC7530].  SETCLIENTID_CONFIRM is at Section 16.34 of [RFC7530].


     


    s4.3, last para:







   In that event, when the server gets a SETCLIENTID specifying a client
   id string for which the server has a clientid4





    Which of the three events in the previous para is being referred
    to?  Presumably the deliberate change of the primcipal since the
    other two are not really recognizable in a controlled way. ;-)





    s4.4, 2nd bullet on page 11 (para 7):







      the client to be
      aware when two server IP addresses are connected to the same
      server (they return the same server name in responding to an
      EXCHANGE_ID).





    The comment about 'return[ing] the same server name' in NFSv4.1 is
    not the real reason the client is sure they are the same server. 
    Would suggest changing this to '(Section 2.10.5.1 of [RFC5661]
    explains how the client is able to assure itself that the
    connections are to the same logical server.)'  





    s4.4, 3rd bullet on page 11 (para 9):  Suggest adding a reference to
    the fs_locations discussions in RFC 7530 - Perhaps '(see Sections
    8.1 and 8.42 of [RFC7530])'.








    s4.4. 4th bullet on page 11 (para 10):







      in that the two different
      client id strings sent to different IP addresses may wind up on
      the same IP address, adding confusion.





    The phrase 'same IP address' doesn't really express what is going on
    IMO.  Perhaps 'same (logical) server'.





    s4.5, para 3: s/per server network addresses./per server network
    address./





    s4.5, last para:







Therefore, client implementations
   that support migration with transparent state migration SHOULD NOT
   use the non-uniform client id string approach, except where it is
   necessary for compatibility with existing server implementations 





    Arguably, this is an operational decision rather than something that
    affects the protocol... possibly s/SHOULD NOT/should not/. 





    s4.6, para 10: s/typically the reboot time./typically at reboot
    time./





    s4.7, para 6: s/the spec/this specification/





    s4.7, para 8:   Spurious double quote (") at end of para:







enable transparent state migration."








    s4.7, para 9: Missing period/full stop at end of para.








    s4.8, para 3: s/blind-sided/surprised/ (blind-sided is rather too
    idiomatic).





    s4.8, para 7: s/Servers MUST NOT do that./Servers MUST NOT behave in
    this way./





    s4.8, last para on page 17:







   In this algorithm, when SETCLIENTID is done it will use the common
   nfs_client_id4 and specify the current target IP address as part of
   the callback parameters.





    I am unclear why (and, indeed, how) the target IP address needs to
    be incorporated in the callback parameters.  AFAICS, the process
    described in this section does not examine a callback (by this I
    mean the results of one of the CB_xxx operations) as part of the
    process of trunking determination.   The choice of callback_ident
    (which appears to be the only relevant parameter) seems more likely
    to be appropriately selected in line with the statement later in the
    process (on page 19):







      The specific callback
      parameters chosen, in terms of cb_client4 and callback_ident, are
      up to the client and should reflect its preferences as to callback
      handling for the common clientid, in the event that X and IPn are
      trunked together.





    but chosen so that they would be appropriate if the server doesn't
    end up trunked with any other existing client ID.  Clearly the IP
    address would be a convenient semi-random value which would work to
    identify the callback uniquely eventually but the selected value
    doesn't seem to have any effect on the algorithm.


     


    s4.8, para 1 on page 18:







   Note that when the client has done previous SETCLIENTID's, to any IP
   addresses, with more than one principal or authentication flavor, we
   have the possibility of receiving NFS4ERR_CLID_INUSE, since we do not
   yet know which of our connections with existing IP addresses might be
   trunked with our current one.  In the event that the SETCLIENTID
   fails with NFS4ERR_CLID_INUSE, one must try all other combinations of
   principals and authentication flavors currently in use and eventually
   one will be correct and not return NFS4ERR_CLID_INUSE.






    I am not clear how this interacts with the variation of the
    definition of NFS4ERR_CLID_INUSE given in s7.2.  Would I expect the
    server to accept authentication flavors other than the one already
    used for another address leading to the same server?  s7.2 says that
    the server MAY accept a different flavor if it thinks the request is
    bona fide.  I don't understand what might or might not lead the
    server to think an alternative flavor was bona fide - and would it
    be likely to apply in this case?





    s4.9, last set of bullets: Should the draft also advise applying a
    one way function to the MAC address for privacy reasons?











    s4.8, page 18:







   o  If one or more matching clientid4's is found, none of which is
      marked unresolved, the new IP address X is entered and marked
      unresolved.  A SETCLIENTID_CONFIRM is done to X using XC and XV.

      After applying the steps below to each of the lead IP addresses
      with a matching clientid4, the address will have been resolved: It
      may have been determined to be part of an already known server as
      a new IP address to be added to an existing set of IP addresses
      for that server.  Otherwise, it will be recognized as a new
      server.  At the point at which this determination is made, the
      unresolved indication is cleared and any suspended SETCLIENTID
      processing is restarted






    I suspect that the second para in the section quoted here is not
    intended to be part of the bullet, and should be unindented.





    -----------------


    s5.1, para 2: It would be desirable to mention that the change from
    big endian in [RFC7530] to network byte order is purely
    terminological.





    s5.1.1, para 1:  







   In the case of migration, the servers involved in the migration of a
   filesystem SHOULD transfer all server state associated with the
   migrating filesystem from source to the destination server.





    Arguably this SHOULD is an operational decision.  It does not affect
    the protocol or the format of the bits on the wire.  Accordingly I
    suggest s/SHOULD/should/.  This corresponds with the discussion in
    [RFC7530].  On the other hand:







 This
   must be done in a way that is transparent to the client.





    is probably a MUST: s/This must be done/If state is transferred it
    MUST be done/





    s5.1.1, para 7: s/NFS version 4.0 protocol/NFS version 4.0 protocol
    (either in [RFC7530] or this update)/





    s5.1.1, para 12:







When a
   server implements migration and it does not transfer state
   information, it SHOULD provide a filesystem-specific grace period, to
   allow clients to reclaim locks associated with files in the migrated
   filesystem.





    I believe this is a MUST rather than a SHOULD.  The alternative of
    not doing it doesn't look good!  If so the last sentence of s5..1.1
    needs adapting to match.





    s5.1.1.1: The {v, X, c} notation needs to be defined (presumably as
    used in s7.4 and Section 16.33 of [RFC7530]).  Section references in
    this doc or  [RFC7530] ought to be provided for SETCLIENTID (**s7.4
    in this doc**) and SETCLIENTID_CONFIRM (s16.34 of [RFC7530]).





    s5.1.1.2, para 5: How does the server know that the client has been
    implemented to isolate owners to a particular filesystem?  Looking
    at the next para, I suspect that the server doesn't know - it merely
    observes that owners don't sprerad across multiple file systems at
    the time of migration but there is no guarantee - and it doesn't
    matter - that the client might not have owners that spread across
    filesystems some time.





    s5.1.1.2, para 7 on page 29:







Such support only needs to be
   provided for requests issued before the migration event whose status
   as the last by sequence is invalidated by the migration event.





    I can't parse the last line of this sentence.





    s5.1.1.2, 1st and 2nd bullets on page 30: s/the last request for the
    owner in effect/the sequence number for the last request for the
    owner in effect/;  s/longer one less the next sequence to be
    received./longer one less than the next sequence to be received./





    ------------------------


    s6.2, para 4:







   o  Some operations which modify locking state are not allowed to
      return NFS4ERR_DELAY.





    I think it be worth mentioning which operations these are in this
    para.  Inspection of Table 7 in RFC7530 indicates to me that the
    relevant operations are OPEN_CONFIRM and RELEASE_LOCKOWNER plus
    possibly RENEW.  GETFH, RESTOREFH, SAVEFH and ILLEGAL don't return
    NFS4ERR_DELAY but don't appear to affect the locking state.  I
    observe that all of these are mentioned explicitly except for
    OPEN_CONFIRM:  Does anything need to be said about this extra case?





    s6.2, para 6 on page 37: s/cannot change target filesystem locking
    state,/cannot change the target file system locking state,/





    s6.2, para 7 on page 37:







   o  Keeping track of the earliest request started which is still in
      execution (for example, by keeping a list of active requests
      ordered by request start time).  The server can then define T' to
      be the first time at which the earliest-started active request
      started after time T.






    I don't understand what this criterion is driving at.  The second
    sentence appears to be aimed at identifying the time associated with
    some action that is associated with the earliest-started request but
    doesn't actually say what it is AFAICS.





    s7.1: It would be helpful to identify the sections of RFC 7530 that
    are affected by the changes summarized here, identifying
    replacements and modifications.  It would also be worth reordering
    the summary to match with the order of subsections of s7.1.





    s7.1, bullets 1 and 2: s/CLID_INUSE/NFS4ERR_CLID_INUSE/





    s7.2: See the comment on s4.8, para 1 on page 18 above.





    s7.3, para 7: s/Thus, a client that is prepared to receive
    NFS4ERR_MOVED/Thus, if a client is prepared to receive NFS4ERR_MOVED





    s7.4.5.1: A pointer to Section 9.1.7 of [RFC7530] where the DRC is
    introduced would be helpful.  BTW I am not sure that a server is
    *allowed* to be without a DRC - s9.1.7 says it is critical to have
    something that looks like a DRC. 








    s7.6, para 1: Please reference Section 19 of [RFC7530].  The
    affected para is the penultimate rather than the ultimate para of
    Section 19 of [RFC7530] judging from the initial words.





    s7.6, para 2: 







See the section "Client Identity
      Definition" for further discussion.





    Please add a section reference (Section 4 of this document)





    s8: Suggest


    OLD:


    Is to be modified as specified in Section 7.6.


    NEW:


    The security considerations of [RFC7530] remain appropriate with the
    exception of the modification to the penultimate  paragraph
    specified in Section 7.6 of this document and the addition of the
    material in Section 7.5.


    END