NFSv4 Working Group                                          S. Faibish
Internet-Draft                                          EMC Corporation
Intended status: Proposed Standard                             D. Black
Expires: January 9, 2011                                EMC Corporation
Updates: 5661, 5662                                           M. Eisler
                                                             J. Glasgow
                                                           July 9, 2010

                       pNFS Access Permissions Check

Status of this Memo

   This Internet-Draft is submitted to IETF in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at

   The list of Internet-Draft Shadow Directories can be accessed at

   This Internet-Draft will expire on January 9, 2010.

Copyright Notice

   Copyright (c) 2010 IETF Trust and the persons identified as the
   document authors. All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   ( in effect on the date of
   publication of this document. Please review these documents
   carefully, as they describe your rights and restrictions with
   respect to this document. Code Components extracted from this

Faibish et al.         Expires January 9, 2011                 [Page 1]

Internet-Draft      pNFS Access Permissions Check             July 2010

   document must include Simplified BSD License text as described in
   Section 4.e of the Trust Legal Provisions and are provided without
   warranty as described in the Simplified BSD License.


   This document extends the pNFS protocol to communicate errors caused
   by inability to access data servers referenced by layouts, including
   checks performed by both clients and the MDS. The extension provides
   means for clients to communicate client-detected access denial errors
   to the MDS, including the case in which a client requests direct NFS
   access via the MDS that the MDS cannot perform.

Table of Contents

   1. Introduction...................................................3
   2. Conventions used in this document..............................5
   3. Changes to Operation 51: LAYOUTRETURN (RFC 5661)...............5
      3.1. ARGUMENT (18.44.1)........................................5
      3.2. RESULT (18.44.2)..........................................6
      3.3. DESCRIPTION (18.44.3).....................................6
      3.4. IMPLEMENTATION (18.44.4)..................................7
   4. Security Considerations........................................8
   5. IANA Considerations............................................8
   6. Conclusions....................................................8
   7. References.....................................................9
      7.1. Normative References......................................9

Faibish et al.         Expires January 9, 2011                 [Page 2]

Internet-Draft      pNFS Access Permissions Check             July 2010

1. Introduction

   Figure 1 shows the overall architecture of a Parallel NFS (pNFS)

          |+-----------+                                 +-----------+
          ||+-----------+                                |           |
          |||           |       NFSv4.1 + pNFS           |           |
          +||  Clients  |<------------------------------>|    MDS    |
           +|           |                                |           |
            +-----------+                                |           |
                 |||                                     +-----------+
                 |||                                           |
                 |||                                           |
                 ||| Storage        +-----------+              |
                 ||| Protocol       |+-----------+             |
                 ||+----------------||+-----------+  Control   |
                 |+-----------------|||           |  Protocol  |
                 +------------------+||  Storage  |------------+
                                     +|  Devices  |

                           Figure 1 pNFS Architecture

   In this document, "storage device" is used as a general term for a
   data server and/or storage server for the file, block or object pNFS

   The current pNFS protocol [RFC5661] assumes that a client can access
   every storage device (SD) included in a valid layout sent by the MDS
   server, and provides no means to communicate client access failures
   to the MDS. Access failures can impair pNFS performance scaling and
   allow significant errors to go unreported. If the MDS can access all
   the storage devices involved, but the client doesn't have sufficient
   access rights to some storage devices, the client may choose to fall
   back to accessing the file system using NFSV4.1 without pNFS support;
   there are environments in which this behavior is undesirable,
   especially if it occurs silently. An important example is addition of
   a new storage device to which a large population of pNFS clients
   (e.g., 1000s) lack access permission. Layouts granted that use this
   new device, result in client errors, requiring that all I/Os to that
   new storage device be served by the MDS server. This creates a
   performance and scalability bottleneck that may be difficult to
   detect based on I/O behavior because the other storage devices are
   functioning correctly.

Faibish et al.         Expires January 9, 2011                 [Page 3]

Internet-Draft      pNFS Access Permissions Check             July 2010

   The preferable approach to this scenario is to report the access
   failures before any client attempts to issue any I/Os that can only
   be serviced by the MDS server.  This makes the problem explicit,
   rather than forcing the MDS, or a system administrator, to diagnose
   the performance problem caused by client I/O using NFS instead of
   pNFS. There are limits to this approach because complex mount
   structures may prevent a client from detecting this situation at
   mount time, but at a minimum, access problems involving the root of
   the mount structure can be detected.

   The most suitable time for the client to report inability to access a
   storage device is at mount time, but this is not always possible.
   If the application uses a special tag or a switch to the mount
   command (e.g., -pnfs) and syscall to declare its intention to use
   pNFS, at the client, the client can check for both pNFS support and
   device accessibility.

   This document introduces an error reporting mechanism that is an
   extension to the return of a pNFS layout; a pNFS client MAY use this
   mechanism to inform the MDS that the layout is being returned because
   one or more data servers are not accessible to the client. Error
   reporting at I/O time is not affected because the result of an
   inaccessible data server may not be an I/O error if a subsequent
   retry of the operation via the MDS is successful.

   There is a related problem scenario involving an MDS that cannot
   access some storage devices and hence cannot perform I/Os on behalf
   of a client. In the case of the block layout [RFC5663] if the MDS has
   no access to a storage device (e.g., LUN), MDS implementations
   generally do not export any filesystem using that storage device.  In
   contrast to the block layout, MDSs for the file [RFC5661] and object
   [RFC5664] layouts may be unable to access the storage devices that
   store data for an exported filesystem.  This enables a file or object
   layout MDS to provide layouts that contain client-inaccessible
   devices. For the specific case of adding a new storage device to a
   filesystem, MDS issuance of test I/Os to the newly added device
   before using it in layouts avoids this problem scenario, but does not
   cover loss of access to existing storage devices at a later time.

   In addition, [RFC5661] states that a client can write through or read
   from the MDS, even if it has a layout; this assumes that the MDS can
   access all the storage devices. This document makes that assumed
   access an explicit requirement.

Faibish et al.         Expires January 9, 2011                 [Page 4]

Internet-Draft      pNFS Access Permissions Check             July 2010

2. Conventions used in this document

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   document are to be interpreted as described in RFC-2119 [RFC2119].

3. Changes to Operation 51: LAYOUTRETURN (RFC 5661)

   The existing LAYOUTRETURN operation is extended by introducing two
   new layout return types:

   o  LAYOUT4_RET_REC_FSID_NO_ACCESS at fsid scope; and

   o  LAYOUT4_RET_REC_FILE_NO_ACCESS at file scope.

   The former returns all layouts for the FSID and informs the server
   that the reason for the return is a storage device connectivity
   problem, and the latter performs the same function for an individual
   file layout.

3.1. ARGUMENT (18.44.1)

   The ARGUMENT specification of the LAYOUTRETURN operation in section
   18.44.1 of [RFC5661] is replaced by the following XDR code [XDR]:

     /* Constants used for new LAYOUTRETURN and CB_LAYOUTRECALL */
     const LAYOUT4_RET_REC_FILE      = 1;
     const LAYOUT4_RET_REC_FSID      = 2;
     const LAYOUT4_RET_REC_ALL       = 3;
     const LAYOUT4_RET_REC_FSID_NO_ACCESS    = 4;
     const LAYOUT4_RET_REC_FILE_NO_ACESSS    = 5;

     enum layoutreturn_type4 {

     struct layoutreturn_file4 {
           offset4         lrf_offset;
           length4         lrf_length;
           stateid4        lrf_stateid;

Faibish et al.         Expires January 9, 2011                 [Page 5]

Internet-Draft      pNFS Access Permissions Check             July 2010

           /* layouttype4 specific data */
           opaque          lrf_body<>;

     struct layoutreturn_fsid_no_access4 {
           deviceid4     lrfna_deviceid;
           nfsstat4      lrfna_status;

     struct layoutreturn_file_no_access4 {
           offset4         lrfna_offset;
           length4         lrfna_length;
           stateid4        lrfna_stateid;
           deviceid4       lrfna_deviceid;
           nfsstat4        lrfna_status;
           /* layouttype4 specific data */
           opaque          lrfna_body<>;

     union layoutreturn4 switch(layoutreturn_type4 lr_returntype) {
           case LAYOUTRETURN4_FILE:
                   layoutreturn_file4             lr_layout;
                   layoutreturn_fsid_no_access4   lr_fsid<>;
                   layoutreturn_file_no_accesss4  lr_layout;
           default:                               void;

3.2. RESULT (18.44.2)

   The RESULT of the LAYOUTRETURN operation is unchanged; see section
   18.44.2 of [RFC5661].

3.3. DESCRIPTION (18.44.3)

   The following text is added to the end of the LAYOUTRETURN operation
   DESCRIPTION in section 18.44.3 of [RFC5661]:

   There are two NO_ACCESS layoutreturn_type4 values that indicate lack
   of storage device access, LAYOUT4_RET_REC_FSID_NO_ACCESS and
   LAYOUT4_RET_REC_FILE_NO_ACCESS.  A client uses these values to return

Faibish et al.         Expires January 9, 2011                 [Page 6]

Internet-Draft      pNFS Access Permissions Check             July 2010

   all layouts for an FSID or to return a layout (or portion thereof)
   for a file, and in both cases to inform the server that the reason
   for the return is an inability to access one or more storage devices.
   The same stateid may be used or the client MAY force use of a new
   stateid in order to report a new error. An NFS error (nfsstat4) is
   included in the layoutreturn data structures for these two types to
   distinguish access permission problems from device inaccessibility:

   o  NFS4ERR_PERM SHOULD be used for access permission denial; and

   o  NFS4ERR_NXIO SHOULD be used for inability to access a device.

   Other NFS errors MAY be used when they are appropriate. All uses of
   these two layout return types that report errors SHOULD be logged by
   the client.

   The client MAY use the new LAYOUT4_RET_REC_FILE_NO_ACCESS instead of
   LAYOUT_RET_REC_FSID_NO_ACCESS when it has reason to believe that only
   one, or a small number of files are affected.  If the problem affects
   multiple devices, the client may use multiple file layout return
   operations to communicate the multiple devices encountering errors;
   each return operation SHOULD return a layout extent obtained from the
   device for which an error is being reported. In contrast,
   LAYOUT_RET_REC_FSID_NO_ACCESS includes an array of <device, status>
   pairs to enable errors to be reported for multiple devices in one
   operation so that the client is not required to repeat the FSID-
   scoped layout return operation to report multiple errors.

3.4. IMPLEMENTATION (18.44.4)

   The following text is added to the end of the LAYOUTRETURN operation
   IMPLEMENTATION in section 18.4.4 of [RFC5661]:

   A client that expects to use pNFS for a mounted filesystem SHOULD
   check for pNFS support at mount time.  This check SHOULD be performed
   by sending an OPEN request, a LAYOUTGET operation and a GETDEVICELIST
   operation, followed by layout-type-specific checks for accessibility
   of each storage device returned by GETDEVICELIST.  If the NFS server
   does not support pNFS, the LAYOUTGET operation will be rejected with
   an NFS4ERR_NOTSUPP error; in this situation it is up to the client to
   determine whether it is acceptable to proceed with NFS-only access.

   When an I/O fails because a storage device is inaccessible, the
   client SHOULD retry the failed I/O via the MDS.  In this situation,
   before retrying the I/O, the client SHOULD return the layout, or
   inaccessible portion thereof, and SHOULD indicate which storage
   device or devices was or were inaccessible.  If the client does not

Faibish et al.         Expires January 9, 2011                 [Page 7]

Internet-Draft      pNFS Access Permissions Check             July 2010

   return at least the inaccessible portion of the layout before the I/O
   retry via the MDS, and that I/O retry fails with NFS4ERR_PERM or
   NFS4ERR_NXIO, then the client MUST return at least the inaccessible
   portion of layout, as the MDS error indicates that the affected
   portion of that file is completely inaccessible to the client.

   Backwards compatibility may require a client to perform two layout
   return operations to deal with servers that don't understand the
   NO_ACCESS layoutreturn_type4 values and hence respond with
   NFS4ERR_INVAL. In this situation, the client SHOULD perform an
   ordinary FSID or file layout return operation and remember that the
   new return types are not to be used with that server.

   The metadata server (MDS) SHOULD NOT use storage devices in pNFS
   layouts that are not accessible to the MDS.  To the extent that an
   MDS can determine whether storage devices are accessible to clients,
   an MDS SHOULD NOT include a storage device in any pNFS layouts sent
   to a client that cannot access that storage device. At a minimum, the
   server SHOULD perform these storage device accessibility checks
   before exporting a filesystem that supports pNFS and when the device
   configuration for such an exported filesystem is changed (e.g., to
   add a storage device). A client MAY perform I/O via the MDS even when
   the client holds a layout that covers the I/O; servers MUST support
   this client behavior.

4. Security Considerations

   All control operations from the MDS to the storage devices, including
   any operations required for access permission checks, SHOULD be
   authenticated in order to maintain integrity of stored data.

5. IANA Considerations

   There are no IANA considerations in this document beyond pNFS IANA
   Considerations are covered in [RFC5661].

6. Conclusions

   This draft specifies additions to the pNFS protocol addressing client
   and MDS server inability to access storage devices used in pNFS
   layouts for all layout types.

Faibish et al.         Expires January 9, 2011                 [Page 8]

Internet-Draft      pNFS Access Permissions Check             July 2010

7. References

7.1. Normative References

   [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
             Requirement Levels", BCP 14, RFC 2119, March 1997.

   [RFC5661] Shepler, S., Eisler, M., and D. Noveck, "Network File
             System (NFS) Version 4 Minor Version 1 Protocol",
   , January 2010.

   [RFC5663] Black, D., Glasgow, J., Fridella, S., "Parallel NFS (pNFS)
             Block/Volume Layout",,
             January 2010.

   [RFC5664] Halevy, B., Welch, B., Zelenka, J., "Object-Based Parallel
             NFS (pNFS) Operations",,
             January 2010

   [XDR]     Eisler, M., "XDR: External Data Representation Standard",
             STD 67, RFC 4506, May 2006.


   This draft includes ideas from discussions with the primary author of
   the pNFS object layout, Benny Halevy, and the Linux kernel pNFS
   maintainers, including Bruce Fields.

   This document was prepared using

Faibish et al.         Expires January 9, 2011                 [Page 9]

Internet-Draft      pNFS Access Permissions Check             July 2010

Authors' Addresses

   Sorin Faibish (editor)
   EMC Corporation
   32 Coslin Drive
   Southboro, MA 01772

   Phone: +1 (508) 305-8545

   David L. Black
   EMC Corporation
   176 South Street
   Hopkinton, MA 01748

   Phone: +1 (508) 293-7953

   Michael Eisler
   5765 Chase Point Circle
   Colorado Springs, CO  80919

   Phone: +1 (719) 599-9026

   Jason Glasgow
   5 Cambridge Center, Floors 3-6
   Cambridge, MA  02142

   Phone: +1 (617) 575-1599

Faibish et al.         Expires January 9, 2011                [Page 10]