Network Working Group                                     Robert Thurlow
Internet Draft                                            May 2003
Document: draft-ietf-nfsv4-repl-mig-proto-01.txt



           A Server-to-Server Replication/Migration Protocol



Status of this Memo

   This document is an Internet-Draft and is subject to all provisions
   of Section 10 of RFC2026.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet- Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/1id-abstracts.html

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html

   Discussion and suggestions for improvement are requested.  This
   document will expire in November, 2003. Distribution of this draft is
   unlimited.

Abstract

   NFS Version 4 [RFC3530] provided support for client/server
   interactions to support replication and migration, but left
   unspecified how replication and migration would be done.  This
   document is an initial draft of a protocol which could be used to
   transfer filesystem data and metadata for use with replication and
   migration services for NFS Version 4.








Expires: November 2003                                          [Page 1]


Title               A Replication/Migration Protocol            May 2003


Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
   1.1.  Changes Since Last Revision  . . . . . . . . . . . . . . . 3
   1.2.  Shortcomings . . . . . . . . . . . . . . . . . . . . . . . 4
   1.3.  Rationale  . . . . . . . . . . . . . . . . . . . . . . . . 4
   1.4.  Basic structure  . . . . . . . . . . . . . . . . . . . . . 4
   2.  Common data types  . . . . . . . . . . . . . . . . . . . . . 5
   2.1.  Session, file and checkpoint IDs . . . . . . . . . . . . . 5
   2.2.  Offset, length and cookies . . . . . . . . . . . . . . . . 5
   2.3.  General status . . . . . . . . . . . . . . . . . . . . . . 5
   2.4.  From NFS Version 4 [RFC3530] . . . . . . . . . . . . . . . 6
   3.  Session Management . . . . . . . . . . . . . . . . . . . . . 7
   3.1.  Capabilities negotiation . . . . . . . . . . . . . . . . . 7
   3.2.  Security Negotiation . . . . . . . . . . . . . . . . . . . 8
   3.3.  OPEN_SESSION call  . . . . . . . . . . . . . . . . . . . . 8
   3.4.  CLOSE_SESSION call . . . . . . . . . . . . . . . . . . .  11
   4.  Data transfer  . . . . . . . . . . . . . . . . . . . . . .  12
   4.1.  Data transfer operations . . . . . . . . . . . . . . . .  12
   4.2.  Data transfer phase overview . . . . . . . . . . . . . .  12
   4.3.  SEND call  . . . . . . . . . . . . . . . . . . . . . . .  13
   4.4.  Data transfer operation description  . . . . . . . . . .  15
   4.4.1.  SEND_METADATA operation  . . . . . . . . . . . . . . .  15
   4.4.2.  SEND_FILE_DATA operation . . . . . . . . . . . . . . .  15
   4.4.3.  SEND_FILE_HOLE operation . . . . . . . . . . . . . . .  16
   4.4.4.  SEND_LOCK_STATE operation  . . . . . . . . . . . . . .  16
   4.4.5.  SEND_SHARE_STATE operation . . . . . . . . . . . . . .  16
   4.4.6.  SEND_DELEG_STATE operation . . . . . . . . . . . . . .  17
   4.4.7.  SEND_REMOVE operation  . . . . . . . . . . . . . . . .  17
   4.4.8.  SEND_RENAME operation  . . . . . . . . . . . . . . . .  18
   4.4.9.  SEND_LINK operation  . . . . . . . . . . . . . . . . .  18
   4.4.10.  SEND_SYMLINK operation  . . . . . . . . . . . . . . .  18
   4.4.11.  SEND_DIR_CONTENTS operation . . . . . . . . . . . . .  19
   4.4.12.  SEND_CLOSE operation  . . . . . . . . . . . . . . . .  19
   5.  IANA Considerations  . . . . . . . . . . . . . . . . . . .  19
   6.  Security Considerations  . . . . . . . . . . . . . . . . .  19
   7.  Appendix A: XDR Protocol Definition File . . . . . . . . .  20
   8.  Normative References . . . . . . . . . . . . . . . . . . .  28
   9.  Informative References . . . . . . . . . . . . . . . . . .  29
   10.  Author's Address  . . . . . . . . . . . . . . . . . . . .  30











Expires: November 2003                                          [Page 2]


Title               A Replication/Migration Protocol            May 2003


1.  Introduction

   This document describes a proposed protocol to perform the data
   transfer involved with replication and migration, as the problem was
   described in [DESIGN]; familiarity with that document is assumed.  It
   is not yet proven by implementation experience, but is presented for
   collective work and discussion.

   Though data replication and transfer are needed in many areas, this
   document will focus primarily on solving the problem of providing
   replication and migration support between NFS Version 4 servers.  It
   is assumed that the reader has familiarity with NFS Version 4
   [RFC3530].

1.1.  Changes Since Last Revision

   Since the -00 version of this draft, the following major changes have
   been made:

   o    The protocol no longer uses XDR-formatted messages sent via TCP;
        it now uses RPC calls and replies.

   o    The elements used to transfer data and metadata are now
        operations arguments to a unified SEND RPC, so that an array of
        information about a particular file may be sent in one RPC call.

   o    Session management has been simplified to a single OPEN_SESSION
        call and a single CLOSE_SESSION call.  Sessions may also be
        multiplexed over the same connection.

   o    The protocol should now work in a continuous replication mode,
        where a transfer session stays up indefinitely and changes can
        be passed rapidly to replicas.

   o    Support for transferring delegation state has been added.

   o    Support for transferring hard links and symbolic links has been
        added.

   o    Zero-filled regions or "holes" are now sent as separate
        operations, rather than being treated as a special case of data
        transfers.

   o    ACLs and the object type are handled as part of the RMattrs
        type, rather than being separate.






Expires: November 2003                                          [Page 3]


Title               A Replication/Migration Protocol            May 2003


1.2.  Shortcomings

   This draft has the following known shortcomings:

   o    it does not deal with [RSYNC]-like behaviour, which can compare
        source and destination files

   o    it introduces a capabilities negotiation feature which is not
        complete enough to be useful

   o    it does not fully specify compression algorithms which can be
        used

   o    it does not specify how it works with minor revisions to NFS
        Version 4


1.3.  Rationale

   The protocol presented below is a simple bulk-data transfer protocol
   with minimal traffic in the reverse direction.  It is believed that
   optimal performance is best achieved by a well-implemented source
   server sending the smallest set of change information to the
   destination.  The advantages in this protocol over data formats such
   as tar/pax/cpio (as defined by IEEE 1003.1 or ISO/IEC 9945-1) are:

   o    NFSv4 Access Control Lists (ACLs) and named attributes can be
        transferred

   o    The richer NFSv4 metadata set can be transferred

   o    Restarting of transfers can be achieved

   o    The bandwidth requirements approach the smallest possible.


1.4.  Basic structure

   This replication/migration protocol is optimized for bulk data
   transfer with a minimum of overhead.  The ideal case is where the
   source server can stream filesystem data (or just the changes made)
   to the destination.  An alternate [RSYNC]-like mode which supports
   both servers comparing files to determine differences has been
   discussed, but is not present in this draft.

   Unlike the previous version of this draft, this version will specify
   RPC [RFC1831] rather than just XDR [RFC1832] formatted messages over
   TCP.  Implementations MUST support operation over TCP and MAY support



Expires: November 2003                                          [Page 4]


Title               A Replication/Migration Protocol            May 2003


   UDP and other transports supported by RPC.

   The protocol permits multiple "sessions" per TCP connection by using
   session identifiers in each RPC.  Sessions can be terminated and
   restarted at a later time.  Sessions used to update replicas can also
   be left in place continuously, so that changes to the master can be
   reflected on the replicas in near-real-time.

   The SEND RPC has been optimized by permitting an array of data and
   metadata updates to be sent in one RPC call, while the response
   permits the source server to know how far the destination got in
   applying the updates.

2.  Common data types


2.1.  Session, file and checkpoint IDs

   RMsession_id permits multiplexing transfer sessions on a single
   authenticated connection; the value is chosen arbitrarily by the
   source server.  RMcheckpoint is used to track the last RPC known to
   the destination so that restart can be done; a timestamp is supplied
   to help choose the earliest checkpoint.  RMfile_id is intended to be
   identical to the NFSv4 fileid attribute.

   typedef uint64_t RMsession_id;

   typedef uint64_t RMfile_id;

   struct RMcheckpoint {
           nfstime4 time;
           uint64_t id;
   };

2.2.  Offset, length and cookies

   These variables are chosen for compatibility with NFSv4.


   typedef uint64_t        RMoffset;
   typedef uint64_t        RMlength;
   typedef uint64_t        RMcookie;

2.3.  General status

   Status responses for OPEN_SESSION and SEND responses and
   CLOSE_SESSION reasons shall return a value from this set.




Expires: November 2003                                          [Page 5]


Title               A Replication/Migration Protocol            May 2003


   enum RMstatus {
           RM_OK = 0,
           RMERR_PERM = 1,
           RMERR_IO = 5,
           RMERR_EXISTS = 17
   };

2.4.  From NFS Version 4 [RFC3530]

   The following definitions are imported from NFS Version 4.

   typedef uint32_t        bitmap4<>;
   typedef opaque          attrlist4<>;
   typedef opaque          utf8string<>;
   typedef opaque          utf8str_mixed<>;
   typedef opaque          utf8str_cis<>;

   struct nfstime4 {
           int64_t         seconds;
           uint32_t        nseconds;
   };

   enum nfs_ftype4 {
           NF4REG          = 1,    /* Regular File */
           NF4DIR          = 2,    /* Directory */
           NF4BLK          = 3,    /* Special File - block device */
           NF4CHR          = 4,    /* Special File - character device */
           NF4LNK          = 5,    /* Symbolic Link */
           NF4SOCK         = 6,    /* Special File - socket */
           NF4FIFO         = 7,    /* Special File - fifo */
           NF4ATTRDIR      = 8,    /* Attribute Directory */
           NF4NAMEDATTR    = 9     /* Named Attribute */
   };

   typedef uint32_t        acetype4;
   typedef uint32_t        aceflag4;
   typedef uint32_t        acemask4;
   struct nfsace4 {
           acetype4        type;
           aceflag4        flag;
           acemask4        access_mask;
           utf8string      who;
   };

   typedef nfsace4         fattr4_acl<>;






Expires: November 2003                                          [Page 6]


Title               A Replication/Migration Protocol            May 2003


   struct fattr4 {
           bitmap4         attrmask;
           attrlist4       attr_vals;
   };


3.  Session Management

   Security flavors supported by the destination server may be known in
   advance, or may be discovered via an initial NULL RPC call which uses
   SNEGO GSS-API pseudo-mechanism as defined in [RFC2478].  A security
   flavor normally does not change through the life of the session.

   A transfer session is created or resumed with the OPEN_SESSION call
   and terminated normally or abnormally with the CLOSE_SESSION call.
   This is simpler than the previous draft of this protocol.  The
   OPEN_SESSION call permits negotiation of capabilities and of the
   checkpoint to be used for a restart, while CLOSE_SESSION permits
   abnormal as well as normal termination.

3.1.  Capabilities negotiation

   Parameters in the OPEN_SESSION call express certain capabilities of
   the source server and provide an indication of properties of the data
   to be transferred.  The destination server is responsible for
   reacting to these capabilities.  If the desired capabilities are not
   acceptable to the destination, the response can bid down capabilities
   by clearing capabilities bits, or reject the session by failing the
   RPC.  If the lowered capabilities bid by the destination server are
   not acceptable to the source server, the session should be terminated
   with CLOSE_SESSION.

   Currently, only three capabilities are specified; we expect to add
   more through working group effort.  Specified so far are the
   following:

   o    RM_UTF8NAMES - source server supports and expects to send
        filenames encoded in UTF-8 format.  If the destination server
        does not support UTF-8 filenames, it should convey that by
        clearing the flag.

   o    RM_FHPRESERVE - source server is willing to attempt to preserve
        filehandles by sending them as part of each SEND_METADATA
        operation.  If the destination can issue filehandles which it
        did not generate, and can work with the filehandle format used
        by the implementation identified by RMimplementation field in
        the OPEN_SESSION arguments, it can accept this offer; otherwise
        it should clear the bit to indicate refusal.  Since the source



Expires: November 2003                                          [Page 7]


Title               A Replication/Migration Protocol            May 2003


        server may be denied in attempting to preserve filehandles, it
        should either refuse to transfer data if the destination clears
        this flag, or should advise clients of the possibility that
        filehandles will change via the [RFC3530] FH4_VOL_MIGRATION bit.

   o    RM_FILEID - in combination with RM_FHPRESERVE, the source server
        is willing to attempt to preserve file_ids as well.  If the
        destination can issue file_ids which it did not generate, and
        can work with the file_id format used by the implementation
        identified by RMimplementation field in the OPEN_SESSION
        arguments, it can accept this offer; otherwise it should clear
        the bit to indicate refusal.


3.2.  Security Negotiation

   Security for this protocol is provided by the RPCSEC_GSS mechanism,
   defined in [RFC2203], with the same GSS-API mechanisms defined as
   mandatory-to-implement as [RFC3530], namely the Kerberos V5 and
   LIPKEY mechanisms defined in [RFC1964] and [RFC2847].  In the case of
   a client and server implementing more than one of these mechanisms,
   the first RPC call should be an RPC NULL procedure call with the
   RPCSEC_GSS auth flavor and the SNEGO GSS-API mechanism populated with
   the mechanisms acceptable to the client.  The server should respond
   with the preferred mechanism, if any, and this mechanism will be used
   for all sessions on this connection.

3.3.  OPEN_SESSION call

   SYNOPSIS

   OPEN_SESSIONargs -> OPEN_SESSIONres

   ARGUMENT

   struct RMnewsession {
           utf8string src_path;
           utf8string dest_path;
           uint64_t fs_size;
           uint64_t tr_size;
           uint64_t tr_objs;
   };

   struct RMoldsession {
           RMcheckpoint check_id;
           uint64_t rem_size;
           uint64_t rem_objs;
   };



Expires: November 2003                                          [Page 8]


Title               A Replication/Migration Protocol            May 2003


   union RMopeninfo switch (bool new) {
    case TRUE:
           RMnewsession newinfo;
    case FALSE:
           RMoldsession oldinfo;
   };

   typedef uint64_t RMcapability;
   typedef utf8str_cis RMimplementation<>;

   struct OPEN_SESSIONargs {
           RMsession_id session_id;
           RMcomp_type comp_list<>;
           RMcapability capabilities;
           RNimplementation implementation;
           RMopeninfo info;
   };

   RESULT

   struct RMopenok {
           RMcheckpoint check_id;
           RMcomp_type comp_alg;
           RMcapability capabilities;
   };

   union RMopenresp switch (RMstatus status) {
    case RM_OK:
           RMopenok info;
    default:
           void;
   };

   struct OPEN_SESSIONres {
           RMsession_id session_id;
           RMopenresp response;
   };


   OPEN_SESSION is a request to create or resume a transfer session to
   send the full or incremental contents of one filesystem.  For either
   new or resuming sessions, the source server supplies the following
   information:

   o    session_id - a unique number assigned by the source server to
        the transfer session, or the number of the session to be
        resumed.




Expires: November 2003                                          [Page 9]


Title               A Replication/Migration Protocol            May 2003


   o    comp_list - a list of compression types the source server can
        use to compress data.

   o    capabilities - the bitmask used to negotiate as described in
        Section 4.3.

   o    implementation - a descriptor of the operating system and
        filesystem implementation, with version information, used by the
        source server; this is to permit preservation of filehandles and
        fileids if the destination server runs a compatible version.
        This field is constructed at the pleasure of the source server
        and need only be parsed properly by a destination server running
        the same operating system code.

   For new sessions, the source server supplies the following
   information:

   o    src_path - full path name to the filesystem on source server

   o    dest_path - full path name to the filesystem on the destination
        server

   o    fs_size - total size of the filesystem data

   o    tr_size - amount of filesystem data to be sent during this
        transfer session

   o    tr_objs - number of objects to be sent or updated in this
        transfer session

   For resuming sessions, the source server supplies the following
   information:

   o    check_id - checkpoint ID for the last RPC believed sent

   o    rem_size - remaining amount of filesystem data to be sent

   o    rem_objs - remaining number of objects to be sent or updated

   The response from the destination server may reject the session
   proposal with an error code, may accept the proposal outright, or may
   bid down capabilities or state that it needs to start from an earlier
   checkpoint than that proposed by the source.  The destination will
   also choose a compression algorithm from the list the source
   provided.  The source may issue a CLOSE_SESSION call if capabilities
   negotiated down are not acceptable to it.  Once the OPEN_SESSION RPC
   has been completed, SEND RPCs with data transfer operations will be
   sent until a CLOSE_SESSION RPC is sent.



Expires: November 2003                                         [Page 10]


Title               A Replication/Migration Protocol            May 2003


3.4.  CLOSE_SESSION call

   SYNOPSIS

   CLOSE_SESSIONargs -> CLOSE_SESSIONres

   ARGUMENT

   struct RMbadclose {
           RMcheckpoint check_id;
           bool_t restartable;
   };

   union RMcloseinfo switch (RMstatus status) {
    case RM_OK:
           void;
    default:
           RMbadclose info;
   };

   struct CLOSE_SESSIONargs {
           RMsession_id session_id;
           RMcloseinfo info;
   };

   RESULT

   struct CLOSE_SESSIONres {
           RMsession_id session_id;
           RMcheckpoint check_id;
   };

   CLOSE_SESSION is used to terminate the session normally or abnormally
   by the source server.

   A normal close is handled by setting the RMcloseinfo status to RM_OK.
   Upon a normal close, a migration event is considered complete and the
   source will begin to refer clients to the destination server.

   An abnormal close is handled by setting the status to something other
   than RM_OK and supplying the last checkpoint the source server
   believes it sent plus an indication of whether it is possible to
   restart the transfer from that checkpoint.  The destination server
   responds with the last checkpoint it has successfully committed.  The
   destination server should attempt to save the state of the aborted
   session for a period of at least one hour.





Expires: November 2003                                         [Page 11]


Title               A Replication/Migration Protocol            May 2003


4.  Data transfer


4.1.  Data transfer operations

   Data transfer is accomplished by the SEND RPC, which takes an array
   of unions to permit a variety of transfer operations to be sent in
   each RPC.  All operations must pertain to one filesystem object,
   since the RMfile_id is provided for each SEND RPC, not for each
   operation.  Each operation in the array has an RMstatus in the
   response, so the source server can track how much was done if the
   call failed.  Processingn stops at the first failure, and the SEND
   RPC response status is set to the first failure status.

   The following transfer operations are supported:

   o    SEND_METADATA - send metadata about object

   o    SEND_FILE_DATA - send file data

   o    SEND_FILE_HOLE - send file data

   o    SEND_LOCK_STATE - send file lock state

   o    SEND_SHARE_STATE - send share modes state

   o    SEND_DELEG_STATE - send delegation state

   o    SEND_REMOVE - send an object removal transaction

   o    SEND_RENAME - send an object rename transaction

   o    SEND_LINK - send an object link transaction

   o    SEND_SYMLINK - send an object symlink transaction

   o    SEND_DIR_CONTENTS - send names of objects in a directory

   o    SEND_CLOSE - signal completion of object


4.2.  Data transfer phase overview

   The source server processes filesystem objects in some known order
   which will permit checkpointing and restarting in case of some
   problem or operator abort.  Full transfers should be done in order
   such that objects which are needed, such as directories and link
   targets, are present when referrals are made to them.  Incremental



Expires: November 2003                                         [Page 12]


Title               A Replication/Migration Protocol            May 2003


   transfers should be done in the order changes were made on the source
   server, if possible; if not possible, the order described for full
   transfers is acceptable.

   For files which are to be created or updated, SEND_METADATA is sent
   first, then SEND_FILE_DATA operations will be sent.  If outstanding
   lock, share or delegation state for an object exists on the source
   server, it will be sent via SEND_LOCK_STATE, SEND_SHARE_STATE or
   SEND_DELEG_STATE operations after all data has been transferred.
   SEND_CLOSE is used to signal that all changes to a file are complete.
   Directories are created with SEND_METADATA, but are not populated
   until its objects are created, so the SEND_METADATA is followed by
   SEND_CLOSE.

   Ideally, the source server will track all filesystem changes via a
   mechanism such as [DMAPI], and will be able to reflect remove, rename
   and link changes via SEND_REMOVE, SEND_RENAME and SEND_LINK
   operations.  If the source server cannot capture all create and
   remove operations on a directory reliably, SEND_DIR_CONTENTS should
   be used.  This operation lists all directory entries for a source
   server, so that the destination server can compute what items should
   be removed.  This is less reliable than being able to send
   SEND_REMOVE, SEND_RENAME and SEND_LINK operations, and should be used
   only when the underlying filesystem cannot record changes as they
   happen.

   Named attributes for a filesystem object are handled with
   SEND_METADATA operations with file type NF4NAMEDATTR.  This will be
   "nested", i.e. it will be understood that the named attribute is
   associated with the parent object handled.  SEND_CLOSE is used to
   indicate that all data and metadata of the named attribute have been
   transferred, and must be issued before another named attribute can be
   handled and before the SEND_CLOSE for the parent object is issued.
   Named attributes may not themselves have named attributes.

4.3.  SEND call

   SYNOPSIS

   SENDargs -> SENDres

   ARGUMENT

   union RMsendargs switch (RMoptype sendtype) {
    case OP_SEND_METADATA:
           SEND_METADATA metadata;
    case OP_SEND_FILE_DATA:
           SEND_FILE_DATA data;



Expires: November 2003                                         [Page 13]


Title               A Replication/Migration Protocol            May 2003


    case OP_SEND_FILE_HOLE:
           SEND_FILE_HOLE hole;
    case OP_SEND_LOCK_STATE:
           SEND_LOCK_STATE lock;
    case OP_SEND_SHARE_STATE:
           SEND_SHARE_STATE share;
    case OP_SEND_DELEG_STATE:
           SEND_DELEG_STATE deleg;
    case OP_SEND_REMOVE:
           SEND_REMOVE remove;
    case OP_SEND_RENAME:
           SEND_RENAME rename;
    case OP_SEND_LINK:
           SEND_LINK link;
    case OP_SEND_SYMLINK:
           SEND_SYMLINK symlink;
    case OP_SEND_DIR_CONTENTS:
           SEND_DIR_CONTENTS dirc;
    case OP_SEND_CLOSE:
           void;
   };

   struct SEND1args {
           RMsession_id session_id;
           RMcheckpoint check_id;
           RMfile_id file_id;
           RMsendargs sendarray<>;
   };

   RESULT

   union RMsendres switch (RMoptype sendtype) {
    case OP_SEND_METADATA:
    case OP_SEND_FILE_DATA:
    case OP_SEND_FILE_HOLE:
    case OP_SEND_LOCK_STATE:
    case OP_SEND_SHARE_STATE:
    case OP_SEND_DELEG_STATE:
    case OP_SEND_REMOVE:
    case OP_SEND_RENAME:
    case OP_SEND_LINK:
    case OP_SEND_SYMLINK:
    case OP_SEND_DIR_CONTENTS:
    case OP_SEND_CLOSE:
           RMstatus status;
   };





Expires: November 2003                                         [Page 14]


Title               A Replication/Migration Protocol            May 2003


   struct SEND1res {
           RMsession_id session_id;
           RMcheckpoint check_id;
           RMfile_id file_id;
           RMsendres resarray<>;
           RMstatus status;
   };

   The SEND RPC batches data transfer operations together and sends them
   to the destination server to operate on one file and with one
   checkpoint.  The destination server may fail a call in the middle of
   the array by setting the return status for that operation to
   something other than RM_OK, and will not process further operations.
   The call will be failed with that status as well.

4.4.  Data transfer operation description


4.4.1.  SEND_METADATA operation

   SYNOPSIS

   struct SEND_METADATA {
           utf8string obj_name;
           RMattrs attrs;
   };

   SEND_METADATA announces that we are about to transfer information
   about a particular filesystem object.  If an object does not exist on
   the destination, it will be created with the given obj_name and
   attributes supplied.  If the object exists and is is the correct
   type, its attributes will be updated.  If an object of the same name
   but a different type exists, it will be removed and recreated with
   this information.  If a SEND_METADATA has not followed a SEND_CLOSE,
   it may have the is_named_attr flag set, in which case the object is a
   named attribute of the most recent object identified by a
   SEND_METADATA.

4.4.2.  SEND_FILE_DATA operation

   SYNOPSIS

   struct SEND_FILE_DATA {
           RMoffset offset;
           RMlength length;
           opaque data<>;
   };




Expires: November 2003                                         [Page 15]


Title               A Replication/Migration Protocol            May 2003


   SEND_FILE_DATA sends a block of data for a regular file.  The range
   is identified by the offset, length pair as starting at seek position
   'offset' and extending through 'offset+length-1', inclusive.

4.4.3.  SEND_FILE_HOLE operation

   SYNOPSIS

   struct SEND_FILE_HOLE {
           RMoffset offset;
           RMlength length;
   };
   SEND_FILE_HOLE sends a description of a "hole", or a zero-filled and
   usually unallocated block of data.  A source server which does sparse
   allocation and which can learn via APIs what parts of a file are
   unallocated can use this to describe the hole without transferring
   the block of zeros.

4.4.4.  SEND_LOCK_STATE operation

   SYNOPSIS

   enum RMlocktype {
           RM_NOLOCK = 0,
           RM_READLOCK = 1,
           RM_WRITELOCK = 2
   };

   struct SEND_LOCK_STATE {
           RMowner owner;
           RMclientid clientid;
           RMoffset offset;
           RMlength length;
           RMlocktype type;
           RMstateid id;
   };

   SEND_LOCK_STATE transfers ownership and range information about
   outstanding byte-range locks to the destination server.  The lock
   stateid is transferred so that the client need not reestablish the
   lock after migration.  RM_NOLOCK is included to support continuous
   replication by permitting locks on replicas to be cleared.

4.4.5.  SEND_SHARE_STATE operation

   SYNOPSIS

   typedef uint32_t RMaccess;



Expires: November 2003                                         [Page 16]


Title               A Replication/Migration Protocol            May 2003


   typedef uint32_t RMdeny;

   struct SEND_SHARE_STATE {
           RMowner owner;
           RMclientid client;
           RMaccess accmode;
           RMdeny denymode;
   };

   SEND_SHARE_STATE transfers ownership and mode information about
   outstanding share reservations to the destination server.

4.4.6.  SEND_DELEG_STATE operation

   SYNOPSIS

   enum RMdelegtype {
           RM_NODELEG = 0,
           RM_READDELEG = 1,
           RM_WRITEDELEG = 2
   };

   struct SEND_DELEG_STATE {
           RMclientid client;
           RMdelegtype type;
           RMstateid id;
   };

   SEND_DELEG_STATE transfers ownership and type information about
   outstanding file delegations to the destination server.  RM_NODELEG
   is included to support continuous replication by permitting
   delegations on replicas to be cleared.

4.4.7.  SEND_REMOVE operation

   SYNOPSIS

   struct SEND_REMOVE {
           utf8string name;
   };

   SEND_REMOVE documents a remove event on the object identified; upon
   receipt, the destination server will remove the object as well.








Expires: November 2003                                         [Page 17]


Title               A Replication/Migration Protocol            May 2003


4.4.8.  SEND_RENAME operation

   SYNOPSIS

   struct SEND_RENAME {
           utf8string old_name;
           utf8string new_name;
   };

   SEND_RENAME documents a rename event on the object identified by
   old_name; upon receipt, the destination server will rename the object
   in the destination filesystem.  Full paths may be used relative to
   the root of the source filesystem.

4.4.9.  SEND_LINK operation

   SYNOPSIS

   struct SEND_LINK {
           utf8string old_name;
           utf8string new_name;
   };

   SEND_LINK documents the creation of a hard link from the old_name to
   the new_name; upon receipt, the destination server will link the
   objects in the destination filesystem.  Full paths may be used
   relative to the root of the source filesystem.

4.4.10.  SEND_SYMLINK operation

   SYNOPSIS

   struct SEND_SYMLINK {
           utf8string old_name;
           utf8string new_name;
   };

   SEND_SYMLINK documents the creation of a symbolic link from the
   old_name to the new_name; upon receipt, the destination server will
   symlink the objects in the destination filesystem.  The old_name
   value is not checked in any way and can be arbitrary textual data.










Expires: November 2003                                         [Page 18]


Title               A Replication/Migration Protocol            May 2003


4.4.11.  SEND_DIR_CONTENTS operation

   SYNOPSIS

   struct SEND_DIR_CONTENTS {
           RMcookie cookie;
           bool eof;
           utf8string names<>;
   };

   SEND_DIR_CONTENTS is used to account for removals and renames when
   source servers cannot record the events such that they may be sent
   with SEND_REMOVE and SEND_RENAME.  The contents are listed in no
   predictable order so that the destination can what entries it has
   which are no longer found on the source.  Each SEND_DIR_CONTENTS
   includes an opaque directory cookie to represent starting location of
   the block on the source server, and the eof flag is set on the last
   block.  Any item existing on the destination that is not listed in a
   SEND_DIR_CONTENTS operation will be removed.

4.4.12.  SEND_CLOSE operation

   SYNOPSIS

   void;

   SEND_CLOSE is used to announce that all data and metadata changes for
   a particular object have been completed.

5.  IANA Considerations

   The replication/migration protocol will use a well-known RPC program
   number at which destination servers will register.  The author will
   acquire an RPC program number for this purpose.

6.  Security Considerations

   NFS Version 4 is the primary impetus behind a replication/migration
   protocol, so this protocol should mandate a strong security scheme in
   a manner comparable with NFS Version 4.  Implementations of this
   protocol MUST support the RPCSEC_GSS security flavor as defined in
   [RFC2203] and must also support the Kerberos V5 and LIPKEY mechanisms
   as defined in [RFC1964] and [RFC2847].  The particular mechanism
   chosen for sessions is determined by the use of SNEGO on the initial
   call, which should be a NULL RPC.






Expires: November 2003                                         [Page 19]


Title               A Replication/Migration Protocol            May 2003


7.  Appendix A: XDR Protocol Definition File


   /*
    * Copyright (C) The Internet Society (1998,1999,2000,2001,2002).
    *  All Rights Reserved.
    */

   /*
    *      repl-mig.x
    */

   %#pragma ident  "@(#)repl-mig.x 1.4     03/05/27"

   /*
    * From RFC3530
    */
   typedef uint32_t        bitmap4<>;
   typedef opaque          attrlist4<>;
   typedef opaque          utf8string<>;
   typedef opaque          utf8str_mixed<>;
   typedef opaque          utf8str_cis<>;

   struct nfstime4 {
           int64_t         seconds;
           uint32_t        nseconds;
   };

   enum nfs_ftype4 {
           NF4REG          = 1,    /* Regular File */
           NF4DIR          = 2,    /* Directory */
           NF4BLK          = 3,    /* Special File - block device */
           NF4CHR          = 4,    /* Special File - character device */
           NF4LNK          = 5,    /* Symbolic Link */
           NF4SOCK         = 6,    /* Special File - socket */
           NF4FIFO         = 7,    /* Special File - fifo */
           NF4ATTRDIR      = 8,    /* Attribute Directory */
           NF4NAMEDATTR    = 9     /* Named Attribute */
   };

   typedef uint32_t        acetype4;
   typedef uint32_t        aceflag4;
   typedef uint32_t        acemask4;

   struct nfsace4 {
           acetype4        type;
           aceflag4        flag;
           acemask4        access_mask;



Expires: November 2003                                         [Page 20]


Title               A Replication/Migration Protocol            May 2003


           utf8str_mixed   who;
   };

   typedef nfsace4         fattr4_acl<>;

   struct fattr4 {
           bitmap4         attrmask;
           attrlist4       attr_vals;
   };

   /*
    * For session, message, file and checkpoint IDs
    */
   typedef uint64_t RMsession_id;

   typedef uint64_t RMfile_id;

   struct RMcheckpoint {
           nfstime4 time;
           uint64_t id;
   };

   /*
    * For compression algorithm negotiation
    */
   enum RMcomp_type {
           RM_NULLCOMP = 0,
           RM_COMPRESS = 1,
           RM_ZIP = 2
   };

   /*
    * For capabilities negotiation
    */
   typedef utf8str_cis RMimplementation<>;
   typedef uint64_t RMcapability;
   const   RM_UTF8NAMES = 0x00000001;
   const   RM_FHPRESERVE = 0x00000002;

   /*
    * For general status
    */
   enum RMstatus {
           RM_OK = 0,
           RMERR_PERM = 1,
           RMERR_IO = 5,
           RMERR_EXISTS = 17
   };



Expires: November 2003                                         [Page 21]


Title               A Replication/Migration Protocol            May 2003


   /*
    * Attributes
    */
   struct RMattrs {
           fattr4  attr;
           nfs_ftype4 obj_type;
           fattr4_acl obj_acl;
           bool is_named_attr;
   };

   /*
    * Offset, length and cookies
    */
   typedef uint64_t        RMoffset;
   typedef uint64_t        RMlength;
   typedef uint64_t        RMcookie;

   /*
    * Owner
    */
   typedef utf8str_mixed   RMowner;

   /*
    * Lock and share supporting definitions
    */
   struct RMclientid {
           utf8string name;
           opaque address<>;
   };

   struct RMstateid {
           uint32_t        seqid;
           opaque          other[12];
   };

   enum RMlocktype {
           RM_NOLOCK = 0,
           RM_READLOCK = 1,
           RM_WRITELOCK = 2
   };

   typedef uint32_t RMaccess;
   typedef uint32_t RMdeny;

   enum RMdelegtype {
           RM_NODELEG = 0,
           RM_READDELEG = 1,
           RM_WRITEDELEG = 2



Expires: November 2003                                         [Page 22]


Title               A Replication/Migration Protocol            May 2003


   };

   /*
    * Protocol elements - session control
    */
   struct RMnewsession {
           utf8string src_path;
           utf8string dest_path;
           uint64_t fs_size;
           uint64_t tr_size;
           uint64_t tr_objs;
   };

   struct RMoldsession {
           RMcheckpoint check_id;
           uint64_t rem_size;
           uint64_t rem_objs;
   };

   union RMopeninfo switch (bool new) {
    case TRUE:
           RMnewsession newinfo;
    case FALSE:
           RMoldsession oldinfo;
   };

   struct OPEN_SESSIONargs {
           RMsession_id session_id;
           RMcomp_type comp_list<>;
           RMcapability capabilities;
           RNimplementation impl;
           RMopeninfo info;
   };

   struct RMopenok {
           RMcheckpoint check_id;
           RMcomp_type comp_alg;
           RMcapability capabilities;
   };

   union RMopenresp switch (RMstatus status) {
    case RM_OK:
           RMopenok info;
    default:
           void;
   };

   struct OPEN_SESSIONres {



Expires: November 2003                                         [Page 23]


Title               A Replication/Migration Protocol            May 2003


           RMsession_id session_id;
           RMopenresp response;
   };

   struct RMbadclose {
           RMcheckpoint check_id;
           bool_t restartable;
   };

   union RMcloseinfo switch (RMstatus status) {
    case RM_OK:
           void;
    default:
           RMbadclose info;
   };

   struct CLOSE_SESSIONargs {
           RMsession_id session_id;
           RMcloseinfo info;
   };

   struct CLOSE_SESSIONres {
           RMsession_id session_id;
           RMcheckpoint check_id;
   };

   /*
    * Protocol elements - data transfer
    */
   enum RMoptype {
           OP_SEND_METADATA = 1,
           OP_SEND_FILE_DATA = 2,
           OP_SEND_FILE_HOLE = 3,
           OP_SEND_LOCK_STATE = 4,
           OP_SEND_SHARE_STATE = 5,
           OP_SEND_DELEG_STATE = 6,
           OP_SEND_REMOVE = 7,
           OP_SEND_RENAME = 8,
           OP_SEND_LINK = 9,
           OP_SEND_SYMLINK = 10,
           OP_SEND_DIR_CONTENTS = 11,
           OP_SEND_CLOSE = 12
   };

   /*
    * Data and metadata send items
    */
   struct SEND_METADATA {



Expires: November 2003                                         [Page 24]


Title               A Replication/Migration Protocol            May 2003


           utf8string obj_name;
           RMattrs attrs;
   };

   struct SEND_FILE_DATA {
           RMoffset offset;
           RMlength length;
           opaque data<>;
   };

   struct SEND_FILE_HOLE {
           RMoffset offset;
           RMlength length;
   };

   struct SEND_LOCK_STATE {
           RMowner owner;
           RMclientid client;
           RMoffset offset;
           RMlength length;
           RMlocktype type;
           RMstateid id;
   };

   struct SEND_SHARE_STATE {
           RMowner owner;
           RMclientid client;
           RMaccess accmode;
           RMdeny denymode;
   };

   struct SEND_DELEG_STATE {
           RMclientid client;
           RMdelegtype type;
           RMstateid id;
   };

   struct SEND_REMOVE {
           utf8string name;
   };

   struct SEND_RENAME {
           utf8string old_name;
           utf8string new_name;
   };

   struct SEND_LINK {
           utf8string old_name;



Expires: November 2003                                         [Page 25]


Title               A Replication/Migration Protocol            May 2003


           utf8string new_name;
   };

   struct SEND_SYMLINK {
           utf8string old_name;
           utf8string new_name;
   };

   struct SEND_DIR_CONTENTS {
           RMcookie cookie;
           bool eof;
           utf8string names<>;
   };

   /* no parameters for SEND_CLOSE */

   union RMsendargs switch (RMoptype sendtype) {
    case OP_SEND_METADATA:
           SEND_METADATA metadata;
    case OP_SEND_FILE_DATA:
           SEND_FILE_DATA data;
    case OP_SEND_FILE_HOLE:
           SEND_FILE_HOLE hole;
    case OP_SEND_LOCK_STATE:
           SEND_LOCK_STATE lock;
    case OP_SEND_SHARE_STATE:
           SEND_SHARE_STATE share;
    case OP_SEND_DELEG_STATE:
           SEND_DELEG_STATE deleg;
    case OP_SEND_REMOVE:
           SEND_REMOVE remove;
    case OP_SEND_RENAME:
           SEND_RENAME rename;
    case OP_SEND_LINK:
           SEND_LINK link;
    case OP_SEND_SYMLINK:
           SEND_SYMLINK symlink;
    case OP_SEND_DIR_CONTENTS:
           SEND_DIR_CONTENTS dirc;
    case OP_SEND_CLOSE:
           void;
   };

   union RMsendres switch (RMoptype sendtype) {
    case OP_SEND_METADATA:
    case OP_SEND_FILE_DATA:
    case OP_SEND_FILE_HOLE:
    case OP_SEND_LOCK_STATE:



Expires: November 2003                                         [Page 26]


Title               A Replication/Migration Protocol            May 2003


    case OP_SEND_SHARE_STATE:
    case OP_SEND_DELEG_STATE:
    case OP_SEND_REMOVE:
    case OP_SEND_RENAME:
    case OP_SEND_LINK:
    case OP_SEND_SYMLINK:
    case OP_SEND_DIR_CONTENTS:
    case OP_SEND_CLOSE:
           RMstatus status;
   };

   struct SEND1args {
           RMsession_id session_id;
           RMcheckpoint check_id;
           RMfile_id file_id;
           RMsendargs sendarray<>;
   };

   struct SEND1res {
           RMsession_id session_id;
           RMcheckpoint check_id;
           RMfile_id file_id;
           RMsendres resarray<>;
           RMstatus status;
   };

   program RM_PROGRAM {
           version RM_V1 {
                   void
                           RMPROC1_NULL(void) = 0;
                   OPEN_SESSIONres
                           RMPROC1_OPEN_SESSION(OPEN_SESSIONargs) = 1;
                   CLOSE_SESSIONres
                           RMPROC1_CLOSE_SESSION(CLOSE_SESSIONargs) = 2;
                   SEND1res
                           RMPROC1_SEND(SEND1args) = 3;
           } = 1;
   } = 100273;













Expires: November 2003                                         [Page 27]


Title               A Replication/Migration Protocol            May 2003


8.  Normative References


   [RFC1831]
   R. Srinivasan, "RPC: Remote Procedure Call Protocol Specification
   Version 2", RFC1831, August 1995.


   [RFC1832]
   R. Srinivasan, "XDR: External Data Representation Standard", RFC1832,
   August 1995.


   [RFC1964]
   J. Linn, "Kerberos Version 5 GSS-API Mechanism", RFC1964, June 1996


   [RFC2203]
   M. Eisler, A. Chiu, L. Ling, "RPCSEC_GSS Protocol Specification",
   RFC2203, September 1997


   [RFC2478]
   E. Baize, D. Pinkas, "The Simple and Protected GSS-API Negotiation
   Mechanism", RFC2478, December 1998.


   [RFC2847]
   M. Eisler, "LIPKEY - A Low Infrastructure Public Key Mechanism Using
   SPKM", RFC2847, June 2000


   [RFC3530]
   S. Shepler, B. Callaghan, D. Robinson, R. Thurlow, C. Beame, M.
   Eisler, D. Noveck, "Network File System (NFS) Version 4 Protocol",
   RFC3530, April 2003.















Expires: November 2003                                         [Page 28]


Title               A Replication/Migration Protocol            May 2003


9.  Informative References


   [RDIST]
   MagniComp, Inc., "RDist Home Page", http://www.magnicomp.com/rdist.


   [RSYNC]
   The Samba Team, "rsync web pages", http://samba.anu.edu.au/rsync.


   [DESIGN]
   R. Thurlow, "Server-to-Server Replication/Migration Protocol Design
   Principles" (work in progress), http://www.ietf.org/internet-
   drafts/draft-ietf-nfsv4-repl-mig-design-00.txt, December 2002.


   [DMAPI]
   P. Lawthers, "The Data Management Applications Programming
   Interface",
   http://www.computer.org/conferences/mss95/lawthers/lawthers.htm, July
   1995.





























Expires: November 2003                                         [Page 29]


Title               A Replication/Migration Protocol            May 2003


10.  Author's Address

   Address comments related to this memorandum to:

        nfsv4-wg@sunroof.eng.sun.com

   Robert Thurlow
   Sun Microsystems, Inc.
   500 Eldorado Boulevard, UBRM05-171
   Broomfield, CO 80021

   Phone: 877-718-3419
   E-mail: robert.thurlow@sun.com






































Expires: November 2003                                         [Page 30]