INTERNET DRAFT                                Mallikarjun Chadalapaka
draft-ietf-ips-iscsi-impl-guide-00.txt            Hewlett-Packard Co.

                                                 Expires January 2006

                        iSCSI Implementer's Guide

Status of this Memo
     By submitting this Internet-Draft, each author represents
     that any applicable patent or other IPR claims of which he or
     she is aware have been or will be disclosed, and any of which
     he or she becomes aware will be disclosed, in accordance with
     Section 6 of BCP 79.

     Internet-Drafts are working documents of the Internet
     Engineering Task Force (IETF), its areas, and its working
     groups.  Note that other groups may also distribute working
     documents as Internet-Drafts.

     Internet-Drafts are draft documents valid for a maximum of
     six months and may be updated, replaced, or obsoleted by
     other documents at any time.  It is inappropriate to use
     Internet-Drafts as reference material or to cite them other
     than a "work in progress."

     The list of current Internet-Drafts can be accessed at

     The list of Internet-Draft Shadow Directories can be accessed

     iSCSI is a SCSI transport protocol and maps the SCSI family
     of application protocols onto TCP/IP.  RFC 3720 defines the
     iSCSI protocol.  This document compiles the clarifications to
     the original protocol definition in RFC 3720 to serve as a
     companion document for the iSCSI implementers. This document
     updates RFC 3720 and the text in this document supersedes the
     text in RFC 3720 when the two differ.

Chadalapaka             Expires January, 2006       [Page 1]

Internet-Draft                   iSCSI Guide         10 July 2005

     Table of Contents

     1       Definitions and acronyms ...............................3
     1.1     Definitions ............................................3
     1.2     Acronyms ...............................................3
     2       Introduction ...........................................5
     3       iSCSI semantics for SCSI tasks .........................6
     3.1     SCSI REPORT LUNS and Residual Overflow .................6
     4       Task Management ........................................8
     4.1     Requests Affecting Multiple Tasks ......................8
     4.1.1  Scope of affected tasks...............................8
     4.1.2  Updated semantics.....................................8
     4.1.3  Rationale behind the new semantics....................9
     5       iSCSI Error Handling and Recovery .....................11
     5.1     ITT ...................................................11
     5.2     Format Errors .........................................11
     5.3     Digest Errors .........................................11
     6       Security Considerations ...............................13
     7       IANA Considerations ...................................14
     8       References and Bibliography ...........................15
     8.1     Normative References ..................................15
     8.2     Informative References ................................15
     9       Editor's Address ......................................16
     10      Acknowledgements ......................................17
     11      Full Copyright Statement ..............................18
     12      Intellectual Property Statement .......................19

Chadalapaka            Expires January, 2006        [Page 2]

   Internet-Draft                    iSCSI Guide         10 July 2005

1  Definitions and acronyms

1.1  Definitions

        I/O Buffer ¡ A buffer that is used in a SCSI Read or Write
            operation so SCSI data may be sent from or received into
            that buffer.

1.2  Acronyms

        Acronym        Definition


        EDTL              Expected Data Transfer Length

        IANA           Internet Assigned Numbers Authority

        IETF           Internet Engineering Task Force

        I/O            Input - Output

        IP             Internet Protocol

        iSCSI          Internet SCSI

        iSER           iSCSI Extensions for RDMA

        ITT            Initiator Task Tag

        LO             Leading Only

        LU             Logical Unit

        LUN            Logical Unit Number

        PDU            Protocol Data Unit

        RDMA           Remote Direct Memory Access

        R2T            Ready To Transfer

        R2TSN          Ready To Transfer Sequence Number

        RFC            Request For Comments

        SAM            SCSI Architecture Model

        SCSI           Small Computer Systems Interface

   Chadalapaka            Expires January, 2006        [Page 3]

Internet-Draft                  iSCSI Guide         10 July 2005

     SN             Sequence Number

     SNACK          Selective Negative Acknowledgment - also

                    Sequence Number Acknowledgement for data

     TCP            Transmission Control Protocol

     TMF              Task Management Function

     TTT            Target Transfer Tag

Chadalapaka            Expires January, 2006        [Page 4]

   Internet-Draft               iSCSI Guide         10 July 2005

2  Introduction

   Several iSCSI implementations had been built after [RFC3720] was
   published and the iSCSI community is now richer by the resulting
   implementation expertise.  The goal of this document is to
   leverage this expertise both to offer clarifications to the
   [RFC3720] semantics and to address defects in [RFC3720] as
   appropriate.  This document intends to offer critical guidance
   to implementers with regard to non-obvious iSCSI implementation
   aspects so as to improve interoperability and accelerate iSCSI
   adoption.  This document, however, does not purport to be an
   all-encompassing iSCSI how-to guide for implementers, nor a
   complete revision of [RFC3720].  This document instead is
   intended as a companion document to [RFC3720] for the iSCSI

   iSCSI implementers are required to reference [RFC3722] and
   [RFC3723] in addition to [RFC3720] for mandatory requirements.
   In addition, [RFC3721] also contains useful information for
   iSCSI implementers.  The text in this document, however, updates
   and supersedes the text in all the noted RFCs whenever there is
   such a question.

   Chadalapaka            Expires January, 2006        [Page 5]

Internet-Draft              iSCSI Guide         10 July 2005

3  iSCSI semantics for SCSI tasks

3.1  SCSI REPORT LUNS and Residual Overflow

The specification of the SCSI REPORT LUNS command requires that
SCSI target limit the amount of data transferred to a maximum
size (ALLOCATION LENGTH) provided by the initiator in the REPORT
LUNS CDB.  If the Expected Data Transfer Length (EDTL) in the
iSCSI header of the SCSI Command PDU for a REPORT LUNS command
is set to at least as large as that ALLOCATION LENGTH, the SCSI
layer truncation prevents an iSCSI Residual Overflow from
occurring.  A SCSI initiator can detect that such truncation has
occurred via other information at the SCSI layer.  The rest of
the section elaborates this required behavior.

iSCSI uses the (O) bit (bit 5) in the Flags field of the SCSI
Response and SCSI Data-Out PDUs to indicate that that an iSCSI
target was unable to transfer all of the SCSI data for a command
to the initiator because the amount of data to be transferred
exceeded the EDTL in the corresponding SCSI Command PDU (see
Section 10.4.1 of [RFC 3720]).

The SCSI REPORT LUNS command requests a target SCSI layer to
return a logical unit inventory (LUN list) to the initiator SCSI
layer (see section 6.21 of SPC-3 [SPC3]).  The size of this LUN
list may not be known to the initiator SCSI layer when it issues
the REPORT LUNS command; to avoid transfer of more LUN list data
than the initiator is prepared for, the REPORT LUNS CDB contains
an ALLOCATION LENGTH field to specify the maximum amount of data
to be transferred to the initiator for this command.  If the
initiator SCSI layer has under-estimated the number of logical
units at the target, it is possible that the complete logical
unit inventory does not fit in the specified ALLOCATION LENGTH.
In this situation, section in [SPC3] requires that the
target SCSI layer "shall terminate transfers to the Data-In
Buffer" when the number of bytes specified by the ALLOCATION
LENGTH field have been transferred.

Therefore, in response to a REPORT LUNS command, the SCSI layer
at the target presents at most ALLOCATION LENGTH bytes of data
(logical unit inventory) to iSCSI for transfer to the initiator.
For a REPORT LUNS command, if the iSCSI EDTL is at least as
large as the ALLOCATION LENGTH, the SCSI truncation ensures that
the EDTL will accommodate all of the data to be transferred.  If

Chadalapaka            Expires January, 2006        [Page 6]

Internet-Draft              iSCSI Guide         10 July 2005

all of the logical unit inventory data presented to the iSCSI
layer ¡ i.e. the data remaining after any SCSI truncation - is
transferred to the initiator by the iSCSI layer, an iSCSI
Residual Overflow has not occurred and the iSCSI (O) bit MUST
NOT be set in the SCSI Response or final SCSI Data-Out PDU.
This is not a new requirement but is already required by the
combination of [RFC 3720] with the specification of the REPORT
LUNS command in [SPC3].

The LUN LIST LENGTH field in the logical unit inventory (first
field in the inventory) is not affected by truncation of the
inventory to fit in ALLOCATION LENGTH; this enables a SCSI
initiator to determine that the received inventory is incomplete
by noticing that the LUN LIST LENGTH in the inventory is larger
than the ALLOCATION LENGTH that was sent in the REPORT LUNS CDB.
A common initiator behavior in this situation is to re-issue the
REPORT LUNS command with a larger ALLOCATION LENGTH.

Chadalapaka            Expires January, 2006        [Page 7]

Internet-Draft                    iSCSI Guide         10 July 2005

4  Task Management

4.1  Requests Affecting Multiple Tasks

This section updates the original text in section 10.6.2 of
[RFC3720].  The clarified semantics are a superset of the
semantics of the original text in it the new text covers all
TMFs that can impact multiple tasks.

4.1.1  Scope of affected tasks

       ABORT TASK SET: All outstanding tasks for the I_T_L nexus
       identified by the LUN field in the ABORT TASK SET TMF
       Request PDU.
       CLEAR TASK SET: All outstanding tasks in the task set for
       the LU identified by the LUN field in the CLEAR TASK SET
       TMF Request PDU.  See [SPC3] for the definition of a "task

       LOGICAL UNIT RESET: All outstanding tasks from all
       initiators for the LU identified by the LUN field in the

       TARGET WARM RESET/TARGET COLD RESET: All outstanding tasks
       from all initiators across all LUs that the TMF-issuing
       session has access to on the SCSI target device hosting the
       iSCSI session.

Usage example: an "ABORT TASK SET TMF Request PDU" in the
preceding text is an iSCSI TMF Request PDU with the "Function"
field set to "ABORT TASK SET" as defined in [RFC3720].  Similar
usage is employed for other descriptions.

4.1.2  Updated semantics

consists of the following sequence of actions in the specified
order on each of the entities.

The initiator:


     b) Continues to respond to each TTT received for the affected

Chadalapaka            Expires January, 2006        [Page 8]

Internet-Draft                 iSCSI Guide         10 July 2005
  c) Receives any responses that the target may provide for some
       tasks among the affected tasks (may process them as usual
       because they are guaranteed to be valid).

     d) Receives the task management response concluding all the
       tasks in the set of affected tasks.

The Target:


     b) Waits for all currently valid target transfer tags of the
       affected tasks to be responded.

     c) Based on the CmdSN ordering, waits (concurrent with the
       wait in step (b)) for all commands of the affected tasks to
       be received.  In the case of target-scoped requests (i.e.
       TARGET WARM RESET and TARGET COLD RESET), all the commands
       that are not received, as at the end of step (b), in the
       command stream however can be considered to have been
       received with no command waiting period - i.e. the entire
       CmdSN space upto the CmdSN of the task management function
       can be "plugged" (refer section 6.9 on how aborting a
       specific task can implicitly plug the CmdSN of the task
       being aborted) at the end of step (b).

     d) Propagates the TMF request to and receives the response
       from the target SCSI layer.

     e) Takes note of last-sent StatSN on each of the connections
       in the iSCSI session(s) (one or more) sharing the affected
       tasks, and waits for acknowledgement of each StatSN (may
       solicit for acknowledgement by way of a NOP-In). If some
       tasks originate from non-iSCSI I_T_L nexuses then the means
       by which the target insures that all affected tasks have
       returned their status to the initiator are defined by the
       specific non-iSCSI transport protocol(s).

     f) Sends the task set management response to the issuing

4.1.3  Rationale behind the new semantics

There are fundamentally three basic objectives behind the
semantics specified in section 4.1.2.

Chadalapaka            Expires January, 2006        [Page 9]

Internet-Draft                  iSCSI Guide         10 July 2005

     1.  Maintaining an ordered command flow I_T nexus abstraction
        to the target SCSI layer even with multi-connection

         o  Target iSCSI processing of a TMF request must maintain
            the single flow illusion - steps c & d of the target
            behavior correspond to this objective.

     2.  Maintaining a single ordered response flow I_T nexus
        abstraction to the initiator SCSI layer even with multi-
        connection sessions.

         o  Target must ensure that the initiator does not see
            "old" task responses (that were placed on the wire
            chronologically earlier than the TMF response) after
            seeing the TMF response - step e of the target
            behavior corresponds to this objective.

     3.  Draining all active TTTs corresponding to affected tasks
        before the TMF is acted on.

         o  Targets are better off if the TTTs are
            deterministically retired before the affected tasks
            are terminated because that eliminates the possibility
            of  large-sized Data-out PDUs with stale TTTs arriving
            after the tasks are terminated.  Step b of the target
            behavior corresponds to this objective.

The only other notable thing in step c of the target behavior is
the "plugging" part - it is an optimization that says if all
tasks on the I_T nexus will be aborted anyway (as with a target
reset), there is no need to wait, the target can simply plug all
missing CmdSN slots and move on with TMF processing.  The first
objective (maintaining a single ordered command flow) is still
met with this optimization because target SCSI layer only sees
ordered commands.

Chadalapaka            Expires January, 2006        [Page 10]

Internet-Draft                 iSCSI Guide         10 July 2005

5  iSCSI Error Handling and Recovery

5.1  ITT

Section 10.19 in [RFC3720] mentions this in passing but noted
here again for making it obvious.  An ITT value of 0xffffffff is
reserved and MUST NOT be used by the initiator.  The only
instance it may be seen on the wire is in a target-initiated
NOP-In PDU (and the initiator response to that PDU if

5.2  Format Errors

Section 6.6 of [RFC3720] discusses format error handling.  This
section elaborates on the "inconsistent" PDU field contents
noted in [RFC3720].

All initiator-detected PDU construction errors MUST be
considered as format errors.  Some examples of such errors are:

- NOP-In with a valid TTT but an invalid LUN

- NOP-In with a valid ITT (i.e. a NOP-In response) and also a
valid TTT

- SCSI Response PDU with Status=CHECK CONDITION, but
DataSegmentLength = 0

5.3  Digest Errors

Section 6.7 of [RFC3720] discusses digest error handling.  It
states that "No further action is necessary for initiators if the discarded
PDU is an unsolicited PDU (e.g., Async, Reject)" on detecting a
payload digest error.  This is incorrect.

An Asynchronous Message PDU or a Reject PDU carries the next
StatSN value on an iSCSI connection, advancing the StatSN.  When
an initiator discards one of these PDUs due to a payload digest
error, the entire PDU including the header MUST be discarded.
Consequently, the initiator MUST treat the exception like a loss
of any other solicited response PDU ¡ i.e. it MUST use one of
the following options noted in [RFC3720]:

Chadalapaka            Expires January, 2006        [Page 11]

Internet-Draft               iSCSI Guide         10 July 2005

     a)     Request PDU retransmission with a status SNACK.

     b)     Logout the connection for recovery and continue the
            tasks on a different connection instance.

     c)     Logout to close the connection (abort all the commands
            associated with the connection).

Chadalapaka            Expires January, 2006        [Page 12]

Internet-Draft              iSCSI Guide         10 July 2005

6  Security Considerations

This document does not introduce any new security considerations
other than those already noted in [RFC3720].   Consequently, all
the iSCSI-related security text in [RFC3723] is also directly
applicable to this document.

Chadalapaka            Expires January, 2006        [Page 13]

Internet-Draft              iSCSI Guide         10 July 2005

7  IANA Considerations

This draft does not have any specific IANA considerations other
than those already noted in [RFC3720].

Chadalapaka            Expires January, 2006        [Page 14]

Internet-Draft                 iSCSI Guide         10 July 2005

8  References and Bibliography

8.1  Normative References

     [RFC3720] Satran, J., Meth, K., Sapuntzakis, C., Chadalapaka,
          M., and E. Zeidner, "Internet Small Computer Systems
          Interface (iSCSI)", RFC 3720, April 2004.

     [RFC3722] Bakke, M., "String Profile for Internet Small
          Computer Systems Interface (iSCSI) Names", RFC 3722, April

     [RFC3723] Aboba, B., Tseng, J., Walker, J., Rangan, V., and
     F. Travostino, "Securing Block Storage Protocols over IP",
     RFC 3723, April 2004.

     [SPC3] T10/1416-D, SCSI Primary Commands-3.

8.2  Informative References

     [RFC3721] Bakke, M., Hafner, J., Hufferd, J., Voruganti, K.,
          and M. Krueger, "Internet Small Computer Systems Interface
          (iSCSI) Naming and Discovery", RFC 3721, April 2004.

     [iSER] Ko, M., Chadalapaka, M., Elzur, U., Shah, H., Thaler,
          P., J. Hufferd, "iSCSI Extensions for RDMA", IETF
          Internet Draft draft-ietf-ips-iser-04.txt (work in
          progress),  June 2005.

     [RFC2119] Bradner, S. "Key Words for use in RFCs to Indicate
          Requirement Levels", BCP 14, RFC 2119, March 1997.

     [SAM] ANSI X3.270-1998, SCSI-3 Architecture Model (SAM).

Chadalapaka            Expires January, 2006        [Page 15]

Internet-Draft                  iSCSI Guide         10 July 2005

9  Editor's Address

     Mallikarjun Chadalapaka
     Hewlett-Packard Company
     8000 Foothills Blvd.
     Roseville, CA 95747-5668, USA
     Phone: +1-916-785-5621

Chadalapaka            Expires January, 2006        [Page 16]

Internet-Draft                    iSCSI Guide         10 July 2005

10  Acknowledgements

     The IP Storage (ips) Working Group in the Transport Area of
     IETF has been responsible for defining the iSCSI protocol
     (apart from a host of other relevant IP Storage protocols).
     The editor acknowledges the contributions of the entire
     working group.

     The following individuals directly contributed to identifying
     [RFC3720] issues and/or suggesting resolutions to the issues
     clarified in this document: David Black (REPORT LUNS/overflow
     semantics), Gwendal Grignou (TMF scope), Mike Ko (digest
     error handling for Asynchronous Message), Dmitry Fomichev
     (reserved ITT).  This document benefited from all these

Chadalapaka            Expires January, 2006        [Page 17]

Internet-Draft                iSCSI Guide         10 July 2005

11  Full Copyright Statement

     Copyright (C) The Internet Society (2005).  This document is
     subject to the rights, licenses and restrictions contained in
     BCP 78, and except as set forth therein, the authors retain
     all their rights.

     This document and the information contained herein are
     provided on an "AS IS" basis and THE CONTRIBUTOR, THE

Chadalapaka            Expires January, 2006        [Page 18]

Internet-Draft                 iSCSI Guide         10 July 2005

12  Intellectual Property Statement

      The IETF takes no position regarding the validity or scope of
      any Intellectual Property Rights or other rights that might
      be   claimed to pertain to the implementation or use of the
      technology described in this document or the extent to which
      any license under such rights might or might not be
      available; nor does it represent that it has made any
      independent effort to identify any such rights.  Information
      on the procedures with respect to rights in RFC documents can
      be found in BCP 78 and BCP 79.

      Copies of IPR disclosures made to the IETF Secretariat and
      any    assurances of licenses to be made available, or the
      result of an   attempt made to obtain a general license or
      permission for the use of such proprietary rights by
      implementers or users of this   specification can be obtained
      from the IETF on-line IPR    repository at

      The IETF invites any interested party to bring to its
      attention    any copyrights, patents or patent applications,
      or other    proprietary rights that may cover technology that
      may be required   to implement this standard.  Please address
      the information to the IETF at

Chadalapaka            Expires January, 2006        [Page 19]