FTPEXT Working Group                                         G. Lundberg
Internet-Draft                                 WU-FTPD Development Group
Expiration Date: November 28, 2002                              May 2002


                          UTF-8 Option for FTP
                 draft-ietf-ftpext-utf-8-option-00.txt

Status of this Memo

   This document is an Internet-Draft and is subject to all provisions
   of Section 10 of RFC2026.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   To view the list of IETF Internet-Draft Mirror Sites, see
   http://www.ietf.org/shadow.html.

Copyright Notice

   Copyright (C) The Internet Society (2002).  All Rights Reserved.

Abstract

   This document specifies an extension to the File Transfer Protocol
   (FTP) which provides for inter-operation between existing
   implementations and those supporting the exchange of UTF-8 encoded
   pathnames, and clarifies certain issues involved with UTF-8 encoding.

   It introduces a new option, UTF-8, negotiated by use of the OPTS
   command.  Through use of this option, the user informs the server of
   its willingness to accept UTF-8 encoded pathnames.  The proposed
   extension requires that neither party transmit UTF-8 encoded
   pathnames without having first successfully negotiated this option.

   Implementation of this extension is RECOMMENDED.




Lundberg                Expires November 28, 2002               [Page 1]


Internet-Draft            UTF-8 Option for FTP                  May 2002


Table of Contents

      Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . .   1
   1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . .   3
   2. UTF-8 Option . . . . . . . . . . . . . . . . . . . . . . . . .   5
   3. UTF-8 Encoding Issues  . . . . . . . . . . . . . . . . . . . .   7
   4. Misuse CR NUL in Pathnames . . . . . . . . . . . . . . . . . .   7
   5. ABNF for Pathnames . . . . . . . . . . . . . . . . . . . . . .   7
   6. IANA Considerations  . . . . . . . . . . . . . . . . . . . . .   8
   7. Security Considerations  . . . . . . . . . . . . . . . . . . .   8
      Acknowledgments  . . . . . . . . . . . . . . . . . . . . . . .   8
      Normative References . . . . . . . . . . . . . . . . . . . . .   8
      Author's Address . . . . . . . . . . . . . . . . . . . . . . .   9
      Annex A - Specific Changes to Existing Specifications  . . . .  10
      Full Copyright Statement . . . . . . . . . . . . . . . . . . .  11




































Lundberg                Expires November 28, 2002               [Page 2]


Internet-Draft            UTF-8 Option for FTP                  May 2002


1. Introduction

   Internationalization of the File Transfer Protocol [RFC2640] enhances
   the capabilities of the FTP by removing the 7-bit restrictions on
   pathnames used in commands and responses.  It defines a single
   character set, in addition to NVT ASCII and EBCDIC, which is to be
   understandable by all systems, and specifies that character set to be
   ISO/IEC 10646:1993 (UCS) using UTF-8 encoding when exchanging
   pathnames.

   UTF-8 encoding is a file-safe encoding which may avoid the use of
   byte values that have special significance during the parsing of
   pathname character strings.  It represents each UCS character as a
   sequence of 1 to 6 bytes in length, using a modified Huffman encoding
   scheme.  For all sequences of one byte the most significant bit is
   ZERO.  For all sequences of more than one byte the number of ONE bits
   in the first byte, starting from the most significant bit position,
   indicates the number of bytes in the UTF-8 encoded sequence, followed
   by a ZERO bit.

   A property of UTF-8 encoding is that its single byte sequence is
   consistent with the 7-bit ASCII character set.  [RFC2640] incorrectly
   asserts that this feature of UTF-8 encoding will allow existing
   implementations to inter-operate with implementations which support
   UTF-8 encoding.

   [RFC2640] further ignores problems of inter-operation by only
   requiring that conforming implementations must support UTF-8 encoding
   for the transfer and receipt of pathnames.  It contains a great deal
   of discussion of how a compliant implementation should treat UTF-8
   encoding under various conditions.  Unfortunately, other than to
   allow existing implementations to continue to use local character set
   encodings where pathnames encoded in those character sets are not
   UTF-8 encoded (and thus not ASCII), [RFC2640] gives no thought to the
   effect of UTF-8 encoding upon existing implementations.

   At a strictly protocol level, [RFC2640] is generally correct; the use
   of UTF-8 encoding should not inherently prevent most existing
   implementations from correctly reading the character sequence as a
   pathname.  In the best case, existing implementations, when presented
   with UTF-8 encoded pathname, will treat it as an error and recover.
   The possibility exists, however, that existing implementations will
   not detect the error, and either fail directly, pass the information
   to the host system causing a failure at that level, or treat the
   characters as invocations of special functions (such as end-of-line
   markers or command-line editing).  Experience has shown that
   presenting hosts and applications with unexpected character sequences
   may result in serious security issues [RR, EL, AD].



Lundberg                Expires November 28, 2002               [Page 3]


Internet-Draft            UTF-8 Option for FTP                  May 2002


   The specifications of [RFC2640] provide the means for the server to
   indicate its willingness to accept UTF-8 encoded pathnames.  To
   restore inter-operation with existing implementations, the FTP should
   provide a means for the user to express its willingness to accept
   UTF-8 encoded pathnames, and servers should not transmit UTF-8
   encoded pathnames without prior authorization from the user.

   [RFC2640] also incorrectly requires interpreting the Telnet end-of-
   line sequence CR NUL as a pathname character.  [RFC1123] attempted to
   address this issue with inter-operation of the Telnet protocol, but
   its effect upon the FTP has been largely ignored.  The intention of
   [RFC1123] was to clarify that a server implemenation must transmit
   the Telnet end-of-line sequence as CR LF, but that both user and
   server implementations must be prepared to accept either CR LF or CR
   NUL as representing Telnet end-of-line when received, and that the
   choice of whether to send CR LF or CR NUL is up to the user and
   should be configurable.

   The requirement in [RFC2640] that CR NUL be an allowed pathname
   character, when considered in the context of the [RFC1123]
   requirements, should cause existing implementations to incorrectly
   interpret the sequence as Telnet end-of-line, causing loss of
   synchronization.  Experience shows implementations often fail in a
   number of ways once they have lost synchronization.  While server
   implementations usually cope fairly well with the problem, user
   implementations often lock up.  It is possible, however, that an
   implementation will fail in some critical manner that may cause
   serious security issues.

   The receipt of unexpected UTF-8 encoded information alone raises the
   possibility of serious security problems.  When taken together with
   the misuse of the Telnet end-of-line sequence, the security
   implications increase dramatically since not only is unexpected
   information being received, but the sender has control of the
   protocol's primary sequence point.

   This document addresses the deficiencies of [RFC2640] by adding a new
   option which must be successully negotiated prior to transmission of
   UTF-8 encoded pathnames, making specific clarifications in the use
   UTF-8 encoding, and clarifying the proper interpretation of the
   sequence CR NUL as being Telnet end-of-line and thus not a usable
   pathname character.

   In the development of the protocol specified in this document, an
   alternative was considered: the server could interpret the LANG
   command as an indication that the user is willing to accept UTF-8
   encoded pathnames.  In this case, the server should provide the
   language EN (English) as an alternative so that the user can



Lundberg                Expires November 28, 2002               [Page 4]


Internet-Draft            UTF-8 Option for FTP                  May 2002


   negotiate the LANG command without any change to the message text
   included with replies.  This approach unreasonably assumes that
   implementations will either adopt [RFC2640] in its entirety, or not
   at all.  While that may often be a safe assumption, the concept of
   UTF-8 encoded pathnames is logically distinct from the concept of the
   language used for free-form response text.  It unnecessarily limits
   implementers, forcing them to support the internationalization of
   response text when they desire only to allow internationalized
   pathnames.

   When reading the following specifications, the key words "MUST",
   "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT",
   "RECOMMENDED",  "MAY", and "OPTIONAL" are to be interpreted as
   described in RFC 2119 [RFC2119].

2. UTF-8 Option

   The user issues the OPTS UTF-8 command to indicate its willingness to
   send and receive UTF-8 encoded pathnames over the control connection.
   Prior to sending this command, the user should not transmit UTF-8
   encoded pathnames.

   The specifications of [RFC2640] apply only to pathnames sent over the
   control connection.  The OPTS UTF-8 command provides an optional
   parameter which modifies the behavior of the NLST command.  When this
   option is specified, pathnames transmitted over the data connection
   in response to the NLST command must be UTF-8 encoded, and the data
   transmission assumes TYPE L 8 character framing.  By using this
   optional parameter, responsibility for conversion to the local
   character set of the pathnames contained in the NLST data transfer
   shifts from the server implementation to the user.

   Before sending the UTF-8 option, the user should issue the FEAT
   command and examine the response to that command.  If the response
   contains the UTF-8 option, the user should take that option to mean
   the server is willing to transmit UTF-8 encoded pathnames, and may
   support the OPTS UTF-8 command to enable their use.  Note that the
   specification of the OPTS command, and the OPTS UTF-8 variant,
   provide a reliable means to determine support for UTF-8 encoded
   pathnames; no harmful effect occurs if the user does not issue the
   FEAT command.

   The format of the OPTS UTF-8 command is:

      OPTS <SP> UTF-8 [ <SP> NLST ] <CRLF>

   The text of the command line is not case sensitive, but should be
   transmitted in upper case as shown.



Lundberg                Expires November 28, 2002               [Page 5]


Internet-Draft            UTF-8 Option for FTP                  May 2002


   The UTF-8 option allows one optional parameter: NLST.  When present,
   NLST must be separated from the UTF-8 option by exactly one space
   character.

   Possible replies to the OPTS UTF-8 command, and their meanings,
   include:

      200 Command okay.
      421 Service not available, closing control connection.
      500 Syntax error, command unrecognized.
      501 Syntax error in parameters or arguments.
      502 Command not implemented.

   The User-FTP process must not depend upon the actual text (if any)
   included with the reply.

   A Server-FTP process which does not implement the OPTS command will
   reply with either response code 500 or 502.  For compatibility with
   existing implementations, the User-FTP process must be prepared for
   this reply and must not transmit UTF-8 encoded pathnames.

   The response code 501 indicates the Server-FTP process does not
   implement the OPTS UTF-8 command or was unable to recognize the
   parameters given with the command.  For compatibility with existing
   implementations, the User-FTP process must be prepared for this reply
   and must not transmit UTF-8 encoded pathnames.

   Prior to transmitting response code 200 in response to the OPTS UTF-8
   command, the Server-FTP must not transmit UTF-8 encoded pathnames and
   should not accept them on commands: the Server-FTP should transmit
   either response code 501 or 553 in reply to any command which
   includes a pathname outside the range of 7-bit ASCII; and the Server-
   FTP should transmit response code 550 in reply to any command to
   which the server would otherwise have sent a UTF-8 encoded pathname.

   Upon receiving response code 200, the user should transmit only UTF-8
   encoded pathnames, and should expect to receive only UTF-8 encoded
   pathnames from the server.

   The user may issue the OPTS UTF-8 command without the NLST parameter
   to restore the operation of the NLST command.

   To terminate the use of UTF-8 encoding on the control connection, the
   user must either issue the REIN command or terminate the FTP session
   and begin anew.






Lundberg                Expires November 28, 2002               [Page 6]


Internet-Draft            UTF-8 Option for FTP                  May 2002


3. UTF-8 Encoding Issues

   The UTF-8 encoding scheme allows the possibility of multiple
   encodings for a single character.

   For each character, there is a single, shortest form of the UTF-8
   encoding.  When transmitting UTF-8 encoded characters, the shortest
   form should be used.

   When interpreting received UTF-8 encoded information, the
   implementation should accept the non-shortest form as meaning the
   same character as the preferred, shortest form.  (One method would be
   for the implementation to actually replace the character with the
   shortest form encoding.)

   Implementations, however, must attach no special significance to any
   non-shortest form encodings.  In particular, the non-shortest form
   encodings for CR, LF and NUL are not to be treated as potential
   Telnet end-of-line characters.

4. Misuse of CR NUL in Pathnames

   The assertion in [RFC2640] that CR NUL is not a Telnet end-of-line
   sequence is incorrect.

   The Telnet protocol requires the character CR to always be followed
   by either the character LF or NUL.  The design of the FTP requires
   that the sequences CR LF and CR NUL be treated as a Telnet end-of-
   line and all existing implementations properly recognize them as
   such.

   Pathnames must not include the character CR.  This applies all uses
   of the character CR whether alone or followed by any other character.

5. ABNF for Pathnames

   The ABNF for pathnames presented in [RFC2640] is incorrect.

   When UTF-8 encoding is not present, the correct syntax is

      PATHNAME = *( %x20-7E )

   When UTF-8 encoding is in use, the correct syntax is

      PATHNAME = *( %x20-7E / %x80-FF )

   Note that both cases render moot all discussion about the use of the
   characters CR, LF, and NUL, in pathnames.



Lundberg                Expires November 28, 2002               [Page 7]


Internet-Draft            UTF-8 Option for FTP                  May 2002


6. IANA Considerations

   The list of valid option names for the FTP OPTS command is believed
   to be first-come first-served, and managed outside the control of the
   Internet Assigned Numbers Authority (IANA).

7. Security Issues

   While it should improve inter-operation, and therefore may improve
   security, the addition of the UTF-8 option itself should have no
   effect upon the security of the FTP, networks or hosts.

   The intention of this document is to address inter-operational issues
   in the existing protocol specifications.

   Some of those issues can lead to unexpected data appearing on the
   communications channel.  Experience shows this can lead to serious
   security issues, potentially including the compromising hosts on the
   network.  While one would hope that implementations were hardened
   against such occurances, some implementations may not be.  The
   importance of such hardening cannot be emphasized strongly enough.

Acknowledgments

   The following people provided significant assistance with the
   analysis of the problem, the proposed solution, and the preparation
   of this document:

      The members of vulnerability handling team at the CERT
      Coordination Center.

Normative References

   [RFC854]    J. Postel and J. Reynolds, "TELNET PROTOCOL
               SPECIFICATION", STD 8, RFC 854, May 1983.

   [RFC959]    J. Postel and J. Reynolds, "FILE TRANSFER PROTOCOL
               (FTP)", STD 9, RFC 959, October 1985.

   [RFC1123]   IETF, "Requirements for Internet Hosts -- Application and
               Support", STD 3, RFC 1123, October 1989.

   [RFC2119]   S. Bradner, "Key words for use in RFCs to Indicate
               Requirements Levels", RFC 2119, BCP 14, March 1997.

   [RFC2234]   D. Crocker and P. Overell, "Augmented BNF for Syntax
               Specifications: ABNF", RFC 2234, November 1997.




Lundberg                Expires November 28, 2002               [Page 8]


Internet-Draft            UTF-8 Option for FTP                  May 2002


   [RFC2389]   P. Hethmon and R. Elz, "Feature negotiation mechanism for
               the File Transfer Protocol", RFC 2389, August 1998.

   [RFC2640]   B. Curtin, "Internationalization of the File Transfer
               Protocol", RFC 2640, July 1999.

Informative References

   [RR]        R. Russell and S. Cunningham, "Hack Proofing Your
               Network: Internet Tradecraft", ISBN 1-928994-15-6,
               January 2000.

   [EL]        E. Labbate, "Vulnerability as a Function of Software
               Quality", http://rr.sans.org/code/quality.php, March
               2001.

   [AD]        A. Davis, et al, "Understanding the Risks of SNMP
               Vulnerabilities",
               http://www.lucent.com/livelink/255868_Whitepaper.pdf,
               March 2002.

Author's Address

   Gregory A. Lundberg
   WU-FTPD Development Group
   1441 Elmdale Drive
   Dayton, OH 45409-1615
   US

   Phone: +1 937 299 7653
   Email: lundberg@vr.net




















Lundberg                Expires November 28, 2002               [Page 9]


Internet-Draft            UTF-8 Option for FTP                  May 2002


Annex A - Specific Changes to Existing Specifications

   In summary, the specifications presented in this memo make the
   following specific changes with respect to the requirements of
   [RFC2640]:

    - add the option UTF-8, and require that implementations must not
      transmit UTF-8 encoded pathnames until after successful
      negotiation of the UTF-8 option;

    - clarify that UTF-8 encoding applies only to pathnames transmitted
      over the control connection, and provide a means to specify UTF-8
      encoding in the data transfer sent in response to the NLST
      command;

    - clarify the use of UTF-8 encoding with respect to non-shortest
      encodings;

    - clarify that the character sequence CR NUL is a Telnet end-of-line
      sequence and must not be treated as a pathname character;

    - specify the correct ABNF syntax for pathnames when UTF-8 encoding
      is, and is not, in use.




























Lundberg                Expires November 28, 2002              [Page 10]


Internet-Draft            UTF-8 Option for FTP                  May 2002


Full Copyright Statement

   Copyright (C) The Internet Society (2002).  All Rights Reserved.

   This document and translations of it may be copied and furnished to
   others, and derivative works that comment on or otherwise explain it
   or assist in its implementation may be prepared, copied, published
   and distributed, in whole or in part, without restriction of any
   kind, provided that the above copyright notice and this paragraph are
   included on all such copies and derivative works.  However, this
   document itself may not be modified in any way, such as by removing
   the copyright notice or references to The Internet Society or other
   Internet organizations, except as needed for the purpose of
   developing Internet standards in which case the procedures for
   copyrights defined in the Internet Standards process must be
   followed, or as required to translate it into languages other than
   English.

   The limited permissions granted above are perpetual and will not be
   revoked by The Internet Society or its successors or assigns.

   This document and the information contained herein is provided on an
   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
   TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT
   NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN
   WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Acknowledgement

   Funding for the RFC Editor function is currently provided by The
   Internet Society.



















Lundberg                Expires November 28, 2002              [Page 11]