FTPEXT Working Group G. Lundberg
Internet-Draft WU-FTPD Development Group
Expiration Date: November 28, 2002 May 2002
UTF-8 Option for FTP
draft-ietf-ftpext-utf-8-option-00.txt
Status of this Memo
This document is an Internet-Draft and is subject to all provisions
of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
To view the list of IETF Internet-Draft Mirror Sites, see
http://www.ietf.org/shadow.html.
Copyright Notice
Copyright (C) The Internet Society (2002). All Rights Reserved.
Abstract
This document specifies an extension to the File Transfer Protocol
(FTP) which provides for inter-operation between existing
implementations and those supporting the exchange of UTF-8 encoded
pathnames, and clarifies certain issues involved with UTF-8 encoding.
It introduces a new option, UTF-8, negotiated by use of the OPTS
command. Through use of this option, the user informs the server of
its willingness to accept UTF-8 encoded pathnames. The proposed
extension requires that neither party transmit UTF-8 encoded
pathnames without having first successfully negotiated this option.
Implementation of this extension is RECOMMENDED.
Lundberg Expires November 28, 2002 [Page 1]
Internet-Draft UTF-8 Option for FTP May 2002
Table of Contents
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. UTF-8 Option . . . . . . . . . . . . . . . . . . . . . . . . . 5
3. UTF-8 Encoding Issues . . . . . . . . . . . . . . . . . . . . 7
4. Misuse CR NUL in Pathnames . . . . . . . . . . . . . . . . . . 7
5. ABNF for Pathnames . . . . . . . . . . . . . . . . . . . . . . 7
6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8
7. Security Considerations . . . . . . . . . . . . . . . . . . . 8
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 8
Normative References . . . . . . . . . . . . . . . . . . . . . 8
Author's Address . . . . . . . . . . . . . . . . . . . . . . . 9
Annex A - Specific Changes to Existing Specifications . . . . 10
Full Copyright Statement . . . . . . . . . . . . . . . . . . . 11
Lundberg Expires November 28, 2002 [Page 2]
Internet-Draft UTF-8 Option for FTP May 2002
1. Introduction
Internationalization of the File Transfer Protocol [RFC2640] enhances
the capabilities of the FTP by removing the 7-bit restrictions on
pathnames used in commands and responses. It defines a single
character set, in addition to NVT ASCII and EBCDIC, which is to be
understandable by all systems, and specifies that character set to be
ISO/IEC 10646:1993 (UCS) using UTF-8 encoding when exchanging
pathnames.
UTF-8 encoding is a file-safe encoding which may avoid the use of
byte values that have special significance during the parsing of
pathname character strings. It represents each UCS character as a
sequence of 1 to 6 bytes in length, using a modified Huffman encoding
scheme. For all sequences of one byte the most significant bit is
ZERO. For all sequences of more than one byte the number of ONE bits
in the first byte, starting from the most significant bit position,
indicates the number of bytes in the UTF-8 encoded sequence, followed
by a ZERO bit.
A property of UTF-8 encoding is that its single byte sequence is
consistent with the 7-bit ASCII character set. [RFC2640] incorrectly
asserts that this feature of UTF-8 encoding will allow existing
implementations to inter-operate with implementations which support
UTF-8 encoding.
[RFC2640] further ignores problems of inter-operation by only
requiring that conforming implementations must support UTF-8 encoding
for the transfer and receipt of pathnames. It contains a great deal
of discussion of how a compliant implementation should treat UTF-8
encoding under various conditions. Unfortunately, other than to
allow existing implementations to continue to use local character set
encodings where pathnames encoded in those character sets are not
UTF-8 encoded (and thus not ASCII), [RFC2640] gives no thought to the
effect of UTF-8 encoding upon existing implementations.
At a strictly protocol level, [RFC2640] is generally correct; the use
of UTF-8 encoding should not inherently prevent most existing
implementations from correctly reading the character sequence as a
pathname. In the best case, existing implementations, when presented
with UTF-8 encoded pathname, will treat it as an error and recover.
The possibility exists, however, that existing implementations will
not detect the error, and either fail directly, pass the information
to the host system causing a failure at that level, or treat the
characters as invocations of special functions (such as end-of-line
markers or command-line editing). Experience has shown that
presenting hosts and applications with unexpected character sequences
may result in serious security issues [RR, EL, AD].
Lundberg Expires November 28, 2002 [Page 3]
Internet-Draft UTF-8 Option for FTP May 2002
The specifications of [RFC2640] provide the means for the server to
indicate its willingness to accept UTF-8 encoded pathnames. To
restore inter-operation with existing implementations, the FTP should
provide a means for the user to express its willingness to accept
UTF-8 encoded pathnames, and servers should not transmit UTF-8
encoded pathnames without prior authorization from the user.
[RFC2640] also incorrectly requires interpreting the Telnet end-of-
line sequence CR NUL as a pathname character. [RFC1123] attempted to
address this issue with inter-operation of the Telnet protocol, but
its effect upon the FTP has been largely ignored. The intention of
[RFC1123] was to clarify that a server implemenation must transmit
the Telnet end-of-line sequence as CR LF, but that both user and
server implementations must be prepared to accept either CR LF or CR
NUL as representing Telnet end-of-line when received, and that the
choice of whether to send CR LF or CR NUL is up to the user and
should be configurable.
The requirement in [RFC2640] that CR NUL be an allowed pathname
character, when considered in the context of the [RFC1123]
requirements, should cause existing implementations to incorrectly
interpret the sequence as Telnet end-of-line, causing loss of
synchronization. Experience shows implementations often fail in a
number of ways once they have lost synchronization. While server
implementations usually cope fairly well with the problem, user
implementations often lock up. It is possible, however, that an
implementation will fail in some critical manner that may cause
serious security issues.
The receipt of unexpected UTF-8 encoded information alone raises the
possibility of serious security problems. When taken together with
the misuse of the Telnet end-of-line sequence, the security
implications increase dramatically since not only is unexpected
information being received, but the sender has control of the
protocol's primary sequence point.
This document addresses the deficiencies of [RFC2640] by adding a new
option which must be successully negotiated prior to transmission of
UTF-8 encoded pathnames, making specific clarifications in the use
UTF-8 encoding, and clarifying the proper interpretation of the
sequence CR NUL as being Telnet end-of-line and thus not a usable
pathname character.
In the development of the protocol specified in this document, an
alternative was considered: the server could interpret the LANG
command as an indication that the user is willing to accept UTF-8
encoded pathnames. In this case, the server should provide the
language EN (English) as an alternative so that the user can
Lundberg Expires November 28, 2002 [Page 4]
Internet-Draft UTF-8 Option for FTP May 2002
negotiate the LANG command without any change to the message text
included with replies. This approach unreasonably assumes that
implementations will either adopt [RFC2640] in its entirety, or not
at all. While that may often be a safe assumption, the concept of
UTF-8 encoded pathnames is logically distinct from the concept of the
language used for free-form response text. It unnecessarily limits
implementers, forcing them to support the internationalization of
response text when they desire only to allow internationalized
pathnames.
When reading the following specifications, the key words "MUST",
"MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT",
"RECOMMENDED", "MAY", and "OPTIONAL" are to be interpreted as
described in RFC 2119 [RFC2119].
2. UTF-8 Option
The user issues the OPTS UTF-8 command to indicate its willingness to
send and receive UTF-8 encoded pathnames over the control connection.
Prior to sending this command, the user should not transmit UTF-8
encoded pathnames.
The specifications of [RFC2640] apply only to pathnames sent over the
control connection. The OPTS UTF-8 command provides an optional
parameter which modifies the behavior of the NLST command. When this
option is specified, pathnames transmitted over the data connection
in response to the NLST command must be UTF-8 encoded, and the data
transmission assumes TYPE L 8 character framing. By using this
optional parameter, responsibility for conversion to the local
character set of the pathnames contained in the NLST data transfer
shifts from the server implementation to the user.
Before sending the UTF-8 option, the user should issue the FEAT
command and examine the response to that command. If the response
contains the UTF-8 option, the user should take that option to mean
the server is willing to transmit UTF-8 encoded pathnames, and may
support the OPTS UTF-8 command to enable their use. Note that the
specification of the OPTS command, and the OPTS UTF-8 variant,
provide a reliable means to determine support for UTF-8 encoded
pathnames; no harmful effect occurs if the user does not issue the
FEAT command.
The format of the OPTS UTF-8 command is:
OPTS <SP> UTF-8 [ <SP> NLST ] <CRLF>
The text of the command line is not case sensitive, but should be
transmitted in upper case as shown.
Lundberg Expires November 28, 2002 [Page 5]
Internet-Draft UTF-8 Option for FTP May 2002
The UTF-8 option allows one optional parameter: NLST. When present,
NLST must be separated from the UTF-8 option by exactly one space
character.
Possible replies to the OPTS UTF-8 command, and their meanings,
include:
200 Command okay.
421 Service not available, closing control connection.
500 Syntax error, command unrecognized.
501 Syntax error in parameters or arguments.
502 Command not implemented.
The User-FTP process must not depend upon the actual text (if any)
included with the reply.
A Server-FTP process which does not implement the OPTS command will
reply with either response code 500 or 502. For compatibility with
existing implementations, the User-FTP process must be prepared for
this reply and must not transmit UTF-8 encoded pathnames.
The response code 501 indicates the Server-FTP process does not
implement the OPTS UTF-8 command or was unable to recognize the
parameters given with the command. For compatibility with existing
implementations, the User-FTP process must be prepared for this reply
and must not transmit UTF-8 encoded pathnames.
Prior to transmitting response code 200 in response to the OPTS UTF-8
command, the Server-FTP must not transmit UTF-8 encoded pathnames and
should not accept them on commands: the Server-FTP should transmit
either response code 501 or 553 in reply to any command which
includes a pathname outside the range of 7-bit ASCII; and the Server-
FTP should transmit response code 550 in reply to any command to
which the server would otherwise have sent a UTF-8 encoded pathname.
Upon receiving response code 200, the user should transmit only UTF-8
encoded pathnames, and should expect to receive only UTF-8 encoded
pathnames from the server.
The user may issue the OPTS UTF-8 command without the NLST parameter
to restore the operation of the NLST command.
To terminate the use of UTF-8 encoding on the control connection, the
user must either issue the REIN command or terminate the FTP session
and begin anew.
Lundberg Expires November 28, 2002 [Page 6]
Internet-Draft UTF-8 Option for FTP May 2002
3. UTF-8 Encoding Issues
The UTF-8 encoding scheme allows the possibility of multiple
encodings for a single character.
For each character, there is a single, shortest form of the UTF-8
encoding. When transmitting UTF-8 encoded characters, the shortest
form should be used.
When interpreting received UTF-8 encoded information, the
implementation should accept the non-shortest form as meaning the
same character as the preferred, shortest form. (One method would be
for the implementation to actually replace the character with the
shortest form encoding.)
Implementations, however, must attach no special significance to any
non-shortest form encodings. In particular, the non-shortest form
encodings for CR, LF and NUL are not to be treated as potential
Telnet end-of-line characters.
4. Misuse of CR NUL in Pathnames
The assertion in [RFC2640] that CR NUL is not a Telnet end-of-line
sequence is incorrect.
The Telnet protocol requires the character CR to always be followed
by either the character LF or NUL. The design of the FTP requires
that the sequences CR LF and CR NUL be treated as a Telnet end-of-
line and all existing implementations properly recognize them as
such.
Pathnames must not include the character CR. This applies all uses
of the character CR whether alone or followed by any other character.
5. ABNF for Pathnames
The ABNF for pathnames presented in [RFC2640] is incorrect.
When UTF-8 encoding is not present, the correct syntax is
PATHNAME = *( %x20-7E )
When UTF-8 encoding is in use, the correct syntax is
PATHNAME = *( %x20-7E / %x80-FF )
Note that both cases render moot all discussion about the use of the
characters CR, LF, and NUL, in pathnames.
Lundberg Expires November 28, 2002 [Page 7]
Internet-Draft UTF-8 Option for FTP May 2002
6. IANA Considerations
The list of valid option names for the FTP OPTS command is believed
to be first-come first-served, and managed outside the control of the
Internet Assigned Numbers Authority (IANA).
7. Security Issues
While it should improve inter-operation, and therefore may improve
security, the addition of the UTF-8 option itself should have no
effect upon the security of the FTP, networks or hosts.
The intention of this document is to address inter-operational issues
in the existing protocol specifications.
Some of those issues can lead to unexpected data appearing on the
communications channel. Experience shows this can lead to serious
security issues, potentially including the compromising hosts on the
network. While one would hope that implementations were hardened
against such occurances, some implementations may not be. The
importance of such hardening cannot be emphasized strongly enough.
Acknowledgments
The following people provided significant assistance with the
analysis of the problem, the proposed solution, and the preparation
of this document:
The members of vulnerability handling team at the CERT
Coordination Center.
Normative References
[RFC854] J. Postel and J. Reynolds, "TELNET PROTOCOL
SPECIFICATION", STD 8, RFC 854, May 1983.
[RFC959] J. Postel and J. Reynolds, "FILE TRANSFER PROTOCOL
(FTP)", STD 9, RFC 959, October 1985.
[RFC1123] IETF, "Requirements for Internet Hosts -- Application and
Support", STD 3, RFC 1123, October 1989.
[RFC2119] S. Bradner, "Key words for use in RFCs to Indicate
Requirements Levels", RFC 2119, BCP 14, March 1997.
[RFC2234] D. Crocker and P. Overell, "Augmented BNF for Syntax
Specifications: ABNF", RFC 2234, November 1997.
Lundberg Expires November 28, 2002 [Page 8]
Internet-Draft UTF-8 Option for FTP May 2002
[RFC2389] P. Hethmon and R. Elz, "Feature negotiation mechanism for
the File Transfer Protocol", RFC 2389, August 1998.
[RFC2640] B. Curtin, "Internationalization of the File Transfer
Protocol", RFC 2640, July 1999.
Informative References
[RR] R. Russell and S. Cunningham, "Hack Proofing Your
Network: Internet Tradecraft", ISBN 1-928994-15-6,
January 2000.
[EL] E. Labbate, "Vulnerability as a Function of Software
Quality", http://rr.sans.org/code/quality.php, March
2001.
[AD] A. Davis, et al, "Understanding the Risks of SNMP
Vulnerabilities",
http://www.lucent.com/livelink/255868_Whitepaper.pdf,
March 2002.
Author's Address
Gregory A. Lundberg
WU-FTPD Development Group
1441 Elmdale Drive
Dayton, OH 45409-1615
US
Phone: +1 937 299 7653
Email: lundberg@vr.net
Lundberg Expires November 28, 2002 [Page 9]
Internet-Draft UTF-8 Option for FTP May 2002
Annex A - Specific Changes to Existing Specifications
In summary, the specifications presented in this memo make the
following specific changes with respect to the requirements of
[RFC2640]:
- add the option UTF-8, and require that implementations must not
transmit UTF-8 encoded pathnames until after successful
negotiation of the UTF-8 option;
- clarify that UTF-8 encoding applies only to pathnames transmitted
over the control connection, and provide a means to specify UTF-8
encoding in the data transfer sent in response to the NLST
command;
- clarify the use of UTF-8 encoding with respect to non-shortest
encodings;
- clarify that the character sequence CR NUL is a Telnet end-of-line
sequence and must not be treated as a pathname character;
- specify the correct ABNF syntax for pathnames when UTF-8 encoding
is, and is not, in use.
Lundberg Expires November 28, 2002 [Page 10]
Internet-Draft UTF-8 Option for FTP May 2002
Full Copyright Statement
Copyright (C) The Internet Society (2002). All Rights Reserved.
This document and translations of it may be copied and furnished to
others, and derivative works that comment on or otherwise explain it
or assist in its implementation may be prepared, copied, published
and distributed, in whole or in part, without restriction of any
kind, provided that the above copyright notice and this paragraph are
included on all such copies and derivative works. However, this
document itself may not be modified in any way, such as by removing
the copyright notice or references to The Internet Society or other
Internet organizations, except as needed for the purpose of
developing Internet standards in which case the procedures for
copyrights defined in the Internet Standards process must be
followed, or as required to translate it into languages other than
English.
The limited permissions granted above are perpetual and will not be
revoked by The Internet Society or its successors or assigns.
This document and the information contained herein is provided on an
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT
NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN
WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Acknowledgement
Funding for the RFC Editor function is currently provided by The
Internet Society.
Lundberg Expires November 28, 2002 [Page 11]