FTPEXT Working Group B. Curtin
INTERNET DRAFT Defense Information Systems Agency
Expires 26 May 1997 26 November 1996
Internationalization of the File Transfer Protocol
<draft-ietf-ftpext-intl-ftp-00.txt>
Status of this Memo
This document is an Internet-Draft. Internet-Drafts are
working documents of the Internet Engineering Task Force
(IETF), its areas, and its working groups. Note that other
groups may also distribute working documents as
Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of
six months. Internet-Drafts may be updated, replaced, or
obsoleted by other documents at any time. It is not
appropriate to use Internet-Drafts as reference material or
to cite them other than as a "working draft" or "work in
progress".
To learn the current status of any Internet-Draft, please
check the 1id-abstracts.txt listing contained in the
Internet-Drafts Shadow Directories on ds.internic.net (US
East Coast), nic.nordu.net (Europe), ftp.isi.edu (US West
Coast), or munnari.oz.au (Pacific Rim).
Distribution of this document is unlimited. Please send
comments to the FTP Extension working group (FTPEXT-WG) of
the Internet Engineering Task Force (IETF) at
<ftp-wg@hops.ag.utk.edu>. Subscription address is
<ftp-wg-request@hops.ag.utk.edu>. Discussions of the group
are archived at <URL:ftp://hops.ag.utk.edu/ftp-wg/archives/>.
Abstract
The File Transfer Protocol, as defined in RFC 959 [RFC959]
and RFC 1123 Section 4 [RFC1123], is one of the oldest and
widely used protocols on the Internet. The protocol's primary
character set, 7 bit ASCII, has served the protocol well
through the early growth years of the Internet. However, as
the Internet becomes more global, there is a need to support
character sets beyond 7 bit ASCII.
This document addresses the internationalization (I18n) of
FTP, which includes supporting the multiple character sets
found throughout the Internet community. This is achieved
by extending the FTP specification and giving recommendations
for proper internationalization support.
Expires 26 May 1997 [Page 1]
INTERNET DRAFT FTP Internationalization 26 November, 1996
Table of Contents
1. INTRODUCTION.................................................3
1.1 SCOPE.......................................................3
2.0 INTERNATIONALIZATION........................................3
2.1 INTERNATIONAL CHARACTER SET.................................3
2.2 TRANSFER ENCODING...........................................4
2.3 TRANSLATIONS................................................6
2.3.1 ISO/IEC 8859-8 EXAMPLE....................................9
2.3.2 VENDOR CODEPAGE EXAMPLE..................................10
3. CONFORMANCE.................................................11
3.1 INTERNATIONAL SERVERS......................................11
3.1.1 SERVER STRATEGIES EXAMPLES...............................12
3.2 INTERNATIONAL CLIENTS......................................12
4.0 SECURITY...................................................13
5.0 ACKNOWLEDGEMENTS...........................................13
BIBLIOGRAPHY...................................................14
AUTHOR'S ADDRESS...............................................15
Expires 26 May 1997 [Page 2]
INTERNET DRAFT FTP Internationalization 26 November, 1996
1. Introduction
As the Internet grows throughout the world the requirement to
support character sets outside of the ASCII / Latin-1
character set becomes ever more urgent. For FTP, because of
the large installed base, it is paramount that this be done
without breaking existing clients and servers. This document
addresses this need. In doing so it defines a solution which
will still allow the installed base to interoperate with new
international clients and servers.
1.1 Scope
This document enhances the capabilities of the File Transfer
Protocol by defining a Universal Character Set (UCS), a UCS
transformation format (UTF), and removing the 7-bit
restrictions on pathnames used in client commands and server
responses.
2.0 Internationalization
The File Transfer Protocol was developed in a period when the
predominate character sets were 7 bit ASCII and 8 bit EBCDIC.
Today these character sets can not support the wide range of
characters needed by multinational systems. Given that there
are a number of character sets in current use that provide
more characters than 7-bit ASCII, it makes sense to decide on
a convenient way to represent the union of those
possibilities. To work globally either requires support of a
number of character sets and to be able to translate between
them, or the use of a single preferred character set . To
assure interoperability this document recommends the latter
approach and defines a single character set, in addition to
NVT ASCII and EBCDIC, which is understandable by all systems.
For FTP this character set will be ISO/IEC 10646:1993 and the
UTF-8 encoding. For support of global compatibility it is
strongly recommended that clients and servers use UTF-8
encoding when performing operations on filenames. Clients and
servers are, however, under no obligation to perform any
translation on the contents of a file for operations such as
STOR or RETR.
A more thorough description, beyond what is given in the
document, on UTF-8, ISO/IEC 10646, and UNICODE can be found
in RFC 2044 [RFC2044].
2.1 International Character Set
The character set defined for international support of FTP
shall be the Universal Character Set as defined in ISO
Expires 26 May 1997 [Page 3]
INTERNET DRAFT FTP Internationalization 26 November, 1996
10646:1993 [ISO-10646] as amended. This standard incorporates
the script and symbol character sets of many existing
international, national, and corporate standards. ISO/IEC
10646 defines two alternate forms of encoding, UCS-4 and
UCS-2. UCS-4 is a four byte (31 bit) encoding containing
2**31 code positions divided into 128 groups of 256 planes.
Each plane consists of 256 rows of 256 cells. UCS-2 is a 2
byte (16 bit) character set consisting of plane zero or the
Basic Multilingual Plane (BMP). Currently, no codesets have
been defined outside of the 2 byte BMP.
The Unicode standard version 2.0 [UNICODE] is consistent with
the UCS-2 subset of ISO/IEC 10646. The Unicode standard
version 2.0 includes the repertoire of IS 10646 characters,
amendments 1-7 of IS 10646, and editorial and technical
corrigenda.
NOTE -- implementers should be aware that ISO 10646 amended
from time to time; 4 amendments have been adopted since the
initial 1993 publication, none of which significantly
affects this specification. A fifth amendment, now under
consideration, will introduce incompatible changes to the
standard: 6556 Korean Hangul syllables allocated between
code positions 3400 and 4DFF (hexadecimal) will be moved to
new positions (and 4516 new syllables added), thus making
references to the old positions invalid. Since the Unicode
consortium has already adopted the corresponding amendment
in Unicode 2.0, adoption of DAM 5 is considered likely and
implementers should probably consider the old code positions
as already invalid. Despite this one-time change, the
relevant standard bodies have committed themselves not to
change any allocated code position in the future. To encode
Korean Hangul irrespective of these changes, the conjoining
Hangul Jamo in the range 1110-11F9 can be used.
2.2 Transfer Encoding
UCS Transformation Format 8 (UTF-8) [UTF-8], also known as
UTF-2, will be used as a transfer encoding to transmit the
international character set. UTF-8 is a file safe encoding
which avoids the use of byte values which have special
significance during the parsing of file name character
strings. UTF-8 is an 8 bit encoding of the characters in the
UCS. Some of UTF-8's benefits are that it is compatible with
7 bit ASCII, so it doesn't affect programs that give special
meanings to various ASCII characters; it is immune to
synchronization errors; and it has enough space to support
large character sets.
UTF-8 encoding represents each UCS character as a sequence of
Expires 26 May 1997 [Page 4]
INTERNET DRAFT FTP Internationalization 26 November, 1996
1 to 6 bytes in length. For all sequences of one byte the
most significant bit is ZERO. For all sequences of more than
one byte the number of ONE bits in the first byte, starting
from the most significant bit position, indicates the number
of bytes in the UTF-8 sequence followed by a ZERO bit. For
example, the first byte of a 3 byte UTF-8 sequence would have
1110 as its most significant bits. Each additional bytes
(continuing bytes) in the UTF-8 sequence, contain a ONE bit
followed by a ZERO bit as their most significant bits. The
remaining free bit positions in the continuing bytes are used
to identify characters in the UCS. The relationship between
UCS and UTF-8 is demonstrated in the following table:
UCS-4 range UTF-8 byte sequence
0000 0000-0000 007F 0xxxxxxx
0000 0080-0000 07FF 110xxxxx 10xxxxxx
0000 0800-0000 FFFF 1110xxxx 10xxxxxx 10xxxxxx
0001 0000-001F FFFF 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
0020 0000-03FF FFFF 111110xx 10xxxxxx 10xxxxxx 10xxxxxx
10xxxxxx
0400 0000-7FFF FFFF 1111110x 10xxxxxx 10xxxxxx 10xxxxxx
10xxxxxx 10xxxxxx
A beneficial property of UTF-8 is that its single byte
sequence is consistent with the ASCII character set. This
feature will allow a transition where old ASCII-only clients
can still interoperate with new servers which support the
UTF-8 encoding.
Another feature is that the encoding rules make it very
unlikely that a character sequence from a different character
set will be mistaken for a UTF-8 encoded character sequence.
Clients and servers can use a simple routine to determine if
the character set being exchanged is a valid UTF-8:
int utf8_valid(const unsigned char *buf, unsigned int len)
{
const unsigned char *endbuf = buf + len;
int trailing = 0; /* trailing (continuation) bytes to
follow */
while (buf != endbuf)
{
unsigned char c = *buf++;
if (trailing)
if ((c&0xC0) == 0x80) trailing--;
else return 0;
else
Expires 26 May 1997 [Page 5]
INTERNET DRAFT FTP Internationalization 26 November, 1996
if ((c&0x80) == 0x00) continue;
else if ((c&0xE0) == 0xC0) trailing = 1;
else if ((c&0xF0) == 0xE0) trailing = 2;
else if ((c&0xF8) == 0xF0) trailing = 3;
else if ((c&0xFC) == 0xF8) trailing = 4;
else if ((c&0xFE) == 0xFC) trailing = 5;
else return 0;
}
return trailing == 0;
}
2.3 Translations
Translation from the local filesystem character set to UTF-8
will normally involve a two step process. First translate the
local character set to the UCS; then translate the UCS to
UTF-8.
The first step in the process can be performed by maintaining
a translation table which includes the local character set
code and the corresponding UCS code. For instance the ISO/IEC
8859-8 [ISO-8859] code for the Hebrew letter "VAV" is 0xE4.
The corresponding 4 byte ISO/IEC 10646 code is 0x000005D5.
The next step is to translate the UCS character code to the
UTF-8 encoding. The following routine can be used to
determine and encode the correct number of bytes based on the
UCS-4 character code:
int ucs4_to_utf8 (unsigned long *ucs4_buf, unsigned int ucs4_len,
unsigned char *utf8_buf)
{
const unsigned long *ucs4_endbuf = ucs4_buf + ucs4_len;
unsigned long ucs4_ch;
while (ucs4_buf != ucs4_endbuf)
{
ucs4_ch = *ucs4_buf;
if ( ucs4_ch <= 0x7FUL) /* ASCII chars no conversion needed */
*utf8_buf++ = (unsigned char) ucs4_ch;
else
if ( ucs4_ch <= 0x07FFUL ) /* In the 2 byte utf-8 range */
{
*utf8_buf++= (unsigned char) (0xC0UL + (ucs4_buf/0x40UL));
*utf8_buf++= (unsigned char) (0x80UL + (ucs4_buf%0x40UL));
}
else
if ( ucs4_ch <= 0xFFFFUL ) /* In the 3 byte utf-8 range.
The values 0x0000FFFE,
0x0000FFFF and
Expires 26 May 1997 [Page 6]
INTERNET DRAFT FTP Internationalization 26 November, 1996
0x0000D800 - 0x0000DFFF do
not occur in UCS-4 */
{
*utf8_buf++=
(unsigned char) (0xE0UL + (ucs4_buf/0x1000UL));
*utf8_buf++= (unsigned char) (0x80UL
+ ((ucs4_buf/0x40UL)%0x40UL));
*utf8_buf++= (unsigned char) (0x80UL +
(ucs4_buf%0x40UL));
}
else
if ( ucs4_ch <= 0x1FFFFFUL ) /* In the 4 byte
{ utf-8 range */
*utf8_buf++= (unsigned char) (0xF0UL +
(ucs4_buf/0x040000UL));
*utf8_buf++= (unsigned char) (0x80UL
+ ((ucs4_buf/0x10000)%0x40UL));
*utf8_buf++= (unsigned char) (0x80UL
+ ((ucs4_buf/0x40UL)%0x40UL));
*utf8_buf++= (unsigned char) (0x80UL +
(ucs4_buf%0x40UL));
}
else
if ( ucs4_ch <= 0x03FFFFFFUL ) /* In the 5 byte
{ utf-8 range */
*utf8_buf++= (unsigned char) (0xF8UL
+(ucs4_buf/0x01000000UL));
*utf8_buf++= (unsigned char) (0x80UL
+ ((ucs4_buf/0x040000UL)%0x40UL));
*utf8_buf++= (unsigned char) (0x80UL
+ ((ucs4_buf/0x1000UL)%0x40UL));
*utf8_buf++= (unsigned char) (0x80UL
+ ((ucs4_buf/0x40UL)%0x40UL));
*utf8_buf++= (unsigned char) (0x80UL +
(ucs4_buf%0x40UL));
}
else
if ( ucs4_ch <= 0x7FFFFFFFUL ) /* In the 6 byte
{ utf-8 range */
*utf8_buf++= (unsigned char) (0xF8UL
+(ucs4_buf/0x40000000UL));
*utf8_buf++= (unsigned char) (0x80UL
+ ((ucs4_buf/0x01000000UL)%0x40UL));
*utf8_buf++= (unsigned char) (0x80UL
+ ((ucs4_buf/0x040000UL)%0x40UL));
*utf8_buf++= (unsigned char) (0x80UL
+ ((ucs4_buf/0x1000UL)%0x40UL));
*utf8_buf++= (unsigned char) (0x80UL
+ ((ucs4_buf/0x40UL)%0x40UL));
*utf8_buf++= (unsigned char) (0x80UL
Expires 26 May 1997 [Page 7]
INTERNET DRAFT FTP Internationalization 26 November, 1996
+ (ucs4_buf%0x40UL));
}
}
}
When moving from UTF-8 encoding to the local character set
the reverse procedure is used. First the UTF-8 encoding is
transformed into the UCS-4 character set. The UCS-4 is then
converted to the local character set from a translation table
(i.e. the opposite of the table used to form the UCS-4
character code).
To convert from UTF-8 to UCS-4 the free bits (those that do
not define UTF-8 sequence size or signify continuation bytes)
in a UTF-8 sequence are concatenated as a bit string. The
bits are then distributed into a four byte sequence starting
from the least significant bits. Those bits not assigned a
bit in the four byte sequence are padded with ZERO bits. The
following routine converts the UTF-8 encoding to UCS-4
character codes:
int utf8_to_ucs4 (unsigned long *ucs4_buf, unsigned int utf8_len,
unsigned char *utf8_buf)
{
const unsigned char *utf8_endbuf = utf8_buf + utf8_len;
while (utf8_buf != utf8_endbuf)
{
if ((*utf8_buf & 0x80) == 0x00) /* ASCII chars no conversion
{ needed */
*ucs4_buf++ = (unsigned long) *utf8_buf;
utf8_buf++;
}
else
if ((*utf8_buf & 0xE0)== 0xC0) /* In the 2 byte utf-8
{ range */
*ucs4_buf++ = (unsigned long) ((*utf8_buf - 0xC0) * 0x40)
+ ( *(utf_buf+1) - 0x80));
utf8_buf += 2;
}
else
if ( (*utf8_buf & 0xF0) == 0xE0 ) /* In the 3 byte utf-8
{ range */
*ucs4_buf++ = (unsigned long) (((*utf8_buf - 0xE0)
* 0x1000) + (( *(utf8_buf+1) - 0x80)
* 0x40) + ( *(utf_buf+2) - 0x80));
utf8_buf += 3;
}
else
if ((*utf8_buf & 0xF8) == 0xF0) /* In the 4 byte utf-8
Expires 26 May 1997 [Page 8]
INTERNET DRAFT FTP Internationalization 26 November, 1996
{ range */
*ucs4_buf++ = (unsigned long) (((*utf8_buf - 0xF0)
* 0x040000) + (( *(utf8_buf+1) - 0x80)
* 0x1000) + (( *(utf8_buf+2) - 0x80)
* 0x40) + ( *(utf_buf+3) - 0x80));
utf8_buf += 4;
}
else
if ((*utf8_buf & 0xFC) == 0xF8) /* In the 5 byte utf-8
{ range */
*ucs4_buf++ = (unsigned long) (((*utf8_buf - 0xF8)
* 0x01000000) + ((*(utf8_buf+1) - 0x80)
* 0x040000) + (( *(utf8_buf+2) - 0x80)
* 0x1000)
+ (( *(utf8_buf+3) - 0x80) * 0x40)
+ ( *(utf_buf+4) - 0x80));
utf8_buf+=5;
}
else
if ((*utf8_buf & 0xFE) == 0xFC) /* In the 6 byte utf-8
{ range */
*ucs4_buf++ = (unsigned long) (((*utf8_buf - 0xFC)
* 0x40000000) + ((*(utf8_buf+1)
- 0x80) * 0x010000000)
+ ((*(utf8_buf+2) - 0x80) * 0x040000)
+ (( *(utf8_buf+3) - 0x80) * 0x1000)
+ (( *(utf8_buf+4) - 0x80) * 0x40)
+ ( *(utf_buf+5) - 0x80));
utf8_buf+=6;
}
}
}
2.3.1 ISO/IEC 8859-8 Example
This example demonstrates mapping ISO/IEC 8859-8 character
set to UTF-8 and back to ISO/IEC 8859-8. As noted earlier,
the Hebrew letter "VAV" is translated from the ISO/IEC 8859-8
character code 0xE4 to the corresponding 4 byte ISO/IEC 10646
code of 0x000005D5 by a simple lookup of a
translation/mapping file.
The UCS-4 character code is transformed into UTF-8 using the
ucs4_to_utf8 routine described earlier by:
1. Because the UCS-4 character is between 0x80 and 0x07FF it
will map to a 2 byte UTF-8 sequence.
2. The first byte is defined by (0xC0 + (0x000005D5 / 0x40))
= 0xD7.
3. The second byte is defined by (0x80 + (0x000005D5 %
Expires 26 May 1997 [Page 9]
INTERNET DRAFT FTP Internationalization 26 November, 1996
0x40)) = 0x95.
The UTF-8 encoding is transferred back to UCS-4 by using the
utf8_to_ucs4 routine described earlier by:
1. Because the first byte of the sequence, when the '&'
operator with a value of 0xE0 is applied, will produce
0xC0 (0xD7 & 0xE0 = 0xC0) the UTF-8 is a 2 byte sequence.
2. The four byte UCS-4 character code is produced by
(((0xD7 - 0xC0) * 0x40) + (0x95 -0x80)) = 0x000005D5.
Finally, the UCS-4 character code is translated to ISO/IEC
8859-8 character code (using the translation table which
matches ISO/IEC 8859-8 to UCS-4 ) to produce the original
0xE4 code for the Hebrew letter "VAV".
2.3.2 Vendor Codepage Example
This example demonstrates the mapping of a codepage to UTF-8
and back to a vendor codepage. Mapping between vendor
codepages can be done in a very similar manner as described
above. For instance both the PC and Mac codepages reflect the
character set from the Thai standard TIS 620-2533. The
character code on both platforms for the Thai letter "SO SO"
is 0xAB. This character can then be mapped into the UCS-4 by
way of a translation/mapping file to produce the UCS-4 code
of 0x0E0B.
The UCS-4 character code is transformed into UTF-8 using the
ucs4_to_utf8 routine described earlier by:
1. Because the UCS-4 character is between 0x0800 and 0xFFFF
it will map to a 3 byte UTF-8 sequence.
2. The first byte is defined by (0xE0 + (0x00000E0B /
0x1000) = 0x00.
3. The second byte is defined by (0x80 + ((0x00000E0B /
0x40) % 0x40))) = 0xB8.
4. The third byte is defined by (0x80 + (0x00000E0B % 0x40))
= 0x8B.
The UTF-8 encoding is transferred back to UCS-4 by using the
utf8_to_ucs4 routine described earlier by:
1. Because the first byte of the sequence, when the '&'
operator with a value of 0xF0 is applied, will produce
0xE0 (0xE0 & 0xF0 = 0xE0) the UTF-8 is a 3 byte sequence.
2. The four byte UCS-4 character code is produced by
(((0xE0 - 0xE0) * 0x1000) + ((0xB8 - 0x80) * 0x40) +
(0x8B -0x80) = 0x0000E0B.
Expires 26 May 1997 [Page 10]
INTERNET DRAFT FTP Internationalization 26 November, 1996
Finally, the UCS-4 character code is translated to either the
PC or MAC codepage character code (using the translation
table which matches codepage to UCS-4 ) to produce the
original 0xAB code for the Thai letter "SO SO".
3. Conformance
File names are sequences of bytes. The character set of
names that are valid UTF-8 sequences is UTF-8. The character
set of other names is undefined.
Conforming internationalized client and servers must either
support UTF-8 or support a local character set which is
supported by both the client and server. Clients and servers,
unless otherwise configured to support a specific native
character set, should check for a valid UTF-8 byte sequence
to determine if the pathname being presented is UTF-8.
3.1 International Servers
The 7-bit restriction on pathnames used in server responses
is dropped.
If servers and clients are not configured to share the same
character set, servers should use UTF-8 encoding for all
pathname transfers.
There are several plausible UTF-8 server implementation
strategies:
- A server that copies filenames transparently from a local
filesystem may continue to do so. It is then up to the local
file creators to use UTF-8 filenames.
-A server may translate filenames from a local character set
to UTF-8. Each filename will be translated to UTF-8 before it
is sent to the client.
- UTF-8 Filenames received from the client must be translated
back if possible. Many existing servers interpret 8-bit
filenames as being in the local character set. They may
continue to do so for filenames that are not valid UTF-8.
A high-quality translating server will use the following
procedure:
If fn is valid UTF-8 and can be translated to the local
character set:
Translate fn to the local character set, obtaining
localfn.
Expires 26 May 1997 [Page 11]
INTERNET DRAFT FTP Internationalization 26 November, 1996
Attempt to operate on localfn.
Upon success: Stop.
Upon temporary error: Return an error message to the
client.
Stop.
Attempt to operate on fn.
Upon temporary error: Return an error message to the
client.
Stop.
Otherwise:
Attempt to operate on fn.
Upon temporary error: Return an error message to the
client.
Stop.
3.1.1 Server Strategies Examples
There are a number of server strategies which might be
employed:
- Server's OS uses one fixed character set. In this case,
the server should easily be able to support built-in
translation to UTF-8. This is trivial where that fixed
character set is ASCII, ISO 8859/1, or UTF-8.
- Server supports charset labeling of files and/or
directories, such that different file names may have
different charsets. The server should attempt to translate
all file names to UTF-8, but if it can't then it should leave
that name in its raw form.
- Server's OS does not mandate the character set, but the
administrator configures it in the FTP server. The server
should be configured to use a particular translation table.
(Maybe external, but the server might have some common
choices built-in.) This also allows the flexibility of
defining different charsets for different directories.
- Server's OS does not mandate the character set and it is
not configured. The server should simply use the raw bytes in
the file name. They might be ASCII or UTF-8.
- Server is a mirror, and wants to look just like the site it
is mirroring. It should save the exact file name bytes that
it received from the main server.
3.2 International Clients
The 7-bit restriction on pathnames used by client commands is
dropped.
Expires 26 May 1997 [Page 12]
INTERNET DRAFT FTP Internationalization 26 November, 1996
While clients are not obligated to support all of the
characters or the associated glyphs defined in the UCS,
clients which are presented UTF-8 filenames by the server
should parse UTF-8 correctly, and attempt to display the
filename within the limitation of the resources available.
Unknown UTF-8 glyphs might be displayed as question marks, or
hex, or something else. This is a quality-of-implementation
issue.
Client developers should be aware that it will be possible
for pathnames to contain mixed characters (e.g.
/Latin1DirectoryName/HebrewFileName). They should be prepared
to handle the Bi-directional (BIDI) display of these
character sets (i.e. right to left display for the directory
and left to right display for the filename).
Character semantics of other names shall remain undefined. If
a client detects that a server is non-UTF-8, it should change
its display appropriately. How a client implementation
handles non UTF-8 is a quality of implementation issue. It
may try to assume some other encoding, give the user a chance
to try to assume something, or save encoding assumptions for
a server from one FTP session to another.
Client implementation notes: Many existing clients interpret
8-bit filenames as being in the local character set. They may
continue to do so for filenames that are not valid UTF-8.
4.0 Security
This document addresses the support of character sets beyond
1 byte. Conformance to this document should not induce a
security threat.
5.0 Acknowledgements
The following people have contributed to this document:
Alex Belits
D. J. Berstein
Martin J. Duerst
Mark Harris
Paul Hethmon
Alun Jones
James Matthews
Keith Moore
Benjamin Riefenstahl
(and others from the FTPEXT working group)
Expires 26 May 1997 [Page 13]
INTERNET DRAFT FTP Internationalization 26 November, 1996
Bibliography
[ISO-8859]
ISO 8859. International standard -- Information
processing -- 8-bit single-byte coded graphic character
sets -- Part 1: Latin alphabet No. 1 (1987) -- Part 2:
Latin alphabet No. 2 (1987) -- Part 3: Latin alphabet
No. 3 (1988) -- Part 4: Latin alphabet No. 4 (1988) --
Part 5: Latin/Cyrillic alphabet (1988) -- Part 6:
Latin/Arabic alphabet (1987) -- Part : Latin/Greek
alphabet (1987) -- Part 8: Latin/Hebrew alphabet (1988)
-- Part 9: Latin alphabet No. 5 (1989) -- Part10: Latin
alphabet No. 6 (1992)
[ISO-10646]
ISO/IEC 10646-1:1993. International standard --
Information technology -- Universal multiple-octet coded
character set (UCS) -- Part 1: Architecture and basic
multilingual plane.
[RFC959]
J. Postel, J Reynolds, "File Transfer Protocol (FTP)",
RFC 959, October 1985.
[RFC1123]
R. Braden, "Requirements for Internet Hosts --
Application and Support", RFC 1123, October 1989.
[RFC2044]
F. Yergeau, "UTF-8, a transformation format of Unicode
and ISO 10646", RFC 2044, October 1996.
[UNICODE]
The Unicode Consortium, "The Unicode Standard - Version
2.0", Addison Westley Developers Press, July 1996.
[UTF-8]
ISO/IEC 10646-1:1993 AMENDMENT 2 (1996). UCS
Transformation Format 8 (UTF-8).
Expires 26 May 1997 [Page 14]
INTERNET DRAFT FTP Internationalization 26 November, 1996
Author's Address
JIEO
Attn JEBBD (Bill Curtin)
Ft. Monmouth, N.J.
07703-5613
curtinw@ftm.disa.mil
Expires 26 May 1997 [Page 15]