draft-klensin-net-utf8-03

Network Working Group                                         J. Klensin
Internet-Draft                                              M. Padlipsky
Expires: September 3, 2007                                 March 2, 2007


                 Unicode Format for Network Interchange
                     draft-klensin-net-utf8-03.txt

Status of this Memo

   By submitting this Internet-Draft, each author represents that any
   applicable patent or other IPR claims of which he or she is aware
   have been or will be disclosed, and any of which he or she becomes
   aware will be disclosed, in accordance with Section 6 of BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on September 3, 2007.

Copyright Notice

   Copyright (C) The IETF Trust (2007).

Abstract

   The Internet today is in need of a standardized form for the
   transmission of internationalized "text" information, paralleling the
   specifications for the use of ASCII that date from the early days of
   the ARPANET.  This document specifies that format, using UTF-8 with
   specification of normalization and specific line-ending sequences.







Klensin & Padlipsky     Expires September 3, 2007               [Page 1]


Internet-Draft               Network Unicode                  March 2007


Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
     1.1.  Background . . . . . . . . . . . . . . . . . . . . . . . .  3
     1.2.  Terminology  . . . . . . . . . . . . . . . . . . . . . . .  5
     1.3.  Mailing List . . . . . . . . . . . . . . . . . . . . . . .  5
   2.  Net-Unicode  . . . . . . . . . . . . . . . . . . . . . . . . .  5
     2.1.  Definition . . . . . . . . . . . . . . . . . . . . . . . .  5
     2.2.  The ASCII NVT Definition . . . . . . . . . . . . . . . . .  6
   3.  Normalization  . . . . . . . . . . . . . . . . . . . . . . . .  7
   4.  Versions of Unicode  . . . . . . . . . . . . . . . . . . . . .  8
   5.  Applicability and Stability of this Specification  . . . . . .  9
     5.1.  Use in IETF Applications Specifications  . . . . . . . . .  9
     5.2.  The Unicode Applicability Dilemma  . . . . . . . . . . . .  9
   6.  Security Considerations  . . . . . . . . . . . . . . . . . . . 11
   7.  Acknowledgments  . . . . . . . . . . . . . . . . . . . . . . . 11
   8.  Change log . . . . . . . . . . . . . . . . . . . . . . . . . . 12
     8.1.  Changes from -00 to -01  . . . . . . . . . . . . . . . . . 12
     8.2.  Changes from -01 to -02  . . . . . . . . . . . . . . . . . 12
     8.3.  Changes from -02 to -03  . . . . . . . . . . . . . . . . . 12
   9.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 12
     9.1.  Normative References . . . . . . . . . . . . . . . . . . . 12
     9.2.  Informative References . . . . . . . . . . . . . . . . . . 14
   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 15
   Intellectual Property and Copyright Statements . . . . . . . . . . 17


























Klensin & Padlipsky     Expires September 3, 2007               [Page 2]


Internet-Draft               Network Unicode                  March 2007


1.  Introduction

1.1.  Background

   This subsection contains a review of prior work in the ARPANET and
   Internet to establish a standard text type, work that establishes the
   context and motivation for the approach taken in this document.  The
   text is explanatory rather than normative: nothing in this section is
   intended to change or update any current specification.  Those who
   are uninterested in this review and analysis can safely skip to the
   next section.

   One of the earlier application design decisions made in the
   development of ARPANET, a decision that was carried forward into the
   Internet, was the decision to standardize on a single and very
   specific coding for "text" to be passed across the network [RFC0020].
   Hosts on the network were then responsible for translating or mapping
   from whatever character coding conventions were used locally to that
   common intermediate representation, with sending hosts mapping to it
   and receiving ones mapping from it to their local forms as needed.
   It is interesting to note that at the time the ARPANET was being
   developed, participating host operating systems used at least three
   different character coding standards: the antiquated BCD (Binary
   Coded Decimal), the then-dominant major manufacturer-backed EBCDIC
   (Extended BCD Interchange Code), and the then-still emerging ASCII
   (American Standard Code for Information Interchange).  Since the
   ARPANET was an "open" project and EBCDIC was intimately linked to a
   particular hardware vendor, the original Network Working Group agreed
   that its standard should be ASCII.  That ASCII form was precisely
   "7-bit ASCII in an 8-bit field", which was in effect a compromise
   between hosts that were natively 7-bit oriented (e.g., with five
   seven-bit characters in a 36 bit word), those that were 8-bit
   oriented (using eight-bit characters) and those that placed the
   seven-bit ASCII characters in 9-bit fields with two leading zero bits
   (four characters in a 36 bit word).

   More standardization was suggested in the first preliminary
   description of the Telnet protocol [RFC0097].  With the iterations of
   that protocol [RFC0137] [RFC0139] and the drawing together of an
   essentially formal definition somewhat later [RFC0318], a standard
   abstraction, the Network Virtual Terminal (NVT) was established.  NVT
   character-coding conventions (initially called "Telnet ASCII" and
   later called "NVT ASCII", or, more casually, "network ASCII")
   included the requirement that Carriage Return - Line Feed (CRLF) be
   the common representation for ending lines of text (given that some
   participating "Host" operating systems used the one natively, some
   the other, and at least one used both) and specified conventions for
   some other characters.  Also, since NVT ASCII was restricted to



Klensin & Padlipsky     Expires September 3, 2007               [Page 3]


Internet-Draft               Network Unicode                  March 2007


   seven-bit characters, use of the high-order bit in octets was
   reserved for the transmission of control signaling information.

   At a very high level, the concept was that a system could use
   whatever character coding and line representations were appropriate
   locally, but text transmitted over the network as text must conform
   to the single "network virtual terminal" convention.  Virtually all
   early Internet protocols that presume transfer of "text" assume this
   virtual terminal model, although different ones assume or limit it in
   different ways.  Telnet, the command stream and ASCII Type in FTP
   [RFC0542], the message stream in SMTP transfer [RFC2821], and the
   strings passed to finger [RFC0742] and whois [RFC0954] are the
   classic examples.  More recently, HTTP [RFC2068] follows the same
   general model but permits 8 bit data and leaves the line end sequence
   unspecified (the latter has been the source of a significant number
   of problems).

   In our more internationalized world, "text" clearly no longer equates
   unambiguously to "network ASCII".  Fortunately, however, we are
   converging on Unicode [Unicode] [ISO10646] as a single international
   interchange character coding and no longer need to deal with per-
   script standards for character sets (e.g., one standard for each of
   Arabic, Cyrillic, Devanagari, etc., or even standards keyed to
   languages that are usually considered to share a script, such as
   French, German, or Swedish).  Unfortunately, though, while it is
   certainly time to define a Unicode-based text type for use as a
   common text interchange format, "use Unicode" involves even more
   ambiguity than "use ASCII" did decades ago.  Unicode can be
   transmitted in at least three standard and generally-recognized
   encoding forms:

   o  UTF-8 (a variable-length encoding with some additional properties)
      [RFC3629],

   o  UTF-16 (a variable length encoding with all common characters
      having 16 bits, but some characters that require additional bits
      being expressed via a "surrogate" mechanism) [RFC2781],

   o  UTF-32 (all characters 32 bit; also known as UCS-4).


   Older forms and nomenclature, such as the 16 bit UCS-2, are now
   strongly discouraged.

   As with ASCII, any of these forms may have different line-ending
   conventions.

   This document proposes to establish "Net-Unicode" as a new



Klensin & Padlipsky     Expires September 3, 2007               [Page 4]


Internet-Draft               Network Unicode                  March 2007


   standardized text transmission form for the Internet, to serve as an
   internationalized alternative for NVT ASCII when specified in new --
   and, where appropriate, updated -- protocols.  UTF-8 [RFC3629] is
   chosen for the coding because it has good compatibility properties
   with ASCII and for other reasons discussed in the existing IETF
   character set policy [RFC2277].

   In circumstances in which there is a choice, use of Unicode and the
   text encoding specified here is preferred to the double-byte encoding
   of "extended ASCII" [RFC0698] or the assorted per-language or per-
   country character coding systems and SHOULD be used.

1.2.  Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in [RFC2119].

1.3.  Mailing List

   [[anchor4: RFC Editor: Please remove this subsection prior to
   publication.]]

   Along with related work on general internationalization issues, this
   document is being discussed on the discuss@apps.ietf.org mailing
   list.


2.  Net-Unicode

2.1.  Definition

   The Network Unicode (Net-Unicode) format is defined as follows:

   1.  Characters MUST be coded in UTF-8 as defined in [RFC3629].

   2.  Line-endings MUST be indicated by the sequence Carriage-Return
       (U+000D) followed by Line-Feed (U+000A).

   3.  Before transmission, all character sequences MUST be normalized
       according to Unicode method "NFC" (see Section 3).

   4.  As suggested in Section 6 of RFC 3629, the Byte Order Mark
       ("BOM") signature MUST NOT appear at the beginning of these text
       strings.

   The NVT specification contained a number of additional provisions,
   e.g., for the optional use of backspacing and "bare CR" (sent as CR



Klensin & Padlipsky     Expires September 3, 2007               [Page 5]


Internet-Draft               Network Unicode                  March 2007


   NUL) to generate overstruck character sequences.  The much greater
   number of precomposed characters in Unicode, the availability of
   combining characters, and the growing use of markup conventions of
   various types to show, e.g., emphasis (rather than attempting to do
   that via the use of special characters), should make such sequences
   largely unnecessary.  Because they were optional in NVT applications,
   they SHOULD be avoided if at all possible; if they are used, this
   specification does not change the NVT rules and conventions of RFC
   318 and RFC 854 [RFC0854] (see Section 2.2).  The most important of
   these rules is that CR MUST NOT appear without either LF (indicating
   end of line) or NUL (note that NUL, X'00' is hostile to programming
   languages that use that character as a string delimiter).

2.2.  The ASCII NVT Definition

   [[anchor7: Note in Draft: The material that follows is an
   extrapolation from the original NVT material.  Questions have been
   raised as to whether it is completely appropriate in today's
   environment (internationalized or not).  See the note at the end of
   this subsection -- further discussion is solicited.]]

   This specification is intended as an update to, and internationalized
   version of, the Net-ASCII defintion.  As such, it is appropriate to
   review and, if necessary, update, the key elements of that
   definition.

   The first part of the section titled "THE NVT PRINTER AND KEYBOARD"
   in RFC 854 is generally considered to be the normative definition of
   the (ASCII) Network Virtual Terminal and hence of Net-ASCII.  In
   today's usage, and for the present specification, the following
   clarifications and updates to that list should be noted:

   1.  The "defined but not required" codes -- BEL, BS, HT, VT, FF --
       and the undefined control codes ("C0") SHOULD NOT be used unless
       required by exceptional circumstances.

   2.  CR MUST NOT appear except when immediately followed by either NUL
       or LF, with the latter (CR LF) designating the "new line"
       function.  Because page layout is better done in other ways and
       to avoid other types of confusion, CR NUL SHOULD preferably be
       avoided.

   3.  LF CR SHOULD NOT appear except as a side-effect of multiple CR LF
       sequences (e.g., CR LF CR LF).

   [[anchor8: Note in Draft: As mentioned above, it is not clear that
   these are the right restrictions today.  In particular, despite the
   general belief that it is better to specify formatting by markup



Klensin & Padlipsky     Expires September 3, 2007               [Page 6]


Internet-Draft               Network Unicode                  March 2007


   rather than character codes, "FF" is fairly widely used and accepted
   by most printers (although not all of them and certainly not on-
   screen systems) as indicating a page eject.  Similarly, HT is still
   widely used despite ambiguities about the length or column position
   to which a tab applies and no standardized way to specify them.  At
   the other extreme, the discussion above does not mention the so-
   called "C1 Controls" at U+0080 through U+009F. In addition, while the
   telnet IAC character itself is not a problem for UTF-8, telnet
   permits other command-introducer characters whose bit sequences in an
   octet may be part of valid UTF-8 characters.Suggestions as to how to
   address the above issues are solicited.]]



3.  Normalization

   There are cases where strings of Unicode are fundamentally
   equivalent, essentially representing the same text.  These are called
   "canonical equivalents" in the Unicode Standard.  For example, the
   following pairs of strings are canonically equivalent:

   U+2126 OHM SIGN
   U+03A9 GREEK CAPITAL LETTER OMEGA

   U+0061 LATIN SMALL LETTER A, U+0300 COMBINING GRAVE ACCENT
   U+00E0 LATIN SMALL LETTER A WITH GRAVE

   Comparison of strings becomes much easier if any such cases are
   always represented by a single unique form.  The Unicode Consortium
   specifies a normalization method, known as NFC [NFC], which provides
   the necessary mappings and mechanisms to convert all canonically
   equivalent sequences a single unique form.  Typically, this form
   produces precomposed characters for any sequences that can be
   represented in that fashion.  It also reorders other combining marks
   so that they have a unique and unambiguous order.

   Systems conforming to this specification MUST NOT transmit any string
   containing any code point that is unassigned in the version of
   Unicode and NFC on which they are dependent.

   The section above requires that all Net-Unicode strings be
   transmitted in normalized form.  Recognition of the fact that some
   applications implementations may rely on operating system libraries
   over which they have little control and adherence to the robustness
   principle suggests that receivers of such strings should be prepared
   to receive unnormalized ones and to not react to that in excessive
   ways.




Klensin & Padlipsky     Expires September 3, 2007               [Page 7]


Internet-Draft               Network Unicode                  March 2007


4.  Versions of Unicode

   In retrospect, one of the advantages of ASCII [X3.4-1978] when it was
   chosen was that the code space was full when the Standard was first
   published.  There was no practical way to add characters or change
   code point assignments without being obviously incompatible.  Unicode
   does not have that property: there are large blocks of space reserved
   for future expansion and new versions, with new characters and code
   point assignments, appear at regular intervals.

   While there are some security issues if people deliberately try to
   trick the system (see Section 6), Unicode version changes should not
   have a significant impact on the text stream specification of this
   document for the following reasons:

   o  The transformation between Unicode code table positions and the
      corresponding UTF-8 code is algorithmic; it does not depend on
      whether a code point has been assigned or not.

   o  The normalization specified here, NFC (see Section 3), performs a
      very limited set of mappings, much more limited than those of the
      more extensive NFKC used in, e.g., nameprep [RFC3491].

   The NFC tables may be updated over time as new characters are added,
   but the Unicode Consortium has guaranteed the stability of all NFC
   strings.  That is, if a string does not contain any unassigned
   characters, and it is normalized according to NFC, it will always be
   normalized according to all future versions of the Unicode Standard.
   The stability of the Net-Unicode format is thus guaranteed when any
   implementation that converts text into Net-Unicode format does not
   permit unassigned characters.

   Were Unicode to be changed in a way that violated these assumptions,
   i.e., that either invalidated the string order of RFC 3629 or that
   that changed the stability of NFC as stated above, this specification
   would not apply.  Put differently, this specification applies only to
   versions of Unicode starting with version 3.2 and extending to, but
   not including, any version for which no changes are made in either
   the UTF-8 definition or to NFC stability.

   Were such changes to be were made, the IETF would be faced with
   either freezing on the last version of Unicode in which they were not
   changed or of replacing this specification with one that (i) was
   consistent with the new rules and (ii) specified a way to distinguish
   between strings that were created entirely according old rules and
   those that conform to newer ones.  Where this specification is
   referenced in a specification or implementation, otherwise
   unidentified UTF-8 strings are to be treated as conforming to it.



Klensin & Padlipsky     Expires September 3, 2007               [Page 8]


Internet-Draft               Network Unicode                  March 2007


5.  Applicability and Stability of this Specification

5.1.  Use in IETF Applications Specifications

   During the development of this specification, there was some
   confusion about where it would be useful given that, e.g., MIME and
   HTTP have their own rules about UTF-8 character types.  There are
   three answers.  The first is that, in retrospect, it would have been
   better to have those protocols and content types standardized in the
   way specified here, even though it is certainly too late to change
   them at this time.  The second is that we have several protocols that
   are dependent on either Telnet or other arrangements requiring a
   standard, interoperable, string definition without specific content-
   labels of one sort or another.  Whois [RFC3912] is an example member
   of this group.  As consideration is given to upgrading them for non-
   ASCII use, this specification provides a possible normative reference
   that provides the same stability that NVT has provided the ASCII
   forms.  In particular, if this proposal is approved, or even appears
   to be getting significant traction, it may be followed by a Telnet
   option to specify this type of stream and, more likely, an FTP
   extension to permit a new "Unicode text" data TYPE.  Finally, and
   most important, having a preferred standard Internet definition for
   Unicode text streams -- rather than just one for transmission codings
   -- may help improve the specification and interoperability of
   protocols to be developed in the future.

5.2.  The Unicode Applicability Dilemma

   The IETF faces a practical dilemma with regard to versions of
   Unicode.  Each new version brings with it new characters and
   sometimes new combining characters.  Version 5.0 introduces the new
   concept of sequences of characters named as if they were individual
   characters (see [NamedSequences]).  The normalization represented by
   NFC is stable if all strings are transmitted and stored in normalized
   form (a requirement of this specification but of neither the IETF's
   UTF-8 Standard [RFC3629] nor of internationalized domain names (IDNA
   [RFC3490])) if corrections are never made to character definitions or
   normalization tables and if unassigned code points are never used.
   The latter is important because an unassigned code point always
   normalizes to itself.  However, if the same code point is assigned to
   a character in a future version, it may participate in some other
   normalization mapping.

   All would be well with this as described in Section 4 except for one
   problem: Applications typically do not perform their own conversions
   to Unicode and may not perform their own normalizations but instead
   rely on operating system or language library functions -- functions
   that may be upgraded or otherwise changed without changes to the



Klensin & Padlipsky     Expires September 3, 2007               [Page 9]


Internet-Draft               Network Unicode                  March 2007


   application code itself.  Consequently, there may be no plausible way
   for an application to know which version of Unicode, or which version
   of the normalization procedures, it is utilizing, nor is there any
   way by which it can guarantee that the two will be consistent.

   Because of per-version changes in definitions and tables, IDNA is now
   tied to Unicode Version 3.2 [Unicode32] and IETF Standard UTF-8 is
   dependent on some definitions not changing after Unicode Version 4.0.
   The latter assumption seems fairly safe, but it is still an
   assumption.  This specification can reasonably be tied to Version 4.1
   [Unicode410] or even 5.0 [Unicode] but, in addition to the obvious
   disadvantages of having three IETF standards tied to three different
   versions of Unicode, the application implementation behavior
   described above makes these version linkages nearly meaningless in
   practice.

   In theory, one can get around this problem in four ways:

   1.  Freeze on a particular version of Unicode and try to insist that
       applications enforce that version by, e.g., containing lists of
       unassigned characters and prohibiting their use.  Of course, this
       would prohibit evolution to include newly-added scripts and the
       tables of unassigned code points would be cumbersome.

   2.  Require that every Unicode "text" string or file start with a
       version indication, somewhat akin to the "byte order mark"
       indicator.  It is unlikely that this provision would be
       practical.  More important, it would require that each
       application implementation be prepared to either support multiple
       normalization tables and versions or that it reject text from
       Unicode Versions with which it was not prepared to deal.

   3.  Devise a different set of normalization rules that would, e.g.,
       guarantee that no character assigned to a previously-unassigned
       code point in Unicode was ever normalized to anything but itself
       and use those rules instead of NFC.  It is not clear whether or
       not such a set of rules is possible or whether some other
       completely stable set of rules could be devised, perhaps in
       combination with restrictions on the ways in which characters
       were added in future versions of Unicode.

   4.  Devise a normalization process that is otherwise equivalent to
       NFC but that rejects code points that are unassigned in the
       current version of Unicode, rather than mapping those code points
       to themselves.  This would still leave some risk of incompatible
       corrections in Unicode and possibly a few edge cases, but it is
       probably stable enough for Internet use in the overwhelming
       number of cases.  This process has been discussed in the Unicode



Klensin & Padlipsky     Expires September 3, 2007              [Page 10]


Internet-Draft               Network Unicode                  March 2007


       Consortium under the name "Stable NFC".

   None of these approaches seems ideal: the ideal procedure would be as
   stable and predictable as ASCII has been.  But that level is simply
   not feasible as long as Unicode continues to evolve by the addition
   of new code points and scripts.  The fourth option listed above
   appears to be a reasonable compromise.


6.  Security Considerations

   This specification provides a standard form for the use of Unicode as
   "network text".  The same security issues that apply to UTF-8, and
   discussed in [RFC3629] could be argued to apply to it, although it
   should be slightly less subject to some risks by virtue of requiring
   NFC normalization and generally being somewhat more restrictive.
   However, shifts in Unicode versions, as discussed in Section 5.2, may
   introduce other security issues.

   While not specifically a security issue, the requirement in NVT, and
   hence here, that, except as "newline" (CR LF), the CR character never
   appear alone but only when followed by ASCII NUL (an octet with all
   bits zero) may be problematic for some programming languages, and
   hence a trap for the unwary, unless caution is used.  This may be an
   additional reason to avoid the use of CR entirely, except in sequence
   with LF, as suggested above.

   The discussion about Unicode versions above (see Section 4 and
   Section 5.2) makes several assumptions about future versions of
   Unicode, about NFC normalization being applied properly, and about
   UTF-8 being processed and transmitted exactly as specified in RFC
   3629.  If any of those assumptions are not correct, then there are
   cases in which strings that would be considered equivalent do not
   compare equal.  Robust code should be prepared for those
   possibilities.


7.  Acknowledgments

   Many thanks to Mark Davis, Martin Duerst, and Michel Suignard for
   suggestions about Unicode normalization that led to the format
   described here and especially to Mark for providing the paragraphs
   that describe the role of NFC.  Thanks also to Mark, Doug Ewell,
   Asmus Freytag for corrected text describing Unicode transmission
   forms and to Stephane Bortzmeyer, Frank Ellermann, Ted Hardie, and
   Bjoern Hoehrmann for a number of helpful comments and clarification
   requests.




Klensin & Padlipsky     Expires September 3, 2007              [Page 11]


Internet-Draft               Network Unicode                  March 2007


8.  Change log

   [[anchor12: RFC Editor: Please remove this section before
   publication.]]

8.1.  Changes from -00 to -01

   o  Replaced the section on Normalization with text provided by Mark
      Davis

   o  Several small editorial changes and corrections.

8.2.  Changes from -01 to -02

   o  Added material explaining the relationship to Net-ASCII and the
      NVT.

   o  Brought the material on transmission forms into line with current
      practice and terminology.

   o  Made terminology more consistent.

   o  Inserted normalization text provided by Mark Davis.

   o  Rewrote and reorganized Unicode versioning material.

   o  Clarified relationships to existing protocols, stressing that this
      is not, in itself, a proposal to change any of them.

8.3.  Changes from -02 to -03

   o  Clarification of several relationships and updating to reflect
      mailing list comments and other work.

   o  Inserted a discussion and pair of placeholders about prohibited
      NVT characters.

   o  Several corrections of typographic and editorial errors and
      additions of relevant references.


9.  References

9.1.  Normative References

   [ISO10646]
              International Organization for Standardization,
              "Information Technology - Universal Multiple- Octet Coded



Klensin & Padlipsky     Expires September 3, 2007              [Page 12]


Internet-Draft               Network Unicode                  March 2007


              Character Set (UCS) - Part 1: Architecture and Basic
              Multilingual Plane"", ISO/IEC 10646-1:2000, October 2000.

   [NFC]      Davis, M. and M. Duerst, "Unicode Standard Annex #15:
              Unicode Normalization Forms", March 2005,
              <http://www.unicode.org/reports/tr15/>.

   [RFC0137]  O'Sullivan, T., "Telnet Protocol - a proposed document",
              RFC 137, April 1971.

   [RFC0139]  O'Sullivan, T., "Discussion of Telnet Protocol", RFC 139,
              May 1971.

   [RFC0318]  Postel, J., "Telnet Protocols", RFC 318, April 1972.

   [RFC0854]  Postel, J. and J. Reynolds, "Telnet Protocol
              Specification", STD 8, RFC 854, May 1983.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

   [RFC3629]  Yergeau, F., "UTF-8, a transformation format of ISO
              10646", STD 63, RFC 3629, November 2003.

   [Unicode]  The Unicode Consortium, "The Unicode Standard, Version
              5.0", 2007.

              Boston, MA, USA: Addison-Wesley.  ISBN 0-321-48091-0

   [Unicode32]
              The Unicode Consortium, "The Unicode Standard, Version
              3.0", 2000.

              (Reading, MA, Addison-Wesley, 2000.  ISBN 0-201-61633-5).
              Version 3.2 consists of the definition in that book as
              amended by the Unicode Standard Annex #27: Unicode 3.1
              (http://www.unicode.org/reports/tr27/) and by the Unicode
              Standard Annex #28: Unicode 3.2
              (http://www.unicode.org/reports/tr28/).

   [Unicode410]
              The Unicode Consortium, "The Unicode Standard, Version
              4.1.0", March 2005.

              Defined by: The Unicode Standard, Version 4.0 (Boston, MA,
              Addison-Wesley, 2003.  ISBN 0-321-18578-1), as amended by
              Unicode 4.0.1
              (http://www.unicode.org/versions/Unicode4.0.1) and by



Klensin & Padlipsky     Expires September 3, 2007              [Page 13]


Internet-Draft               Network Unicode                  March 2007


              Unicode 4.1.0
              (http://www.unicode.org/versions/Unicode4.1.0).

9.2.  Informative References

   [ISO.646.1991]
              International Organization for Standardization,
              "Information technology - ISO 7-bit coded character set
              for information interchange", ISO Standard 646, 1991.

   [ISO.8859.2003]
              International Organization for Standardization,
              "Information processing - 8-bit single-byte coded graphic
              character sets - Part 1: Latin alphabet No. 1 (1998) -
              Part 2: Latin alphabet No. 2 (1999) - Part 3: Latin
              alphabet No. 3 (1999) - Part 4: Latin alphabet No. 4
              (1998) - Part 5: Latin/Cyrillic alphabet (1999) - Part 6:
              Latin/Arabic alphabet (1999) - Part 7: Latin/Greek
              alphabet (2003) - Part 8: Latin/Hebrew alphabet (1999) -
              Part 9: Latin alphabet No. 5 (1999) - Part 10: Latin
              alphabet No. 6 (1998) - Part 11: Latin/Thai alphabet
              (2001) - Part 13: Latin alphabet No. 7 (1998) - Part 14:
              Latin alphabet No. 8 (Celtic) (1998) - Part 15: Latin
              alphabet No. 9 (1999) - Part 16: Part 16: Latin alphabet
              No. 10 (2001)", ISO Standard 8859, 2003.

   [NamedSequences]
              The Unicode Consortium, "NamedSequences-4.1.0.txt", 2005,
              <http://www.unicode.org/Public/UNIDATA/
              NamedSequences.txt>.

   [RFC0020]  Cerf, V., "ASCII format for network interchange", RFC 20,
              October 1969.

   [RFC0097]  Melvin, J. and R. Watson, "First Cut at a Proposed Telnet
              Protocol", RFC 97, February 1971.

   [RFC0542]  Neigus, N., "File Transfer Protocol", RFC 542,
              August 1973.

   [RFC0698]  Mock, T., "Telnet extended ASCII option", RFC 698,
              July 1975.

   [RFC0742]  Harrenstien, K., "NAME/FINGER Protocol", RFC 742,
              December 1977.

   [RFC0954]  Harrenstien, K., Stahl, M., and E. Feinler, "NICNAME/
              WHOIS", RFC 954, October 1985.



Klensin & Padlipsky     Expires September 3, 2007              [Page 14]


Internet-Draft               Network Unicode                  March 2007


   [RFC2068]  Fielding, R., Gettys, J., Mogul, J., Nielsen, H., and T.
              Berners-Lee, "Hypertext Transfer Protocol -- HTTP/1.1",
              RFC 2068, January 1997.

   [RFC2277]  Alvestrand, H., "IETF Policy on Character Sets and
              Languages", BCP 18, RFC 2277, January 1998.

   [RFC2781]  Hoffman, P. and F. Yergeau, "UTF-16, an encoding of ISO
              10646", RFC 2781, February 2000.

   [RFC2821]  Klensin, J., "Simple Mail Transfer Protocol", RFC 2821,
              April 2001.

   [RFC3490]  Faltstrom, P., Hoffman, P., and A. Costello,
              "Internationalizing Domain Names in Applications (IDNA)",
              RFC 3490, March 2003.

   [RFC3491]  Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep
              Profile for Internationalized Domain Names (IDN)",
              RFC 3491, March 2003.

   [RFC3912]  Daigle, L., "WHOIS Protocol Specification", RFC 3912,
              September 2004.

   [X3.4-1978]
              American National Standards Institute (formerly United
              States of America Standards Institute), "USA Code for
              Information Interchange", ANSI X3.4-1968, 1968.

              ANSI X3.4-1968 has been replaced by newer versions with
              slight modifications, but the 1968 version remains
              definitive for the Internet.


Authors' Addresses

   John C Klensin
   1770 Massachusetts Ave, #322
   Cambridge, MA  02140
   USA

   Phone: +1 617 491 5735
   Email: john-ietf@jck.com








Klensin & Padlipsky     Expires September 3, 2007              [Page 15]


Internet-Draft               Network Unicode                  March 2007


   Michael A. Padlipsky
   8011 Stewart Ave.
   Los Angeles, CA  90045
   USA

   Phone: +1 310-670-4288
   Email: the.map@alum.mit.edu












































Klensin & Padlipsky     Expires September 3, 2007              [Page 16]


Internet-Draft               Network Unicode                  March 2007


Full Copyright Statement

   Copyright (C) The IETF Trust (2007).

   This document is subject to the rights, licenses and restrictions
   contained in BCP 78, and except as set forth therein, the authors
   retain all their rights.

   This document and the information contained herein are provided on an
   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
   THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
   OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
   THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.


Intellectual Property

   The IETF takes no position regarding the validity or scope of any
   Intellectual Property Rights or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; nor does it represent that it has
   made any independent effort to identify any such rights.  Information
   on the procedures with respect to rights in RFC documents can be
   found in BCP 78 and BCP 79.

   Copies of IPR disclosures made to the IETF Secretariat and any
   assurances of licenses to be made available, or the result of an
   attempt made to obtain a general license or permission for the use of
   such proprietary rights by implementers or users of this
   specification can be obtained from the IETF on-line IPR repository at
   http://www.ietf.org/ipr.

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights that may cover technology that may be required to implement
   this standard.  Please address the information to the IETF at
   ietf-ipr@ietf.org.


Acknowledgment

   Funding for the RFC Editor function is provided by the IETF
   Administrative Support Activity (IASA).





Klensin & Padlipsky     Expires September 3, 2007              [Page 17]

Document	Document type	This is an older version of an Internet-Draft that was ultimately published as RFC 5198. Expired & archived
	Select version	00 01 02 03 04 05 06 07 08 09 RFC 5198
	Compare versions
	Author
	RFC stream
	Other formats	txt pdf bibtex bibxml