X.400 Use of Extended Character Sets
RFC 1502

Document Type RFC - Historic (August 1993; No errata)
Last updated 2013-03-02
Stream IETF
Formats plain text pdf html bibtex
Stream WG state (None)
Document shepherd No shepherd assigned
IESG IESG state RFC 1502 (Historic)
Consensus Boilerplate Unknown
Telechat date
Responsible AD (None)
Send notices to (None)
Network Working Group                                      H. Alvestrand
Request for Comments: 1502                                  SINTEF DELAB
                                                             August 1993

                  X.400 Use of Extended Character Sets

Status of this Memo

   This RFC specifies an IAB standards track protocol for the Internet
   community, and requests discussion and suggestions for improvements.
   Please refer to the current edition of the "IAB Official Protocol
   Standards" for the standardization state and status of this protocol.
   Distribution of this memo is unlimited.

1.  Introduction

   Since 1988, X.400 has had the capacity for carrying a large number of
   different character sets in a message by using the body part
   "GeneralText" defined by ISO/IEC 10021-7.

   Since 1992, the Internet also has the means of passing around
   messages containing multiple character sets, by using the mechanism
   defined in RFC-MIME.

   This RFC defines a suggested method of using "GeneralText" in order
   to harmonize as much as possible the usage of this body part.

2.  General principles

2.1.  Goals

   The target of this memo is to define a way of using existing
   standards to achieve:

    (1)  in the short term, a standard for sending E-mail in the
         European languages (Latin letters with European accents,
         Greek and Cyrillic)

    (2)  in the medium term, extending this to cover the Hebrew and
         Arabic character sets

    (3)  in the long term, opening up true international E-mail by
         allowing the full character set specified in ISO-10646 to be
         used.

Alvestrand                                                      [Page 1]
RFC 1502          X.400 Use of Extended Character Sets       August 1993

   The author believes that this document gives a specification that can
   easily accomodate the use of any character set in the ISO registry,
   and, by giving guidance rules for choosing character sets, will help
   interworking.

2.2.  Families of character sets

2.2.1.  ISO 6937/T.61

   ISO 6937 is a code technique used and recommended in T.51 and T.101
   (Teletex and Videotex service) and in X.500, providing a repertoire
   of 333 characters from the Latin script by use of non- spacing
   diacritical marks. It corresponds closely to CCITT recommendation
   T.61.

   The problem with that technique is that the character stream comes in
   two modes, i.e., some characters are coded with one byte and some
   with two (composite characters). This makes information processing
   systems such as an E-mail UA or GW more complex.

   It is also not extensible to other languages like Korean or Chinese,
   or even Greek, without invoking the character set switching
   techniques of ISO 2022.

2.2.2.  ISO 8859

   ISO 8859 defines a set of character sets, each suitable for use in
   some group of languages. Each character in ISO 8859 is coded in a
   single byte.

   There are currently 11 parts of ISO 8859, plus a "supplementary" set,
   registered as ISO IR 154. Most languages using single-byte characters
   can be written in one or another of the ISO 8859 sets.  There are
   sets covering Greek, Hebrew and Arabic, but there is still
   controversy over the problem of the rendering direction for Hebrew
   and Arabic.

   All the ISO 8859 sets include US-ASCII as a subset. All use 8 bits.

   ISO 8859 is regarded by many as a solution; for instance, the X
   windows system now comes with ISO-8859-1 as the "standard" character
   set, with the possibility of specifying others. But since the same
   applications often do not support character set switching within
   text, it is problematic to use these in a truly multilingual
   environment.  (Also, most fonts claiming to be "ISO- 8859-1" in X11R5
   are actually 7-bit fonts. The implied lie is very unfortunate.)

Alvestrand                                                      [Page 2]
RFC 1502          X.400 Use of Extended Character Sets       August 1993

   It turns out to work fine, however, if the second language is
   English, since this can be written in all ISO 8859 sets.

   The parts 3 and 4 have not seen wide acceptance, and it is expected
   that they will be discarded. They should therefore not be used.

   Note that an ISO 8859 set is actually 2 sets in the ISO sense: US-
   ASCII in the G0 set and another character set in the G1 set.  The
   overloading of the word "character set" is unfortunate, but
   traditional.

2.2.3.  ISO 10646

   At the moment of writing, ISO 10646 has just been accepted as an
   International Standard. It is basically a 32-bit character set, with
   all of the currently used characters being numbered by the first 16
   bits, leaving some room for expansion.

   It is not possible to use ISO 10646 as a normal character set,
   because it does not conform to the rules for usage of byte values set
   down in ISO 2022 and other places; it uses the "control space" for
Show full document text