draft-yamamoto-charset-iso-2022-jp-00

Internet-Draft                                                   E. Wada
                                                    Fujitsu Laboratories
Expires in six months                                           J. Murai
                                                         Keio University
                                                             K. Yamamoto
                                                                   NAIST
                                                          November, 1997


          Japanese Character Encoding for Internet Messages

              <draft-yamamoto-charset-iso-2022-jp-00.txt>

Status of this Memo

    This document is an Internet-Draft.  Internet-Drafts are working
    documents of the Internet Engineering Task Force (IETF), its areas,
    and its working groups.  Note that other groups may also distribute
    working documents as Internet-Drafts.

    Internet-Drafts are draft documents valid for a maximum of six
    months and may be updated, replaced, or obsoleted by other documents
    at any time.  It is inappropriate to use Internet-Drafts as
    reference material or to cite them other than as ``work in
    progress.''

    To learn the current status of any Internet-Draft, please check the
    ``1id-abstracts.txt'' listing contained in the Internet-Drafts
    Shadow Directories on ds.internic.net (US East Coast), nic.nordu.net
    (Europe), ftp.isi.edu (US West Coast), or munnari.oz.au (Pacific
    Rim).

Abstract

    This memo describes the encoding used in electronic mail, NetNews,
    and world wide web messages in Japanese networks.  It was first
    specified by and used in JUNET then described in RFC 1468.  The name
    of this encoding was originally known as ''JUNET code'' and is now
    called ''ISO-2022-JP'' when used in the context of MIME.

    In ISO-2022-JP text, both one 7-bit byte Latin script (ASCII or JIS
    X 0201 Latin set) and two 7-bit bytes Kanji, Hiragana, Katakana and
    some other symbols and characters (JIS X 0208) are employed.
    Switching the graphic character sets is based on the extension
    techniques defined in ISO 2022.

    This memo eliminates some ambiguities of RFC 1468.  However, it
    NEVER introduces any essential changes against RFC 1468.  Since the
    encoding is now widely used in Japanese IP communities, backward
    compatibility is most important in this revision.

    This memo revises RFC 1468 on the following points:

    1. It is clarified that ISO-2022-JP does NOT conform to ISO 2022.
    2. The formal syntax is divided into two new yet compatible rules,

Yamamoto                                                        [Page 1]


Internet-Draft             ISO-2022-JP CHARSET             November 1997

        namely, ISO-2022-JP decoding syntax and ISO-2022-JP encoding
        syntax.
    3. The bit combinations permitted in JIS X 0208 are explicitly
        described in the syntax so that the invalid character positions
        will be excluded in ISO-2022-JP text.
    4. Recommended graphic character sets are specified.  That is, ASCII
        is RECOMMENDED rather than JIS X 0201 1976 Latin set. ONLY to
        embed YEN SIGN and OVER LINE, JIS 0201 Latin set MAY be used.
        Also, JIS X 0208 1983 (including 1990) is RECOMMENDED rather
        than JIS X 0208 1978.

1. Introduction

    Efforts to exchange Japanese text via electronic mail[MAIL] and
    NetNews[NETNEWS] messages started in the early days of JUNET[JUNET],
    consisted of networks arbitrarily connected by UUCP in Japan.  There
    was strong demand for exchanging ASCII[ASCII], JIS X 0201 Latin
    set[JISX0201-76], and JIS X 0208 graphic character sets.

    JIS X 0201 Latin set is identical to ASCII except for REVERSE
    SOLIDUS (BACKSLASH) and TILDE.  REVERSE SOLIDUS and TILDE are
    replaced by YEN SIGN and OVER LINE, respectively.  This set is
    Japan's national variant of ISO 646[ISO646].  The JIS X 0208 graphic
    character sets consist of Kanji, Hiragana, Katakana and some other
    symbols and characters.  Each character takes up two bytes.

    At that time, some implementations of message transfer agents (MTA)
    had 7-bit restriction in their transport connection.  Moreover,
    European countries used 8-bit text to convey their languages.  It
    was thus crucial to design a 7-bit safe format to carry Japanese
    text both for robustness against such MTAs and for distinction from
    European languages.

    Popular encodings for Japanese graphic character sets were EUC-JP
    and Shift_JIS.  However, they were not chosen for a transfer form of
    Japanese text because they are 8-bit.  To encode ASCII, JIS X 0201,
    JIS X 0208 1978[JISX0208-78] and 1983[JISX0208-83], JUNET code was
    designed.  It switches these four graphic character sets according
    to the extension techniques defined in ISO 2022[ISO2022], which is
    described later in this memo.

    During JUNET days, some systems erroneously used the escape sequence
    "ESC ( H" for JIS X 0201 1976 due to a typological error in the JIS
    book.  This escape sequence is officially registered for a Swedish
    graphic character set[ISOREG] and MUST NOT have been used in JUNET
    code.  JUNET had difficulty in eliminating this misusage.

    In 1993, the encoding was described in RFC 1468[OLDSPEC] and the
    name "ISO-2022-JP" was given to identify it in the context of
    MIME[MIME].

    The word "encoding" is ambiguous because it is used for both the
    extension techniques and for conversion of MIME's transport safe.
    For this reason, "MIME-encoding" is used to identify content

Yamamoto                                                        [Page 2]


Internet-Draft             ISO-2022-JP CHARSET             November 1997

    transfer encoding of MIME and "charset"[CHARSET] is introduced to
    describe ISO-2022-JP.  "ISO-2022-JP encoding" means creation of
    ISO-2022-JP text by MIME composers while "ISO-2022-JP decoding"
    refers to interpretation of ISO-2022-JP text by MIME viewers.

    At the time of writing RFC 1468, one more revision for JIS X 0208,
    namely 1990[JISX0208-90], was available.  RFC 1468, however, does
    NOT suggest using the revision identification sequence "ESC & @"
    even if two characters added in JIS X 0208 1990 are used.

    This choice was practical from the empirical point of view of JUNET
    code but it was a departure from the ISO 2022 platform.  Strictly
    speaking, ISO-2022-JP does NOT conform to ISO 2022 though the name
    is associated with ISO 2022.

    The purposes of this memo are mainly to clarify ISO-2022-JP encoding
    procedure and to fix a bug of ABNF[ABNF] in RFC 1468, which failed
    to express null lines.  To avoid confusion, no new escape sequence
    for JIS X 0208 1990 nor for JIS X 0212 1990[JISX0212-90] is
    introduced.  Such efforts should be elaborated in other rich
    charsets.

    NOTE: One more revision for JIS X 0208, namely 1997, is available
    [JISX0208-97].  Since it is not a new graphic character set, most
    parts of this memo do NOT explicitly talk about it.  Likewise, JIS X
    0201 1997[JISX0201-97] does NOT appear in the main body of this
    memo.  For more information, see Historical Note.

2. Standard Keywords

    This memo uses terms which are in capital letters.  When the terms
    "MUST", "SHOULD", "MUST NOT", "SHOULD NOT", "MAY", and "RECOMMENDED"
    appear capitalized, they are being used to indicate particular
    requirements, whose definitions are found in [KEYWORDS].

3. Description

    For historical reasons, ISO-2022-JP can contain four graphic
    character sets, ASCII, JIS X 0201 1976 Latin set, JIS X 0208 1978
    and JIS X 0208 1983 (including 1990).  A charset should start with
    and should end in ASCII since many message systems assume so.
    Moreover, each line should be self-composite so that lines can be
    safely split.  ISO-2022-JP was designed to satisfy these
    requirements.  Its definition is as follows:

    ISO-2022-JP text MUST start with ASCII and MUST end with ASCII.
    Each line of ISO-2022-JP text MUST end with ASCII or JIS X 0201 1976
    (i.e. before CRLF).  The next line starts in the graphic character
    set that was switched to before the end of the previous line.

    ISO-2022-JP is NOT completely self-composite because a starting
    graphic character set of a line is not known if the previous line is
    missing.  This is one of the reasons to recommend ASCII, instead of
    JIS X 0201 1976, in ISO-2022-JP.

Yamamoto                                                        [Page 3]


Internet-Draft             ISO-2022-JP CHARSET             November 1997


    An escape sequence is used to specify a start boundary of a graphic
    character set different form the preceding graphic character set.
    The following table gives the escape sequences and the graphic
    character sets used in ISO-2022-JP text.  The "ISOREG" is the
    registration number in ISO's registry[ISOREG].

       ESC Seq    Graphic Character Set          ISOREG

       ESC ( B    ASCII                             6
       ESC ( J    JIS X 0201 1976 (Latin set)      14
       ESC $ @    JIS X 0208 1978                  42
       ESC $ B    JIS X 0208 1983                  87

    For example, the escape sequence "ESC $ B" indicates that the bytes
    following this escape sequence are some characters of JIS X 0208
    1983.  To switch back to ASCII, the escape sequence "ESC ( B" is
    used.

    If Japanese text represented within the four graphic character sets
    is transferred by SMTP[SMTP] or NNTP[NNTP], it is STRONGLY
    RECOMMENDED to be in ISO-2022-JP form.

    Not all implementations distinguish JIS X 0208 1978 and 1983.
    Moreover, if ISO-2022-JP text including both ASCII and JIS X 0201
    1976 AND/OR ISO-2022-JP text including both JIS X 0208 1978 and JIS
    X 0208 1983 is converted to EUC-JP or Shift_JIS, original
    ISO-2022-JP text could not be re-produced.

    So, to implement best interoperability, message composers SHOULD use
    ASCII and JIS X 0208 1983 ONLY.  However, message composers MAY use
    JIS X 0201 1976 to embed YEN SIGN and OVER LINE.  Message viewers
    MUST accept all four graphic character sets.  Please note that the
    best strategy for interoperability is TO BE LIBERAL IN WHAT YOU
    RECEIVE AND BE CONSERVATIVE IN WHAT YOU SEND.

    JIS X 0201 Kana set MUST NOT be used in ISO-2022-JP text.  It may
    erroneously appear in ISO-2022-JP text with the following sequences:

        <8-bit Kana set char>
        SO <7-bit Kana set char> SI
        SO <8-bit Kana set char> SI
        ESC ( I <7-bit Kana set char> ESC ( B
        ESC ) I <8-bit Kana set char>

    JIS X 0212 1990 MUST NOT be embedded in ISO-2022-JP text.  It may
    erroneously appear following "ESC $ ( D" in ISO-2022-JP text.  To
    embed JIS X 0212 1990, other charsets such as ISO-2022-JP-2
    [ISO-2022-JP-2] SHOULD be used.

    IMPLEMENTATION NOTE: Even if a message composer does not provide any
    input method for graphic character sets which is not used in
    ISO-2022-JP, such as JIS X 0201 Kana set and JIS X 0212 1990, there
    are ways to include them by accident.  Typical cases are citing and

Yamamoto                                                        [Page 4]


Internet-Draft             ISO-2022-JP CHARSET             November 1997

    forwarding a message including JIS X 0201 Kana set and/or JIS X 0212
    1990 (but labeled as ISO-2022-JP).

    IMPLEMENTATION NOTE ON CITATION: If JIS X 0201 Kana set is cited, it
    MUST be removed anyway.  One possible deletion is to convert the
    characters to corresponding characters in JIS X 0208 1983.  Please
    note that as of this writing there doesn't exist a charset to
    contain JIS X 0201 Kana set.  When JIS X 0212 1990 is cited, MIME
    composers MUST take one of the followings: (1) Remove all characters
    in JIS X 0212 1990 and specify ISO-2022-JP for the charset
    parameter.  (2) Specify other richer charset such as ISO-2022-JP-2.

    IMPLEMENTATION NOTE ON FORWARDING: Since this memo talks about
    ISO-2022-JP text only, forwarding an entire message is outside the
    scope.  But MIME composers are REQUIRED to forward valid messages in
    the context of MIME.  Mismatch between the charset parameter and its
    content body MUST be eliminated.  The situation would be mixed up
    when the invalid message was digitally signed.  But again,
    procedures to create valid messages are outside the scope of this
    memo.

4. Formal Syntax

    Two sets of formal syntax rules are defined in this memo.  One is
    ISO-2022-JP decoding syntax for message viewers and the other is
    ISO-2022-JP encoding syntax for message composers.  ISO-2022-JP
    decoding syntax is designed to be redundant to maintain backward
    compatibility with existing message composers while ISO-2022-JP
    encoding syntax is designed to be necessary and sufficient.  Readers
    are assumed to be familiar with the notational conventions defined
    in [ABNF] and [MAIL2].

4.1 Rules for decoding syntax

    ISO-2022-JP decoding syntax that embeds ASCII, JIS X 0201 1976, JIS
    X 0208 1978, JIS X 0208 1983 (including 1990) is defined as follows
    (see also Figure 1):

    iso-2022-jp-text-d = *( line-d CRLF ) line-d
        ; When text is terminated by CRLF, the last 'line-d' is empty.

    line-d = *single-byte-char-d
             *(*double-byte-segment-d single-byte-segment-d)

    single-byte-segment-d = single-byte-designator-d *single-byte-char-d

    single-byte-designator-d = %d27 %d40 ( %d66 / %d74 )
        ; ESC "(" ( "B" / "J" )

    single-byte-char-d = %d0-9 / %d11-12 / %d14-26 / %d28-127
        ; All bit combinations of ASCII code excluding
        ; LF(%d10), CR(%d13), and ESC(%d27).

    double-byte-segment-d = double-byte-designator-d *double-byte-char-d

Yamamoto                                                        [Page 5]


Internet-Draft             ISO-2022-JP CHARSET             November 1997


    double-byte-designator-d = %d27 %d36 ( %d64 / %d66 )
        ; ESC "$" ( "@" / "B" )

    double-byte-char-d = %d33-126.33-126
        ; All bit combinations of 94 x 94 graphic character set

    NOTE: Though RFC 1468[OLDSPEC] allows bare CR and bare LF to be used
    as single byte char, this memo explicitly prohibits use of those
    characters as bare characters.  This conforms to RFC 2045[MIME]
    section 2.7.  If a message viewer finds unsafe characters in
    'single-byte-char-d' such as NUL(%d0), LF(%d10), CR(%13), SO(%d14),
    SI(%d15), SS2(%d25), ESC(%d27), SS3(%d29), the viewer SHOULD ignore
    them.  If the message viewer finds invalid characters in
    'double-byte-char-d', the viewer SHOULD also ignore them.  For more
    information, refer to 'double-byte-char-e'.

                          +---------------------->
                          |         +----------> |
                          |    +--+ |     +--+ | |
             +----------> | +->|1S|->  +->|1C|->->-> exit
             |     +--+ | | |  +--+    |  +--+ | |
    line-d -->  +->|1C|->-> |          <-------+ v
                |  +--+ |   |       +----------> |
                <-------+   |  +--+ |     +--+ | |
                            +->|2S|->  +->|2C|->->
                            |  +--+    |  +--+ | |
                            |          <-------+ |
                            <--------------------+

                   Figure 1  'line-d' for decoding syntax

    NOTE: 1S, 1C, 2S, and 2C stand for 'single-byte-designator-d',
    'single-byte-char-d', 'double-byte-designator-d', and
    'double-byte-char-d', respectively.

4.2 Rules for encoding syntax

    To create ISO-2022-JP text or string, only characters in ASCII and
    JIS X 0208 1983 (including 1990) SHOULD be used.  But if YEN SIGN
    and/or OVER LINE of JIS X 0201 1976 are to be used, the following
    rules SHOULD be followed:

        (1) It is RECOMMENTED to use YEN SIGN(%d33.111) and OVER
        LINE(%d33.49) of JIS X 0208 1983 instead of YEN SIGN(%d92) and
        OVER LINE(%d126) of JIS X 0201 1976, respectively.

        (2) If sequences of JIS X 0201 1976 Latin set can be produced,
        YEN SIGN and/or OVER LINE of the set MAY be used preceded by the
        designator.

        (3) Otherwise, YEN SIGN and OVER LINE of JIS X 0201 1976 SHOULD
        be converted to REVERSE SOLIDUS and TILDE of ASCII,
        respectively.

Yamamoto                                                        [Page 6]


Internet-Draft             ISO-2022-JP CHARSET             November 1997


    IMPLEMENTATION NOTE: Display of YEN SIGN and OVER LINE of JIS X 0201
    1976 is highly dependent on the receiving system.  REVERSE SOLIDUS
    and TILDE of ASCII may be displayed.  And again, if ISO-2022-JP text
    including both ASCII and JIS X 0201 1976 is converted to EUC-JP or
    Shift_JIS, original ISO-2022-JP text could not be re-produced.  This
    would cause verification failure of digital signature.

    The following is ISO-2022-JP encoding syntax to accomplish rule (1)
    or (3) above (see also Figure 2)

    iso-2022-jp-text-e = *( line-e CRLF ) line-e

    line-e = *single-byte-char-e [*i-segment-e  f-segment-e]

    i-segment-e = double-byte-designator-e 1*double-byte-char-e
                  single-byte-designator-e 1*single-byte-char-e

    f-segment-e = double-byte-designator-e 1*double-byte-char-e
                  single-byte-designator-e *single-byte-char-e

    single-byte-designator-e = %d27 %d40 %d66
        ; ESC "(" "B"

    single-byte-char-e = %d7-9 / %d11-12 / %d32-126
        ; BEL(%d7), BS(%d8), HT(%d9), VT(%d11), FF(%d12), SPC(%32), and
        ; all graphic characters.

    double-byte-designator-e = %d27 %d36 %d66
        ; ESC "$" "B"

    double-byte-char-e = %d33.33-126 /
        %d34.(33-46 / 58-65 / 74-80 / 92-106 / 114-121 / 126) /
        %d35.(48-57 / 65-90 / 97-122) /
        %d36.33-115 /
        %d37.33-118 /
        %d38.(33-56 / 65-88) /
        %d39.(33-65 / 81-113) /
        %d40.33-64 /
        %d48-78.33-126 /
        %d79.33-83 /
        %d80-115.33-126 /
        %d116.33-38
        ; The valid bit combinations of JIS X 0208 1990.











Yamamoto                                                        [Page 7]


Internet-Draft             ISO-2022-JP CHARSET             November 1997



                          +-------------------------------------->
                          |                         +----------> |
                          |                         |     +--+ | |
             +----------> |                         |  +->|1C|->->-> exit
             |     +--+ | |    +--+    +--+    +--+ |  |  +--+ |
    line-e -->  +->|1C|->->->->|2S|->->|2C|->->|1S|->  <-------+
                |  +--+ |   |  +--+ |  +--+ |  +--+ |     +--+
                <-------+   |       <-------+       +-->->|1C|->->
                            |                          |  +--+ | |
                            |                          <-------+ |
                            <------------------------------------+

                   Figure 2  'line-e' for encoding syntax

    NOTE: 1S, 1C, 2S, and 2C stand for 'single-byte-designator-e',
    'single-byte-char-e', 'double-byte-designator-e', and
    'double-byte-char-e', respectively.

5. MIME Considerations

    This section discusses how to use ISO-2022-JP text or string
    (i.e. excluding CRLF) in the context of MIME.  The scope is limited
    to the case where messages are transferred by SMTP[SMTP] or
    NNTP[NNTP].  Other protocols are outside the scope of this memo.

5.1 The Charset Parameter

    To identify the charset described in this memo, "ISO-2022-JP" MUST
    be used for the charset parameter wherever it is allowed.  Please
    note that the charset parameter is case-insensitive.

    For the charset parameter, MIME composers SHOULD choose as small
    charset as possible.  For example, if a content body is ASCII,
    US-ASCII SHOULD be specified instead of ISO-2022-JP.  This rule
    improves interoperability with other MIME viewers which do not
    support various charsets.

    To the contrary, MIME viewers SHOULD be able to handle text labeled
    ISO-2022-JP even if its content is ASCII only.  Remember to "be
    liberal in what you receive and be conservative in what you send".

5.2 Content Transfer Encoding

    ISO-2022-JP text is already in 7-bit form and MIME-encoding, such as
    Base64 and Quoted-Printable, makes ISO-2022-JP text unreadable for
    non-MIME viewers.  For this reason, MIME composers SHOULD choose
    "7bit" (case-insensitive) and SHOULD NOT MIME-encode ISO-2022-JP
    text with Base64 nor Quoted-Printable.

    Note that "7bit" means that no MIME-encoding is applied and its
    syntax is in 7-bit form.  MIME defines that "7bit" MIME-encoding is
    assumed if content transfer encoding is not present.  So, this field

Yamamoto                                                        [Page 8]


Internet-Draft             ISO-2022-JP CHARSET             November 1997

    can be omitted.

    In certain situation, Base64 or Quoted-Printable MAY be used.
    Base64 is RECOMMENDED to MIME-encode ISO-2022-JP text rather than
    Quoted-Printable.  This is because Base64 encoding is considered
    more suitable for ISO-2022-JP text than Quoted-Printable encoding
    for the following reasons: (1) Quoted-Printable encoding is designed
    for text which is mostly ASCII but ISO-2022-JP text is NOT.  (2)
    Quoted-Printable encoding produces various results while Base64
    encoding creates a unique output if folding is ignored.  This means
    that it is easier to implement interoperability for Base64 encoding
    than for Quoted-Printable encoding.

    IMPLEMENTATION NOTE: The keyword "=" of Quoted-Printable encoding is
    used in JIS X 0208 (see 'double-byte-char-e').  If only the ASCII
    portion of ISO-2022-JP text is MIME-encoded with Quoted-Printable,
    it is likely that such text cannot be MIME-decoded to the original
    since the "=" characters in JIS X 0208 portion are considered to be
    the keyword of Quoted-Printable.  If ISO-2022-JP text needs to be
    MIME-encoded with Quoted-Printable, the entire text must be
    MIME-encoded.  Some old implementations made this mistake.

    MIME viewers MUST be able to handle ISO-2022-JP text even if it is
    encoded with Base64 or Quoted-Printable.  Also, MIME viewers SHOULD
    be able to handle ISO-2022-JP text even if its MIME-encoding is
    specified as "8bit".

5.3 Header Extensions

    In a header or in MIME content headers where `encoded-word's are
    allowed, Japanese string (i.e. excluding CRLF) which consists of the
    four graphic character sets MUST be in the form defined in RFC
    2047[MIME].  For "charset", "ISO-2022-JP" (case-insensitive) SHOULD
    be specified.  ISO-2022-JP string MUST end with ASCII (see 'line-e').

    Message composers SHOULD use "B" encoding for ISO-2022-JP string,
    however, message viewers MUST be able to handle both "B" and "Q"
    encoding.

    IMPLEMENTATION NOTE: It should be noted that these requirements
    above apply to any fields in MIME content headers.  For example, RFC
    2047 mechanism MUST be used to represent ISO-2022-JP string when
    embedded in Content-Description: content header in a part of a
    multipart.

    ISO-2022-JP string MUST NOT be embedded directly in a header nor in
    MIME content headers.  EUC-JP string and Shift_JIS string SHOULD NOT
    be inserted even if they are in the form defined in RFC 2047.

    The reason why ISO-2022-JP string and "B" encoding is recommended to
    embed Japanese string in header fields is that RFC 1468 defines so.
    For historical reasons, not many message viewers support EUC-JP,
    Shift_JIS, and "Q" encoding,


Yamamoto                                                        [Page 9]


Internet-Draft             ISO-2022-JP CHARSET             November 1997

5.4 MIME Parameter Extensions

    With RFC 2184[PARAMETER] extensions, charsets other than US-ASCII
    can be embedded in values of MIME parameters.  As of this writing,
    the Japanese community does NOT have enough experience for RFC 2184
    extensions.  Thus, this memo does NOT define any requirements to
    embed Japanese string in the MIME parameters.  However, Japanese
    string MUST NOT be directly embedded in MIME parameters

6. Requirements for Message Transfer Agents (MTA)

    Several MTAs automatically convert any message to a local charset.
    This means that it is very likely that a charset other than
    ISO-2022-JP is converted to the local charset by a routine which
    assumes that incoming data is ISO-2022-JP text.  MTAs SHOULD NOT
    convert charsets in general.  If MTAs need to convert charsets, they
    MUST do so according to the charset parameter of MIME messages.

    Actually, some MTAs convert "ESC ( B" into "ESC ( J" and "ESC $ B"
    into "ESC $ @" interchangeably.  Gateways SHOULD NOT convert in this
    way.  Such MTAs are the primary cause of verification failure in
    digital signature.

7. Historical Note

    The JUNET code was originally described in the JUNET User's Guide
    [JUNET].

    JIS X 0208, which was originally called JIS C 6226, was published in
    1978.  It was revised in 1983, 1990 and 1997.

    The revision in 1983 was made because the Japanese Government
    published the new Kanji graphic character set called "Joyo Kanji Hyo
    (Kanji List for Daily Use)" for use in daily life, in the public
    documents, in newspapers in 1981.  Some of the character shapes were
    modified or simplified.  To harmonize JIS X 0208 1978 with Joyo
    Kanji, pairs of characters were swapped in the code positions and
    several characters had their shapes amended.  Moreover, since new
    characters were added to the Kanji set called "Jinmei You Kanji
    Beppyo (Additional Kanji List for Use with People's Name)", four
    characters were added to and a lot of special characters were
    included in JIS X 0208 1983.  This graphic character set was treated
    as a new one and a new escape sequence "ESC $ B" was assigned.

    Since the Kanji set for People's names was extended, two more Kanji
    characters were added at the end of JIS X 0208 in 1990 and the
    result is called JIS X 0208 1990.  The revised graphic character set
    is indeed backward compatible with the previous set.  Leaving the
    designation sequence unchanged, only the revision identification
    sequence "ESC & @" was assigned.

    To be strict, the designation sequence of JIS X 0208 1990 is "ESC &
    @ ESC $ B".  However, ISO-2022-JP encoding uses "ESC $ B" for
    practical reasons even though the two additional Kanji characters

Yamamoto                                                       [Page 10]


Internet-Draft             ISO-2022-JP CHARSET             November 1997

    are used.

    The purpose of revision in 1997 is to clarify references of each
    character.  No new characters were introduced.  As a graphic
    character set, JIS X 0208 1997 is equivalent to JIS X 0208 1990.
    The designation sequence is the same.

    Therefore, among versions of JIS X 0208, the version of 1978 is of a
    different graphic character set while the versions of 1983, 1990 and
    1997 are regarded as a family.

    JIS X 0201, originally called JIS C 6220, was published in 1976.  It
    was revised in 1997 but no change was introduced.  They are thus the
    same graphic character set.

8. Security Considerations

    ISO-2022-JP is NOT believed to introduce any new security holes to
    the Internet.

References

    [ABNF] D. Crocker and Paul Overell, "Augmented BNF for Syntax
        Specifications: ABNF", Internet-Draft,
        draft-ietf-drums-abnf-05.txt, 1997.

    [ASCII] American National Standards Institute, "Coded character set
        -- 7-bit American national standard code for information
        interchange", ANSI X3.4-1986.

    [CHARSET] N. Freed and J. Postel, "IANA Charset Registration
        Procedures", Internet-Draft, draft-freed-charset-reg-04.txt,
        September 1997.

    [ISO-2022-JP-2] M. Ohta and K. Handa, "ISO-2022-JP-2: Multilingual
        Extension of ISO-2022-JP", RFC 1554, December 1993.

    [ISO2022] International Organization for Standardization (ISO),
        "Information technology -- Character code structure and
        extension techniques", International Standard, Ref. No. ISO
        2022-1991(E).

    [ISO646] International Organization for Standardization (ISO),
        "Information technology -- ISO 7-bit coded character set for
        information interchange", International Standard,
        Ref. No. ISO/IEC 646:1991(E).

    [ISOREG] International Organization for Standardization (ISO),
        "International Register of Coded Character Sets To Be Used With
        Escape Sequences".

    [JISX0201-76] Japanese Industrial Standards Committee, "Code for
        information interchange", JIS C 6220 1976(aka JIS X 0201 1976).


Yamamoto                                                       [Page 11]


Internet-Draft             ISO-2022-JP CHARSET             November 1997

    [JISX0201-97] Japanese Industrial Standards Committee, "7-bit and
        8-bit coded character sets for information interchange", JIS X
        0201 1997.

    [JISX0208-78] Japanese Industrial Standards Committee, "Code of the
        Japanese graphic character set for information interchange", JIS
        C 6226 1978(aka JIS X 0208 1978).

    [JISX0208-83] Japanese Industrial Standards Committee, "Code of the
        Japanese graphic character set for information interchange", JIS
        C 6226 1983(aka JIS X 0208 1983).

    [JISX0208-90] Japanese Industrial Standards Committee, "Code of the
        Japanese graphic character set for information interchange", JIS
        X 0208 1990.

    [JISX0208-97] Japanese Industrial Standards Committee, "7-bit and
        8-bit double byte coded KANJI sets for information interchange",
        JIS X 0208 1997

    [JISX0212-90] Japanese Industrial Standards Committee, "Code of the
        supplementary Japanese graphic character set for information
        interchange", JIS X0212 1990.

    [JUNET] JUNET Riyou No Tebiki Sakusei Iin Kai (JUNET User's Guide
        Drafting Committee), "JUNET Riyou No Tebiki (Dai Ippan)" ("JUNET
        User's Guide (First Edition)"), February 1988.

    [KEYWORDS] S. Bradner, "Key words for use in RFCs to Indicate
        Requirement Levels", RFC 2119, March 1997.

    [MAIL] D. Crocker, "Standard for the Format of ARPA Internet Text
        Messages", STD 11, RFC 822, August 1982.

    [MAIL2] P. Resnick, "Message Format Standard",
        draft-ietf-drums-msg-fmt-02.txt, June 1997.

    [MIME] The primary definition of MIME. "MIME Part 1: Format of
        Internet Message Bodies", RFC 2045; "MIME Part 2: Media Types",
        RFC 2046; "MIME Part 3: Message Header Extensions for Non-ASCII
        Text", RFC 2047; "MIME Part 4: Registration Procedures", RFC
        2048; "MIME Part 5: Conformance Criteria and Examples", RFC 2049;
        November 1996.

    [NETNEWS] M. Horton and R. Adams, "Standard for Interchange of
        USENET Messages", RFC 1036, December 1987.

    [NNTP] B. Kantor and P. Lapsley, "Network News Transfer Protocol",
        RFC 977, February 1986.

    [OLDSPEC] J. Murai, M. Crispin, and E. van der Poel, "Japanese
        Character Encoding for Internet Messages", RFC 1468, June 1993.

    [PARAMETER] N. Freed and K. Moore, "MIME Parameter Value and Encoded

Yamamoto                                                       [Page 12]


Internet-Draft             ISO-2022-JP CHARSET             November 1997

        Word Extensions: Character Sets, Languages, and Continuations",
        RFC 2184, August 1997.

    [SMTP] J. Postel, "Simple Mail Transfer Protocol", RFC 821, August
        1982.

Acknowledgements

    Many people contributed to RFC 1468.  The original authors of RFC
    1468 wished to thank in particular Akira Kato, Masahiro Sekiguchi
    and Ken'ichi Handa.

    The authors of this memo would acknowledge the original work of Erik
    M. van der Poel and Mark Crispin.  Our deep gratitude goes to
    Noritoshi Demizu, Kenichi Handa, Jun'ichiro Ito, Shuhei Kobayashi,
    Tomohiko Morioka, Erik M. van der Poel, Shigeya Suzuki, and Akira
    Tanaka (in alphabetical order) for their contribution to this memo.

Authors' Addresses

    Eiiti WADA
    Fujitsu Laboratories, Ltd
    4-1-1 Kamikodanaka, Nakahara-ku, Kawasaki 211-88 JAPAN

    Phone: +81-44-754-2608
    FAX:   +81-44-754-2580
    EMail: wada@u-tokyo.ac.jp

    Jun MURAI
    Keio University
    5322 Endo, Fujisawa 252 JAPAN

    Phone: +81-466-47-5111
    FAX:   +81-466-49-1101
    EMail: jun@wide.ad.jp

    Kazuhiko YAMAMOTO
    Graduate School of Information Science
    Nara Institute of Science and Technology(NAIST)
    8916-5 Takayama, Ikoma 630-01 JAPAN

    Phone: +81-743-72-5111
    FAX:   +81-743-72-5329
    EMail: Kazu@Mew.org











Yamamoto                                                       [Page 13]


Internet-Draft             ISO-2022-JP CHARSET             November 1997


Appendix

    An example of a MIME text object is as follows:

        Content-Type: text/plain; charset=iso-2022-jp
        Content-Transfer-Encoding: 7bit

        ISO-2022-JP text comes here.

    Another form of the example above is as follows:

        Content-Type: Text/Plain; Charset=ISO-2022-JP

        ISO-2022-JP text comes here.

    The following is an example of RFC 2047 header encoding

        To: june@wide.ad.jp
        Subject: =?iso-2022-jp?B?GyRCJDMkTkZ8S1w4bCQsRkkkYSRsJFAbKEI=?=
         OK
         =?iso-2022-jp?B?GyRCJEckOSEjGyhC?=
        From: Kazu Yamamoto (=?iso-2022-jp?B?GyRCOzNLXE9CSScbKEI=?=)
         <Kazu@Mew.org>































Yamamoto                                                       [Page 14]

Document	Document type	This is an older version of an Internet-Draft whose latest revision state is "Expired". Expired & archived
	Select version	00 01 02
	Compare versions
	Author
	RFC stream	(None)
	Other formats	txt pdf bibtex bibxml