Network Working Group                                          K. Tamaru
Request for Comments: 2237                         Microsoft Corporation
Category: Informational                                    November 1997

           Japanese Character Encoding for Internet Messages

Status of this Memo

   This memo provides information for the Internet community.  It does
   not specify an Internet standard of any kind.  Distribution of this
   memo is unlimited.

Copyright Notice

   Copyright (C) The Internet Society (1997).  All Rights Reserved.

1. Abstract

   This memo defines an encoding scheme for the Japanese Characters,
   describes "ISO-2022-JP-1", which is used in electronic mail [RFC-
   822], and network news [RFC 1036]. Also this memo provides a listing
   of the Japanese Character Set that can be used in this encoding

2. Requirements Notation

   This document uses terms that appear in capital letters to indicate
   particular requirements of this specification. Those terms are
   "MUST", "SHOULD", "MUST NOT", "SHOULD NOT", and "MAY". The meaning of
   each term are found in [RFC-2119]

3. Introduction

   RFC 1468 defines the way Japanese Characters are encoded, likewise
   what this memo defines. It defines the use of JIS X 0208 as the
   double-byte character set in ISO-2022-JP text.

   Today, many operating systems support proprietary extended Japanese
   characters or JIS X 0212, This includes the Unicode character set,
   which does not conform to JIS X 0201 nor JIS X 0208. Therefore, this
   limits the ability to communicate and correspond precise information
   because of the limited availability of Kanji characters. Fortunately
   JIS (Japanese Industry Standard) defines JIS X 0212 as "code of the

RFC 2237              Japanese Character Encoding          November 1997

   supplementary Japanese graphic character set for information
   interchange". Most Japanese characters which are used in regular
   electronic mail in most cases can be accommodated in JIS X 0201, JIS
   X 0208 and JIS X 0212.

   Also it is recognized that there is a tendency to use Unicode,
   however, Unicode is not yet widely used and there is a certain
   limitation with old electronic mail system. Furthermore, the purpose
   of this comment is to add the capability of writing out JIS X 0212.

   This comment does not describe any representation of iso-2022-jp-1
   version information in addition to JIS X 0212 support.

4. Description

   In "ISO-2022-JP-1" text, the initial character code of the message is
   in ASCII. The "double-byte-seq"(see "Format Syntax" section) (ESC "$"
   "B" / ESC "$" "@" / ESC "$" "(" "D") is the only designator that
   indicates that the following character is double-byte, and it is
   valid until another escape sequence appears.  It is very discouraged
   to use (ESC "$" "@") for double byte character encoding, new
   implementation SHOULD use only (ESC "$" "B") for double byte encoding

   The end of "ISO-2022-JP-1" text MUST be in ASCII. Also it is strongly
   recommended to back up to the ASCII at the end of each line rather
   than JIS X 0201-Roman if there is any none ASCII character in middle
   of a line.

   Since "ISO-2022-JP-1" is designed to add the capability of writing
   out JIS X 0212, if the message does not contain none of JIS X 0212
   characters. "ISO-2022-JP" text MUST BE used.

   JIS X 0201-Roman is not identical to the ASCII with two different

   The following list are the escape sequences and character sets that
   can be used in "ISO-2022-JP-1" text. The registered number in the ISO
   2375 Register which allow double-byte ideographic scripts to be
   encoded within ISO/IEC 2022 code structure is indicated as reg#

   reg# character set     ESC sequence                  designated to
   6    ASCII             ESC 2/8 4/2                   ESC ( B    G0
   42   JIS X 0208-1978   ESC 2/4 4/0                   ESC $ @    G0
   87   JIS X 0208-1983   ESC 2/4 4/2                   ESC $ B    G0
   14   JIS X 0201-Roman  ESC 2/8 4/10                  ESC ( J    G0
   159  JIS X 0212-1990   ESC 2/4 2/8 4/4               ESC $ ( D  G0

   Other restrictions are given in the Formal Syntax below.

5. Formal Syntax

   The notational conventions used here are identical to those used in
   STD 11, RFC 822 [RFC822].

   The * (asterisk) convention is as follows:
          l*m something
   meaning at least l and at most m something, with l and m taking
   default values of 0 and infinity, respectively.

   iso-2022-jp-1-text  = *( line CRLF ) [line]

   line                = (*single-byte-char *segment
                        single-byte-seq *single-byte-char) /
