Japanese Character Encoding for Internet Messages
Network Working Group                                           J. Murai
Request for Comments: 1468                               Keio University
                                                              M. Crispin
                                                       Panda Programming
                                                         E. van der Poel
                                                               June 1993

           Japanese Character Encoding for Internet Messages

Status of this Memo

   This memo provides information for the Internet community.  It does
   not specify an Internet standard.  Distribution of this memo is


   This document describes the encoding used in electronic mail [RFC822]
   and network news [RFC1036] messages in several Japanese networks. It
   was first specified by and used in JUNET [JUNET]. The encoding is now
   also widely used in Japanese IP communities.

   The name given to this encoding is "ISO-2022-JP", which is intended
   to be used in the "charset" parameter field of MIME headers (see
   [MIME1] and [MIME2]).


   The text starts in ASCII [ASCII], and switches to Japanese characters
   through an escape sequence. For example, the escape sequence ESC $ B
   (three bytes, hexadecimal values: 1B 24 42) indicates that the bytes
   following this escape sequence are Japanese characters, which are
   encoded in two bytes each.  To switch back to ASCII, the escape
   sequence ESC ( B is used.

   The following table gives the escape sequences and the character sets
   used in ISO-2022-JP messages. The ISOREG number is the registration
   number in ISO's registry [ISOREG].

       Esc Seq    Character Set                  ISOREG

       ESC ( B    ASCII                             6
       ESC ( J    JIS X 0201-1976 ("Roman" set)    14
       ESC $ @    JIS X 0208-1978                  42
       ESC $ B    JIS X 0208-1983                  87

   Note that JIS X 0208 was called JIS C 6226 until the name was changed

   on March 1st, 1987. Likewise, JIS C 6220 was renamed JIS X 0201.

   The "Roman" character set of JIS X 0201 [JISX0201] is identical to
   ASCII except for backslash () and tilde (~). The backslash is
   replaced by the Yen sign, and the tilde is replaced by overline. This
   set is Japan's national variant of ISO 646 [ISO646].

   The JIS X 0208 [JISX0208] character sets consist of Kanji, Hiragana,
   Katakana and some other symbols and characters. Each character takes
   up two bytes.

   For further details about the JIS Japanese national character set
   standards, refer to [JISX0201] and [JISX0208].  For further
   information about the escape sequences, see [ISO2022] and [ISOREG].

   If there are JIS X 0208 characters on a line, there must be a switch
   to ASCII or to the "Roman" set of JIS X 0201 before the end of the
   line (i.e., before the CRLF). This means that the next line starts in
   the character set that was switched to before the end of the previous

   Also, the text must end in ASCII.

   Other restrictions are given in the Formal Syntax below.

Formal Syntax

   The notational conventions used here are identical to those used in
   RFC 822 [RFC822].

   The * (asterisk) convention is as follows:

       l*m something

   meaning at least l and at most m somethings, with l and m taking
   default values of 0 and infinity, respectively.

   message             = headers 1*( CRLF *single-byte-char *segment
                         single-byte-seq *single-byte-char )
                                           ; see also [MIME1] "body-part"
                                           ; note: must end in ASCII

   headers             = <see [RFC822] "fields" and [MIME1] "body-part">

   segment             = single-byte-segment / double-byte-segment

   single-byte-segment = single-byte-seq 1*single-byte-char

   double-byte-segment = double-byte-seq 1*( one-of-94 one-of-94 )

   single-byte-seq     = ESC "(" ( "B" / "J" )

   double-byte-seq     = ESC "$" ( "@" / "B" )

   CRLF                = CR LF

                                                    ; ( Octal, Decimal.)

   ESC                 = <ISO 2022 ESC, escape>     ; (    33,      27.)

   SI                  = <ISO 2022 SI, shift-in>    ; (    17,      15.)

   SO                  = <ISO 2022 SO, shift-out>   ; (    16,      14.)
