Network Working Group Jun Murai
Internet Draft Mark Crispin
Erik van der Poel
25th August 1992
Japanese Character Encoding for Internet Messages
Status of this Memo
This document is an Internet Draft. Internet Drafts are working
documents of the Internet Engineering Task Force (IETF), its Areas,
and its Working Groups. Note that other groups may also distribute
working documents as Internet Drafts.
Internet Drafts are draft documents valid for a maximum of six
months. Internet Drafts may be updated, replaced, or obsoleted by
other documents at any time. It is not appropriate to use Internet
Drafts as reference material or to cite them other than as a "working
draft" or "work in progress."
Please check the I-D abstract listing contained in each Internet
Draft directory to learn the current status of this or any other
Internet Draft.
This draft document will be submitted to the RFC editor as an
informational document. This document will expire before 2nd March
1993. Distribution of this memo is unlimited. Please send comments
to ietf-822@dimacs.rutgers.edu.
Introduction
This document describes the encoding used in plain text electronic
mail and network news in several Japanese networks. It was first
specified by and used in JUNET [JUNET]. The encoding is now also
widely used in Japanese IP communities.
This document provides a name for the encoding which is intended to
be used in the "charset" parameter field of MIME [MIME] messages.
This document only describes the encoding of plain text. The encoding
of other subtypes of text, such as rich text, is not discussed here.
Murai et al Expires 2nd March 1993 [Page 1]
Internet Draft Updated 25th August 1992
Informal Description
The message body starts in ASCII, and switches to Japanese characters
through an escape sequence. For example, the escape sequence ESC $ B
(three bytes) indicates that the bytes following this escape sequence
are Japanese characters, which are encoded in two bytes each. To
switch back to ASCII, the escape sequence ESC ( B is used.
The following table gives the escape sequences and the character sets
used in JUNET messages.
ESC ( B ASCII
ESC ( J JIS X 0201-1976 (left-hand part)
ESC $ @ JIS X 0208-1978
ESC $ B JIS X 0208-1983
The left-hand part of JIS X 0201-1976 is identical to ASCII except
for backslash (\) and tilde (~). The backslash is replaced by the Yen
sign, and the tilde is replaced by macron (overline). This set is
Japan's national variant of ISO 646.
The JIS X 0208 character sets consist of Kanji, Hiragana, Katakana
and some other symbols and characters. Each character takes up two
bytes.
For further details about the JIS Japanese national character set
standards, refer to the JIS standards themselves. For further
information about the escape sequences, see ISO 2022 [ISO2022].
If there are JIS X 0208 characters on a line, there must be a switch
to ASCII or to the left-hand part of JIS X 0201 before the end of the
line (i.e. before the CRLF). This means that the next line starts in
the character set that was switched to before the end of the previous
line. Other restrictions are given in the Formal Description below.
Formal Description
This section provides a formal description of the JUNET encoding. In
the event that this description is not consistent with the above
informal description, this formal description shall take precedence.
The notational conventions used here are identical to those used in
RFC 822 [RFC822].
The * (asterisk) convention is as follows:
l*m something
Murai et al Expires 2nd March 1993 [Page 2]
Internet Draft Updated 25th August 1992
meaning at least l and at most m somethings, with l and m taking
default values of 0 and infinity, respectively.
line = *text *1( *segment single-byte-seq *text ) CRLF
segment = single-byte-segment / double-byte-segment
single-byte-segment = single-byte-seq 1*text
double-byte-segment = double-byte-seq 1*( one-of-94 one-of-94 )
single-byte-seq = ESC "(" ( "B" / "J" )
double-byte-seq = ESC "$" ( "@" / "B" )
; ( Octal, Decimal.)
ESC = <ISO 2022 ESC, escape> ; ( 33, 27.)
one-of-94 = <any char in 94-char set> ; (41-176, 33.-126.)
CHAR = <any ASCII character> ; ( 0-177, 0.-127.)
text = <any CHAR, including bare
CR & bare LF, but NOT
including CRLF>
Additional restrictions that are difficult to describe in the above
are as follows.
Adjacent segments should have different escape sequences. For
example, the following is not recommended:
ESC $ B .... ESC $ B ....
MIME Considerations
The name given to the JUNET character encoding is "ISO-2022-JP". This
name is intended to be used in MIME messages as follows:
Content-Type: text/plain; charset=iso-2022-jp
The JUNET encoding is already in 7-bit form, so it is not necessary
to use a Content-Transfer-Encoding header. It should be noted that
applying the Base64 or Quoted-Printable encoding will render the
Murai et al Expires 2nd March 1993 [Page 3]
Internet Draft Updated 25th August 1992
message unreadable in current JUNET software.
Background Information
The JUNET encoding was described in the JUNET User's Guide [JUNET]
(JUNET Riyou No Tebiki Dai Ippan).
The encoding is based on the particular usage of ISO 2022 [ISO2022]
announced by 4/1. However, the escape sequence normally used for this
announcement is not included in JUNET messages.
References
[ISO2022] International Organization for Standardization (ISO),
"Information processing -- ISO 7-bit and 8-bit coded character sets
-- Code extension techniques", International Standard, 1986, Ref. No.
ISO 2022-1986 (E)
[JUNET] JUNET Riyou No Tebiki Sakusei Iin Kai (JUNET User's Guide
Drafting Committee), "JUNET Riyou No Tebiki (Dai Ippan)" ("JUNET
User's Guide (First Edition)"), February 1988
[MIME] Nathaniel Borenstein and Ned Freed, "MIME (Multipurpose
Internet Mail Extensions): Mechanisms for Specifying and Describing
the Format of Internet Message Bodies", Proposed (Internet) standard,
June 1992, rfc1341
[RFC822] David H. Crocker, "Standard for the Format of ARPA Internet
Text Messages", Internet standard, August 1982, rfc822
Security Considerations
Security considerations are not discussed in this memo.
Acknowledgements
Many people assisted in drafting this document. The authors wish to
thank in particular Akira Kato, Masahiro Sekiguchi and Ken'ichi
Handa.
Murai et al Expires 2nd March 1993 [Page 4]
Internet Draft Updated 25th August 1992
Authors' Addresses
Jun Murai
Keio University
5322 Endo, Fujisawa
Fujisawa 252 Japan
Fax: +81 (466) 49-1101
EMail: jun@wide.ad.jp
Mark Crispin
Panda Programming
6158 Lariat Loop NE
Bainbridge Island, WA 98110-2098
USA
Phone: +1 (206) 842-2385
EMail: MRC@PANDA.COM
Erik M. van der Poel
A-105 Park Avenue
4-4-10 Ohta, Kisarazu
Chiba 292 Japan
Phone: +81 (438) 22-5836
Fax: +81 (438) 22-5837
EMail: erik@poel.juice.or.jp
Murai et al Expires 2nd March 1993 [Page 5]