Chinese Character Encoding for Internet Messages
RFC 1922

Document Type RFC - Informational (March 1996; No errata)
Last updated 2013-03-02
Stream Legacy
Formats plain text pdf html bibtex
Stream Legacy state (None)
Consensus Boilerplate Unknown
RFC Editor Note (None)
IESG IESG state RFC 1922 (Informational)
Telechat date
Responsible AD (None)
Send notices to (None)
Network Working Group                                            HF. Zhu
Request for Comments: 1922                                    Tsinghua U
Category: Informational                                           DY. Hu
                                                              Tsinghua U
                                                                ZG. Wang
                                                                    CITS
                                                                 TC. Kao
                                                                     III
                                                              WCH. Chang
                                                                     III
                                                              M. Crispin
                                                            U Washington
                                                              March 1996

            Chinese Character Encoding for Internet Messages

Status of this Memo

   This memo provides information for the Internet community.  It does
   not specify an Internet standard.  Distribution of this memo is
   unlimited.

Abstract

   This memo describes methods of transporting Chinese characters in
   Internet services which transport text, such as electronic mail
   [RFC-822], network news [RFC-1036], telnet [RFC-854] and the World
   Wide Web [RFC-1866].

Introduction

   As the use of Internet covers more and more Chinese people in the
   world, the need has increased for the ability to send documents
   containing Chinese characters on the Internet.  The methods described
   in this document provide means of transporting existing Chinese
   character sets as well as leaving space for future extension.

   This document describes two encodings, ISO-2022-CN and
   ISO-2022-CN-EXT.  These are designed with interoperability in mind
   and are encouraged in this document for current Chinese interchange;
   they are 7-bit, support both simplified and traditional characters
   using both GB and CNS/Big5, and do not impose any unusual quoting
   requirements on ASCII characters.

   As important related issues, this document gives detailed
   descriptions of the two encodings CN-GB and CN-Big5, and a brief
   description of ISO/IEC 10646 [ISO-10646].  CN-GB and CN-Big5 are

Zhu, et al                   Informational                      [Page 1]
RFC 1922               Chinese Character Encoding             March 1996

   currently used as the internal codes for Chinese documents.
   ISO-10646 is the universal multi-octet character set defined by ISO;
   we feel that in the future it may become the preferred technology for
   Chinese documents and electronic mail when it is widely available.

Specification

1.    7-bit Chinese encodings: ISO-2022-CN and ISO-2022-CN-EXT

1.1.  Description

   ISO-2022-CN is based on ISO 2022 [ISO-2022], similar to earlier work
   on ISO-2022-JP [RFC-1468] and ISO-2022-KR [RFC-1557] for the Japanese
   and Korean languages respectively.  It is 7-bit, and supports both
   simplified Chinese characters using GB 2312-80 [GB-2312] and
   traditional Chinese characters using the first two planes of CNS
   11643 [CNS-11643], as well as ASCII [ASCII] characters.

   ISO-2022-CN-EXT is a superset of ISO-2022-CN that additionally
   supports other GB character sets and planes of CNS 11643.

   Since ISO-2022-CN and ISO-2022-CN-EXT are 7-bit encodings, they do
   not require the 8-bit SMTP extensions.  ISO-2022-CN supports all the
   Chinese characters that appear in Big5 [BIG5].

1.2.  ISO-2022-CN

   The starting code of ISO-2022-CN is ASCII.  ASCII and Chinese
   characters are distinguished by designations (ESC sequences) and
   shift functions.

   Designations define the Chinese character sets used in the text.
   There are three kinds of designations: SOdesignation, SS2designation
   and SS3designation.

   The SOdesignation is in the form ESC $ ) <F>, where <F> is the "final
   character" assigned to the character set by ISO (refer to the ISO
   registry [ISOREG] for more details).  The SS2designation is in the
   form ESC $ * <F>, and the SS3designation is in the form ESC $ + <F>.
   A designation overrides any previous designation for subsequent bytes
   in the text.

   There are four kinds of shifts: SI, SO, SS2 and SS3.  Shift functions
   specify how to interpret the subsequent bytes.

   The shift SI (one byte with hexadecimal value 0F) declares that
   subsequent bytes are interpreted in ASCII.

Zhu, et al                   Informational                      [Page 2]
RFC 1922               Chinese Character Encoding             March 1996

   The shift SO (one byte with hexadecimal value 0E) declares that
   subsequent bytes are interpreted in the character set defined by
   SOdesignation.

   The shift SS2 (two bytes with hexadecimal values 1B 4E) declares that
   the subsequent TWO bytes are interpreted in the character set defined
Show full document text