Korean Character Encoding for Internet Messages
RFC 1557

 
Document Type RFC - Informational (December 1993; No errata)
Last updated 2013-03-02
Stream Legacy
Formats plain text pdf html
Stream Legacy state (None)
Document shepherd No shepherd assigned
IESG IESG state RFC 1557 (Informational)
Telechat date
Responsible AD (None)
Send notices to (None)
Network Working Group                                            U. Choi
Request for Comments: 1557                                       K. Chon
Category: Informational                                            KAIST
                                                                 H. Park
                                                     Solvit Chosun Media
                                                           December 1993

            Korean Character Encoding for Internet Messages

Status of this Memo

   This memo provides information for the Internet community.  This memo
   does not specify an Internet standard of any kind.  Distribution of
   this memo is unlimited.

Introduction

   This document describes the encoding method being used to represent
   Korean characters in both header and body part of the Internet mail
   messages [RFC822].  This encoding method was specified in 1991, and
   has since then been used.  It has now widely being used in Korean IP
   networks.

   This document also describes the name of the encoding method which is
   to be used in order to match the message header and body format of
   MIME [MIME1, MIME2].

   This document describes only the encoding method for plain text.
   Other text subtypes, rich text and similar forms of text, are beyond
   the scope of this document.

Description

   It is assumed that the starting code of the message is ASCII.  ASCII
   and Korean characters can be distinguished by use of the shift
   function.  For example, the code SO will alert us that the upcoming
   bytes will be a Korean character as defined in KSC 5601.  To return
   to ASCII the SI code is used.

   Therefore, the escape sequence, shift function and character set used
   in a message are as follows:

           SO           KSC 5601
           SI           ASCII
           ESC $ ) C    Appears once in the beginning of a line
                            before any appearance of SO characters.

Choi, Chon & Park                                               [Page 1]
RFC 1557               Korean Character Encoding           December 1993

   The KSC 5601 [KSC5601] character set that includes Hangul, Hanja
   (Chinese ideographic characters), graphic and foreign characters,
   etc., is two bytes long for each character.

   For more information about Korean character sets please refer to the
   KSC 5601-1987 document.  Also, for more detailed information about
   the escape sequence and the shift function you can look for the ISO
   2022 [ISO2022] document.

Formal Syntax

   Where this document in its formal syntax does not agree with the
   description part, priority should be given to the formal syntax of
   the document.

   The notations used in this section of the document are according to
   those used in STD 11, RFC 822 [RFC822] with the same meaning.

        * (asterisk) has the following meaning :
             l*m "anything"

   The above means that "anything" has to be used at least l times and
   at most m times.  Default values for l and m are 0 and infinitive,
   respectively.

   body            = *e-line *1( designator *( e-line / h-line ))

   designator      = ESC "$" ")" "C"

   e-line          = *text CRLF

   h-line          = *text 1*( segment *text ) CRLF

   segment         = SO 1*(one-of-94 one-of-94 SI

                                               ; ( Octal, Decimal.)

   ESC             = <ISO 2022 ESC, escape>    ; ( 33, 27.)

   SO              = <ASCII SO, shift out>     ; ( 16, 14.)

   SI              = <ASCII SI, shift in>      ; ( 17, 15.)

   SP              = <ASCII SP, space>         ; ( 40, 32.)

Choi, Chon & Park                                               [Page 2]
RFC 1557               Korean Character Encoding           December 1993

   one-of-94       = <any char in 94-char set> ; (41-176, 33.-126.)

   CHAR            = <any ASCII character>     ; ( 0-177, 0.-127.)

   text            = <any CHAR, including bare CR & bare LF, but NOT
                      including CRLF, and not including ESC, SI, SO>

MIME and RFC 1522 Considerations

   The name to be used for the Hangul encoding scheme in the contents is
   "ISO-2022-KR".  This name when used in MIME message form would be:

                Content-Type: text/plain; charset=iso-2022-kr

   Since the Hangul encoding is done with 7 bit format in nature, the
   Content-Transfer-Encoding-header does not need to be used. However,
   while using the Hangul encoding, current Hangul message softwares
   does not support Base64 or Quoted-Printable encoding applied on
   already encoded Hangul messages.

   The Hangul encoded in the header part of the message is Korean EUC
   [EUC-KR].  In the EUC-KR encoding, the bytes with 8th bit set will be
Show full document text