IANA Charset Registration Procedures
RFC 2278
Document | Type |
RFC - Informational
(January 1998; No errata)
Obsoleted by RFC 2978
Was draft-freed-charset-reg (individual)
|
|
---|---|---|---|
Authors | Jon Postel , Ned Freed | ||
Last updated | 2013-03-02 | ||
Stream | Legacy | ||
Formats | plain text html pdf htmlized bibtex | ||
Stream | Legacy state | (None) | |
Consensus Boilerplate | Unknown | ||
RFC Editor Note | (None) | ||
IESG | IESG state | RFC 2278 (Informational) | |
Telechat date | |||
Responsible AD | (None) | ||
Send notices to | (None) |
Network Working Group N. Freed Request for Comments: 2278 Innosoft BCP: 19 J. Postel Category: Best Current Practice ISI January 1998 IANA Charset Registration Procedures Status of this Memo This document specifies an Internet Best Current Practices for the Internet Community, and requests discussion and suggestions for improvements. Distribution of this memo is unlimited. Copyright Notice Copyright (C) The Internet Society (1998). All Rights Reserved. 1. Abstract MIME [RFC-2045, RFC-2046, RFC-2047, RFC-2184] and various other modern Internet protocols are capable of using many different charsets. This in turn means that the ability to label different charsets is essential. This registration procedure exists solely to associate a specific name or names with a given charset and to give an indication of whether or not a given charset can be used in MIME text objects. In particular, the general applicability and appropriateness of a given registered charset is a protocol issue, not a registration issue, and is not dealt with by this registration procedure. 2. Definitions and Notation The following sections define various terms used in this document. 2.1. Requirements Notation This document occasionally uses terms that appear in capital letters. When the terms "MUST", "SHOULD", "MUST NOT", "SHOULD NOT", and "MAY" appear capitalized, they are being used to indicate particular requirements of this specification. A discussion of the meanings of these terms appears in [RFC-2119]. Freed & Postel Best Current Practice [Page 1] RFC 2278 Charset Registration January 1998 2.2. Character A member of a set of elements used for the organisation, control, or representation of data. 2.3. Charset The term "charset" (see historical note below) is used here to refer to a method of converting a sequence of octets into a sequence of characters. This conversion may also optionally produce additional control information such as directionality indicators. Note that unconditional and unambiguous conversion in the other direction is not required, in that not all characters may be representable by a given charset and a charset may provide more than one sequence of octets to represent a particular sequence of characters. This definition is intended to allow charsets to be defined in a variety of different ways, from simple single-table mappings such as US-ASCII to complex table switching methods such as those that use ISO 2022's techniques, to be used as charsets. However, the definition associated with a charset name must fully specify the mapping to be performed. In particular, use of external profiling information to determine the exact mapping is not permitted. HISTORICAL NOTE: The term "character set" was originally used in MIME to describe such straightforward schemes as US-ASCII and ISO-8859-1 which consist of a small set of characters and a simple one-to-one mapping from single octets to single characters. Multi-octet character encoding schemes and switching techniques make the situation much more complex. As such, the definition of this term was revised to emphasize both the conversion aspect of the process, and the term itself has been changed to "charset" to emphasize that it is not, after all, just a set of characters. A discussion of these issues as well as specification of standard terminology for use in the IETF appears in RFC 2130. 2.4. Coded Character Set A Coded Character Set (CCS) is a mapping from a set of abstract characters to a set of integers. Examples of coded character sets are ISO 10646 [ISO-10646], US-ASCII [US-ASCII], and the ISO-8859 series [ISO-8859]. Freed & Postel Best Current Practice [Page 2] RFC 2278 Charset Registration January 1998 2.5. Character Encoding Scheme A Character Encoding Scheme (CES) is a mapping from a Coded Character Set or several coded character sets to a set of octets. A given CES is typically associated with a single CCS; for example, UTF-8 applies only to ISO 10646. 3. Registration Requirements Registered charsets are expected to conform to a number of requirements as described below. 3.1. Required Characteristics Registered charsets MUST conform to the definition of a "charset" given above. In addition, charsets intended for use in MIME content types under the "text" top-level type must conform to the restrictions on that type described in RFC 2045. All registeredShow full document text