IANA Charset Registration Procedures
RFC 2278

Document Type RFC - Informational (January 1998; No errata)
Obsoleted by RFC 2978
Was draft-freed-charset-reg (individual)
Last updated 2013-03-02
Stream Legacy
Formats plain text pdf html bibtex
Stream Legacy state (None)
Consensus Boilerplate Unknown
RFC Editor Note (None)
IESG IESG state RFC 2278 (Informational)
Telechat date
Responsible AD (None)
Send notices to (None)
Network Working Group                                           N. Freed
Request for Comments: 2278                                      Innosoft
BCP: 19                                                        J. Postel
Category: Best Current Practice                                      ISI
                                                            January 1998

                              IANA Charset
                        Registration Procedures

Status of this Memo

   This document specifies an Internet Best Current Practices for the
   Internet Community, and requests discussion and suggestions for
   improvements.  Distribution of this memo is unlimited.

Copyright Notice

   Copyright (C) The Internet Society (1998).  All Rights Reserved.

1.  Abstract

   MIME [RFC-2045, RFC-2046, RFC-2047, RFC-2184] and various other
   modern Internet protocols are capable of using many different
   charsets. This in turn means that the ability to label different
   charsets is essential. This registration procedure exists solely to
   associate a specific name or names with a given charset and to give
   an indication of whether or not a given charset can be used in MIME
   text objects. In particular, the general applicability and
   appropriateness of a given registered charset is a protocol issue,
   not a registration issue, and is not dealt with by this registration
   procedure.

2.  Definitions and Notation

   The following sections define various terms used in this document.

2.1.  Requirements Notation

   This document occasionally uses terms that appear in capital letters.
   When the terms "MUST", "SHOULD", "MUST NOT", "SHOULD NOT", and "MAY"
   appear capitalized, they are being used to indicate particular
   requirements of this specification. A discussion of the meanings of
   these terms appears in [RFC-2119].

Freed & Postel           Best Current Practice                  [Page 1]
RFC 2278                  Charset Registration              January 1998

2.2.  Character

   A member of a set of elements used for the organisation, control, or
   representation of data.

2.3.  Charset

   The term "charset" (see historical note below) is used here to refer
   to a method of converting a sequence of octets into a sequence of
   characters. This conversion may also optionally produce additional
   control information such as directionality indicators.

   Note that unconditional and unambiguous conversion in the other
   direction is not required, in that not all characters may be
   representable by a given charset and a charset may provide more than
   one sequence of octets to represent a particular sequence of
   characters.

   This definition is intended to allow charsets to be defined in a
   variety of different ways, from simple single-table mappings such as
   US-ASCII to complex table switching methods such as those that use
   ISO 2022's techniques, to be used as charsets.  However, the
   definition associated with a charset name must fully specify the
   mapping to be performed.  In particular, use of external profiling
   information to determine the exact mapping is not permitted.

   HISTORICAL NOTE: The term "character set" was originally used in MIME
   to describe such straightforward schemes as US-ASCII and ISO-8859-1
   which consist of a small set of characters and a simple one-to-one
   mapping from single octets to single characters. Multi-octet
   character encoding schemes and switching techniques make the
   situation much more complex. As such, the definition of this term was
   revised to emphasize both the conversion aspect of the process, and
   the term itself has been changed to "charset" to emphasize that it is
   not, after all, just a set of characters. A discussion of these
   issues as well as specification of standard terminology for use in
   the IETF appears in RFC 2130.

2.4.  Coded Character Set

   A Coded Character Set (CCS) is a mapping from a set of abstract
   characters to a set of integers. Examples of coded character sets are
   ISO 10646 [ISO-10646], US-ASCII [US-ASCII], and the ISO-8859 series
   [ISO-8859].

Freed & Postel           Best Current Practice                  [Page 2]
RFC 2278                  Charset Registration              January 1998

2.5.  Character Encoding Scheme

   A Character Encoding Scheme (CES) is a mapping from a Coded Character
   Set or several coded character sets to a set of octets. A given CES
   is typically associated with a single CCS; for example, UTF-8 applies
   only to ISO 10646.

3.  Registration Requirements

   Registered charsets are expected to conform to a number of
   requirements as described below.

3.1.  Required Characteristics

   Registered charsets MUST conform to the definition of a "charset"
   given above.  In addition, charsets intended for use in MIME content
   types under the "text" top-level type must conform to the
   restrictions on that type described in RFC 2045. All registered
Show full document text