Tags for the Identification of Languages
RFC 1766

Document Type RFC - Proposed Standard (March 1995; No errata)
Obsoleted by RFC 3066, RFC 3282
Last updated 2013-03-02
Stream IETF
Formats plain text pdf html
Stream WG state (None)
Document shepherd No shepherd assigned
IESG IESG state RFC 1766 (Proposed Standard)
Telechat date
Responsible AD (None)
Send notices to (None)
Network Working Group                                      H. Alvestrand
Request for Comments: 1766                                       UNINETT
Category: Standards Track                                     March 1995

                Tags for the Identification of Languages

Status of this Memo

   This document specifies an Internet standards track protocol for the
   Internet community, and requests discussion and suggestions for
   improvements.  Please refer to the current edition of the "Internet
   Official Protocol Standards" (STD 1) for the standardization state
   and status of this protocol.  Distribution of this memo is unlimited.


   This document describes a language tag for use in cases where it is
   desired to indicate the language used in an information object.

   It also defines a Content-language: header, for use in the case where
   one desires to indicate the language of something that has RFC-822-
   like headers, like MIME body parts or Web documents, and a new
   parameter to the Multipart/Alternative type, to aid in the usage of
   the Content-Language: header.

1.  Introduction

   There are a number of languages spoken by human beings in this world.

   A great number of these people would prefer to have information
   presented in a language that they understand.

   In some contexts, it is possible to have information in more than one
   language, or it might be possible to provide tools for assisting in
   the understanding of a language (like dictionaries).

   A prerequisite for any such function is a means of labelling the
   information content with an identifier for the language in which is
   is written.

   In the tradition of solving only problems that we think we
   understand, this document specifies an identifier mechanism, and one
   possible use for it.

Alvestrand                                                      [Page 1]
RFC 1766                      Language Tag                    March 1995

2.  The Language tag

   The language tag is composed of 1 or more parts: A primary language
   tag and a (possibly empty) series of subtags.

   The syntax of this tag in RFC-822 EBNF is:

    Language-Tag = Primary-tag *( "-" Subtag )
    Primary-tag = 1*8ALPHA
    Subtag = 1*8ALPHA

   Whitespace is not allowed within the tag.

   All tags are to be treated as case insensitive; there exist
   conventions for capitalization of some of them, but these should not
   be taken to carry meaning.

   The namespace of language tags is administered by the IANA according
   to the rules in section 5 of this document.

   The following registrations are predefined:

   In the primary language tag:

    -    All 2-letter tags are interpreted according to ISO standard
         639, "Code for the representation of names of languages" [ISO

    -    The value "i" is reserved for IANA-defined registrations

    -    The value "x" is reserved for private use. Subtags of "x"
         will not be registered by the IANA.

    -    Other values cannot be assigned except by updating this

   The reason for reserving all other tags is to be open towards new
   revisions of ISO 639; the use of "i" and "x" is the minimum we can do
   here to be able to extend the mechanism to meet our requirements.

   In the first subtag:

    -    All 2-letter codes are interpreted as ISO 3166 alpha-2
         country codes denoting the area in which the language is

    -    Codes of 3 to 8 letters may be registered with the IANA by
         anyone who feels a need for it, according to the rules in

Alvestrand                                                      [Page 2]
RFC 1766                      Language Tag                    March 1995

         chapter 5 of this document.

   The information in the subtag may for instance be:

    -    Country identification, such as en-US (this usage is
         described in ISO 639)

    -    Dialect or variant information, such as no-nynorsk or en-

    -    Languages not listed in ISO 639 that are not variants of
         any listed language, which can be registered with the i-
         prefix, such as i-cherokee

    -    Script variations, such as az-arabic and az-cyrillic

   In the second and subsequent subtag, any value can be registered.

   NOTE: The ISO 639/ISO 3166 convention is that language names are
   written in lower case, while country codes are written in upper case.
   This convention is recommended, but not enforced; the tags are case

   NOTE: ISO 639 defines a registration authority for additions to and
   changes in the list of languages in ISO 639. This authority is:
Show full document text