Tags for the Identification of Languages
RFC 3066

 
Document Type RFC - Best Current Practice (January 2001; Errata)
Obsoleted by RFC 4646, RFC 4647
Obsoletes RFC 1766
Last updated 2013-03-02
Stream Legacy
Formats plain text pdf html
Stream Legacy state (None)
Document shepherd No shepherd assigned
IESG IESG state RFC 3066 (Best Current Practice)
Telechat date
Responsible AD (None)
Send notices to (None)

Email authors IPR References Referenced by Nits Search lists

Network Working Group                                      H. Alvestrand
Request for Comments: 3066                                 Cisco Systems
BCP: 47                                                     January 2001
Obsoletes: 1766
Category: Best Current Practice

                Tags for the Identification of Languages

Status of this Memo

   This document specifies an Internet Best Current Practices for the
   Internet Community, and requests discussion and suggestions for
   improvements.  Distribution of this memo is unlimited.

Copyright Notice

   Copyright (C) The Internet Society (2001).  All Rights Reserved.

Abstract

   This document describes a language tag for use in cases where it is
   desired to indicate the language used in an information object, how
   to register values for use in this language tag, and a construct for
   matching such language tags.

1. Introduction

   Human beings on our planet have, past and present, used a number of
   languages.  There are many reasons why one would want to identify the
   language used when presenting information.

   In some contexts, it is possible to have information available in
   more than one language, or it might be possible to provide tools
   (such as dictionaries) to assist in the understanding of a language.

   Also, many types of information processing require knowledge of the
   language in which information is expressed in order for that process
   to be performed on the information; for example spell-checking,
   computer-synthesized speech, Braille, or high-quality print
   renderings.

   One means of indicating the language used is by labeling the
   information content with an identifier for the language that is used
   in this information content.

Alvestrand               Best Current Practice                  [Page 1]
RFC 3066          Tags for Identification of Languages      January 2001

   This document specifies an identifier mechanism, a registration
   function for values to be used with that identifier mechanism, and a
   construct for matching against those values.

   The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in [RFC 2119].

2. The Language tag

2.1 Language tag syntax

   The language tag is composed of one or more parts: A primary language
   subtag and a (possibly empty) series of subsequent subtags.

   The syntax of this tag in ABNF [RFC 2234] is:

    Language-Tag = Primary-subtag *( "-" Subtag )

    Primary-subtag = 1*8ALPHA

    Subtag = 1*8(ALPHA / DIGIT)

   The productions ALPHA and DIGIT are imported from RFC 2234; they
   denote respectively the characters A to Z in upper or lower case and
   the digits from 0 to 9.  The character "-" is HYPHEN-MINUS (ABNF:
   %x2D).

   All tags are to be treated as case insensitive; there exist
   conventions for capitalization of some of them, but these should not
   be taken to carry meaning.  For instance, [ISO 3166] recommends that
   country codes are capitalized (MN Mongolia), while [ISO 639]
   recommends that language codes are written in lower case (mn
   Mongolian).

2.2 Language tag sources

   The namespace of language tags is administered by the Internet
   Assigned Numbers Authority (IANA) [RFC 2860] according to the rules
   in section 3 of this document.

   The following rules apply to the primary subtag:

   - All 2-letter subtags are interpreted according to assignments found
     in ISO standard 639, "Code for the representation of names of
     languages" [ISO 639], or assignments subsequently made by the ISO
     639 part 1 maintenance agency or governing standardization bodies.
     (Note: A revision is underway, and is expected to be released as

Alvestrand               Best Current Practice                  [Page 2]
RFC 3066          Tags for Identification of Languages      January 2001

     ISO 639-1:2000)

   - All 3-letter subtags are interpreted according to assignments found
     in ISO 639 part 2, "Codes for the representation of names of
     languages -- Part 2: Alpha-3 code [ISO 639-2]", or assignments
     subsequently made by the ISO 639 part 2 maintenance agency or
     governing standardization bodies.

   - The value "i" is reserved for IANA-defined registrations

   - The value "x" is reserved for private use.  Subtags of "x" shall
     not be registered by the IANA.

   - Other values shall not be assigned except by revision of this
     standard.
Show full document text