Linguistic Guidelines for the Use of the Arabic Language in Internet Domains
draft-farah-adntf-ling-guidelines-04
This document is an Internet-Draft (I-D) that has been submitted to the Independent Submission stream.
This I-D is not endorsed by the IETF and has no formal standing in the
IETF standards process.
The information below is for an old version of the document that is already published as an RFC.
| Document | Type |
This is an older version of an Internet-Draft that was ultimately published as RFC 5564.
|
|
|---|---|---|---|
| Authors | Abdulaziz H. Al-Zoman Ph.D. , Ibaa Oueichek , Ayman El-Sherbiny , Mansour Farah | ||
| Last updated | 2015-10-14 (Latest revision 2009-02-06) | ||
| RFC stream | Independent Submission | ||
| Intended RFC status | Informational | ||
| Formats | |||
| Stream | ISE state | (None) | |
| Consensus boilerplate | Unknown | ||
| Document shepherd | (None) | ||
| IESG | IESG state | Became RFC 5564 (Informational) | |
| Action Holders |
(None)
|
||
| Telechat date | (None) | ||
| Responsible AD | Lisa M. Dusseault | ||
| Send notices to | rfc-editor@rfc-editor.org |
draft-farah-adntf-ling-guidelines-04
Network Working Group A. El-Sherbiny
Internet-Draft M. Farah
Intended status: Informational UN-ESCWA
Expires: August 9, 2009 I. Oueichek
Syrian Telecom Establishment
A. Al-Zoman
SaudiNIC, CITC
February 5, 2009
Linguistic Guidelines for the Use of the Arabic Language in Internet
Domains
draft-farah-adntf-ling-guidelines-04.txt
Status of this Memo
This Internet-Draft is submitted to IETF in full conformance with
the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
Copyright (c) 2009 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document.
This Internet-Draft will expire on August 9, 2009.
El-Sherbiny, et al. Expires August 9, 2009 [Page 1]
Internet-Draft Arabic Character Guidelines February 2009
Abstract
This document constitutes technical specifications for the use of
Arabic in Internet Domain names and provides linguistic guidelines
for Arabic Domain Names. It addresses Arabic-specific linguistic
issues pertaining to the use of Arabic language in domain names.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Arabic Language-Specific Issues . . . . . . . . . . . . . . . 4
2.1. Linguistic Issues . . . . . . . . . . . . . . . . . . . . 4
2.1.1. Diacritics (tashkeel) and Shadda . . . . . . . . . . . 5
2.1.2. Kasheeda or Tatweel (Horizontal Character Size
Extension) . . . . . . . . . . . . . . . . . . . . . . 5
2.1.3. Character Folding . . . . . . . . . . . . . . . . . . 5
2.2. Supported Character Set . . . . . . . . . . . . . . . . . 6
2.3. Arabic Linguistic Issues Affected By Technical
Constraints . . . . . . . . . . . . . . . . . . . . . . . 8
2.3.1. Numerals . . . . . . . . . . . . . . . . . . . . . . . 8
2.3.2. The Space Character . . . . . . . . . . . . . . . . . 8
3. Summary and Conclusion . . . . . . . . . . . . . . . . . . . . 9
4. Security Considerations . . . . . . . . . . . . . . . . . . . 9
5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9
6. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 9
7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 10
7.1. Normative References . . . . . . . . . . . . . . . . . . . 10
7.2. Informative References . . . . . . . . . . . . . . . . . . 10
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 11
Intellectual Property and Copyright Statements . . . . . . . . . . 12
El-Sherbiny, et al. Expires August 9, 2009 [Page 2]
Internet-Draft Arabic Character Guidelines February 2009
1. Introduction
The Internet Engineering Task Force (IETF) issued in March 2003 a set
of RFCs for Internationalized Domain Names (IDN) [1],[2], [3] which
were planned to become the de facto standard for all languages. In
2007 and 2008, new versions of the internet-drafts proposing the
revisions to the IDNA protocol have been released and are as follows:
o Internationalizing Domain Names for Applications (IDNA): Issues
and Rationale [5]
o Internationalizing Domain Names in Applications (IDNA): Protocol
[6]
o An IDNA problem in right-to-left scripts [7]
o The Unicode Codepoints and IDN [8]
Those documents are known collectively as "IDNA2008".
This document constitutes a technical specification for the
implementation of the IDN standards in the case of the Arabic
Language. It will allow the use of standard language tables to write
domain names in Arabic characters. Therefore, it should be
considered as a logical extension to the IDN standards. It thus
presents guidelines for the proper use of Arabic characters with the
IDN standards in an Arabic language context.
This document reflects the recommendations of the Arab Working Group
on Arabic Domain Names (AWG-ADN) established by the League of Arab
States (LAS), based on standardisation efforts of the United Nations
Economic and Social Commission for Western Asia (UN-ESCWA) and its
Internet- Draft, "Guidelines for an Arabic Internet Domain Name" [9].
It is also in full harmony with recent rigorous discussions that took
place with the major language communities that also use the Arabic
script in their languages.
This document provides guidelines for the ways Arabic characters may
be used for registering Internet Domain Names and how linguistic
specific issues should be handled. A few rules are recommended for
application at the protocol level.
The key words "MUST", "REQUIRED", "SHOULD", "RECOMMENDED", and "MAY"
in this document are to be interpreted as described in RFC 2119 [4].
Comments on this document are solicited and should be addressed to
the working group's mailing list at ESCWA-ICTD@un.org and/or the
author(s).
El-Sherbiny, et al. Expires August 9, 2009 [Page 3]
Internet-Draft Arabic Character Guidelines February 2009
2. Arabic Language-Specific Issues
The main objective of the creation of Arabic Domain Names is to have
a vehicle to increase Internet use amongst all strata of the Arabic-
speaking communities.
Furthermore, a non-user friendly Domain Name would further add to the
ambiguity and the eccentricity of the Internet to the Arabic-speaking
communities, thus contributing negatively to the spread of the
Internet and leading to further isolation of these communities at the
global level.
Hence, there have been intensive efforts especially those spearheaded
by Dr. Al-Zoman and contributed to by UN-ESCWA and its Arabic Domain
Names Task Force (ADN-TF) to reach consensus on a multitude of
linguistic issues with the following goals:
o To define the accepted Arabic character set to be used for writing
domain names in Arabic; which is the subject of this document.
o To define the top-level domains of the Arabic domain name tree
structure (i.e., Arabic gTLDs and ccTLDs). This goal will be
handled in a separate document.
The first meeting of the AWG-ADN, held in Damascus January-February
2005, gave special attention to the following:
a. Simplification of the domain names, whenever possible, to
facilitate the interaction of the Arabic user with the Internet.
b. Adoption of solutions that do not lead to confusion either in
reading or in writing, provided that this does not compromise the
linguistic correctness of used words.
c. Mixing Arabic and non-Arabic letters in the domain name label is
not acceptable.
2.1. Linguistic Issues
There are a number of linguistic issues that have been proposed with
respect to the use of the Arabic language in domain names. This
section will highlight some of them. This section is based on the
papers of Dr. Al-Zoman [10] [11] and the report of the first meeting
of AWG-ADN [12]. For details the reader is encouraged to review the
references.
El-Sherbiny, et al. Expires August 9, 2009 [Page 4]
Internet-Draft Arabic Character Guidelines February 2009
2.1.1. Diacritics (tashkeel) and Shadda
Tashkeel and Shadda are accent marks placed above or below Arabic
letters to produce proper pronunciation. They are thus used to
differentiate different meanings for different words with the same
base characters.
Neither Tashkeel nor Shadda are permitted in zone files when
registering domain names in the Arabic language, although they are
permitted in the current edition of IDNA2008. They can be supported
or ignored, if necessary, in the user interface with local mappings
and stripped before IDNA processing.
The following are their Unicode presentations:
U+064B ARABIC FATHATAN
U+064C ARABIC DAMMATAN
U+064D ARABIC KASRATAN
U+064E ARABIC FATHA
U+064F ARABIC DAMMA
U+0650 ARABIC KASRA
U+0651 ARABIC SHADDA
U+0652 ARABIC SUKUN
2.1.2. Kasheeda or Tatweel (Horizontal Character Size Extension)
Kasheeda (U+0640 ARABIC TATWEEL) must not be used in Arabic domain
names and should be disallowed for Arabic language domain names. The
Kasheeda is not a letter and does not have an effect on
pronunciation. It is used to extend the horizontal length or change
the shape of the preceding letter for graphical representation
purposes in Arabic writing. Accordingly, it has no value for the
writing of domain names. The same applies to all languages using the
Arabic script. The authors recommend that it should be disallowed
at the protocol level.
2.1.3. Character Folding
Character folding is the process where multiple letters (that may
have some similarity with respect to their shapes) are folded into
one shape. Examples of such Arabic characters include:
o Folding Teh Marbuta (U+0629) and Heh (U+0647) at the end of a
word;
o Folding different forms of Hamzah (U+0622, U+0623, U+0625,
U+0627);
o Folding Alef Maksura (U+0649) and Yeh (U+064A) at the end of a
word;
El-Sherbiny, et al. Expires August 9, 2009 [Page 5]
Internet-Draft Arabic Character Guidelines February 2009
o Folding Waw with Hamzah Above (U+0624) and Waw (U+0648).
With respect to the Arabic language, character folding is not
acceptable because it changes the meaning of words and it is against
the principle of spelling rules. Replacing a character valid for use
in domain names with another character also valid for use in domain
names, which may have a similar shape, will give a different meaning.
This will lead to have only one word representing several words
consisting of all the combinations of folded characters. Hence, the
other words will be masked by a single word [10].
Mis-spelling or handwriting errors do occur leading to mixing
different characters despite the fact that this is not the case in
published and printed materials. One of the motivations of this
effort is to preserve the language particularly with the spread of
the globalization movement. Within this context, character folding
is working against this motivation since it is going to have a
negative affect on the principle and ethics of the language.
Technology should work for preserving the language and not for
destroying it. Thus, character folding should not be allowed. The
case of digits is treated in a separate section below.
2.2. Supported Character Set
A domain name to be written in Arabic must be composed of a sequence
of the following UNICODE characters and the FULL STOP (u+002E) to
seperate the labels. These are based on UNICODE version 5.0. The
tables below are constructed using an inclusion-based approach.
Thus, characters that are not part of the table are prohibited.
+---------+-------------------------------------+
| Unicode | Character Name |
+---------+-------------------------------------+
| 0621 | ARABIC LETTER HAMZA |
| 0622 | ARABIC LETTER ALEF WITH MADDA ABOVE |
| 0623 | ARABIC LETTER ALEF WITH HAMZA ABOVE |
| 0624 | ARABIC LETTER WAW WITH HAMZA ABOVE |
| 0625 | ARABIC LETTER ALEF WITH HAMZA BELOW |
| 0626 | ARABIC LETTER YEH WITH HAMZA ABOVE |
| 0627 | ARABIC LETTER ALEF |
| 0628 | ARABIC LETTER BEH |
| 0629 | ARABIC LETTER TEH MARBUTA |
| 062A | ARABIC LETTER TEH |
| 062B | ARABIC LETTER THEH |
| 062C | ARABIC LETTER JEEM |
| 062D | ARABIC LETTER HAH |
| 062E | ARABIC LETTER KHAH |
| 062F | ARABIC LETTER DAL |
| 0630 | ARABIC LETTER THAL |
| 0631 | ARABIC LETTER REH |
El-Sherbiny, et al. Expires August 9, 2009 [Page 6]
Internet-Draft Arabic Character Guidelines February 2009
| 0632 | ARABIC LETTER ZAIN |
| 0633 | ARABIC LETTER SEEN |
| 0634 | ARABIC LETTER SHEEN |
| 0635 | ARABIC LETTER SAD |
| 0636 | ARABIC LETTER DAD |
| 0637 | ARABIC LETTER TAH |
| 0638 | ARABIC LETTER ZAH |
| 0639 | ARABIC LETTER AIN |
| 063A | ARABIC LETTER GHAIN |
| 0641 | ARABIC LETTER FEH |
| 0642 | ARABIC LETTER QAF |
| 0643 | ARABIC LETTER KAF |
| 0644 | ARABIC LETTER LAM |
| 0645 | ARABIC LETTER MEEM |
| 0646 | ARABIC LETTER NOON |
| 0647 | ARABIC LETTER HEH |
| 0648 | ARABIC LETTER WAW |
| 0649 | ARABIC LETTER ALEF MAKSURA |
| 064A | ARABIC LETTER YEH |
| 0660 | ARABIC-INDIC DIGIT ZERO |
| 0661 | ARABIC-INDIC DIGIT ONE |
| 0662 | ARABIC-INDIC DIGIT TWO |
| 0663 | ARABIC-INDIC DIGIT THREE |
| 0664 | ARABIC-INDIC DIGIT FOUR |
| 0665 | ARABIC-INDIC DIGIT FIVE |
| 0666 | ARABIC-INDIC DIGIT SIX |
| 0667 | ARABIC-INDIC DIGIT SEVEN |
| 0668 | ARABIC-INDIC DIGIT EIGHT |
| 0669 | ARABIC-INDIC DIGIT NINE |
+---------+-------------------------------------+
Source: Supporting the Arabic Language in Domain Names [10]
Table 1: CHARACTERS FROM UNICODE ARABIC TABLE (0600-06FF)
El-Sherbiny, et al. Expires August 9, 2009 [Page 7]
Internet-Draft Arabic Character Guidelines February 2009
+---------+-----------------+
| Unicode | Digit Name |
+---------+-----------------+
| 0030 | DIGIT ZERO |
| 0031 | DIGIT ONE |
| 0032 | DIGIT TWO |
| 0033 | DIGIT THREE |
| 0034 | DIGIT FOUR |
| 0035 | DIGIT FIVE |
| 0036 | DIGIT SIX |
| 0037 | DIGIT SEVEN |
| 0038 | DIGIT EIGHT |
| 0039 | DIGIT NINE |
| 002D | HYPHEN-MINUS |
+---------+-----------------+
Source: Supporting the Arabic Language in Domain Names [11]
Table 2: CHARACTERS FROM UNICODE BASIC LATIN TABLE (0000-007F)
2.3. Arabic Linguistic Issues Affected By Technical Constraints
In this section, technical aspects of some linguistic issues are
discussed.
2.3.1. Numerals
In the Arab countries, there are two sets of numerical digits used:
o Set I: (0, 1, 2, 3, 4, 5, 6, 7, 8, 9) mostly used in the western
part of the Arab world.
o Set II: (u+0660, u+0661, u+0662, u+0663, u+0664, u+0665, u+0666,
u+0667, u+0668, u+0669) mostly used in the eastern part of the
Arab world.
Both sets may be supported in the user interface; however, the rule
of numeral homogeneity must be observed. The rule specifies that
digits from the Arabic-Indic set of numerals (u+0660 to u+0669)
should not be allowed to mix with ASCII digits (u+0030 to u+0039)
within the same Arabic domain name label. Thus the appearance of a
digit from one set prevents the use of any other digit from the other
set.
2.3.2. The Space Character
The space character is strictly disallowed in domain names, as it is
a control character. Instead, the hyphen (Al-sharta) (i.e.u+02D) is
proposed as a separator between Arabic words to avoid confusion that
can take place if the words are typed without a separator.
El-Sherbiny, et al. Expires August 9, 2009 [Page 8]
Internet-Draft Arabic Character Guidelines February 2009
It is acceptable to use the hyphen to separate between words within
the same domain name label.
3. Summary and Conclusion
The proposed guidelines are in full accordance with the IETF IDN
standards and take into account Arabic language-specific issues
within a compromise between grammatical rules of the Arabic language
and the ease of use of the language on the Internet.
In summary, the guidelines specify that in Arabic domain names:
o Accent marks (Tashkeel and Shadda) are not permitted.
o Character folding is not permitted.
o If a numeral from the Arabic-Indic or ASCII digit sets appears
in a label, numeral homogeneity is required.
o The hyphen must be used as a word separator instead of space.
4. Security Considerations
No particular security considerations could be identified regarding
the use of Arabic characters in writing domain names. In particular,
any potential visual confusion between different character strings is
avoided using the guidelines proposed in this document.
5. IANA Considerations
This document has no action for IANA.
6. Acknowledgments
ESCWA ICT Division provided support and funding for the development
of this document with the objective of reaching a standard for a
comprehensive Arabic Domain Names. Thanks are due to SaudiNIC for
its continuous efforts in supporting the development of Arabic Domain
Names.
John Klensin and Harald Alvestrand reviewed the document and provided
useful editorial and substantive support to enrich it.
El-Sherbiny, et al. Expires August 9, 2009 [Page 9]
Internet-Draft Arabic Character Guidelines February 2009
7. References
7.1. Normative References
[1] Faltstrom, P., Hoffman, P., and A. Costello,
"Internationalizing Domain Names in Applications (IDNA)",
RFC 3490, March 2003.
[2] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep Profile
for Internationalized Domain Names (IDN)", RFC 3491,
March 2003.
[3] Costello, A., "Punycode: A Bootstring encoding of Unicode for
Internationalized Domain Names in Applications (IDNA)",
RFC 3492, March 2003.
[4] Bradner, S., "Key words for use in RFCs to Indicate Requirement
Levels", BCP 14, RFC 2119, March 1997.
7.2. Informative References
[5] Klensin, J., "Internationalized Domain Names for Applications
(IDNA): Definitions, Background and Rationale",
draft-ietf-idnabis-rationale-06 (work in progress),
September 2008.
[6] Klensin, J., "Internationalized Domain Names in Applications
(IDNA): Protocol", draft-ietf-idnabis-protocol-08 (work in
progress), September 2008.
[7] Alvestrand, H. and C. Karp, "An updated IDNA criterion for
right-to-left scripts", draft-ietf-idnabis-bidi-03 (work in
progress), July 2008.
[8] Faltstrom, P., "The Unicode Codepoints and IDNA",
draft-ietf-idnabis-tables-05 (work in progress), July 2008.
[9] United Nations Economic and Social Commission for Western Asia
(UN-ESCWA), "Guidelines for an Arabic Domain Name System
(ADNS)", Internet-Draft farah-adntf-adns-guidelines-03.txt,
November 2007.
[10] Al-Zoman, A., "Supporting the Arabic Language in Domain Names",
October 2003, <http://www.arabic-domains.org/docs/NIC-docs/
SupportingArabicDomainNmaes.pdf>.
[11] Al-Zoman, A., "Arabic Top-Level Domains", July 2003.
Paper presented in EGM on promotion of Digital Arabic Content,
the United Nations, ESCWA, Beirut
El-Sherbiny, et al. Expires August 9, 2009 [Page 10]
Internet-Draft Arabic Character Guidelines February 2009
[12] League of Arab States, "Report of the first meeting of AWG-ADN,
Damascus", February 2005,
<http://www.arabic-domains.org/ar/intrnational-entites.php>.
This document is in Arabic.
Authors' Addresses
Ayman El-Sherbiny
Information and Communication Technology Division ESCWA
UN-House
P.O. Box 11-8575
Beirut
Lebanon
Email: El-sherbiny@un.org
Mansour Farah
Information and Communication Technology Division ESCWA
UN-House
P.O. Box 11-8575
Beirut
Lebanon
Email: farah14@un.org
Ibaa Oueichek
Syrian Telecom Establishment
Damascus
Syria
Email: oueichek@scs-net.org
Abdulaziz H. Al-Zoman, PhD
SaudiNIC, General Directorate of Internet Services
IT Sector, CITC
King Abdulaziz City for Science and Technology
PO Box 6086
Riyadh 11442
Saudi Arabia
Email: azoman@citc.gov.sa
El-Sherbiny, et al. Expires August 9, 2009 [Page 11]
Internet-Draft Arabic Character Guidelines February 2009
Copyright and License Notice
Copyright (c) 2009 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document.
All IETF Documents and the information contained therein are provided
on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE
REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE
IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL
WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY
WARRANTY THAT THE USE OF THE INFORMATION THEREIN WILL NOT INFRINGE
ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS
FOR A PARTICULAR PURPOSE.
Intellectual Property
The IETF Trust takes no position regarding the validity or scope of
any Intellectual Property Rights or other rights that might be
claimed to pertain to the implementation or use of the technology
described in any IETF Document or the extent to which any license
under such rights might or might not be available; nor does it
represent that it has made any independent effort to identify any
such rights.
Copies of Intellectual Property disclosures made to the IETF
Secretariat and any assurances of licenses to be made available, or
the result of an attempt made to obtain a general license or
permission for the use of such proprietary rights by implementers or
users of this specification can be obtained from the IETF on-line IPR
repository at http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement
any standard or specification contained in an IETF Document. Please
address the information to the IETF at ietf-ipr@ietf.org.
El-Sherbiny, et al. Expires August 9, 2009 [Page 12]