draft-ietf-idn-udns-00

Internet Draft                                             Dan Oscarsson
draft-ietf-idn-udns-00.txt                                 Telia ProSoft
Updates: RFC 2181, 1035, 1034, 2535                        9 July 2000
Expires: 9 January 2001

   Using the Universal Character Set in the Domain Name System (UDNS)

Status of this memo

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC2026.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups. Note that other
   groups may also distribute working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time. It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

     The list of current Internet-Drafts can be accessed at
     http://www.ietf.org/ietf/1id-abstracts.txt

     The list of Internet-Draft Shadow Directories can be accessed at
     http://www.ietf.org/shadow.html.


Abstract

   Since the Domain Name System (DNS) [RFC1035] was created there have
   been a desire to use other characters than ASCII in domain names.
   Lately this desire have grown very strong and several groups have
   started to experiment with non-ASCII names.

   By using the Universal Character Set (UCS) [ISO10646] this document
   updates the Domain Name System so that non-ASCII domain names can be
   used while still being compatible with the current (RFC 1035) DNS.



1. Introduction

   While the need for non-ASCII domain names have existed since the
   creation of the DNS, the need have increased very much during the
   last few years. Currently there are at least two implementations
   using UTF-8 in use, and others using other methods.




Dan Oscarsson           Expires: 9 January 2001                 [Page 1]


Internet Draft               Universal DNS                   9 July 2000


   To avoid several different implementations of non-ASCII names in DNS
   that do not work together, and to avoid breaking the current ASCII
   only DNS, there is an immediate need to standardise how DNS shall
   handle non-ASCII names.

   The basic handling of character data in DNS have several properties
   that need to be preserved:
    - The DNS itself places only one restriction on the particular
      labels that can be used to identify resource records. That one
      restriction relates to the length of the label and the full name.
      The length of any one label is limited to between 1 and 63 octets.
      A full domain name is limited to 255 octets (including the
      separators).  [RFC2181]
    - Any binary string whatever can be used as the label of any
      resource record. Similarly, any binary string can serve as the
      value of any record that includes a domain name as some or all of
      its value (SOA, NS, MX, PTR, CNAME, and any others that may be
      added).  Implementations of the DNS protocols must not place any
      restrictions on the labels that can be used. In particular, DNS
      servers must not refuse to serve a zone because it contains labels
      that might not be acceptable to some DNS client programs.
      [RFC2181]
    - Names must be compared with case-insensitivity.  [RFC1035]
    - The original case should be preserved when possible as data is
      entered into the system. This also implies that responses should
      preserve case when possible. [RFC1035] Some of the reasons for
      this are:
        + Domain names are used for many purposes.
        + One is domain names where company names or trademarks could be
          used.  Very commonly companies and trademarks are using a
          combination of upper and lower case to enhance the image of
          the name.  Many of them would prefer that when you, for
          example, lookup the domain name for an IP address, the correct
          case is returned.
        + An other is the e-mail address defined in the SOA record.
          While many systems now does a case-insensitive comparison on
          the user name part of the e-mail address, there may still be
          those that don't.  And also here, e-mail addresses can be made
          more readable by mixing upper and lower case.
        + If you look up a host name form an IP address you may want to
          use the host name to compare with other data. Many services
          under Unix does this, and many of the are not case-
          insensitive. So they need the correct case returned.
        + There may be other uses of domain names that requires them to
          be unchanged.
    - The characters in the ASCII character set should still be encoded
      as ASCII.




Dan Oscarsson           Expires: 9 January 2001                 [Page 2]


Internet Draft               Universal DNS                   9 July 2000


   This document specifies the update needed of the DNS protocol, user
   interface issues and the effect of other protocols. It is intended to
   full fill the requirements of internationalised domain names which
   currently worked on by the IDN working group [IDNREQ].

1.1 Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in [RFC2119].

    IDN: Internationalised Domain Name, here used to mean a domain name
      containing non-ASCII characters.

    ACE: ASCII Compatible Encoding. Used to encode IDNs in a way
      compatible with the ASCII host name syntax.

1.2 Previous versions of this document

   The first version of this document was available as draft-oscarsson-
   i18ndns-00.txt.


2. The DNS Protocol

   The DNS protocol is used when communicating between DNS servers and
   other DNS servers or DNS clients. User interface issues like the
   format of zone files or how to enter or display domain names are not
   part of the protocol.

   The update of the protocol defined here can be used immediately as it
   is fully compatible with the DNS of today.

   As at this stage not all requirements are completed and many ideas
   for handling IDNs are being discussed, this version of the draft
   includes several alternatives to some of the aspects of the DNS
   internationalisation, with comments to help in the discussions.
   Using an alternative may result in not being able to use IDNs
   immediately as it may require all DNS servers taking part in a query
   to be updated first.

   The goal of this update in the DNS protocol is to add IDN handling
   with as little disruption as possible to software unaware of domain
   names with non-ASCII characters in them. To be able to do this it is
   necessary for the DNS servers to be able to identify if it is talking
   to software knowing about IDNs or not. And if the software does not
   understand IDNs, the DNS server must use an ASCII Compatible Encoding
   (ACE) of IDNs in the communication to avoid disruption the software.



Dan Oscarsson           Expires: 9 January 2001                 [Page 3]


Internet Draft               Universal DNS                   9 July 2000


   ACE will be further discussed later.

2.1 Internationalisation aware software (IDN aware)

   Internationalisation aware DNS software (IDN aware) is software that
   handles the rules for handling international text as defined here.
   Only IDN aware software will get all requirements fulfilled.

   For a DNS server to know if it is talking to IDN aware software, DNS
   awareness must be signalled in some way that is compatible with the
   current non-IDN aware software (following [RFC1035]).

2.1.1 The IN bit

   Referring to section 4.1.1 in [RFC1035] and section 6.1 in [RFC2535]
   the the DNS query/response format header is updated by allocation the
   last un-allocated bit in the header. This bit is defined to be zero
   in old servers and resolvers. For description of all field see the
   sections in the above RFCs.

                                           1  1  1  1  1  1
             0  1  2  3  4  5  6  7  8  9  0  1  2  3  4  5
            +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
            |                      ID                       |
            +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
            |QR|   Opcode  |AA|TC|RD|RA|IN|AD|CD|   RCODE   |
            +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
            |                    QDCOUNT                    |
            +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
            |                    ANCOUNT                    |
            +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
            |                    NSCOUNT                    |
            +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
            |                    ARCOUNT                    |
            +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+

   IDN aware software identifies itself in a query or a response by
   setting the IN bit in the DNS query/response format header.  As this
   bit is defined to be zero in old servers and resolvers they identify
   themselves as non-IDN aware.

   IDN aware software MUST set the IN bit in both queries and responses.
   Currently there is no need for a client to know that the DNS server
   knows about IDNs, but by always setting the IN bit in responses will
   allow clients to use this information if need arises in the future.

2.1.2 Alternatives




Dan Oscarsson           Expires: 9 January 2001                 [Page 4]


Internet Draft               Universal DNS                   9 July 2000


   While the IN bit used above is the cleanest way, it is also the last
   free bit available in the DNS header since DNSSEC reserved the other
   two free ones. To avoid using the last free bit here are some
   alternative ways.

   Note: These alternatives are here to help show why the above IN bit
   is the best choice and to help discussions about how to do it.

2.1.2.1 Using the RA, AA or TC bit

   Use one of the bits that only have a meaning in a response as defined
   by RFC 1035: the RA, AA or TC bit. One of these bits could be defined
   to be a IN bit in queries and the old meaning in responses. Non-IDN
   DNS servers following RFC 1035 can be expected to ignore these bits
   in queries.
   Some of the drawbacks are:
    - Some old software may not zero the value of the bits in queries
      and will therefore look as if they are IDN aware.
    - The bit cannot be used in a response to inform the client that the
      server is IDN aware.
    - Not recommended by [IANADNS].

2.1.2.2 Using the RCODE bits

   Use the bits of the RCODE value. It is defined by RFC 1035 to be used
   in responses. One of these bits could be defined to be a IN bit in
   queries or a value for all bits could be used, and the old meaning in
   responses. Non-IDN DNS servers following RFC 1035 can be expected to
   ignore these bits in queries.
   Some of the drawbacks are:
    - Their value are not defined in queries by RFC 1035. Most software
      could be expected to send them as zero as most zero the entire
      header before setting the values needed for the query. Instead of
      using just a bit one could use all bits. Using a value of all ones
      (all bits set to 1) would probably be a good flag even if old
      software fills it with random bits.
    - The bits cannot be used in a response to inform the client that
      the server is IDN aware.

2.1.2.3 Using the upper 8 bits of QTYPE

   The QTYPE is 16 bits but currently are codes above 255 not used.  It
   would therefore be possible to use one of the upper 8 bits as an IN
   bit.  Some of the drawbacks are:
    - Using an upper bit results in a a new value for the QTYPE.
      Software following RFC 1035 will see this as an unknown QTYPE and
      return an error response, forcing a new query using the QTYPE
      without the bit set. This will result in a overhead because of the



Dan Oscarsson           Expires: 9 January 2001                 [Page 5]


Internet Draft               Universal DNS                   9 July 2000


      extra query/response that have to be done for all non-IDN aware
      servers and will make IDN queries slower and increase the DNS
      traffic. It will go away when all DNS servers is IDN aware, but
      that is many years in the future.

2.1.2.5 Using EDNS

   Using EDNS [RFC2671] additional information could be sent in the
   additional section of a query or response. The IN bit could then be
   sent in an OPT RR record. An example of doing it this way can be
   found in [IDNE].  Some of the drawbacks are:
    - As most software today do not implement EDNS and return an error
      response, forcing a new query to be sent without EDNS.  This will
      result in a overhead because of the extra query/response that have
      to be done for all non-IDN aware servers and will make IDN queries
      slower and increase the DNS traffic. It will go away when all DNS
      servers is EDNS aware, but that is many years in the future.
    - Even a query for an ASCII only domain name will have the query
      overhead described above. The OPT RR will have to be sent here
      also, otherwise the server cannot know if the client can handle
      IDNs in the response.


2.2 Character data

   Character data need to be able to represent as much as possible of
   the characters in the world as well as being compatible with ASCII.
   It must also be well defined so that it can easily be handled and
   should be compact as only 63 octets is available without an extension
   of the protocol.

2.2.1 Native format of character data in the DNS protocol

   Character data used in the DNS protocol MUST, unless it is sent as an
   ACE to non-IDN aware software, use:
    - Use ISO 10646 (UCS) [ISO10646] as coded character set.
    - Be normalised using form C as defined in Unicode technical report
      #15 [UTR15]. See also [CHNORM].
    - Encoded using the UTF-8 [RFC2279] character encoding scheme.

   In responses to non-IDN aware software, IDNs MUST be encoded using
   ACE.

   As all data is required to be normalised before sent using the DNS
   protocol, a DNS server do not need to normalise any data coming in
   through a request. But it MUST normalise data loaded from a zone
   file.




Dan Oscarsson           Expires: 9 January 2001                 [Page 6]


Internet Draft               Universal DNS                   9 July 2000


2.2.1.1 Alternative native formats

   Here are some alternative possibilities to the native format defined
   above and comments of why they were not selected, to help in
   discussions. See [IDNCOMP] for more examples.

2.2.1.1.1 Using other normalisation formats

   The Unicode normalisation form C does not remove any information in
   the character data while making it as compact as possible and have a
   well defined order of combining characters.

   One could use form KC or some other normalisation format that removes
   all forms that are not significant when testing for domain name
   equivalence, but that would remove a lot of information that is
   important in printing or displaying them. It would be like replacing
   all upper case letters in ASCII names with the lower case version.

   As normalisation form C is both compact and does not destroy any
   information it is also well suited to be the standard format to be
   used in all protocols using UCS. It is the one recommended by W3C.

   Software can be reused if all protocols use the same normalisation
   form.

2.2.1.1.2 Using other encoding schemes than UTF-8

   While using another coded character set than UCS is possible,
   allowing only one simplifies software very much and reduces the
   possibility of incompatibilities. But using UTF-8 is not that compact
   for several languages. For some a single row of UCS is enough and for
   some 16-bit values is best.  The native format could easily include a
   tag byte first defining subset/encoding of UCS allowing ISO 8859-1,
   UTF-8, UCS-2 or UTF-16 to be used. This would allow more compact
   format and therefore more characters into the 63 byte limit of a
   label, for some names.  It would not make software much more complex
   as the coded character set is still the same. But unless the length
   limits are unacceptable or cannot be overridden is an easy way,
   having just one scheme is easiest.

2.2.1.2 Handling length limits

   The current DNS protocol limits a label to 63 octets. As IDNs take
   more than one octet for some characters, an IDN cannot have 63
   characters in a label like an ASCII name can. For example a name
   using Hangul would have a maximum of 21 characters when encoded using
   UTF-8. There is currently no requirement on minimum length that is
   required in IDNs, but even if the need does not appear immediately it



Dan Oscarsson           Expires: 9 January 2001                 [Page 7]


Internet Draft               Universal DNS                   9 July 2000


   will come.

   The limits imposed by RFC 1035 is 63 octets per label and 255 octets
   for the full name. The 255 limit is not a protocol limit but one to
   simplify implementations.

   When longer limits are allowed for IDNs, we still need to set a limit
   to simplify implementations. By following the limits defined in RFC
   1035 the limits for IDNs (and ASCII only names) is defined as:

   A label is limited to a maximum of 63 character code points in UCS
   normalised using Unicode form C.  The full name is limited to a
   maximum of 255 character code points normalised as for a label.

   Longer labels in DNS are supported as follows:

2.2.1.2.1 EDNS long label

   An extended label type is defined that allows as a minimum 255 octets
   per label. It can be defined as in [IDNE].

   Even though a label now can have 255 octets, an ASCII only name by
   still only have 63 characters in it.

   All IDN aware software MUST implement the ENDS long label.

   As EDNS is not understood by older software, a query with an EDNS
   long label will fail and can only be returned in a response to an IDN
   aware client. It is recommended that only IDNs that fits into the 63
   octets in the standard label are used until enough DNS software have
   been upgraded to avoid a lot of overhead in queries.

2.2.1.2.1 Alternative: encoded long label

   As an alternative to EDNS, it is possible to split a long label into
   several parts and encode it into several labels. It could be done
   like this: <UTF-8 code part1><DEL>.<DEL><UTF-8 code part 2>.com.
   Where the <DEL> octet indicates that part1 and part2 shall be joined
   together. This can be done inside the client (resolver) software and
   in the DNS server so that application software and the user will not
   be aware of it happening.

   While this solution can work through non-IDN aware software it is
   less clean than the EDNS solution. And there are always the
   possibility that some software assumes that part1 is a host name.

2.2.2 ASCII Compatible Encoding




Dan Oscarsson           Expires: 9 January 2001                 [Page 8]


Internet Draft               Universal DNS                   9 July 2000


   When responding to non-IDN aware software (for example most current
   DNS software and current SMTP implementations) there is a need for a
   transition mechanism to support them.

   As a lot of software and protocols assume only the ASCII letters,
   digits and hyphen in each label, the transition mechanism should use
   an ASCII Compatible Encoding (ACE) to encode the IDNs.

   This ACE can then be used by DNS and other protocols when
   communicating with non-IDN aware software.

   The selected ACE will be defined in a separate document, but some of
   the basic ways to do it, is as follows:
    - Tag labels that are ACE names with a prefix or suffix.

      This would give names like: abcd.ra--XXXXX.com or abcd.XXXXX-.com
      where the XXXXX is the ACE name and the leading "ra--" or trailing
      "-" is the tag.

      Using this technique some labels may be using ACE and some not.

      One difficulty here is to have a tag that will never occur in a
      normal domain name.

      Labels longer than 63 octets could be split like this:
      abcd.ra--XXXXXZ.Zra--YYYY.com where during decoding the parts
      XXXXX and YYYY will be joined to form one label. This technique
      can probably mess up some programs that assume that a dot
      separates each label.
    - Use a special TLD.

      Here names would look like: XXXXX.YYYY.ACE where XXXXX.YYYY is the
      encoded name.

      Here we do not have the problem of selecting a tag that is unique
      as the TLD will be unique.
    - Using label aliases (not really an ACE).

      One could use a user defined or automatically generated aliases to
      labels that are returned to non-IDN aware software. This way
      labels could be made meaningful.

      One problem with this is to convert from the IDN to the ACE, you
      need access the DNS to lookup the alias. This may be difficult for
      some protocols that have no access to the DNS at the moment the
      conversion must be done.





Dan Oscarsson           Expires: 9 January 2001                 [Page 9]


Internet Draft               Universal DNS                   9 July 2000


2.3 Domain name matching

   One of the most difficult areas of making DNS universal is what names
   are equivalent to an other. For ASCII this was easily solved by
   case-insensitivity. It is also easily solved for many other Latin
   based alphabets. But when you look at the whole world you get a
   mixture of rules, some conflicting, including case-insensitivity,
   half width/full width, final/non-final forms and much more.

   This type of matching will be called "equivalence matching" here
   after

2.3.1 Equivalence matching rules

   To compare two domain names, both names must first be mapped to a
   format where all equivalent characters are mapped to one character so
   that the names then can be binary compared.

   This mapping is done from the native UCS normalised form C format as
   follows:
    1) Fold case to lower case.
    2) Do additional simplification.
    3) Normalise to Unicode form C again.

   For the UCS character code range 0-255 (ASCII and ISO 8859-1) the
   case folding MUST be done by following the one to one mapping as
   defined in the Unicode 3.0 Character Database [UDATA].  Case folding
   and simplification for the rest of UCS will be defined in a separate
   document.


2.3.2 Matching of domain names in DNS servers

   To be able to handle correct domain name matching in lookups, the
   following MUST be followed by DNS servers:
    - Do matching on authorative data using the full name equivalence
      matching needed for the characters used in the data.
    - On non-authorative data, either do binary matching or case-
      insensitive matching on ASCII letters and binary matching on all
      others.
    - Implement the equivalence matching rules as defined above. Local
      variations are not allowed.

   The effect of the above is:
    - only servers handling authorative data must implement equivalence
      matching of names. And they need only implement the subset needed
      for the subset of characters of UCS they support in its
      authorative zones.



Dan Oscarsson           Expires: 9 January 2001                [Page 10]


Internet Draft               Universal DNS                   9 July 2000


    - it normally gives fast lookup because data is usually sent like:
      resolver <-> server <-> authorative server.
      While full equivalence matching can be complex and CPU consuming,
      the server in the middle will do caching with only simple and fast
      binary matching. So the impact of complex matching rules should
      not slow down DNS very much.


2.4 Inter operability between IDN aware DNS software and non-IDN aware

   While the current non-IDN aware DNS software MUST allow UTF-8 encoded
   domain names (if they follow RFC1035, 2181) a lot of software using
   DNS may not (for example SMTP). To not break all the old software
   only expecting or allowing ASCII in domain names, the following rules
   MUST be followed by an IDN aware DNS server:
    - A query with the IN bit set is assumed to be from IDN aware
      software.
    - A query with domain names having valid non-ASCII UTF-8 characters
      is assumed to be from IDN aware software even if the IN bit is not
      set. (this is because the query can have been sent from an IDN
      aware resolver through a non-IDN aware server).
    - Always encode using ACE the UTF-8 names into ASCII before sending
      it when responding to non-IDN aware software.
    - Never have ACE names in the response when responding to IDN aware
      software.
    - Always check for ACE names in requests.
    - Not do zone transfers to non-IDN aware software, if the zone
      contains non-ASCII.
    - Return the server failed error if a label cannot be encoded into
      ACE and fit in the 63 octets allowed.

   An IDN aware DNS resolver MUST:
    - Decode any ACE names before sending them using the DNS protocol.
    - Decode any ACE names received in a response.

   The result of this is:
    - Old software gets an ASCII only domain name using only the old set
      of allowed characters.
    - Both IDN aware DNS servers and resolver software must handle up
      coding of domain names.
    - Domain names used from old software will work in other protocols
      only allowing ASCII names.
    - We may get old software that is never fixed as it still works.
    - We do not get rid of this user unfriendly, encode everything in
      ASCII handling that many non-ASCII users complain about.

   Note: As a non-IDN aware DNS server only understands matching using
   ASCII case-insensitivity, it may cache IDN responses as different



Dan Oscarsson           Expires: 9 January 2001                [Page 11]


Internet Draft               Universal DNS                   9 July 2000


   even though the are IDN equivalent. This will result in more data
   cached but not give invalid responses.


2.4 DNSSEC

   DNSSEC [RFC2535] is complex and not yet fully studied. Especially the
   canonical DNS name order and signing of RRsets.

   The canonical DNS name order sorts names with letters as lower case.
   In IDN this means to fold to lower case, normalise and simplify as is
   done in lookups.  This would mean that only a DNS server knowing the
   full equivalence rules could do the sorting. It would be better if
   this was not needed.

   Signing of RRsets is done on the canonical RR form. RFC 2535 is
   somewhat unclear if domain names inside the RDATA should be lower
   cased. If not, so that original format of RDATA is preserved, signing
   should be no problem in IDN aware DNS software.

   The full handling of DNSSEC and IDN data may have to be described in
   a separate document.


3. Characters allowed in domain names

   The DNS protocol do not place any restriction on characters used in a
   domain name. However applications that make use of DNS data may have
   restrictions imposed on what particular values are acceptable in
   their environment. If the client has such restrictions, it is solely
   responsible for validating the data from the DNS to ensure that it
   conforms before it makes any use of that data. [RFC2181]

   For example domains, hosts and e-mail addresses are represented in
   DNS and may have different rules.

   As the whole idea of making DNS universal is to get domain names with
   non-ASCII, the original recommendation in DNS [RFC1035] for
   host/domain names needs to be updated. But, the DNS may not itself
   place any restriction on the characters allowed in a domain name.
   Domain names are used for more than hosts and e-mail domains.

   It is recommended that domains, hosts and e-mail addresses all are
   extended to allow all letters, digits and some separators of UCS.

   This have to be defined in an other document. A beginning to this is
   available in [NAMEPREP].




Dan Oscarsson           Expires: 9 January 2001                [Page 12]


Internet Draft               Universal DNS                   9 July 2000


4. User interface issues

   Locally on a system or in a user interface a different character set
   than the one defined to be used in the DNS protocol may be used.
   Therefore software must map between the local character set and the
   character set of the protocol, so that human beings can understand
   it.

   This means that a zone file that is edited in a text editor by a
   person before being loaded into a DNS server must be allowed to be in
   the local character set. Software may not assume that the user can
   edit text encoded in UTF-8. A zone file transmitted between DNS
   software that is not handled by a human, can be transmitted using any
   format.

   When character data is presented to a human or entered by a human,
   software must, as good as possible, present it using local character
   set and allow it to be entered using the local character set.  It is
   the responsibility of the software to convert between the local
   character set and the one used in the protocol, not the human.

   The down coding defined above allows all names to be entered and
   displayed for all users, as long as at least the ASCII characters are
   supported.

4.1 Applications using DNS software

   If an application does a call to DNS, it must present the data to the
   users in the local character set used by the user, down coding if
   necessary. Software used to access DNS should give the application
   programmer both the possibility of doing queries and getting
   responses using local character set, and using UTF-8.

   APIs like getipnodebyname should be updated with a IDN flag that
   results in the name being returned using the current locale, instead
   of native UTF-8 or ASCII format.

5. Effect on other protocols

   As now a domain name may include non-ASCII many other protocols that
   include domain names need to be updated. For example SMTP, HTTP and
   URIs. The ACE format can be used when interfacing with ASCII only
   software or protocols.  Protocols like SMTP could be extended using
   ESMTP and a UTF8 option that defines that all headers are in UTF-8.

   It is recommended that protocols updated to handle i18n do this by
   encoding character data in the same standard format as defined for
   DNS in this document (UCS normalised form C). The use of encoding it



Dan Oscarsson           Expires: 9 January 2001                [Page 13]


Internet Draft               Universal DNS                   9 July 2000


   in ASCII or by tagged character sets should be avoided.

   DNS do not only have domain names in them, for example e-mail
   addresses are also included. So an e-mail address would be expected
   to be changed to include non-ASCII both before and after the @-sign.

   Software need to be updated to follow the user interface
   recommendations given above, so that a human will see the characters
   in their local character set, if possible.

5.1 An example: SMTP

   When using SMTP it may be extended to allow UTF-8 in headers and
   addresses.  It will then have to, when transferring an e-mail to a
   SMTP system that have not been extended, encoded e-mail addresses and
   IDNs into an ACE.

   In this case an e-mail address could look like:
   ra--XXXXX.surname@ra--YYYYY.com
   where ra--XXXXX is the ACE of the given name and ra--YYYYY is the ACE
   of one part of the domain name.

6. Security Considerations

   As always with data, if software does not check for data that can be
   a problem, security may be affected. As more characters than ASCII is
   allowed, software only expecting ASCII and with no checks may now get
   security problems.

7. References

   [RFC1034]  P. Mockapetris, "Domain Names - Concepts and Facilities",
              STD 13, RFC 1034, November 1987.

   [RFC1035]  P. Mockapetris, "Domain Names - Implementation and
              Specification", STD 13, RFC 1035, November 1987.

   [RFC2119]  Scott Bradner, "Key words for use in RFCs to Indicate
              Requirement Levels", March 1997, RFC 2119.

   [RFC2181]  R. Elz and R. Bush, "Clarifications to the DNS
              Specification", RFC 2181, July 1997.

   [RFC2279]  F. Yergeau, "UTF-8, a transformation format of ISO 10646",
              RFC 2279, January 1998.

   [RFC2535]  D. Eastlake, "Domain Name System Security Extensions".
              RFC 2535, March 1999.



Dan Oscarsson           Expires: 9 January 2001                [Page 14]


Internet Draft               Universal DNS                   9 July 2000


   [RFC2671]  P. Vixie, "Extension Mechanisms for DNS (EDNS0)", RFC
              2671, August 1999.

   [ISO10646] ISO/IEC 10646-1:2000. International Standard --
              Information technology -- Universal Multiple-Octet Coded
              Character Set (UCS)

   [Unicode]  The Unicode Consortium, "The Unicode Standard -- Version
              3.0", ISBN 0-201-61633-5. Described at
              http://www.unicode.org/unicode/standard/versions/
              Unicode3.0.html

   [UTR15]    M. Davis and M. Duerst, "Unicode Normalization Forms",
              Unicode Technical Report #15, Nov 1999,
              http://www.unicode.org/unicode/reports/tr15/.

   [UTR21]    M. Davis, "Case Mappings", Unicode Technical Report #21,
              Dec 1999, http://www.unicode.org/unicode/reports/tr21/.

   [UDATA]    The Unicode Character Database,
              ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txt.
              The database is described in
              ftp://ftp.unicode.org/Public/UNIDATA/
              UnicodeCharacterDatabase.html.

   [IDNREQ]   James Seng, "Requirements of Internationalized Domain
   Names", draft-ietf-idn-requirement.

   [IANADNS]  Donald Eastlake, Eric Brunner, Bill Manning, "Domain Name
   System (DNS) IANA Considerations",draft-ietf-dnsext-iana-dns.

   [IDNE]     Marc Blanchet,Paul  Hoffman, "Internationalized domain
   names using EDNS (IDNE)", draft-ietf-idn-idne.

   [CHNORM]   M. Duerst, M. Davis, "Character Normalization in IETF
   Protocols", draft-duerst-i18n-norm.

   [IDNCOMP]  Paul Hoffman, "Comparison of Internationalized Domain Name
   Proposals", draft-ietf-idn-compare.

   [NAMEPREP] Paul Hoffman, "Comparison of Internationalized Domain Name
   Proposals", draft-ietf-idn-compare.

8. Acknowledgements

   Paul Hoffman giving many comments in our e-mail discussions.

   Ideas from drafts by Paul Hoffman, Stuart Kwan, James Gilroy and Kent



Dan Oscarsson           Expires: 9 January 2001                [Page 15]


Internet Draft               Universal DNS                   9 July 2000


   Karlsson.

   Magnus Gustavsson, Mark Davis, Kent Karlsson and Andrew Draper for
   comments on my draft.

   Discussions and comments by the members of the IDN working group.



Author's Address

   Dan Oscarsson
   Telia ProSoft AB
   Box 85
   201 20 Malmo
   Sweden

   E-mail: Dan.Oscarsson@trab.se

































Dan Oscarsson           Expires: 9 January 2001                [Page 16]

Document	Document type	This is an older version of an Internet-Draft whose latest revision state is "Expired". Expired & archived
	Select version	00 01 02 03
	Compare versions
	Author
	RFC stream
	Other formats	txt pdf bibtex bibxml
	Additional resources	Mailing list discussion