Unicode in ABNF

Document Type Active Internet-Draft (individual)
Last updated 2017-03-13
Stream (None)
Intended RFC status (None)
Formats plain text pdf html bibtex
Stream Stream state (No stream defined)
Consensus Boilerplate Unknown
RFC Editor Note (None)
IESG IESG state I-D Exists
Telechat date
Responsible AD (None)
Send notices to (None)
Network Working Group                                         S. Leonard
Internet-Draft                                             Penango, Inc.
Updates: 5234 (if approved)                                    C. Newman
Intended Status: Experimental                                     Oracle
Expires: September 14, 2017                               March 13, 2017

                            Unicode in ABNF

   This experimental document adds support for Unicode strings in ABNF
   (Augmented Backus-Naur Form), and provides certain symbols related to
   Unicode code point ranges.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF). Note that other groups may also distribute working
   documents as Internet-Drafts. The list of current Internet-Drafts is
   at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time. It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft is a fork of

Copyright Notice

   Copyright (c) 2017 IETF Trust and the persons identified as the
   document authors. All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document. Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document. Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Leonard & Newman              Experimental                      [Page 1]
Internet-Draft              Unicode in ABNF               March 13, 2017

1.  Introduction

   Augmented Backus-Naur Form (ABNF) [RFC5234] is a formal syntax that
   is popular among many Internet specifications. Many Internet
   documents employ this syntax along with the Core Rules defined in
   Appendix B.1 of [RFC5234]. ABNF is defined in terms of ASCII
   [ASCII86, RFC0020]; however, Unicode [UNICODE] has become
   increasingly popular--even required--as the Internet has evolved over
   the last two decades. Unicode (as UTF-8) will be permitted in the RFC
   series [IABNA], while [RFC5198] established Net-Unicode as the
   standard form for the use of Unicode as "network text". Protocols
   that originally were ASCII-based have been, or are being, extended to
   support Unicode. However, protocols that use Unicode in some way
   (e.g., permit UTF-8 content in a production) use different ABNF
   expressions, some of which do not conform to the modern Unicode
   Standard 9.0.0, and therefore could introduce interoperability or
   security problems.

   Many parties have expressed interest in incorporating [UNICODE] into
   ABNF, yet the questions remain: "How?" and "To what extent?"

   This document proposes standardized techniques for expressing Unicode
   code points using ABNF. This document intends to be very conservative
   in its approach: a conforming implementation only needs to know how
   to map between the Unicode scalar values and any Unicode encoding
   form. The Unicode Character Database (UCD, Section 4.1 of [UNICODE])
   is intentionally not necessary. ABNF text that uses the syntax in
   this document needs to be in a Unicode encoding form (Conformance
   Clause D89 of [UNICODE]), but ABNF text that just uses the rules or
   terminal values can be expressed in ASCII [RFC0020].

2.  Unicode Code Points in ABNF

   (Consult Section 2.3 of [RFC5234] in relation to this paragraph.)
   Unicode has been expressed in several different ways in RFCs to-date.
   This document establishes that in contexts where Unicode is specified
   as the coded character set [RFC2130], the terminal values %x00-10FFFF
   are to be used to represent the Unicode code points. Only the Unicode
   scalar values are to be used in specifications that follow this
   document; surrogate code points (%xD800-DFFF) are not to be used
   [[NB: directly]]. This technique aligns ABNF with W3C EBNF [XMLEBNF]
   and Unicode EBNF [UNICODE].

   (Consult Section 2.4 and Appendix B.2 of [RFC5234] in relation to
   this paragraph.)
   In contexts where Unicode is specified as the character set, the
Show full document text