Internet-Draft | CDDL grammar updates | May 2024 |
Bormann | Expires 18 November 2024 | [Page] |
- Workgroup:
- CBOR
- Internet-Draft:
- draft-ietf-cbor-update-8610-grammar-05
- Updates:
- 8610 (if approved)
- Published:
- Intended Status:
- Standards Track
- Expires:
Updates to the CDDL grammar of RFC 8610
Abstract
The Concise Data Definition Language (CDDL), as defined in RFC 8610 and RFC 9165, provides an easy and unambiguous way to express structures for protocol messages and data formats that are represented in CBOR or JSON.¶
The present document updates RFC 8610 by addressing errata and making other small fixes for the ABNF grammar defined for CDDL there.¶
About This Document
This note is to be removed before publishing as an RFC.¶
The latest revision of this draft can be found at https://cbor-wg.github.io/update-8610-grammar/. Status information for this document may be found at https://datatracker.ietf.org/doc/draft-ietf-cbor-update-8610-grammar/.¶
Discussion of this document takes place on the CBOR Working Group mailing list (mailto:cbor@ietf.org), which is archived at https://mailarchive.ietf.org/arch/browse/cbor/. Subscribe at https://www.ietf.org/mailman/listinfo/cbor/.¶
Source for this draft and an issue tracker can be found at https://github.com/cbor-wg/update-8610-grammar.¶
Status of This Memo
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 18 November 2024.¶
Copyright Notice
Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
1. Introduction
The Concise Data Definition Language (CDDL), as defined in [RFC8610] and [RFC9165], provides an easy and unambiguous way to express structures for protocol messages and data formats that are represented in CBOR or JSON.¶
The present document updates [RFC8610] by addressing errata and making other small fixes for the ABNF grammar defined for CDDL there.¶
2. Clarifications and Changes based on Errata Reports
A number of errata reports have been made around some details of text string and byte string literal syntax: [Err6527] and [Err6543]. These are being addressed in this section, updating details of the ABNF for these literal syntaxes. Also, [Err6526] needs to be applied (backslashes have been lost during RFC processing in some text explaining backslash escaping).¶
These changes are intended to mirror the way existing implementations have dealt with the errata. They also use the opportunity presented by the necessary cleanup of the grammar of string literals for a backward compatible addition to the syntax for hexadecimal escapes. The latter change is not automatically forward compatible (i.e., CDDL specifications that make use of this syntax do not necessarily work with existing implementations until these are updated, which this specification recommends).¶
2.1. Err6527 (text string literals)
The ABNF used in [RFC8610] for the content of text string literals is rather permissive:¶
This allows almost any non-C0 character to be escaped by a backslash,
but critically misses out on the \uXXXX
and \uHHHH\uLLLL
forms
that JSON allows to specify characters in hex (which should be
applying here according to Bullet 6 of Section 3.1 of [RFC8610]).
(Note that we import from JSON the unwieldy \uHHHH\uLLLL
syntax,
which represents Unicode code points beyond U+FFFF by making them look
like UTF-16 surrogate pairs; CDDL text strings are not using UTF-16 or
surrogates.)¶
Both can be solved by updating the SESC production.
We use the opportunity to add a popular form of directly specifying
characters in strings using hexadecimal escape sequences of the form
\u{hex}
, where hex
is the hexadecimal representation of the
Unicode scalar value.
The result is the new set of rules defining SESC in Figure 2:¶
(Notes:
In ABNF, strings such as "A"
, "B"
etc. are case-insensitive, as is
intended here.
We could have written %x62
as %s"b"
, but didn't, in order to
maximize ABNF tool compatibility.)¶
Now that SESC is more restrictively formulated, this also requires an update to the BCHAR production used in the ABNF syntax for byte string literals:¶
With the SESC updated as above, \'
is no longer allowed in BCHAR;
this now needs to be explicitly included.¶
Updating BCHAR also provides an opportunity to address [Err6278], which points to an inconsistency in treating U+007F (DEL) between SCHAR and BCHAR. As U+007F is not printable, including it in a byte string literal is as confusing as for a text string literal, and it should therefore be excluded from BCHAR as it is from SCHAR. The same reasoning also applies to the C1 control characters, so we actually exclude the entire range from U+007F to U+009F. The same reasoning then also applies to text in comments (PCHAR). For completeness, all these should also explicitly exclude the code points that have been set aside for UTF-16's surrogates.¶
(Note that, apart from addressing the inconsistencies, there is no attempt to further exclude non-printable characters from the ABNF; doing this properly would draw in complexity from the ongoing evolution of the Unicode standard that is not needed here.)¶
2.2. Err6543 (byte string literals)
The ABNF used in [RFC8610] for the content of byte string literals lumps together byte strings notated as text with byte strings notated in base16 (hex) or base64 (but see also updated BCHAR production above):¶
Change proposed by Errata Report 6543
Errata report 6543 proposes to handle the two cases in separate productions (where, with an updated SESC, BCHAR obviously needs to be updated as above):¶
This potentially causes a subtle change, which is hidden in the WS production:¶
This allows any non-C0 character in a comment, so this fragment becomes possible:¶
foo = h' 43424F52 ; 'CBOR' 0A ; LF, but don't use CR! '¶
The current text is not unambiguously saying whether the three apostrophes
need to be escaped with a \
or not, as in:¶
foo = h' 43424F52 ; \'CBOR\' 0A ; LF, but don\'t use CR! '¶
... which would be supported by the existing ABNF in [RFC8610].¶
No change needed after addressing Err6527 (text string literals) (Section 2.1)
This document takes the simpler approach of leaving the processing of
the content of the byte string literal to a semantic step after
processing the syntax of the bytes
/BCHAR
rules as updated by
Figure 2 and Figure 4.¶
The rules in Figure 7 are therefore applied to the result of this
processing where bsqual
is given as h
or b64
.¶
Note that this approach also works well with the use of byte strings
in Section 3 of [RFC9165].
It does require some care when copy-pasting into CDDL models from ABNF
that contains single quotes (which may also hide as apostrophes
in comments); these need to be escaped or possibly replaced by %x27
.¶
Finally, our approach lends support to extending bsqual
in CDDL
similar to the way this is done for CBOR diagnostic notation in [I-D.ietf-cbor-edn-literals].
(Note that the processing of string literals now is quite similar between
CDDL and EDN, except that CDDL has ";
"-based end-of-line comments, while EDN has
two comment syntaxes, in-line "/
"-based and end-of-line "#
"-based.)¶
The CDDL example in Figure 8 demonstrates various escaping
techniques.
Obviously in the literals for a
and x
, there is no need to escape
the second character, an o
, as \u{6f}
; this is just for demonstration.
Similarly, as shown in c
and z
there also is no need to escape the
🁳
or ⌘
, but escaping them may be convenient in order to limit the character
repertoire of a CDDL file itself to ASCII [STD80].¶
In this example, the rules a to c and x to z all produce strings with
byte-wise identical content, where a to c are text strings, and x to z
are byte strings.
Figure 9 illustrates this by showing the output generated from
the start
rule in Figure 8, using pretty-printed hexadecimal.¶
3. Small Enabling Grammar Changes
The two subsections in this section specify two small changes to the grammar that are intended to enable certain kinds of specifications. These changes are backward compatible, i.e., CDDL files that comply to [RFC8610] continue to match the updated grammar, but not necessarily forward compatible, i.e., CDDL specifications that make use of these changes cannot necessarily be processed by existing [RFC8610] implementations.¶
3.1. Empty data models
[RFC8610] requires a CDDL file to have at least one rule.¶
This makes sense when the file has to stand alone, as a CDDL data model needs to have at least one rule to provide an entry point (start rule).¶
With CDDL modules [I-D.ietf-cbor-cddl-modules], CDDL files can also include directives, and these might be the source of all the rules that ultimately make up the module created by the file. Any other rule content in the file has to be available for directive processing, making the requirement for at least one rule cumbersome.¶
Therefore, we extend the grammar as in Figure 11 and make the existence of at least one rule a semantic constraint, to be fulfilled after processing of all directives.¶
3.2. Non-literal Tag Numbers, Simple Values
The existing ABNF syntax for expressing tags in CDDL is:¶
This means tag numbers can only be given as literal numbers (uints).
Some specifications operate on ranges of tag numbers, e.g., [RFC9277]
has a range of tag numbers 1668546817 (0x63740101) to 1668612095
(0x6374FFFF) to tag specific content formats.
This can currently not be expressed in CDDL.
Similar considerations apply to simple values (#7.
xx).¶
This update extends the syntax to:¶
For #6
, the head-number
stands for the tag number.
For #7
, the head-number
stands for the simple value if it is in
the ranges 0..23 or 32..255 (as per Section 3.3 of RFC 8949 [STD94]
the simple values 24..31 are not used).
For 24..31, the head-number
stands for the "additional
information", e.g., #7.25
or #7.<25>
is a float16, etc.
(All ranges mentioned here are inclusive.)¶
So the above range can be expressed in a CDDL fragment such as:¶
ct-tag<content> = #6.<ct-tag-number>(content) ct-tag-number = 1668546817..1668612095 ; or use 0x63740101..0x6374FFFF¶
Notes:¶
-
This syntax reuses the angle bracket syntax for generics; this reuse is innocuous as a generic parameter/argument only ever occurs after a rule name (
id
), while it occurs after.
here. (Whether there is potential for human confusion can be debated; the above example deliberately uses generics as well.)¶ -
The updated ABNF grammar makes it a bit more explicit that the number given after the optional dot is special, not giving the CBOR "additional information" for tags and simple values as it is with other uses of
#
in CDDL. (Adding this observation to Section 2.2.3 of [RFC8610] is the subject of [Err6575]; it is correctly noted in Section 3.6 of [RFC8610].) In hindsight, maybe a different character than the dot should have been chosen for this special case, however changing the grammar now would have been too disruptive.¶
4. Security Considerations
The grammar fixes and updates in this document are not believed to create additional security considerations. The security considerations in Section 5 of [RFC8610] do apply, and specifically the potential for confusion is increased in an environment that uses a combination of CDDL tools some of which have been updated and some of which have not been, in particular based on Section 2.¶
5. IANA Considerations
This document has no IANA actions.¶
6. References
6.1. Normative References
- [RFC8610]
- Birkholz, H., Vigano, C., and C. Bormann, "Concise Data Definition Language (CDDL): A Notational Convention to Express Concise Binary Object Representation (CBOR) and JSON Data Structures", RFC 8610, DOI 10.17487/RFC8610, , <https://www.rfc-editor.org/rfc/rfc8610>.
- [STD68]
-
Internet Standard 68, <https://www.rfc-editor.org/info/std68>.
At the time of writing, this STD comprises the following:Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax Specifications: ABNF", STD 68, RFC 5234, DOI 10.17487/RFC5234, , <https://www.rfc-editor.org/info/rfc5234>. - [STD94]
-
Internet Standard 94, <https://www.rfc-editor.org/info/std94>.
At the time of writing, this STD comprises the following:Bormann, C. and P. Hoffman, "Concise Binary Object Representation (CBOR)", STD 94, RFC 8949, DOI 10.17487/RFC8949, , <https://www.rfc-editor.org/info/rfc8949>.
6.2. Informative References
- [Err6278]
- "Errata Report 6278", RFC 8610, <https://www.rfc-editor.org/errata/eid6278>.
- [Err6526]
- "Errata Report 6526", RFC 8610, <https://www.rfc-editor.org/errata/eid6526>.
- [Err6527]
- "Errata Report 6527", RFC 8610, <https://www.rfc-editor.org/errata/eid6527>.
- [Err6543]
- "Errata Report 6543", RFC 8610, <https://www.rfc-editor.org/errata/eid6543>.
- [Err6575]
- "Errata Report 6575", RFC 8610, <https://www.rfc-editor.org/errata/eid6575>.
- [I-D.ietf-cbor-cddl-modules]
- Bormann, C. and B. Moran, "CDDL Module Structure", Work in Progress, Internet-Draft, draft-ietf-cbor-cddl-modules-02, , <https://datatracker.ietf.org/doc/html/draft-ietf-cbor-cddl-modules-02>.
- [I-D.ietf-cbor-edn-literals]
- Bormann, C., "CBOR Extended Diagnostic Notation (EDN): Application-Oriented Literals, ABNF, and Media Type", Work in Progress, Internet-Draft, draft-ietf-cbor-edn-literals-08, , <https://datatracker.ietf.org/doc/html/draft-ietf-cbor-edn-literals-08>.
- [RFC7405]
- Kyzivat, P., "Case-Sensitive String Support in ABNF", RFC 7405, DOI 10.17487/RFC7405, , <https://www.rfc-editor.org/rfc/rfc7405>.
- [RFC9165]
- Bormann, C., "Additional Control Operators for the Concise Data Definition Language (CDDL)", RFC 9165, DOI 10.17487/RFC9165, , <https://www.rfc-editor.org/rfc/rfc9165>.
- [RFC9277]
- Richardson, M. and C. Bormann, "On Stable Storage for Items in Concise Binary Object Representation (CBOR)", RFC 9277, DOI 10.17487/RFC9277, , <https://www.rfc-editor.org/rfc/rfc9277>.
- [STD80]
-
Internet Standard 80, <https://www.rfc-editor.org/info/std80>.
At the time of writing, this STD comprises the following:Cerf, V., "ASCII format for network interchange", STD 80, RFC 20, DOI 10.17487/RFC0020, , <https://www.rfc-editor.org/info/rfc20>.
Appendix A. Updated Collected ABNF for CDDL
This appendix is normative.¶
It provides the full ABNF from [RFC8610] with the updates applied in the present document.¶
Acknowledgments
Many thanks go to the submitters of the errata reports addressed in this document. In one of the ensuing discussions, Doug Ewell proposed to define an ABNF rule NONASCII, of which we have included the essence. Special thanks to the reviewers Marco Tiloca, Christian Amsüss (shepherd review), and Orie Steele (AD review).¶