Internet-Draft CDDL for CSVs December 2023
Bormann & Birkholz Expires 26 June 2024 [Page]
Workgroup:
Network Working Group
Internet-Draft:
draft-bormann-cbor-cddl-csv-04
Published:
Intended Status:
Standards Track
Expires:
Authors:
C. Bormann
Universität Bremen TZI
H. Birkholz
Fraunhofer SIT

Using CDDL for CSVs

Abstract

The Concise Data Definition Language (CDDL), standardized in RFC 8610, is defined to provide data models for data shaped like JSON or CBOR.

Another representation format that is quote popular is the CSV (Comma-Separated Values) file as defined by RFC 4180.

The present document shows a way how to use CDDL to provide a data model for CSV files.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 26 June 2024.

1. Introduction

The Concise Data Definition Language (CDDL), standardized in [RFC8610], is defined to provide data models for data shaped like JSON or CBOR.

Another representation format that is quote popular is the CSV file as defined by [RFC4180].

The present document shows how to use CDDL to provide a data model for CSV files.

1.1. Terminology

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

This specification uses terminology from [RFC8610].

2. CSV generic data model

The CSV format is defined in [RFC4180]. The generic data model for the data in a CSV file can be described in CDDL as:

csv = [?header, *record]
header = [+header-field]
record = [+field]
header-field = text
field = text

Note that the elements of this data model describe the interpretation of the data after processing and removal of lexical structure such as newlines, commas, escape characters, and quotation marks.

For the purposes of a specific application, the data model level structure of each field may be described in a more elaborate way, e.g., as a number. A recent proposal, [I-D.ietf-cbor-cddl-more-control], provides some CDDL control operators that could be used to express the transformation between the text string in the CSV field and the number that this text string represents at the application data model level; this could be explored in future revisions of this specification. For now, the usage of anything but "text" for a field therefore MUST be accompanied by an instruction how to perform the translation. As a preferred choice, the JSON representation of the data model item, if it exists, MAY be chosen by that instruction.

Since the CSV media type text/csv defaults to using the US-ASCII character set (i.e., [STD80]; see Section 3 of [RFC4180]), many uses of CSV will need to specify the media type parameter charset. (Note that CDDL can describe text information that is in UTF-8 form, which includes US-ASCII as that is a subset of UTF-8. If a different form that is not a subset of UTF-8 is really still needed, some rules for conversion will need to be defined by the application.)

The media type parameter header MAY be used to indicate the presence or absence of a header line; if it is not given, the grammar MUST NOT be ambiguous about the presence of a header (i.e., it MUST be either mandatory or absent).

Note that the ABNF [STD68] in [RFC4180] does not quite handle the case that charset is not us-ascii. For the purposes of the present specification, the ABNF is understood to allow all characters from the charset except %x22 and %x2C in TEXTDATA. For the purposes of the present specification, the ABNF rule CRLF is read as:

CRLF = [CR] LF

as is hinted in Section 3 of [RFC4180].

3. Examples

A simplified CSV form definition of a SID file [I-D.ietf-core-sid] might look like this:

; header = absent

SID-File = [meta-record,
            ?description-record,
            *dependency-record,
            *range-record,
            *item-record]

meta-record = ["ietf-sid-file",
               module-name: text,
               module-revision: empty / text,
               sid-file-revision: empty / text,
               sid-file-status: empty / "unpublished" / "published"]

description-record = ["description",
                      description: empty / text]

dependency-record = ["dependency",
                     module-name: text,
                     module-revision: text]

range-record = ["range",
                entry-point: uint,
                size: uint]

item-record = [; "item", -- useful to elide for bulk of file
               sid: uint
               (
                 namespace: "module" / "identity" / "feature"
                 identifier: yang-identifier
                //
                 namespace: "data"
                 identifier: schema-node-path
               )
               status: empty / "stable" / "unstable" / "obsolete"]

yang-identifier = text .abnf ("yang-identifier" .det id-abnf)
schema-node-path = text .abnf ("schema-node-path" .det id-abnf)
id-abnf = '
  schema-node-path = "/" QID *( "/" OQID)
  yang-identifier = ID
  QID = ID ":" ID
  OQID = ID [":" ID]
  ID = I *C
  I = "_" / %x41-5a / %x61-7a
  C = I / %x30-39 / "-" / "."
'

empty = ""

This CDDL data model assumes that the text strings representing the numbers entry-point, size, and sid are converted to uint. (Note that, due to the way YANG-JSON [RFC7951] defines the representation of uint64 data items, these actually are text strings in JSON, which in CSV is indistinguishable from numbers. However, the CDDL model for the CSV files will be more useful if it takes into account typical CSV applications that automatically convert integer-like text strings into numbers.)

The result of representing in CSV the sid file ietf-system.sid (as defined in Appendix A of [I-D.ietf-core-sid]) is shown in Appendix A.

4. IANA Considerations

This document makes no requests of IANA.

5. Security considerations

The security considerations of [RFC8610] and [RFC4180] apply.

6. References

6.1. Normative References

[RFC2119]
Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, , <https://www.rfc-editor.org/rfc/rfc2119>.
[RFC4180]
Shafranovich, Y., "Common Format and MIME Type for Comma-Separated Values (CSV) Files", RFC 4180, DOI 10.17487/RFC4180, , <https://www.rfc-editor.org/rfc/rfc4180>.
[RFC8174]
Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, , <https://www.rfc-editor.org/rfc/rfc8174>.
[RFC8610]
Birkholz, H., Vigano, C., and C. Bormann, "Concise Data Definition Language (CDDL): A Notational Convention to Express Concise Binary Object Representation (CBOR) and JSON Data Structures", RFC 8610, DOI 10.17487/RFC8610, , <https://www.rfc-editor.org/rfc/rfc8610>.
[STD68]
Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax Specifications: ABNF", STD 68, RFC 5234, DOI 10.17487/RFC5234, , <https://www.rfc-editor.org/rfc/rfc5234>.

6.2. Informative References

[I-D.ietf-cbor-cddl-more-control]
Bormann, C., "More Control Operators for CDDL", Work in Progress, Internet-Draft, draft-ietf-cbor-cddl-more-control-01, , <https://datatracker.ietf.org/doc/html/draft-ietf-cbor-cddl-more-control-01>.
[I-D.ietf-core-sid]
Veillette, M., Pelov, A., Petrov, I., Bormann, C., and M. Richardson, "YANG Schema Item iDentifier (YANG SID)", Work in Progress, Internet-Draft, draft-ietf-core-sid-24, , <https://datatracker.ietf.org/doc/html/draft-ietf-core-sid-24>.
[RFC7951]
Lhotka, L., "JSON Encoding of Data Modeled with YANG", RFC 7951, DOI 10.17487/RFC7951, , <https://www.rfc-editor.org/rfc/rfc7951>.
[RFC8792]
Watsen, K., Auerswald, E., Farrel, A., and Q. Wu, "Handling Long Lines in Content of Internet-Drafts and RFCs", RFC 8792, DOI 10.17487/RFC8792, , <https://www.rfc-editor.org/rfc/rfc8792>.
[STD80]
Cerf, V., "ASCII format for network interchange", STD 80, RFC 20, DOI 10.17487/RFC0020, , <https://www.rfc-editor.org/rfc/rfc20>.

Appendix A. Example: ietf-system.sid represented in CSV

This appendix shows the CSV file that is automatically generated from Appendix A of [I-D.ietf-core-sid]. (Note that plaintext-based RFCs are limited to 72 columns; therefore five long lines in the CSV file have been folded as defined in [RFC8792].)

=============== NOTE: '\' line wrapping per RFC 8792 ================

ietf-sid-file,ietf-system,2014-08-06,,
description,Example sid file
dependency,ietf-yang-types,2013-07-15
dependency,ietf-inet-types,2013-07-15
dependency,ietf-netconf-acm,2018-02-14
dependency,iana-crypt-hash,2014-08-06
range,1700,100
1700,module,ietf-system,
1701,identity,authentication-method,
1702,identity,local-users,
1703,identity,radius,
1704,identity,radius-authentication-type,
1705,identity,radius-chap,
1706,identity,radius-pap,
1707,feature,authentication,
1708,feature,dns-udp-tcp-port,
1709,feature,local-users,
1710,feature,ntp,
1711,feature,ntp-udp-port,
1712,feature,radius,
1713,feature,radius-authentication,
1714,feature,timezone-name,
1715,data,/ietf-system:set-current-datetime,
1775,data,/ietf-system:set-current-datetime/input,
1776,data,/ietf-system:set-current-datetime/input/current-datetime,
1717,data,/ietf-system:system,
1718,data,/ietf-system:system-restart,
1719,data,/ietf-system:system-shutdown,
1720,data,/ietf-system:system-state,
1721,data,/ietf-system:system-state/clock,
1722,data,/ietf-system:system-state/clock/boot-datetime,
1723,data,/ietf-system:system-state/clock/current-datetime,
1724,data,/ietf-system:system-state/platform,
1725,data,/ietf-system:system-state/platform/machine,
1726,data,/ietf-system:system-state/platform/os-name,
1727,data,/ietf-system:system-state/platform/os-release,
1728,data,/ietf-system:system-state/platform/os-version,
1729,data,/ietf-system:system/authentication,
1730,data,/ietf-system:system/authentication/user,
1731,data,/ietf-system:system/authentication/user-authentication-\
                                                               order,
1732,data,/ietf-system:system/authentication/user/authorized-key,
1733,data,/ietf-system:system/authentication/user/authorized-key/\
                                                           algorithm,
1734,data,/ietf-system:system/authentication/user/authorized-key/key\
                                                               -data,
1735,data,/ietf-system:system/authentication/user/authorized-key/\
                                                                name,
1736,data,/ietf-system:system/authentication/user/name,
1737,data,/ietf-system:system/authentication/user/password,
1738,data,/ietf-system:system/clock,
1739,data,/ietf-system:system/clock/timezone-name,
1740,data,/ietf-system:system/clock/timezone-utc-offset,
1741,data,/ietf-system:system/contact,
1742,data,/ietf-system:system/dns-resolver,
1743,data,/ietf-system:system/dns-resolver/options,
1744,data,/ietf-system:system/dns-resolver/options/attempts,
1745,data,/ietf-system:system/dns-resolver/options/timeout,
1746,data,/ietf-system:system/dns-resolver/search,
1747,data,/ietf-system:system/dns-resolver/server,
1748,data,/ietf-system:system/dns-resolver/server/name,
1749,data,/ietf-system:system/dns-resolver/server/udp-and-tcp,
1750,data,/ietf-system:system/dns-resolver/server/udp-and-tcp/\
                                                             address,
1751,data,/ietf-system:system/dns-resolver/server/udp-and-tcp/port,
1752,data,/ietf-system:system/hostname,
1753,data,/ietf-system:system/location,
1754,data,/ietf-system:system/ntp,
1755,data,/ietf-system:system/ntp/enabled,
1756,data,/ietf-system:system/ntp/server,
1757,data,/ietf-system:system/ntp/server/association-type,
1758,data,/ietf-system:system/ntp/server/iburst,
1759,data,/ietf-system:system/ntp/server/name,
1760,data,/ietf-system:system/ntp/server/prefer,
1761,data,/ietf-system:system/ntp/server/udp,
1762,data,/ietf-system:system/ntp/server/udp/address,
1763,data,/ietf-system:system/ntp/server/udp/port,
1764,data,/ietf-system:system/radius,
1765,data,/ietf-system:system/radius/options,
1766,data,/ietf-system:system/radius/options/attempts,
1767,data,/ietf-system:system/radius/options/timeout,
1768,data,/ietf-system:system/radius/server,
1769,data,/ietf-system:system/radius/server/authentication-type,
1770,data,/ietf-system:system/radius/server/name,
1771,data,/ietf-system:system/radius/server/udp,
1772,data,/ietf-system:system/radius/server/udp/address,
1773,data,/ietf-system:system/radius/server/udp/authentication-port,
1774,data,/ietf-system:system/radius/server/udp/shared-secret,

Acknowledgements

Rob Wilton, unknowingly, made us write this specification. We hope it will be useful. Laurent Toutain inspired the SID CDDL format with an example.

Authors' Addresses

Carsten Bormann
Universität Bremen TZI
Postfach 330440
D-28359 Bremen
Germany
Henk Birkholz
Fraunhofer SIT
Rheinstrasse 75
64295 Darmstadt
Germany