Skip to main content

The RFC Prolog Database (Marc Petit-Huguenin)
slides-interim-2021-aidws-01-sessa-the-rfc-prolog-database-marc-petit-huguenin-00

Meeting Slides IAB Workshop on Analyzing IETF Data (aidws) Team
Date and time 2021-11-29 14:00
Title The RFC Prolog Database (Marc Petit-Huguenin)
State Active
Other versions plain text
Last updated 2023-01-24

slides-interim-2021-aidws-01-sessa-the-rfc-prolog-database-marc-petit-huguenin-00



Network Working Group                                  M. Petit-Huguenin
Internet-Draft                                    Impedance Mismatch LLC
Intended status: Experimental                          16 September 2021
Expires: 20 March 2022


                        The RFC Prolog Database
                   draft-petithuguenin-rfc-prolog-00

Abstract

   This document explores some techniques that can be used to mine
   various sources of data from the IETF for the purpose of analyzing
   how tools and formal description techniques are used at the IETF, how
   they contribute in fulfilling the IETF mission, and if an effort to
   popularize a more systematic use of tools and formal description
   techniques could improve the ability of the IETF to fulfill its
   mission.  The foundation for these techniques is a publicly available
   and actively maintained dataset, expressed as a Prolog database,
   named "RFC-Prolog".

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on 20 March 2022.

Copyright Notice

   Copyright (c) 2021 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents (https://trustee.ietf.org/
   license-info) in effect on the date of publication of this document.
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.  Code Components



Petit-Huguenin            Expires 20 March 2022                 [Page 1]

Internet-Draft                 RFC Prolog                 September 2021


   extracted from this document must include Simplified BSD License text
   as described in Section 4.e of the Trust Legal Provisions and are
   provided without warranty as described in the Simplified BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
   2.  Dataset Design  . . . . . . . . . . . . . . . . . . . . . . .   4
     2.1.  Immutable Metadata  . . . . . . . . . . . . . . . . . . .   5
       2.1.1.  The "rfc" Table . . . . . . . . . . . . . . . . . . .   5
       2.1.2.  The "updates" Table . . . . . . . . . . . . . . . . .   6
       2.1.3.  The "obsoletes" Table . . . . . . . . . . . . . . . .   6
       2.1.4.  The "keyword" Table . . . . . . . . . . . . . . . . .   7
       2.1.5.  The "abstract" Table  . . . . . . . . . . . . . . . .   7
       2.1.6.  The "reference" Table . . . . . . . . . . . . . . . .   7
     2.2.  Mutable Metadata  . . . . . . . . . . . . . . . . . . . .   8
       2.2.1.  The "metadatum" Table . . . . . . . . . . . . . . . .   8
       2.2.2.  The "bcp" Table . . . . . . . . . . . . . . . . . . .   8
       2.2.3.  The "std" Table . . . . . . . . . . . . . . . . . . .   8
       2.2.4.  The "fyi" Table . . . . . . . . . . . . . . . . . . .   9
       2.2.5.  The "erratum" Table . . . . . . . . . . . . . . . . .   9
       2.2.6.  The "errata" Directory  . . . . . . . . . . . . . . .   9
     2.3.  The Manual Tables . . . . . . . . . . . . . . . . . . . .   9
       2.3.1.  The "technique" Table . . . . . . . . . . . . . . . .   9
       2.3.2.  The "use" Table . . . . . . . . . . . . . . . . . . .  10
       2.3.3.  The "prevent" table . . . . . . . . . . . . . . . . .  10
   3.  Dataset Life Cycle  . . . . . . . . . . . . . . . . . . . . .  10
     3.1.  Update Schedule . . . . . . . . . . . . . . . . . . . . .  11
     3.2.  Manual Update . . . . . . . . . . . . . . . . . . . . . .  11
   4.  References  . . . . . . . . . . . . . . . . . . . . . . . . .  11
   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . .  12

1.  Introduction

   The main reason for building the dataset described in this document
   is to collect data on what we call "techniques", and that are not
   available from direct sources.

   "technique" is a vague term that can be explained with the following
   analogy.  If an (immutable) RFC was a dam (an immovable structure
   that separates a flow of water in two parts), then we could talk
   about "upstream" as the group of activities that led to the
   publication of an RFC.  Similarly we could talk about "downstream" as
   the group of activities that start from the publication of an RFC.
   The word "techniques" then covers activities, both upstream or
   downstream, that are or can be mechanized.





Petit-Huguenin            Expires 20 March 2022                 [Page 2]

Internet-Draft                 RFC Prolog                 September 2021


   We call the set of upstream techniques "tools".  This is the set of
   all tools that, at some point before the publication of an RFC,
   contributed to some part of that RFC.  Some examples of tools are:

   *  idnits.
   *  ABNF checkers.
   *  NS-2, NS-3.
   *  Programs written to extract examples from packet capture files.
   *  Security property provers.
   *  etc...

   We call the set of downstream techniques "fdts" for Formal
   Description Techniques.  This is the set of formalisms in an RFC that
   would permit to mechanize the activities related to an RFC.  Some
   examples of fdts are:

   *  Bit diagrams, from which code, tests and validators can be
      derived.
   *  ABNF, from which code, tests, and validators can be derived.
   *  Equations, that can be used to predict or validate various
      parameters.
   *  Examples, that can be used to partially validate a protocol
      implementation.
   *  State machine descriptions, from which code, test and validators
      can be derived.
   *  etc...

   It is expected that the analysis of the techniques used by each RFC,
   together with the analysis of the techniques that could have been
   used to prevent errata will bring some insight on the value of these
   techniques, and if there is a need to focus on improving the use of
   techniques at the IETF.

   More specifically these analyses are meant to guide the development
   of the Computerate Specifying paradigm
   [I-D.petithuguenin-computerate-specifying].  Computerate Specifying
   can be seen as a way to bridge upstream and downstream techniques, by
   not only bringing together tools and executing them as part of the
   generation of an Internet-Draft, but also by generating the fdts that
   are part of that Internet-Draft.  "Computerate Specifying" literally
   means adding computer processing to the act of writing a
   specification, an RFC in our case.









Petit-Huguenin            Expires 20 March 2022                 [Page 3]

Internet-Draft                 RFC Prolog                 September 2021


   An example of that is ABNF [RFC5234].  Traditionally ABNF is a
   downstream technique where a specific ABNF is assembled by hand and
   verified upstream with a tool like those listed at
   https://tools.ietf.org/. Because updating the ABNF when the normative
   text change, verifying it, and inserting it in the document use
   separate tools, it is easy to see how skipping one of these steps can
   lead to an incorrect result.

   An alternative is to use Computerate Specifying, which permits to
   describe an ABNF in a domain-specific language that is embedded in
   the source of an Internet-Draft, making its verification part of the
   processing of that source.  The same processing also generates the
   text of the ABNF that is inserted in the generated Internet-Draft.
   Because the origin of the ABNF is in the same source than the text it
   formalizes, discrepancies are less likely to happen.

   The analysis of the use of techniques in RFCs, or of the lack of use
   that resulted in errata, will guide which techniques should be
   supported by Computerate Specifying.

   In addition to the tables containing the metadata for the RFCs
   published by the RFC Editor, 3 additional tables need to be populated
   to be able to analyze techniques:

   *  A list of all techniques and their reference.
   *  A table that associates tools and fdts to each existing RFC.
   *  A table that lists, for each erratum, which tools and fdts could
      have been used to prevent that erratum.

   Some parts of these tables can be extracted from the other tables,
   but a large part will have to be manually entered.

   Although it is trivial to add a table that contains the authors of
   each RFC to the dataset, such table is not part of the dataset to
   discourage analyses of correlations between individuals and the
   various possible improvements that this dataset is meant to help
   discover.

2.  Dataset Design

   The RFC Prolog [RFC-Prolog] dataset is composed of a set of Prolog
   tables and files that are populated from various IETF sources and
   complemented by hand-filled tables.  Prolog was chosen because it
   permits to express both a database and the queries that can be run on
   it with the same language.  [Clocksin03] is the classic introduction
   book on Prolog, [O_Keefe90] and [Sterling94] completes the trilogy of
   books indispensables to the advanced Prolog programmer.




Petit-Huguenin            Expires 20 March 2022                 [Page 4]

Internet-Draft                 RFC Prolog                 September 2021


   The dataset is designed to be used with XSB 4.0 [XSB] because of its
   ability to handle large in-memory databases, but should be usable
   with other Prolog implementations.

   The dataset is composed of tables and a directory that can be grouped
   in 3 categories.

2.1.  Immutable Metadata

   The "rfc", "updates", "obsoletes", "keyword", "abstract", and
   "reference" tables contain the immutable metadata extracted from
   published RFCs.  The 5 first tables are grouped in the "rfcs.P" file,
   and the last one in the "references.P" file.

2.1.1.  The "rfc" Table

   The "rfc" table is a compound term composed of the following 9
   arguments:

   1.  The RFC number, without any prefix, as a Prolog number.
   2.  The title of the RFC, as a Prolog atom.
   3.  The stream that published that RFC, as a Prolog atom.  The
       current list of streams can be found with this program:

      <CODE BEGINS>
      [ordsets],
      [rfcs],
      findall(S, rfc(_, _, S, _, _, _, _, _, _), L),
      list_to_ordset(L, Streams).
      <CODE ENDS>

   4.  The status at the time of publication, as a Prolog atom.  The
       current list of statuses can be found with this program:

      <CODE BEGINS>
      [ordsets],
      [rfcs],
      findall(S, rfc(_, _, _, S, _, _, _, _, _), L),
      list_to_ordset(L, Statuses).
      <CODE ENDS>

   5.  The canonical format for the RFC, as a Prolog atom.  The current
       list of canonical formats can be found with this program:








Petit-Huguenin            Expires 20 March 2022                 [Page 5]

Internet-Draft                 RFC Prolog                 September 2021


      <CODE BEGINS>
      [ordsets],
      [rfcs],
      findall(F, rfc(_, _, _, _, F, _, _, _, _), L),
      list_to_ordset(L, Formats).
      <CODE ENDS>

   6.  The date of publication for the RFC, as a compound term with
       functor `date' and the year (a Prolog number) and month (a Prolog
       number) as arguments.
   7.  The name of the IETF Working Group that produced the RFC, as a
       Prolog atom.
   8.  The name of the IETF Area that produced the RFC, as a Prolog
       atom.
   9.  The name of the last Internet-Draft that immediately preceded the
       publication of the RFC, as an atom.

      |  NOTE: April Fool's RFC also contain the day of publication.
      |  The database will be updated to reflect that.

2.1.2.  The "updates" Table

   The "updates" table is a compound term composed of the following 2
   arguments:

   1.  The RFC number of the RFC that updated another RFC, as a Prolog
       number.
   2.  The RFC number of the RFC that was updated, as a Prolog number.

   The following program builds the list of all updating RFCs chains:

   <CODE BEGINS>
   :- [rfcs].

   update_chain(Rfc, List) :-
     updates(Rfc, Prev),
     update_chain(Prev, L),
     List = [Rfc|L].
   update_chain(Rfc, [Rfc]).
   <CODE ENDS>

2.1.3.  The "obsoletes" Table

   The "obsoletes" table is a compound term composed of the following 2
   arguments:

   1.  The RFC number of the RFC that obsoleted another RFC, as a Prolog
       number.



Petit-Huguenin            Expires 20 March 2022                 [Page 6]

Internet-Draft                 RFC Prolog                 September 2021


   2.  The RFC number of the RFC that was obsoleted, as a Prolog number.

2.1.4.  The "keyword" Table

   The "keyword" table is a compound term composed of the following 2
   arguments:

   1.  The RFC number of an RFC, as a Prolog number.
   2.  The keyword, as a Prolog atom.

2.1.5.  The "abstract" Table

   The "abstract" table is a compound term composed of the following 2
   arguments:

   1.  The RFC number of an RFC, as a Prolog number.
   2.  The abstract, as a Prolog atom.

2.1.6.  The "reference" Table

   The "reference" table is a compound term composed of the following 3
   arguments:

   1.  The RFC number, without any prefix, as a Prolog number.
   2.  The status of the reference, as a Prolog atom.  The current list
       of statuses can be found with this program:

      <CODE BEGINS>
      [ordsets],
      [references],
      findall(S, reference(_, S, _), L),
      list_to_ordset(L, Statuses).
      <CODE ENDS>

   3.  The referenced document, as a compound term with one argument.
       The functor determines the type of resources, the argument is the
       identifier for that resource.  The current list of all reference
       types can be found with this program:













Petit-Huguenin            Expires 20 March 2022                 [Page 7]

Internet-Draft                 RFC Prolog                 September 2021


      <CODE BEGINS>
      :- [ordsets].
      :- [references].

      types(T) :- reference(_, _, I),
        functor(I, T, _).

      findall(T, types(T), L),
        list_to_ordset(L, Types).
      <CODE ENDS>

2.2.  Mutable Metadata

   The "metadatum", "bcp", "std", "fyi", and "erratum" tables contains
   the mutable metadata extracted from files provided by the RFC Editor.
   The 4 first tables are grouped in the "metadata.P" file, and the last
   one in the "errata.P" file.

2.2.1.  The "metadatum" Table

   The "metadatum" table is a compound term composed of the following 2
   arguments:

   1.  The RFC number, without any prefix, as a Prolog number.
   2.  The current status for the RFC, as a Prolog atom.

      |  NOTE: One missing information in that table is the current
      |  email address that should be used to discuss the RFC.

2.2.2.  The "bcp" Table

   The "bcp" table is a compound term composed of the following 2
   arguments:

   1.  The RFC number, without any prefix, as a Prolog number.
   2.  The current BCP number for that RFC, as a Prolog number.

2.2.3.  The "std" Table

   The "std" table is a compound term composed of the following 2
   arguments:

   1.  The RFC number, without any prefix, as a Prolog number.
   2.  The current STD number for that RFC, as a Prolog number.







Petit-Huguenin            Expires 20 March 2022                 [Page 8]

Internet-Draft                 RFC Prolog                 September 2021


2.2.4.  The "fyi" Table

   The "fyi" table is a compound term composed of the following 2
   arguments:

   1.  The RFC number, without any prefix, as a Prolog number.
   2.  The current FYI number for that RFC, as a Prolog number.

2.2.5.  The "erratum" Table

   The "erratum" table is a compound term composed of the following 7
   arguments:

   1.  The Erratum number, without any prefix, as a Prolog number.
   2.  The list of formats this erratum applies to, as a list of Prolog
       atoms.
   3.  The RFC number this erratum applies to, without any prefix, as a
       Prolog number.
   4.  The name of the reporter for this erratum, as a Prolog atom.
   5.  The date the erratum was reported, as a compound term made of the
       "date" functor and the year, month and day, all 3 as Prolog
       numbers.
   6.  The type of the Erratum.
   7.  The current status of the erratum.  If the status was modified,
       then the status is a compound term, with the name of the verifier
       as first argument and, if available, the date of the modification
       as second argument.

2.2.6.  The "errata" Directory

   The text of each erratum is stored as an individual html file in the
   errata directory.  The name of the file is "errata/
   erratum<eid>.html", with <eid> replaced by the erratum identifier.

2.3.  The Manual Tables

   The "technique", "usage", and "prevention" tables contains either
   manually entered facts, or the result of queries on the other tables.
   Each table is stored in its own file, respectively "techniques.P",
   "usages.P", and "preventions.P".

      |  NOTE: The format of these tables will probably change.

2.3.1.  The "technique" Table

   The "technique" table is a compound term composed of the following 3
   arguments:




Petit-Huguenin            Expires 20 March 2022                 [Page 9]

Internet-Draft                 RFC Prolog                 September 2021


   1.  The name of the technique, which is a one argument compound term
       which functor is either "tool" or "fdt" and whose argument is the
       name of the technique, as a Prolog atom.
   2.  An indication if the reference is solely about the technique or
       is defined in a document that is mostly about something else,
       respectively as a "standalone" atom or an "adhoc" atom.
   3.  The reference of the technique, using the same format that is
       used in the "reference" table.

2.3.2.  The "use" Table

   The "use" table is a compound term composed of the following 3
   arguments:

   1.  The RFC number, without any prefix, as a Prolog number.
   2.  An indication if the technique is used by reference or redefined,
       respectively as a "reference" atom or as a "redefine" atom.
   3.  The name of the technique, which is a one argument compound term
       which functor is either "tool" or "fdt" and whose argument is the
       name of the technique, as a Prolog atom.

2.3.3.  The "prevent" table

   1.  The name of the technique, which is a one argument compound term
       which functor is either "tool" or "fdt" and whose argument is the
       name of the technique, as a Prolog atom.
   2.  The Erratum number, without any prefix, as a Prolog number.

3.  Dataset Life Cycle

   The dataset is distributed as a git repository that can be cloned
   with the following command:

   git clone git://shalmaneser.org/rfc-prolog

   This git repository is mirrored in various locations over the world.
   The "dig +dnssec txt shalmaneser.org" command returns the GPS
   coordinates in decimal degrees and shalmaneser.org subdomain for each
   of these locations.  This can be used to find the closest location
   and substitute the subdomain in the git URL above.

      |  NOTE: The git repository does not currently contain the manual
      |  tables.  These will be added at the same time than the
      |  conclusions for that work will be submitted for public review.
      |  The rfc-prolog dataset is distributed without these tables in
      |  case other parties want to use it for their own analysis.





Petit-Huguenin            Expires 20 March 2022                [Page 10]

Internet-Draft                 RFC Prolog                 September 2021


3.1.  Update Schedule

   New RFCs, new erratum and modifications to mutable metadata require
   to keep that dataset up-to-date.  New tables or code processing
   refinements should also be distributed in a timely manner.

   The git repository is updated with a new commit that covers these
   changes each Monday before 5:00pm PT until November 29 2021.  After
   this date the dataset will be updated each Saturday before 5:00pm PT.

   The date of the next update is also inserted in the comment of the
   latest commit.

3.2.  Manual Update

   The code that is used to build the dataset is distributed together
   with the dataset, so the dataset can continue to be updated in case
   the current maintainer is unable to do so.

   The process to update the dataset is described in the README.adoc
   file distributed in the git repository.

4.  References

   [Clocksin03]
              Clocksin, W. F. and C. S. Mellish, "Programming in
              Prolog", Berlin ; New York:Springer-Verlag, 2003.

   [I-D.petithuguenin-computerate-specifying]
              Petit-Huguenin, M., "The Computerate Specifying Paradigm",
              Work in Progress, Internet-Draft, draft-petithuguenin-
              computerate-specifying, 6 September 2021,
              <https://datatracker.ietf.org/doc/draft-petithuguenin-
              computerate-specifying/>.

   [O_Keefe90]
              Keefe, R. A. O., "The Craft of Prolog", Cambridge:MIT
              Press, 1990.

   [RFC-Prolog]
              Petit-Huguenin, M., "The RFC Prolog Dataset",
              <git://shalmaneser.org/rfc-prolog>.

   [RFC5234]  Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax
              Specifications: ABNF", RFC 5234, DOI 10.17487/RFC5234,
              January 2008, <https://www.rfc-editor.org/info/rfc5234>.





Petit-Huguenin            Expires 20 March 2022                [Page 11]

Internet-Draft                 RFC Prolog                 September 2021


   [Sterling94]
              Sterling, L. and E. Y. Shapiro, "The Art of Prolog",
              Cambridge, Mass:MIT Press, 1994.

   [XSB]      "XSB", <http://xsb.sourceforge.net/>.

Author's Address

   Marc Petit-Huguenin
   Impedance Mismatch LLC

   Email: marc@petit-huguenin.org
   URI:   hallway@jabber.ietf.org/MPH






































Petit-Huguenin            Expires 20 March 2022                [Page 12]