Skip to main content

Proposal for an Opt-Out Vocabulary
draft-keller-aipref-vocab-01

Document Type Active Internet-Draft (candidate for aipref WG)
Author Paul Keller
Last updated 2025-04-10 (Latest revision 2025-03-30)
RFC stream Internet Engineering Task Force (IETF)
Intended RFC status (None)
Formats
Additional resources Mailing list discussion
Stream WG state Call For Adoption By WG Issued
Document shepherd (None)
IESG IESG state I-D Exists
Consensus boilerplate Unknown
Telechat date (None)
Responsible AD (None)
Send notices to (None)
draft-keller-aipref-vocab-01
AI Preferences                                                 P. Keller
Internet-Draft                                               Open Future
Intended status: Informational                             28 March 2025
Expires: 29 September 2025

                   Proposal for an Opt-Out Vocabulary
                      draft-keller-aipref-vocab-01

Abstract

   This document proposes a standardized vocabulary of use cases that
   can be targeted when expressing machine-readable opt-outs related to
   Text and Data Mining (TDM) and AI training.  The vocabulary is
   agnostic to specific opt-out mechanisms and enables declaring parties
   to communicate restrictions or permissions regarding the use of their
   digital assets in a structured and interoperable manner.  It defines
   three key use cases—TDM, AI Training, and Generative AI
   Training—which can be referenced by opt-out systems to ensure
   consistent interpretation across different implementations.

About This Document

   This note is to be removed before publishing as an RFC.

   The latest revision of this draft can be found at
   https://paul2keller.github.io/opt-out-vocab-id/draft-keller-aipref-
   vocab.html.  Status information for this document may be found at
   https://datatracker.ietf.org/doc/draft-keller-aipref-vocab/.

   Discussion of this document takes place on the WG Working Group
   mailing list (mailto:ai-control@ietf.org), which is archived at
   https://mailarchive.ietf.org/arch/browse/ai-control/.  Subscribe at
   https://www.ietf.org/mailman/listinfo/ai-control/.

   Source for this draft and an issue tracker can be found at
   https://github.com/paul2keller/opt-out-vocab-id.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

Keller                  Expires 29 September 2025               [Page 1]
Internet-Draft                Opt-Out Vocab                   March 2025

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on 29 September 2025.

Copyright Notice

   Copyright (c) 2025 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents (https://trustee.ietf.org/
   license-info) in effect on the date of publication of this document.
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.  Code Components
   extracted from this document must include Revised BSD License text as
   described in Section 4.e of the Trust Legal Provisions and are
   provided without warranty as described in the Revised BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
   2.  Conventions and Definitions . . . . . . . . . . . . . . . . .   3
   3.  Definitions . . . . . . . . . . . . . . . . . . . . . . . . .   3
   4.  Vocabulary Structure  . . . . . . . . . . . . . . . . . . . .   3
   5.  Proposed Vocabulary . . . . . . . . . . . . . . . . . . . . .   4
     5.1.  Relationship with more specific instructions  . . . . . .   4
     5.2.  Relationship between categories . . . . . . . . . . . . .   4
   6.  Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . .   5
   7.  Security Considerations . . . . . . . . . . . . . . . . . . .   5
   8.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .   5
   9.  References  . . . . . . . . . . . . . . . . . . . . . . . . .   5
     9.1.  Normative References  . . . . . . . . . . . . . . . . . .   5
     9.2.  Informative References  . . . . . . . . . . . . . . . . .   6
   Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . .   6
   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . .   6

1.  Introduction

   The purpose of this document is to provide a common vocabulary that
   can be used for machine-readable opt-outs by parties who wish to
   restrict the use of their assets for the purpose of AI training and
   other forms of Text and Data Mining (TDM).

Keller                  Expires 29 September 2025               [Page 2]
Internet-Draft                Opt-Out Vocab                   March 2025

   The elements of the vocabulary can be used to describe, in a
   standardized way, the types of uses that a declaring party may wish
   to restrict (or allow), thereby ensuring that opt-outs can be
   communicated, processed and stored in a consistent and interoperable
   manner.

   The vocabulary is agnostic to the technical implementations of opt-
   out systems and is designed to ensure that opt-out information can be
   effectively exchanged between different systems.  The vocabulary is
   intended to govern the use of works in the context of training AI
   models and other forms of TDM but does not concern itself with the
   collection of training data (crawling).  In particular the vocabulary
   is not intended for expressing instructions or restrictions related
   to crawling for the purpose of building a search index, as there are
   already more specific standards and protocols for this purpose
   including but not limited to [RFC9309].

   The vocabulary is intended to both work in contexts where such opt-
   outs expressed to the declaring party give rise to legal obligation
   (such as rights reservation made by rightholders) and in contexts
   where this is not the case.  It is without prejudice to applicable
   laws and the applicability of exceptions and limitations.

2.  Conventions and Definitions

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described in
   BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
   capitals, as shown here.

3.  Definitions

   *  *Asset:* A digital file or stream of data, usually with associated
      metadata.

   *  *Declaring party:* The entity that expresses an opt-out with
      regards to an Asset.

4.  Vocabulary Structure

   The vocabulary consists of the overarching TDM (Text and Data Mining)
   category and a number of specific use cases that can be addressed
   independently.  The overarching category TDM is based on the
   definition of Text and Data Mining in Article 2(2) of [EUCD2019].

Keller                  Expires 29 September 2025               [Page 3]
Internet-Draft                Opt-Out Vocab                   March 2025

5.  Proposed Vocabulary

   The following categories are defined for use in the opt-out
   vocabulary:

   *  *TDM* : Text and Data Mining.  The act of using one or more assets
      in the context of any automated analytical technique aimed at
      analyzing text and data in digital form in order to generate
      information which includes but is not limited to patterns, trends
      and correlations.

   *  *AI Training* : The act of training AI models

   *  *Generative AI Training* : The act of training General Purpose AI
      models that have the capacity to generate text, images or other
      forms of synthetic content, or the act of training other types of
      AI models that have the purpose of generating text, images or
      other forms of synthetic content.

   This list of specific use cases may be expanded in the future, should
   a consensus emerge between stakeholders, to include categories that
   address additional use cases as they emerge.  In addition to these
   categories defined in the vocabulary, it is also expected that some
   systems implementing this vocabulary may extend this list with
   additional categories for their particular needs.

5.1.  Relationship with more specific instructions

   The vocabulary does not preclude the use of other specific
   categories.  Any opt-outs based on this vocabulary shall not be
   interpreted as restricting the use of the work(s) strictly for the
   purpose of search and discovery as long as no restriction is declared
   through search-specific means such as [RFC9309].

   When using this vocabulary more specific instructions — either based
   on the vocabulary or derived from other protocols — should be given
   preference over less specific ones.

5.2.  Relationship between categories

   The TDM category is the overarching category that includes the AI
   training category.  Generative AI training is a subset of the AI
   training category.  Both AI training and generative AI training are
   considered to be forms of TDM.  As such, when a Declaring Party opts
   out of TDM, they also opt out of these categories.  AI model
   developers processing opt-outs must therefore interpret an opt-out
   from TDM to also mean an opt-out from Generative AI Training and AI
   Training.

Keller                  Expires 29 September 2025               [Page 4]
Internet-Draft                Opt-Out Vocab                   March 2025

   The figure below shows the relationship between the currently defined
   categories:

+--------------------------------------------------------------------------+
|                                                                          |
|                          Text and Data Mining (TDM)                      |
|                                                                          |
| +--------------------------------------------+  +- - - - - - - - - - -+  |
| |  +--------------------------+              |  |                     |  |
| |  |                          |              |                           |
| |  |                          |              |  |    [possibly]:      |  |
| |  | Generative AI Training   |  AI Training |                           |
| |  |                          |              |  |  Other use cases    |  |
| |  |                          |              |                           |
| |  +--------------------------+              |  |                     |  |
| +--------------------------------------------+  +- - - - - - - - - - -+  |
|                                                                          |
+--------------------------------------------------------------------------+

              Figure 1: Overview of proposed vocabulary

   Systems referencing the vocabulary must not introduce additional
   categories that include existing categories defined in the vocabulary
   or otherwise include additional hierarchical relationships.

6.  Usage

   The vocabulary may be used by declaring that an opt-out system or
   entity expressing or processing opt-outs uses the terms defined in
   the "Proposed Vocabulary" section above, directly or via mappings, in
   accordance with how they are defined in this document.

7.  Security Considerations

   TODO Security

8.  IANA Considerations

   This document has no IANA actions.

9.  References

9.1.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <https://www.rfc-editor.org/rfc/rfc2119>.

Keller                  Expires 29 September 2025               [Page 5]
Internet-Draft                Opt-Out Vocab                   March 2025

   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
              May 2017, <https://www.rfc-editor.org/rfc/rfc8174>.

   [RFC9309]  Koster, M., Illyes, G., Zeller, H., and L. Sassman,
              "Robots Exclusion Protocol", RFC 9309,
              DOI 10.17487/RFC9309, September 2022,
              <https://www.rfc-editor.org/rfc/rfc9309>.

9.2.  Informative References

   [EUCD2019] European Union, "Directive (EU) 2019/790 of the European
              Parliament and of the Council of 17 April 2019 on
              copyright and related rights in the Digital Single
              Market", 17 May 2019,
              <https://eur-lex.europa.eu/eli/dir/2019/790/oj>.

Acknowledgments

   The following individuals have been involved in the drafting of the
   proposal:

   *  Cullen Miller, Spawing.ai

   *  Sebastian Posth, Liccium

   *  Leonard Rosenthol, Adobe

   *  Laurent Le Meur, EDRLab

Author's Address

   Paul Keller
   Open Future
   Email: paul@openfuture.eu

Keller                  Expires 29 September 2025               [Page 6]