INTERNET-DRAFT                               Charles H. Lindsey
Usenet Format Working Group                  University of Manchester
                                             February 2000

                          News Article Format
                   <draft-ietf-usefor-article-03.txt>

Status of this Memo

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC2026.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six
   months and may be updated, replaced, or obsoleted by other
   documents at any time.  It is inappropriate to use Internet-
   Drafts as reference material or to cite them other than as "work
   in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

Abstract

   This Draft defines the format of Netnews articles and specifies
   the requirements to be met by software which originates,
   distributes, stores and displays them. It is intended as a
   standards track document, superseding RFC 1036, which itself dates
   from 1987.

   Since the 1980s, Usenet has grown explosively, and many Internet and
   non-Internet sites now participate. In addition, this technology is
   now in widespread use for other purposes.

   Backward compatibility has been a major goal of this endeavour, but
   where this standard and earlier documents or practices conflict, this
   standard should be followed. In most such cases, current practice is
   already compatible with these changes.

[The use of the words "this standard" within this document when
referring to itself do not imply that this draft yet has pretensions to
be a standard, but rather indicates what will become the case if and
when it is accepted as an RFC with the status of a proposed or draft
standard.]




C. H. Lindsey                                                   [Page 1]


                          News Article Format              February 2000

[Remarks enclosed in square brackets and aligned with the left margin,
such as this one, are not part of this draft, but are editorial notes to
explain matters amongst ourselves, or to point out alternatives, or to
indicate work yet to be done.]

[Please note that this Draft describes "Work in Progress". Much remains
to be done, though the material included so far is unlikely to change in
any major way.]



                           Table of Contents



1.  Introduction ..................................................    5
  1.1.  Basic Concepts ............................................    5
  1.2.  Objectives ................................................    6
  1.3.  Historical Outline ........................................    6
  1.4.  Transport .................................................    6
2.  Definitions, Notations and Conventions ........................    7
  2.1.  Definitions.  .............................................    7
  2.2.  Textual Notations .........................................    8
  2.3.  Relation To Mail and MIME .................................    9
  2.4.  Syntax Notation ...........................................   10
  2.5.  Language ..................................................   12
3.  Changes to the existing protocols .............................   13
  3.1.  Principal Changes .........................................   13
  3.2.  Transitional Arrangements .................................   13
4.  Basic Format ..................................................   15
  4.1.  Syntax of News Articles ...................................   15
  4.2.  Headers ...................................................   16
    4.2.1.  Names and Contents ....................................   16
    4.2.2.  Header Properties .....................................   17
      4.2.2.1.  Experimental Headers ..............................   17
      4.2.2.2.  Inheritable Headers ...............................   18
      4.2.2.3.  Local Headers .....................................   18
      4.2.2.4.  Variant Headers ...................................   18
    4.2.3.  White Space and Continuations .........................   18
    4.2.4.  Comments ..............................................   19
    4.2.5.  Undesirable Headers ...................................   20
  4.3.  Body ......................................................   20
    4.3.1.  Body Format Issues ....................................   20
    4.3.2.  Body Conventions ......................................   21
  4.4.  Characters and Character Sets .............................   23
    4.4.1.  Character Sets within Article Headers .................   23
    4.4.2.  Character Sets within Article Bodies ..................   24
  4.5.  Size Limits ...............................................   24
  4.6.  Example ...................................................   25
5.  Mandatory Headers .............................................   26
  5.1.  Date ......................................................   26
    5.1.1.  Examples ..............................................   27
  5.2.  From ......................................................   27
    5.2.1.  Examples:  ............................................   27

C. H. Lindsey                                                   [Page 2]


                          News Article Format              February 2000

  5.3.  Message-ID ................................................   27
  5.4.  Subject ...................................................   28
    5.4.1.  Examples ..............................................   29
  5.5.  Newsgroups ................................................   29
    5.5.1.  Forbidden newsgroup names .............................   31
  5.6.  Path ......................................................   32
    5.6.1.  Format ................................................   32
    5.6.2.  Adding a path-identity to the Path header .............   32
    5.6.3.  The tail-entry ........................................   34
    5.6.4.  Delimiter Summary .....................................   34
    5.6.5.  Suggested Verification Methods ........................   35
    5.6.6.  Example ...............................................   36
6.  Optional Headers ..............................................   37
  6.1.  Reply-To ..................................................   37
    6.1.1.  Examples ..............................................   37
  6.2.  Sender ....................................................   38
  6.3.  Organization ..............................................   38
  6.4.  Keywords ..................................................   38
  6.5.  Summary ...................................................   38
  6.6.  Distribution ..............................................   38
  6.7.  Followup-To ...............................................   40
  6.8.  References ................................................   40
    6.8.1.  Examples ..............................................   41
  6.9.  Expires ...................................................   41
  6.10.  Archive ..................................................   41
  6.11.  Control ..................................................   41
  6.12.  Approved .................................................   42
  6.13.  Replaces / Supersedes ....................................   42
    6.13.1.  Syntax and Semantics .................................   43
    6.13.2.  Message-ID version procedure .........................   44
      6.13.2.1.  Message version numbers ..........................   44
      6.13.2.2.  Implementation and Use Note ......................   46
      6.13.2.3.  The Message-Version NNTP extension ...............   47
      6.13.2.4.  Examples .........................................   48
  6.14.  Xref .....................................................   49
  6.15.  Lines ....................................................   50
  6.16.  User-Agent ...............................................   50
    6.16.1.  Examples .............................................   51
  6.17.  MIME headers .............................................   51
    6.17.1.  Syntax ...............................................   51
    6.17.2.  Content-Transfer-Encoding ............................   52
    6.17.3.  Content-Type .........................................   52
      6.17.3.1.  Message/partial ..................................   53
      6.17.3.2.  Message/rfc822 ...................................   53
      6.17.3.3.  Message/external-body ............................   54
      6.17.3.4.  Multipart types ..................................   54
    6.17.4.  Character Sets .......................................   54
    6.17.5.  Content Disposition ..................................   55
    6.17.6.  Definition of some new Content-Types .................   55
      6.17.6.1.  Application/news-transmission ....................   55
      6.17.6.2.  Message/news withdrawn ...........................   56
  6.18.  Obsolete Headers .........................................   56
7.  Control Messages ..............................................   57
  7.1.  The 'newgroup' Control Message ............................   57

C. H. Lindsey                                                   [Page 3]


                          News Article Format              February 2000

    7.1.1.  The Body of the 'newgroup' Control Message ............   58
    7.1.2.  Application/news-groupinfo ............................   58
    7.1.3.  Initial Articles ......................................   60
    7.1.4.  Example ...............................................   61
  7.2.  The 'rmgroup' Control Message .............................   62
    7.2.1.  Example ...............................................   62
  7.3.  The 'mvgroup' Control Message .............................   62
    7.3.1.  Single group ..........................................   62
    7.3.2.  Multiple Groups .......................................   63
    7.3.3.  Examples ..............................................   64
  7.4.  The 'checkgroups' Control Message .........................   65
    7.4.1.  Application/news-checkgroups ..........................   66
  7.5.  Cancel ....................................................   66
  7.6.  Ihave, sendme .............................................   68
  7.7.  Obsolete control messages.  ...............................   69
8.  Duties of Various Agents ......................................   69
  8.1.  General principles to be followed .........................   69
  8.2.  Duties of an Injecting Agent ..............................   70
    8.2.1.  Proto-articles ........................................   70
    8.2.2.  Procedure to be followed by Injecting Agents ..........   70
  8.3.  Duties of a Relaying Agent ................................   72
  8.4.  Duties of a Serving Agent .................................   73
  8.5.  Duties of a Posting Agent .................................   73
  8.6.  Duties of a Followup Agent ................................   74
  8.7.  Duties of a Gateway .......................................   74
9.  Security Considerations .......................................   74
  9.1.  Attacks ...................................................   75
10.  References ...................................................   75
11.  Acknowledgements .............................................   77
12.  Contact Addresses ............................................   77
13.  Intellectual Property Rights .................................   78
Appendix A.1 - A-News Article Format ..............................   79
Appendix A.2 - Early B-News Article Format ........................   79
Appendix B - Collected Syntax .....................................   79





















C. H. Lindsey                                                   [Page 4]


                          News Article Format              February 2000

1.  Introduction

1.1.  Basic Concepts

   "Netnews" is a set of protocols for generating, storing and
   retrieving news "articles" (which resemble mail messages) and for
   exchanging them amongst a readership which is potentially widely
   distributed. It is organized around "newsgroups," with the
   expectation that each reader will be able to see all articles posted
   to each newsgroup in which he participates. These protocols most
   commonly use a flooding algorithm which propagates copies throughout
   a network of participating servers.  Typically, only one copy is
   stored per server, and each server makes it available on demand to
   readers able to access that server.

   An important characteristic of Netnews is the lack of any requirement
   for a central administration or for the establishment of any
   controlling host to manage the network. A network which limits
   participation to some restricted set of hosts (within some company,
   for example) is a "closed" network; otherwise it is an "open"
   network. A set of hosts within a network which, by mutual
   arrangement, operates some variant (whether more or less restrictive)
   of the Netnews protocols is a "cooperating subnet".

   "Usenet" is a particular worldwide open network based upon the
   Netnews protocols, with the newsgroups being organised into
   recognized "hierarchies".  Anybody can join (it is simply necessary
   to negotiate an exchange of articles with one or more other
   participating hosts). Usenet "belongs" to those who administer the
   hosts of which it is comprised. There is no Cabal with overall
   authority to direct what is to be be allowed. Nevertheless, there do
   exist agencies within Usenet that have authority to establish
   policies and to perform administrative functions, but such authority
   derives solely from the consent of those sites which choose to
   recognise it (and who can decline to exchange articles with sites
   which choose not to recognise it). Usually, the authority of such an
   agency is restricted to a particular hierarchy, or group of
   hierarchies.

   A "policy" is a rule intended to facilitate the smooth operation of a
   network by establishing parameters which restrict behaviour that,
   whilst technically unexceptionable, would nevertheless contravene
   some accepted standard of "Good Netkeeping". Since the ultimate
   beneficiaries of a network are its human readers, who will be less
   tolerant of poorly designed interfaces than mere computers, articles
   in breach of established policy can cause considerable annoyance to
   their recipients.

   Policies may well vary from network to network, from hierarchy to
   hierarchy within one network, and even between individual newsgroups
   within one hierarchy. It is assumed, for the purposes of this
   standard, that agencies with varying degrees of authority to
   establish such policies will exist, and that where they do not,
   policy will be established by mutual agreement.  For the benefit of

C. H. Lindsey                                                   [Page 5]


                          News Article Format              February 2000

   networks and hierarchies without such established agencies, and to
   provide a basis upon which all agencies can build, this present
   standard often provides default policy parameters, usually
   introducing them by a phrase such as "As a matter of policy ...".

1.2.  Objectives

   The purpose of this present standard is to define the protocols to be
   used for Netnews in general, and for Usenet in particular, and to set
   standards to be followed by software that implements those protocols.

   It is NOT the purpose of this standard to define how the authority of
   various agencies to exercise control or oversight of the various
   parts of Usenet is established (that is itself a matter of policy).
   Nevertheless, it is assumed that such authorities will exist, and
   tools are provided within the protocols for their use.

1.3.  Historical Outline

   Network news originated as the medium of communication for Usenet,
   circa 1980.  Since then, Usenet has grown explosively, and many
   Internet and non-Internet sites participate in it.  In addition, the
   news technology is now in widespread use for other purposes, on the
   Internet and elsewhere.

   The earliest news interchange used the so-called "A News" article
   format.  Shortly thereafter, an article format vaguely resembling
   Internet mail was devised and used briefly.  Both of those formats
   are completely obsolete; they are documented in A.1 for historical
   reasons only.  With publication of [RFC 850] in 1983, news articles
   came to closely resemble Internet mail messages, with some
   restrictions and some additional headers. [RFC 1036] in 1987 updated
   [RFC 850] without making major changes.
[There should also be some mention of B News and its Appendix.
Alternatively, these appendices may go into some separate informational
RFC.]

   A Draft popularly referred to as "Son of 1036" [Son-of-1036] was
   written in 1994 by Henry Spencer. That document formed the original
   basis for this standard. Much is taken directly from Son of 1036, and
   it is hoped that we have followed its spirit and intentions.

1.4.  Transport

   As in this standard's predecessors, the exact means used to transmit
   articles from one host to another is not specified. NNTP [NNTP] is
   the most common transmission method on the Internet, but much
   transmission takes place entirely independent of the Internet. Other
   methods in use include the UUCP protocol [RFC 976] extensively used
   in the early days of Usenet, FTP, downloading via satellite, tape
   archives, and physically delivered magnetic and optical media.




C. H. Lindsey                                                   [Page 6]


                          News Article Format              February 2000

2.  Definitions, Notations and Conventions

2.1.  Definitions.

   An "article" is the unit of news, analogous to a [MESSFOR] "message".
   A "proto-article" is one that has not yet been injected into the news
   system.

   A "message identifier",) is a unique identifier for an article,
   usually supplied by the "posting agent" which posted it or, failing
   that, by the "injecting agent".  It distinguishes the article from
   every other article ever posted anywhere.  Articles with the same
   message identifier are treated as if they are the same article
   regardless of any differences in the body or headers.

   A "newsgroup" is a single news forum, a logical bulletin board,
   having a name and nominally intended for articles on a specific
   topic.  An article is "posted to" a single newsgroup or several
   newsgroups.  When an article is posted to more than one newsgroup, it
   is said to be "crossposted"; note that this differs from posting the
   same text as part of each of several articles, one per newsgroup.

   A newsgroup may be "moderated", in which case submissions are not
   posted directly, but mailed to a "moderator" for consideration and
   possible posting.  Moderators are typically human but may be
   implemented partially or entirely in software.

   A "hierarchy" is the set of all newsgroups whose names share a first
   component (as defined in 5.5).  The term "sub-hierarchy" is also used
   where several initial components are shared.

   A "poster" is the person or software that composes and submits a
   possibly compliant article to a "posting agent". The poster is
   analogous to [MESSFOR]'s author(s).

   A "posting agent" is the software that assists posters to prepare
   proto-articles, in compliance with this standard. The proto-article
   is then passed on to an "injecting agent" for final checking and
   injection into the news stream.  If the article is not compliant, or
   is rejected by the injecting agent, then the posting agent informs
   the poster with an explanation of the error.

   A "reader" is the person or software reading news articles.

   A "reading agent" is software which presents articles to a reader.

   A "followup" is an article containing a response to the contents of
   an earlier article (the followup's "precursor").

   A "followup agent" is a combination of reading agent and posting
   agent that aids in the preparation and posting of a followup.




C. H. Lindsey                                                   [Page 7]


                          News Article Format              February 2000

   An article's "reply address" is the address to which mailed replies
   should be sent. This is the address specified in the article's From
   header (5.2), unless it also has a Reply-To header (6.1).

   A "reply agent" is a combination of reading agent and mailer that
   aids in the preparation and posting of an email response to an
   article.

   A "sender" is the person or software (usually, but not always, the
   same as the poster) responsible for the operation of the posting
   agent or, which amounts to the same thing, for passing the article to
   the injecting agent. The sender is analogous to [MESSFOR]'s sender.

   An "injecting agent" takes the finished article from the posting
   agent (often via the NNTP "post" command) performs some final checks
   and passes it on to a relaying agent for general distribution.

   A "relaying agent" is software which receives allegedly compliant
   articles from injecting agents and/or other relaying agents, and
   possibly passes copies on to other relaying agents and serving
   agents.

   A "news database" is the set of articles and related strutural
   information stored by a serving agent and made available for access
   by reading agents.

   A "serving agent" receives an article from a relaying agent and files
   it in a news database. It also provides an interface for reading
   agents to access the news database.

   A "control message" is an article which is marked as containing
   control information; a relaying or serving agent receiving such an
   article may (subject to the policies observed at that site) take
   actions beyond just filing and passing on the article.

   A "gateway" is software which receives news articles and converts
   them to messages of some other kind (e.g. mail to a mailing list), or
   vice versa; in essence it is a translating relaying agent that
   straddles boundaries between different methods of message exchange.
   The most common type of gateway connects newsgroup(s) to mailing
   list(s), either unidirectionally or bidirectionally, but there are
   also gateways between news networks using this standard's news format
   and those using other formats.

2.2.  Textual Notations

   This standard contains explanatory NOTEs using the following format.
   These may be skipped by persons interested solely in the content of
   the specification.  The purpose of the notes is to explain why
   choices were made, to place them in context, or to suggest possible
   implementation techniques.




C. H. Lindsey                                                   [Page 8]


                          News Article Format              February 2000

        NOTE: While such explanatory notes may seem superfluous in
        principle, they often help the less-than-omniscient reader grasp
        the purpose of the specification and the constraints involved.
        Given the limitations of natural language for descriptive
        purposes, this improves the probability that implementors and
        users will understand the true intent of the specification in
        cases where the wording is not entirely clear.

   "ASCII" is short for "the ANSI X3.4 character set" [ANSI X3.4].
   While "ASCII" is often misused to refer to various character sets
   somewhat similar to X3.4, in this standard "ASCII" means X3.4 and
   only X3.4. ASCII is a 7 bit character set. Please note that this
   standard requires that all agents be 8 bit clean; that is, they must
   accept and transmit data without changing or omitting the 8th bit.

   Certain words, when capitalized, are used to define the significance
   of individual requirements. The key words "MUST", "SHOULD", "MAY" and
   the same words followed by "NOT" are to be interpreted as described
   in [RFC 2119].

        NOTE: The use of "MUST" always implies a requirement that would
        lead to interoperability problems if not followed, but the word
        "SHOULD", especially when it is applied to actions of posting
        and similar agents which the individual poster may easily
        override, is often used where a violation would do no more than
        breach established policy, or accepted standards of "Good
        Netkeeping". Moreover, even a "MUST" requirement imposed on a
        relaying or serving agent applies only to articles actually
        processed by that agent (since such an agent may always reject
        any article entirely for reasons of site policy).

   All numeric values are given in decimal unless otherwise indicated.
   Octets are assumed to be unsigned values for this purpose.

   Throughout this standard we will give examples of various
   definitions, headers and other specifications. It needs to be be
   remembered that these samples are for the aid of the reader only and
   do NOT define any specification themselves.  In order to prevent
   possible conflict with "Real World" entities and people the top level
   domain of ".example" is used in all sample domains and addresses.
   The hierarchy of example.* is also used as a sample hierarchy.
   Information on the ".example" top level domain is in [RFC 2606].

2.3.  Relation To Mail and MIME

   The primary intent of this standard is to describe the news article
   format.  Insofar as news articles are a subset of the Mail message
   format augmented by some new headers, this standard incorporates many
   (though not all) of the provisions of [MESSFOR], with the aim of
   enabling news articles to pass through mail systems and vice versa,
   provided only that they contain the minimum headers required for the
   mode of transport being used. Unfortunately, the match is not
   perfect, but it is the intention of this standard that gateways
   between Mail and News should be able to operate with the minimum of

C. H. Lindsey                                                   [Page 9]


                          News Article Format              February 2000

   tinkering.
[This standard has been designed to fit on top of the drafts currently
in preparation for Mail [MESSFOR].  It is expected that those drafts
will have progressed to the RFC stage by the time the present standard
in complete, at which time all references to [MESSFOR] in the present
text will be replaced by references to that RFC.]

   Likewise, this standard incorporates many (though not all) of the
   provisions of the MIME standards [RFC 2045] et seq which, though
   designed with Mail in mind, are mostly applicable to News.

2.4.  Syntax Notation

   This standard uses the Augmented Backus Naur Form described in [RFC
   2234].  A discussion of this is outside the bounds of this standard,
   but it is expected that implementors will be able to quickly
   understand it with reference to the defining document.

   Much of the syntax of News Articles is based on the corresponding
   syntax defined in [MESSFOR] or in the Mime specifications [RFC 2045]
   et seq, which is deemed to have been incorporated into this standard
   as required.  However, there are some important differences arising
   from the fact that [MESSFOR] does not recognise anything other than
   US-ASCII characters, that it does not recognise the MIME headers [RFC
   2045], and that it includes much syntax described as "obsolete".

        NOTE:  News parsers historically have been much less permissive
        than Mail parsers, and this is reflected in the modifications
        referred to, and in some further specific rules.

   The following syntactic forms therefore supersede the corresponding
   rules given in [MESSFOR] and [RFC 2045], thus allowing UTF-8
   characters [RFC 2044] to appear in certain contexts (the four rules
   begining with "strict-" reflect the corresponding original rules from
   [MESSFOR]).

      UTF8-xtra-head  = %d192-253
      UTF8-xtra-tail  = %d128-191
      UTF8-xtra-char  = UTF8-xtra-head 1*UTF8-xtra-tail
      text            = %d1-9 /            ; all UTF-8 characters except
                        %d11-12 /          ; US-ASCII NUL, CR and LF
                        %d14-127 /
                        UTF8-xtra-char
      ctext           = NO-WS-CTL /        ; all of <text> except
                        %d33-39 /          ; SP, HTAB, "(", ")"
                        %d42-91 /          ; and "\"
                        %d93-126 /
                        UTF8-xtra-char
      qtext           = NO-WS-CTL /        ; all of <text> except
                        %d33 /             ; SP, HTAB, "\" and DQUOTE
                        %d35-91 /
                        %d93-126 /
                        UTF8-xtra-char


C. H. Lindsey                                                  [Page 10]


                          News Article Format              February 2000

      utext           = NO-WS-CTL /        ; Non white space controls
                        %d33-126 /         ; The rest of US-ASCII
                        UTF8-xtra-char
      strict-text     = %d1-9 /            ; text restricted to
                        %d11-12 /          ; US-ASCII
                        %d14-127
      strict-qtext    = NO-WS-CTL /        ; qtext restricted to
                        %d33 /             ; US-ASCII
                        %d35-91 /
                        %d93-127
      strict-quoted-pair
                      = "\" strict-text
      strict-quoted-string
                      = [CFWS] DQUOTE
                           *([FWS] (strict-qtext / strict-quoted-pair))
                           [FWS] DQUOTE [CFWS]

        NOTE: There are sequences of octets which cannot legitimately
        occur in UTF-8, even a few permitted by the above syntax.  These
        SHOULD NOT be generated by posting agents but, where they occur
        inadavertently, they SHOULD be passed on untouched by other
        agents.

   Wherever in this standard the syntax is stated to be taken from
   [MESSFOR], it is to be understood as the syntax defined by [MESSFOR]
   after making the above changes, but NOT including any syntax defined
   in section 4 ("Obsolete syntax") of [MESSFOR].  Software compliant
   with this standard MUST NOT generate any of the syntactic forms
   defined in that Obsolete Syntax, although it MAY accept such
   syntactic forms. Certain syntax from the MIME specifications [RFC
   2045] et seq is also considered a part of this standard (see 6.17).

   The following syntactic forms, taken from [RFC 2234] or from
   [MESSFOR], are repeated here for convenience only:

      ALPHA           = %x41-5A /          ; A-Z
                        %x61-7A            ; a-z
      CR              = %x0D               ; carriage return
      CRLF            = CR LF
      DIGIT           = %x30-39            ; 0-9
      HTAB            = %x09               ; horizontal tab
      LF              = %x0A               ; line feed
      SP              = %x20               ; space
      NO-WS-CTL       = %d1-8 /            ; US-ASCII control characters
                        %d11 /             ; which do not include the
                        %d12 /             ; carriage return, line feed,
                        %d14-31 /          ; and whitespace characters
                        %d127
      WSP             = SP / HTAB          ; Whitespace characters
      FWS             = ([*WSP CRLF] 1*WSP); Folding whitespace





C. H. Lindsey                                                  [Page 11]


                          News Article Format              February 2000

      atext           = ALPHA / DIGIT /
                        "!" / "#" /        ; Any character except
                        "$" / "%" /        ; controls SP, and specials.
                        "&" / "'" /        ; Used for atoms
                        "*" / "+" /
                        "-" / "/" /
                        "=" / "?" /
                        "^" / "_" /
                        "`" / "}" /
                        "|" / "}" /
                        "~"
      atom            = [CFWS] 1*atext [CFWS]
      dot-atom        = [CFWS] dot-atom-text [CFWS]
      dot-atom-text   = 1*atext *( "." 1*atext )
      comment         = "(" *([FWS]
                           (ctext / quoted-pair / comment)) [FWS] ")"
      CFWS            = *([FWS] comment) (([FWS] comment) / FWS )
      DQUOTE          = %d34              ; quote mark
      quoted-pair     = "\" text
      quoted-string   = [CFWS] DQUOTE
                           *([FWS] (qtext / quoted-pair))
                           [FWS] DQUOTE [CFWS]
      unstructured    = *( [FWS] utext ) [FWS]

        NOTE: CFWS occurs at many places in the syntax in order to allow
        comments and extra whitespace to be inserted almost anywhere.
        The syntax is in fact ambiguous insofar as it may be impossible
        to tell in which of several possible ways a given comment or WS
        was produced. However, this does not lead to semantic ambiguity
        because, unless specifically stated otherwise, the presence of
        absence of a comment or additional WS has no semantic meaning
        and, in particular, it is a matter of indifference whether it
        forms a part of the syntactic construct preceding it or the one
        following it.

        NOTE: Following [RFC 2234], literal text included in the syntax
        is to be regarded as case-insensitive.  However, in
        contradistinction to [MESSFOR], the Netnews protocols are
        sensitive to case in some instances (as in newsgroup names, some
        header parameters, etc.). Care has been taken to indicate this
        explicitly where required.

2.5.  Language

   Various constant strings in this standard, such as header names and
   month  names,  are derived from English words.  Despite their
   derivation, these words do NOT change when the  poster or  reader
   employing them is interacting in a language other than English.
   Posting and reading agents  MAY translate as  appropriate  in  their
   interaction  with  the poster or reader, but the forms that actually
   appear in  articles MUST be the English-derived ones defined in this
   standard.



C. H. Lindsey                                                  [Page 12]


                          News Article Format              February 2000

3.  Changes to the existing protocols

   This standard prescribes many changes, clarifications and new
   features since the protocols described in [RFC 1036] and [Son-of-
   1036].  It is the intention that they can be assimilated into Usenet
   as it presently operates without major interruption to the service,
   though some of the new features may not begin to show benefit until
   they become widely implemented. This section summarizes the main
   changes, and comments on some features of the transition.

3.1.  Principal Changes

     o The [MESSFOR] conventions for parenthesis-enclosed comments in
       headers are supported.
     o Whitespace is permitted in Newsgroups headers, permitting folding
       of such headers. Indeed, all news headers can now be folded.
     o An enhanced syntax for the Path header enables the injection
       point of and the route taken by an article to be determined with
       certainty.
     o Netnews is firmly established as an 8bit medium.
     o Large parts of MIME are recognised as an integral part of
       Netnews.
     o The charset for headers is always UTF-8. This will, inter alia,
       permit newsgroup-names with non-ASCII characters.
     o There is a new Control command 'mvgroup' to facilitate moving a
       group to a different place (name) in a hierarchy.
     o There are several new headers defined, such as Replaces and
       Author-Ids, leading to increased functionality.
     o There are numerous other small changes, clarifications and
       enhancements.
[Doubtless many other changes should be listed, but there is little
point in doing so until our text is nearing completion. The above gives
the flavour of what should be said.]

3.2.  Transitional Arrangements

   An important distinction must be made between serving and relaying
   agents which are responsible for the distribution and storage of news
   articles, and user agents which are responsible for interactions with
   users. It is important that the former should be upgraded to conform
   to this standard as soon as possible to provide the benefit of the
   enhanced facilities.  Fortunately, the number of distinct
   implementations of such agents is rather small, at least so far as
   the main "backbone" of Usenet is concerned, and many of the new
   features are already supported. Contrariwise, there are a great
   number of implementations of user agents, installed on a vastly
   greater number of small sites. Therefore, the new functionality has
   been designed so that existing agents may continue to be used,
   although the full benefits may not be realised until a substantial
   proportion of them have been upgraded.

   In the list which follows, care has been taken to distinguish the
   implications for both kinds of agent.


C. H. Lindsey                                                  [Page 13]


                          News Article Format              February 2000

     o [MESSFOR] style comments in headers do not affect serving and
       relaying agents (note that the Newsgroups and Path headers do not
       contain them). They are unlikely to hinder their proper display
       in existing user agents except in the case of the References
       header in agents which thread articles. Therefore, it is provided
       that they SHOULD NOT be generated except where permitted by the
       previous standards.
     o Because of its importance to all serving agents, the extension
       permitting whitespace and folding in Newsgroup headers SHOULD NOT
       be used until it has been widely deployed amongst relaying
       agents. User agents are unaffected.
     o The new style of Path header is already consistent with the
       previous standards. However, the intention is that relaying
       agents should henceforth reject articles in the old style, and so
       this should be offered as a configurable option for relaying
       agents. User agents are unaffected.
[Should that "should" be a "SHOULD" or a "MAY".]

     o The vast majority of serving, relaying and transport agents are
       believed to be already 8bit clean (in the slightly restricted
       sense in which that term is used in the MIME standards). User
       agents that do not implement MIME may be disadvantaged, but no
       more so than at present when faced with 8bit characters (which
       currently abound in spite of the previous standards).
     o The introduction of MIME reflects a practice that is already
       widespread.  Articles in strict compliance with the previous
       standards (using strict US-ASCII) will be unaffected. Many user
       agents already support it, at least to the extent of widely used
       charsets such as ISO-8859-1. Users expecting to read articles
       using the more exotic charsets will need to acquire suitable
       reading agents. It is not intended, in general, that any single
       user agent will be able to display every charset known to IANA,
       but all such agents MUST support US-ASCII. Serving and relaying
       agents are not affected.
     o The use of the UTF-8 charset for headers will not affect any
       existing usage, since US-ASCII is a strict subset of UTF-8.
       Insofar as newsgroup names containing non-ASCII characters can
       now be expected to arise, support from serving and relaying
       agents will be necessary. It is believed that the customary
       storage structure used by serving agents can already cope
       (perhaps not ideally) with such names. Note that it is not
       necessary for serving and relaying agents to understand all the
       characters available in UTF-8, though it is desirable for them to
       be displayable for diagnostic purposes via some escape mechanism
       using, for example, the visible subset of US-ASCII. For users
       expecting to use the more exotic charsets available under UTF-8,
       the remarks already made in connection with MIME will apply.
     o The new Control: mvgroup command will need to be implemented in
       serving agents. It SHOULD be used in conjunction with pairs of
       matching rmgroup and newgroup commands (injected shortly after
       the mvgroup) until such time as mvgroup is widely implemented.
       The new Replaces header is also effectively a Control command,
       and transitional arrangements are provided which should be used
       in the meantime. User agents are unaffected.

C. H. Lindsey                                                  [Page 14]


                          News Article Format              February 2000

     o The headers newly introduced by this standard can safely be
       ignored by existing software, albeit with loss of the new
       functionality.

4.  Basic Format

4.1.  Syntax of News Articles

   The overall syntax of a news article is:

      article           = 1*header separator body
      header            = header-name ":" 1*SP header-content CRLF
      header-name       = 1*name-character *( "-" 1*name-character )
      name-character    = ALPHA / DIGIT
      header-content    = USENET-header-content
                               *( ";" header-parameter ) /
                          other-header-content
      USENET-header-content
                        = <the header-content defined in this standard
                           (or an extension of it) for a specific
                           USENET header>
      other-header-content
                        = <a header-content defined (explicitly or
                           implicitly) by some other standard>
      header-parameter  = USENET-header-parameter /
                          other-header-parameter
      USENET-header-parameter
                        = <an other-header-parameter defined in
                           this standard for use in conjunction with
                           a specific USENET-header-content>
      other-header-parameter
                        = attribute "=" value
      attribute         = USENET-token / iana-token / x-token
      value             = token / quoted-string
      USENET-token      = <A token defined in this standard for
                           use in conjunction with a specific
                           USENET-header-parameter>
      iana-token        = <A token defined in an experimental
                           or standards-track RFC and registered with
                           IANA>
      x-token           = [CFWS] <the two characters "X-" or "x-"
                           followed, with no intervening white space,
                           by any token>
      token             = [CFWS] 1*<any (US-ASCII) CHAR except SP,
                                    CTLs or tspecials> [CFWS]
      tspecials         = "(" / ")" / "<" / ">" / "@" /
                          "," / ";" / ":" / "\" / DQUOTE /
                          "/" / "[" / "]" / "?" / "="
      separator         = CRLF
      body              = *( *998text CRLF )

   An article consists of some headers followed by a body. An empty line
   separates the two. The headers contain structured information about
   the article and its transmission. A header begins with a header-name

C. H. Lindsey                                                  [Page 15]


                          News Article Format              February 2000

   identifying it, and can be continued onto subsequent lines as
   described in section 4.2.3.  The body is largely unstructured text
   significant only to the poster and the readers.

        NOTE: Terminology here follows the current custom in the news
        community, rather than the [MESSFOR] convention of referring to
        what is here called a "header" as a "header-field" or "field".

   Note that the separator line must be truly empty, not just a line
   containing white space. Further empty lines following it are part of
   the body, as are empty lines at the end of the article.

        NOTE: The syntax above defines the canonical form of a news
        article as a sequence of lines each terminated by CRLF. This
        does not prevent serving agents or transport agents from storing
        or handling the article in other formats (e.g. using a single LF
        in place of CRLF) so long as the overall effects achieved are as
        defined by this standard when operating on the canonical form.

4.2.  Headers

4.2.1.  Names and Contents

   Despite the restrictions on header-name syntax imposed by the
   grammar, relayers and reading agents SHOULD tolerate header names
   containing any US-ASCII printable character other than colon (":",
   ASCII 58).
[To bring it into line with <optional-field> as given in [MESSFOR].]

   Header-names SHOULD be either those for which a USENET-header-content
   is defined in this standard, or those defined in [MESSFOR], or those
   defined in any extension to either of these standards including, in
   particular, the Mime standards [RFC 2045] et seq., or experimental
   headers beginning with "X-" (as defined in 4.2.2.1).  Software SHOULD
   NOT attempt to interpret headers not described in this standard or in
   its extensions, but relaying agents MUST pass them on unaltered and
   reading agents MUST enable them to be displayed, at least optionally.

   The possibility of allowing header-parameters to appear in all
   headers is provided mainly for the purpose of allowing future
   extensions to existing headers, since only a very few USENET-header-
   parameters are actually defined in this standard. Observe that such
   header-parameters do not, in general, occur in headers defined in
   other standards, except for the Mime standards [RFC 2045] et seq. and
   their extensions. Nevertheless, compliant software MUST accept all
   such header-parameters in headers defined by this standard and its
   extensions (ignoring them if their meaning is unknown) and SHOULD
   accept (and ignore) them in all headers.
[but what about
address = mailbox / group
group = phrase ":" [mailbox-list] ";"
Does the following NOTE cover the situation?]



C. H. Lindsey                                                  [Page 16]


                          News Article Format              February 2000

        NOTE: The presence of a ";" in a header-content does not
        indicate the presence of a header-parameter in the few
        situations where it can be parsed as part of some USENET-
        header-content or other-header-content.

   On the other hand, posting agents SHOULD NOT generate them (even
   those using x-tokens) except in those headers for which a USENET-
   header-parameter has been defined, or where that usage is permitted
   by some other standard (notably one of the Mime standards). This
   restriction is likely to removed in a future version of this
   standard.

        NOTE: The given syntax is ambiguous insofar as a USENET-header-
        content that is defined to be <unstructured> could contain,
        within that <unstructured>, text of the form <*(";" header-
        parameter)>. The intention is therefore that any such apparent
        header-parameters are to be regarded as part of the
        <unstructured>. This standard therefore does not (and extensions
        to it SHOULD NOT) define any USENET-header-parameter to be
        associated with such an unstructured USENET-header-content.

   The order of headers in an article is not significant. However,
   posting agents are encouraged to put mandatory headers (section 5)
   first, followed by optional headers (section 6), followed by
   experimental headers and headers not defined in this standard or its
   extensions. Relaying agents MUST NOT change the order of the headers
   in an article, though they MAY add additional headers, preferably
   either before or after all the existing ones.

   Header-names are case-insensitive. There is a preferred case
   convention, which posters and posting agents SHOULD use: each
   hyphen-separated "word" has its initial letter (if any) in uppercase
   and the rest in lowercase, except that some abbreviations have all
   letters uppercase (e.g. "Message-ID" and "MIME-Version"). The forms
   used in this standard are the preferred forms for the headers
   described herein. Relaying and reading agents MUST, however, tolerate
   articles not obeying this convention.

4.2.2.  Header Properties

   There are four special properties that may apply to particular
   headers, namely: "experimental", "inheritable", "local", and
   "variant". When a header is defined, in this (or any future)
   standard, as having one (or possibly more) of these properties, it is
   subject to special treatment, as indicated below.

4.2.2.1.  Experimental Headers

   Experimental headers are those whose header-names begin with "X-".
   They are to be used for experimental Netnews features, or for
   enabling additional material to be propagated with an article. There
   are no established headers that are considered experimental headers;
   an established header cannot be experimental.


C. H. Lindsey                                                  [Page 17]


                          News Article Format              February 2000

        NOTE: Some such headers may eventually be adopted as standard by
        some extension to this standard, at which point they will lose
        their "X-" prefix.

4.2.2.2.  Inheritable Headers

   Subject only to the overriding ability of the poster to determine the
   contents of the headers in a proto-article, headers with the
   inheritable property MUST be copied by followup agents (perhaps with
   some modification) into the followup article, and headers without
   that property MUST NOT be so copied.  Examples include:
     o Newsgroups (5.5) - copied from the precursor, subject to any
       Followup-To header.
     o Subject (5.4) - modified by prefixing with "Re: ", but otherwise
       copied from the precursor.
     o References (6.8) - copied from the precursor, with the addition
       of the precursor's Message-ID.
     o Distribution (6.6) - copied from the precursor.

        NOTE: The Keywords header is not inheritable, though some older
        newsreaders treated it as such.

4.2.2.3.  Local Headers

   Headers with the local property are significant only to a particular
   serving agent (or perhaps a cooperating group of such agents). They
   MAY be removed by relaying agents before propagation, and MUST be
   removed (and replaced as necessary) by serving agents when received.
   The replaced header MAY be placed anywhere within the headers (though
   placing it first is recommended). The principle example is:
     o Xref (6.14) - used to keep track of the article locators of
       crossposted articles so that newsreaders can mark such articles
       as read.

4.2.2.4.  Variant Headers

   Headers with the variant property are modified as articles are
   propagated. The modified header MAY be placed anywhere within the
   headers (though placing it first is recommended). The principle
   example is:
     o Path (5.6) - augmented at each relaying agent that an article
       passes through.

4.2.3.  White Space and Continuations

[The following text is taken from [MESSFOR], adapted to the different
terminology used for this standard.]

   Each header is logically a single line of characters comprising the
   header-name, the colon with its following SP, and the header-content.
   For convenience, however, the header-content can be split into a
   multiple line representation; this is called "folding". The general
   rule is that wherever this standard allows for FWS or CFWS (but not
   simply SP or HTAB) a CRLF may be inserted before any WSP. For

C. H. Lindsey                                                  [Page 18]


                          News Article Format              February 2000

   example, the header:
      Approved: modname@modsite.example (Moderator of comp.foo.bar)
   can be represented as:
      Approved: modname@modsite.example
         (Moderator of comp.foo.bar)

        NOTE: Though header-contents are defined in such a way that
        folding can take place between many of the lexical tokens (and
        even within some of them), folding SHOULD be limited to placing
        the CRLF at higher-level syntactic breaks, and SHOULD also avoid
        leaving trailing WSP on the preceding line. For instance, if a
        header-content is defined as comma-separated values, it is
        recommended that folding occur after the comma separating the
        structured items, even if it is allowed elsewhere.

   Folding MUST NOT be carried out in such a way that any line of a
   header is made up entirely of WSP characters and nothing else.

   The colon following the header name on the first line MUST be
   followed by a WSP, even if the header is empty. If the header is not
   empty, at least some of the content MUST appear on the first line
   (this is to avoid the possibility of harm by any non-compliant agent
   that might eliminate a trailing SP). Posting agents MUST enforce
   these restrictions, but relaying agents SHOULD accept even articles
   that violate them.

        NOTE: This standard differs from [MESSFOR] in requiring that WSP
        followng the colon (it was also an [RFC 1036] requirement).

   Posters and posting agents SHOULD use SP, not HTAB, where white space
   is desired in headers (some existing software expects this), and MUST
   use SP immediately following the colon after a header-name. Relaying
   agents SHOULD accept HTAB in all such cases, however.

   Since the white space beginning a continuation line remains a part of
   the logical line, headers can be "broken" into multiple lines only at
   FWS or CFWS. Posting agents SHOULD NOT break headers unnecessarily
   (but see 4.5).

4.2.4.  Comments

   Strings of characters which are treated as comments may be included
   in header-contents wherever the syntactic element CFWS occurs. They
   consist of characters enclosed in parentheses. Such strings are
   considered comments so long as they do not appear within a quoted-
   string. Comments may be nested.

   A comment is normally used to provide some human readable
   informational text, except at the end of an address which contains no
   phrase, as in
      fred@foo.bar.example (Fred Bloggs)
   as opposed to
      "Fred Bloggs" <fred@foo.bar.example> .


C. H. Lindsey                                                  [Page 19]


                          News Article Format              February 2000

   The former is a deprecated, but commonly encountered, usage and
   reading agents SHOULD take special note of such comments as
   indicating the name of the person whose address it is. In all other
   situations a comment is semantically interpreted as a single SP.
   Since a comment is allowed to contain FWS, folding is permitted
   within it as well as immediately preceding and immediately following
   it. Also note that, since quoted-pair is allowed in a comment, the
   parenthesis and backslash characters may appear in a comment so long
   as they appear as a quoted-pair. Semantically, the enclosing
   parentheses are not part of the comment content; the content is what
   is contained between the two parentheses.

   Since comments have not hitherto been permitted in news articles,
   except in a few specified places, posters and posting-agents SHOULD
   NOT insert them except in those places, namely following addresses in
   From and similar headers, and to indicate the name of the timezone in
   Date headers.  However, compliant software MUST accept them in all
   places where they are syntactically allowed.

4.2.5.  Undesirable Headers

   A header whose content is empty is said to be an empty header.
   Relaying and reading agents SHOULD NOT consider presence or absence
   of an empty header to alter the semantics of an article (although
   syntactic rules, such as requirements that certain header names
   appear at most once in an article, MUST still be satisfied). Posting
   and injecting agents SHOULD delete empty headers from articles before
   posting them; relaying agents MUST pass them untouched.

   Headers that merely state defaults explicitly (e.g., a Followup-To
   header with the same content as the Newsgroups header, or a Mime
   Content-Type header with contents "text/plain; charset=us-ascii") or
   state information that reading agents can typically determine easily
   themselves (e.g.  the length of the body in octets) are redundant and
   posters and posting agents SHOULD NOT include them.

4.3.  Body

4.3.1.  Body Format Issues

   The body of an article MAY be empty, although posting agents SHOULD
   consider this an error condition (meriting returning the article to
   the poster for revision). A posting or injecting agent which does not
   reject such an article SHOULD issue a warning message to the poster
   and supply a non-empty body.  Note that the separator line MUST be
   present even if the body is empty.

        NOTE: Some existing news software is known to react badly to
        body-less articles, hence the request for posting and injecting
        agents to insert a body in such cases. The sentence "This
        article was probably generated by a buggy news reader" has
        traditionally been used is this situation.



C. H. Lindsey                                                  [Page 20]


                          News Article Format              February 2000

   Note that an article body is a sequence of lines terminated by CRLFs,
   not arbitrary binary data, and in particular it MUST end with a CRLF.
   However, relaying agents SHOULD treat the body of an article as an
   uninterpreted sequence of octets (except as mandated by changes of
   CRLF representation and by control-message processing) and SHOULD
   avoid imposing constraints on it. See also section 4.5.

   Posters SHOULD avoid using control characters in US-ASCII (or other
   CCSs) except for tab (ASCII 9), formfeed (ASCII 12), and backspace
   (ASCII 8). Tab signifies sufficient horizontal white space to reach
   the next of a set of fixed positions; posters are warned that there
   is no standard set of positions, so tabs should be avoided if precise
   spacing is essential. Formfeed (which is sometimes referred to as the
   "spoiler character") signifies a point at which a reading agent
   SHOULD pause and await reader interaction before displaying further
   text.  Backspace SHOULD be used only for underlining, done by a
   sequence of underscores (ASCII 95) followed by an equal number of
   backspaces, signifying that the same number of text characters
   following are to be underlined. Posters are warned that underlining
   is not available on all output devices and is best not relied on for
   essential meaning. Reading agents SHOULD recognize underlining and
   translate it to the appropriate commands for devices that support it.
   Reading agents MUST NOT pass other control characters or escape
   sequences unaltered to the output device.

4.3.2.  Body Conventions

   A body is by default an uninterpreted sequence of octets for most of
   the purposes of this standard. However, a Mime Content-Type header
   may impose some structure or intended interpretation upon it, and may
   also specify the character set in accordance with which the octets
   are to be interpreted.

   It is a common practice for followup agents to enable the
   incorporation of the followed-up article (the "precursor") as a
   quotation. This SHOULD be done by prefacing each line of the quoted
   text (even if it is empty) with the character ">" (or perhaps with
   "> " in the case of a previously unquoted line). This will result in
   multiple levels of ">" when quoted content itself contains quoted
   content, and it will also facilitate the automatic analysis of
   articles.

        NOTE: Posters should edit quoted context to trim it down to the
        minimum necessary. However, followup agents SHOULD NOT attempt
        to enforce this beyond issuing a warning (past attempts to do so
        have been found to be notably counter-productive).

   The followup agent SHOULD also precede the quoted content by an
   "attribution line" (however, readers are warned not to assume that
   they are accurate, especially within multiply nested quotations). The
   following convention for such lines, whilst not mandated by this
   standard, is intended to facilitate their automatic recognition and
   processing by sophisticated reading agents. The attribution SHOULD
   contain the name or the email address of the precursor's poster, as

C. H. Lindsey                                                  [Page 21]


                          News Article Format              February 2000

   in
      Joe D. Bloggs <jdbloggs@foo.example> wrote:
   or
      Helmut Schmidt <helmut@bar.example> schrieb:

   The attribution MAY contain also a single Newsgroup name (the one
   from which the followup is being made), the precursor's Message-ID
   and/or the precursor's Date and Time. Any of these that are present,
   SHOULD precede the name and/or email address. However, the inclusion
   or not of such fields SHOULD always be under the control of the
   poster.

   To enable this line, and the Message-ID and the Email address within
   it, to be recognised (for example to enable suitable reading agents
   to retrieve the precursor or email its poster by clicking on them),
   the following conventions SHOULD be observed:
     o The precursor's Message-ID SHOULD be enclosed within <...> or
       <news:...>
     o The precursor's poster's Email address SHOULD be enclosed within
       <...>
     o The various fields may be separated by arbitrary text and they
       may be folded in the same way as headers, but attributions SHOULD
       always be terminated by a ":" followed by CRLF.

   Further examples:

      On comp.foo in <1234@bar.example> on 24 Dec 1997 16:40:20 +0000,
         Joe D. Bloggs <jdbloggs@bar.example> wrote:

      Am 24. Dez 1997 schrieb Helmut Schmidt <helmut@bar.example>:

   A "personal signature" is a short closing text automatically added to
   the end of articles by posting agents, identifying the poster and
   giving his network addresses, etc. If a poster or posting agent does
   append such a signature to an article, it MUST be preceded with a
   delimiter line containing (only) two hyphens (ASCII 45) followed by
   one SP (ASCII 32). The signature is considered to extend from the
   last occurrence of that delimiter up to the end of the article (or up
   to the end of the part in the case of a multipart Mime body).
   Followup agents, when incorporating quoted text from a precursor,
   SHOULD NOT include the signature in the quotation. Posting agents
   SHOULD discourage (at least with a warning) signatures of excessive
   length (4 lines is a commonly accepted limit).

        NOTE: It is undesirable to have more than one personal signature
        in an article body (even though the rule above admits the
        possibility by recognising only the last one). If, for some
        reason, a second signature is considered necessary, it MAY be
        preceded by a different delimiter (e.g.  "--- ").
[That is Clive's suggestion. Not to be included without further
support.]




C. H. Lindsey                                                  [Page 22]


                          News Article Format              February 2000

4.4.  Characters and Character Sets

   Transmission paths for news articles MUST treat news articles as
   uninterpreted sequences of octets, excluding the values 0 (ASCII NUL)
   and 13 and 10 (ASCII CR and LF, which MUST ONLY appear in the
   combination CRLF which denotes a line separator).

        NOTE: this correspponds to the range of octets permitted for
        Mime "8bit data" [RFC 2045].  Thus raw binary data cannot be
        transmitted in an article body except by the use of a Content-
        Transfer-Encoding such as base64.

   An octet, or a sequence of octets, may represent a character in some
   Coded Character Set (CCS) as determined by some Character Encoding
   Scheme (CES) [RFC 2130].

   If it comes to a relaying agent's attention that it is being asked to
   pass an article using the Content-Transfer-Encoding "8bit" to a
   relaying agent that does not support it, it SHOULD report this error
   to its administrator. It MUST refuse to pass the article and MUST NOT
   re-encode it with different Mime encodings.

        NOTE: This strategy will do little harm. The target relaying
        agent is unlikely to be able to make use of the article on its
        own servers, and the usual flooding algorithm will likely find
        some alternative route to get the article to destinations where
        it is needed.

4.4.1.  Character Sets within Article Headers

   Within article headers, the CES is UTF-8 [ISO 10646] or [RFC 2279]
   and hence the CCS is the Universal Multiple-Octet Coded Character Set
   (UCS) [ISO 10646] (which is essentially a superset of Unicode
   [UNICODE] and expected to remain so). However, interpreting the
   octets directly as US-ASCII characters should ensure correct
   behaviour in most situations.

        NOTE: UTF-8 is an encoding for 16bit (and even 32bit) character
        sets with the property that any octet less than 128 immediately
        represents the corresponding US-ASCII character, thus ensuring
        upwards compatibility with previous practice.  Non-ASCII
        characters from UCS are represented by sequences of octets
        satisfying the syntax of a UTF8-xtra-char (2.4).  Only those
        octet sequences explicitly permitted by [RFC 2044] shall be
        used.  UCS includes all characters from the ISO-8859 series of
        characters sets [ISO 8859] (which includes all Greek and Arabic
        characters) as well as the more elaborate characters used in
        Japan and China. See the following section for the appropriate
        treatment of UCS characters by reading agents.

   Notwithstanding the great flexibility permitted by UTF-8, there is
   need for restraint in its use in order that the essential components
   of headers may be discerned using reading agents that cannot present
   the full UCS range. In particular, header-names and tokens MUST be in

C. H. Lindsey                                                  [Page 23]


                          News Article Format              February 2000

   US-ASCII, and certain other components of headers, as defined
   elsewhere in this standard - notably msg-ids, date-times, dot-atoms,
   domains and path-identities - MUST be in US-ASCII.  Comments, phrases
   (as in addresses) and unstructureds (as in Subject headers) MAY use
   the full range of UTF-8 characters. For newsgroup-names see 5.5.

   Where the use of non-ASCII characters, encoded in UTF-8, is permitted
   as above, they MAY also be encoded using the Mime mechanism defined
   in [RFC 2047], but this usage is deprecated within news articles
   (even though it is required in mail messages) since it is less
   legible in older reading agents which support neither it nor UTF-8.
   Nevertheless, reading agents SHOULD support this usage, but only in
   those contexts explicitly mentioned in [RFC 2047].

4.4.2.  Character Sets within Article Bodies

   Within article bodies, the CES and CCS implied by any Content-
   Transfer-Encoding and Content-Type headers [RFC 2045] SHOULD be
   applied by reading agents. In the absence of such headers, reading
   agents cannot be relied upon to display correctly more than the US-
   ASCII characters.
[Observe that reading agents are not forbidden to "guess", or to
interpret as UTF-8 regardless, which would be the simplest course for
them to take.]

        NOTE: It is not expected that reading agents will necessarily be
        able to present characters in all possible character sets,
        although they MUST be able to present all US-ASCII characters.
        For example, a reading agent might be able to present only the
        ISO-8859-1 (Latin 1) characters [ISO 8859], in which case it
        SHOULD present undisplayable characters using some distinctive
        glyph, or by exhibiting a suitable warning. Older reading agents
        that do not understand Mime headers or UTF-8 should be able to
        display bodies in US-ASCII (with some loss of human
        comprehensibility) except possibly when the Content-Transfer-
        Encoding is "8bit".

   Followup agents MUST be careful to apply appropriate encodings to the
   outbound followup. A followup to an article containing non-ASCII
   material is very likely to contain non-ASCII material itself.

4.5.  Size Limits

   Posting agents SHOULD endeavour to keep all header lines, so far as
   is possible, within 79 characters by folding them at suitable places
   (see 4.2.3).  However, posting agents MUST permit the poster to
   include longer headers if he so insists, and compliant software MUST
   support headers of at least 998 octets. Likewise, injecting agents
   SHOULD fold any headers generated automatically by themselves.
   Relaying agents MUST NOT fold headers (i.e. they must pass on the
   folding as received).




C. H. Lindsey                                                  [Page 24]


                          News Article Format              February 2000

        NOTE: There is NO restriction on the number of lines into which
        a header may be split, and hence there is NO restriction on the
        total length of a header (in particular it may, by suitable
        folding, be made to exceed the 998 octets restriction pertaining
        to a single header line).

   The syntax provides for the lines of a body to be up to 998 octets in
   length, not including the CRLF. All software compliant with this
   standard MUST support lines of at least that length, both in headers
   and in bodies, and all such software SHOULD support lines of
   arbitrary length. In particular, relaying agents MUST transmit lines
   of arbitrary length without truncation or any other modification.

        NOTE: The limit of 998 octets is consistent with the
        corresponding limit in [MESSFOR].

   In plain-text messages (those with no Mime headers, or those with a
   Mime Content-Type of text/plain) posting agents SHOULD endeavour to
   keep the length of body lines within some reasonable limit. The size
   of this limit is a matter of policy, the default being to keep within
   79 characters at most, and preferably within 72 characters (to allow
   room for quoting in followups).  Exceptionally, posting agents SHOULD
   NOT adjust the length of quoted lines in followups unless they are
   able to reformat them in a consistent manner.  Moreover, posting
   agents MUST permit the poster to include longer lines if he so
   insists.

        NOTE: Plain-text messages are intended to be displayed "as-is"
        without any special action (such as automatic line splitting) on
        the part of the recipient. The policy limit (e.g. 72 or 79)
        should be expressed as a number of characters (as they will be
        displayed by a reading agent) rather than as the number of
        octets used to encode them.

        NOTE: This standard provides no upper bound on the overall size
        of a single article, but neither does it forbid relaying agents
        from dropping articles of excessive length. It is, however,
        suggested that any limits thought appropriate by particular
        agents would be more appropriately expressed in megabytes than
        in kilobytes.

4.6.  Example

   Here is a sample article:

      Path: server.example/unknown.site2.example@site2.example/
        relay.site.example/site.example/injector.site.example%jsmith
      Newsgroups: example.announce,example.chat
      Message-ID: <9urrt98y53@site.example>
      From: Ann Example <a.example@site1.example>
      Subject: Announcing a new sample article.
      Date: Fri, 27 Mar 1998 12:12:50 +1300
      Approved: example.announce moderator <jsmith@site.example>
      Followup-To: example.chat

C. H. Lindsey                                                  [Page 25]


                          News Article Format              February 2000

      Reply-To: Ann Example <a.example+replies@site1.example>
      Expires: Wed, 22 Apr 1998 12:12:50 -0700
      Organization: Site1, The Number one site for examples.
      User-Agent: ExampleNews/3.14 (Unix)
      Keywords: example, announcement, standards, RFC 1036, Usefor
      Summary: The URL for the next standard.


      Just a quick announcemnt that a new standard example article has
      been released; it is in the new USEFOR draft obtainable from
      ftp.ietf.org.
      Ann.

      --
      Ann Example <a.example@site1.example>   Sample Poster to the Stars
      "The opinions in this article are bloody good ones" - J. Clarke.

5.  Mandatory Headers

   An article MUST have one, and only one, of each of the following
   headers: Date, From, Message-ID, Subject, Newsgroups, Path.

   Note also that there are situations, discussed in the relevant parts
   of section 6, where References, Sender, or Approved headers are
   mandatory. In control messages, specific values are required for
   certain headers.

   For the overall syntax of headers, see section 4.1.  In the
   discussions of the individual headers, the content of each is
   specified using the syntax notation. The convention used is that the
   content of, for example, the Subject header is defined as <Subject-
   content>.

   A proto-article (see 8.2.1) may lack some of these mandatory headers,
   but they MUST then be supplied by the injecting agent.

5.1.  Date

   The Date header contains the date and time that the article was
   prepared by the poster ready for transmission and SHOULD express the
   poster's local time. The content syntax makes use of syntax defined
   in [MESSFOR].

      Date-content        = date-time

        NOTE: It is a useful convention to follow the date-time with a
        comment containing the time zone in human-readable form. The use
        of folding in a date-time is deprecated, even though permitted
        by [MESSFOR].

   In order to prevent the reinjection of expired articles into the news
   stream, relaying and serving agents MUST refuse articles whose Date
   header predates the earliest articles of which they normally keep
   record, or which is more than 24 hours into the future (though they

C. H. Lindsey                                                  [Page 26]


                          News Article Format              February 2000

   MAY use a margin less than that 24 hours). Relaying agents MUST NOT
   modify the Date header in transit.

5.1.1.  Examples

      Date: Fri, 2 Apr 1999 20:20:51 -0500 (EST)
      Date: 26 May 1999 16:13 +0000

5.2.  From

   The From header contains the electronic address(es), and possibly the
   full name, of the article's author(s). The content syntax makes use
   of syntax defined in [MESSFOR], subject to the following revised
   definition of local-part.

      From-content        = mailbox-list
      addr-spec           = local-part "@" domain
      local-part          = dot-atom / strict-quoted-string

        NOTE: This syntax ensures that the local-part of an addr-spec is
        restricted to pure US-ASCII (and is thus in strict compliance
        with [MESSFOR]), whilst allowing any UTF-8 character to be used
        in a preceding quoted-string containing the author's full name.
        If some future extension to the Mail protocols should relax this
        restriction, one would expect the Netnews protocols to follow.

   Any mailbox in the From-content MUST belong to one of the poster(s)
   of the article, or be a mailbox which he is authorized by its owner
   to use, or be an address which ends in the top level domain of
   ".invalid" [RFC 2606].

5.2.1.  Examples:

      From: John Smith <jsmith@site.example>
      From: "John Smith" <jsmith@site.example>, dave@isp.example
      From: "John D. Smith" <jsmith@site.example>, andrew@isp.example,
         fred@site2.example
      From: Jan Jones <jan@please_setup_your_system_correctly.invalid>
      From: Jan Jones <joe@guess-where.invalid>
      From: dave@isp.example (Dave Smith)

        NOTE: the last example shows a now deprecated convention of
        putting an author's full name in a comment following the
        mailbox, rather than in a phrase at the start of that mailbox.
        Observe that the quotes around the "John D. Smith" example were
        required, on account of the '.' character, and they would also
        have been required had any UTF8-xtra-char been present.

5.3.  Message-ID

   The Message-ID header contains the article's message identifier, a
   unique identifier distinguishing the article from every other
   article. The content syntax makes use of syntax defined in [MESSFOR],
   subject to the following revised definition of no-fold-quote.

C. H. Lindsey                                                  [Page 27]


                          News Article Format              February 2000

      Message-ID-content = msg-id
      id-left            = dot-atom-text / no-fold-quote
      no-fold-quote      = DQUOTE *( strict-qtext / strict-quoted-pair )

           NOTE: This syntax ensures that a msg-id is restricted to pure
           US-ASCII (and is thus in strict compliance with [MESSFOR]).

   Following the provisions of [MESSFOR], an agent generating an
   article's message identifier MUST ensure that it is unique and that
   it is NEVER reused. Moreover, even though commonly derived from the
   domain name of the originating site (and domain names are case-
   insensitive), a message identifier MUST NOT be altered in any way
   during transport, or when copied (as into a References header), and
   thus a simple (case-sensitive) comparison of octets will always
   suffice to recognise that same message identifier wherever it
   subsequently reappears.

        NOTE: some old software may treat message identifiers that
        differ only in case within their id-right part as equivalent,
        and implementors of agents that generate message identifiers
        should be aware of this.

5.4.  Subject

   The Subject header contains a short string identifying the topic of
   the message. This is an inheritable header (4.2.2.2) to be copied
   into the Subject header of any followup, in which case the new
   header-content SHOULD then default to the string "Re: " (a "back
   reference") followed by the contents of the pure-subject of the
   precursor. Any leading "Re: " in the pure-subject MUST be stripped.

      Subject-content     = [ back-reference ] pure-subject
      pure-subject        = 1*( [FWS] utext )
      back-reference      = %x52.65.3A.20
                                    ; which is a case-sensitive "Re: "

   The pure-subject MUST NOT begin with "Re: ".

        NOTE: The given syntax differs from that prescribed in [MESSFOR]
        insofar as it does not permit a header content to be completely
        empty, or to consist of WSP only (see remarks in 4.2.5
        concerning undesirable headers).

   Followup agents MAY remove instances of non-standard back-reference
   (such as "Re(2): ", "Re:", "RE: ", or "Sv: ") from the Subject-
   content when composing the subject of a followup and add a correct
   back-reference in front of the result.

        NOTE: that would be "SHOULD remove instances" except that we
        cannot find a sufficiently robust and simple algorithm to do the
        necessary natural language processing.




C. H. Lindsey                                                  [Page 28]


                          News Article Format              February 2000

   Followup agents MUST NOT use any other string except "Re: " as a back
   reference. Specifically, a translation of "Re: " into a local
   language or usage MUST NOT be used.

        NOTE: "Re" is an abbreviation for the Latin "In re", meaning "in
        the matter of", and not an abbreviation of "Reference" as is
        sometimes erroneously supposed.

   Agents SHOULD NOT depend on nor enforce the use of back references by
   followup agents. For compatibility with legacy news software the
   Subject-content of a control message (i.e. an article that also
   contains a Control header) MAY start with the string "cmsg ", and
   non-control messages MUST NOT start with the string "cmsg ". See also
   section 6.11.

5.4.1.  Examples

   In the following examples, please note that only "Re: " is mandated
   by this standard. "was: " is a convention used by many English-
   speaking posters to signal a change in subject matter.  Software
   should be able to deduce this information from References.

      Subject: Film at 11
      Subject: Re: Film at 11
      Subject: Godwin's law considered harmful (was: Film at 11)
      Subject: Godwin's law (was: Film at 11)
      Subject: Re: Godwin's law (was: Film at 11)

5.5.  Newsgroups

   The Newsgroups header's content specifies which newsgroup(s) the
   article is posted to. It is an inheritable header (4.2.2.2) which
   SHOULD then become the default Newsgroups header of any followup,
   unless a Followup-To header is present to prescribe otherwise.

      Newsgroups-content  = newsgroup-name
                               *( *FWS ng-delim *FWS newsgroup-name )
                               *FWS
      newsgroup-name      = component *( "." component )
      component           = component-start
                               *( component-start / component-other )
      component-start     = Un-lowercase / Un-digit
      Un-lowercase        = <Unicode Letter, Lowercase> /
                            <Unicode Letter, Other>
      Un-digit            = <Unicode Number, Decimal Digit> /
                            <Unicode Number, Other>
      component-other     = "+" / "-" / "_"
      ng-delim            = ","
   where the <Unicode ...> items are as described in [UNICODE].

   The inclusion of folding white space within a Newsgroups-content is a
   newly introduced feature in this standard. It MUST be accepted by all
   conforming implementations (relaying agents, serving agents and
   reading agents).  Posting agents should be aware that such postings

C. H. Lindsey                                                  [Page 29]


                          News Article Format              February 2000

   may be rejected by overly-critical old-style relaying agents. When a
   sufficient number of relaying agents are in conformance, posting
   agents SHOULD generate such whitespace in the form of <CRLF WS> so as
   to keep the length of lines in the relevant headers (notably
   Newsgroups and Followup-To) to no more than than 79 characters (or
   other agreed policy limit - see 4.5).  Before such critical mass
   occurs, injecting agents MAY reformat such headers by removing
   whitespace inserted by the posting agent, but relaying agents MUST
   NOT do so.

   A newsgroup-name consists of one or more components. Components MAY
   contain non-ASCII letters, but these MUST be encoded in UTF-8 and not
   according to [RFC 2047].  A component MUST contain at least one
   letter (and MUST, according to the syntax, begin with a letter or
   digit). Components SHOULD begin with a letter.  Composite characters
   (made by overlaying one character with another) and format
   characters, as allowed in certain parts of Unicode and needed by
   certain languages, must use whatever canonical conventions apply to
   those parts of Unicode (such conventions are not defined in this
   Standard). The use of "_" in a component is deprecated. Serving
   agents MAY refuse to accept newsgroups using such a component.

        NOTE: Components composed entirely of digits would cause
        problems for the commonly used implementation technique of using
        the component as the name of a directory, whilst also using
        sequential numbers to distinguish the articles within a group.
        Components containing other non-permitted characters could cause
        problems when newsgroup-names appear in URLs [RFC 1738] (for
        example an '@' character would prevent distinguishing between
        newsgroup-names and message identifiers).

        NOTE: According to the syntax, uppercase letters cannot occur in
        newsgroup-names, but this standard imposes no requirement on
        software to check this condition, since it would be unreasonable
        to expect it to do so in parts of Unicode for which it was not
        configured (in general, a table lookup is required). Rather, it
        is the responsibility of those creating new newsgroups (7.1) not
        to violate it. It is, moreover, to be expected that a newsgroup
        created in violation of this condition will not be propagated
        particularly well.

   Whilst there is no longer any technical reason to limit the length of
   a component (formerly, it was limited to 14 characters) nor to limit
   the total length of a newsgroup-name, it should be noted that these
   names are also used in the newsgroups line (7.1.2) where an overall
   policy limit applies, and moreover excessively long names can be
   exceedingly inconvenient in practical use.  Agencies responsible for
   individual hierarchies SHOULD therefore, as a matter of policy, set
   reasonable limits for the length of a component and of a newsgroup-
   name. In the absence of such explicit policies, the default figures
   are 30 characters and 71 characters respectively.
[If the checkpolicies proposal is included in the Standard, there should
be a reference to it here.]


C. H. Lindsey                                                  [Page 30]


                          News Article Format              February 2000

        NOTE: The newsgroup-name as encoded in UTF-8 should be regarded
        as the canonical form. Reading agents may convert it to whatever
        character set they are able to display (see 4.4.1) and serving
        agents may possibly need to convert it to some form more
        suitable as a filename. Simple algorithms for both kinds of
        conversion are readily available.  Observe that the syntax does
        not allow comments within the Newsgroups header; this is to
        simplify processing by relaying and serving agents which have a
        requirement to process this header extremely rapidly.

   Posters SHOULD use only the names of existing newsgroups in the
   Newsgroups header. However, it is legitimate to cross-post to
   newsgroup(s) which do not exist on the posting agent's host, provided
   that at least one of the newsgroups DOES exist there, and followup
   agents SHOULD accept this (posting agents MAY accept it, but SHOULD
   at least alert the poster to the situation and request confirmation).
   Relaying agents MUST NOT rewrite Newsgroups headers in any way, even
   if some or all of the newsgroups do not exist on the relaying agent's
   host. Serving agents MUST NOT create new newsgroups simply because an
   unrecognised newsgroup-name occurs in a Newsgroups header (see 7.1
   for the correct method of newsgroup creation).

   The Newsgroups header is intended for use in Netnews articles rather
   than in mail messages. It MAY be used in a mail message to indicate
   that it is a copy also posted to the listed newsgroups, but it SHOULD
   NOT be used in a mail-only reply to a Netnews article (thus the
   "inheritable" property of this header applies only to followups to a
   newsgroup, and not to followups to the poster). Moreover, if a
   newsgroup-name contains any non-ASCII character, it MAY be encoded
   using the mechanism defined in [RFC 2047] when sent by mail but, if
   it is subsequently returned to the Netnews environment, it MUST then
   be re-encoded into UTF-8.

5.5.1.  Forbidden newsgroup names

   The following forms of newsgroup-name MUST NOT be used except for the
   specific purposes indicated:

     o Newsgroup-names having only one component. These are reserved for
       newsgroups whose propagation is restricted to a single host or
       local network, and for pseudo-newsgroups such as "poster" (which
       has special meaning in the Followup-To header - see section 6.7),
       "junk" (often used by serving agents), "control" (likewise),
       "revise" and "repost" (which have special meanings in the Xref
       header - see 6.14)

     o Any newsgroup-name beginning with "control." (used as pseudo-
       newsgroups by many serving agents)
     o Any newsgroup-name containing the component "ctl" (likewise)
     o "to" or any newsgroup-name beginning with "to." (reserved for the
       ihave/sendme protocol described in section 7.6, and for test
       messages sent on an essentially point-to-point basis)
     o Any newsgroup-name containing the component "all" (because this
       is used as a wildcard in some implementations)

C. H. Lindsey                                                  [Page 31]


                          News Article Format              February 2000

   A newsgroup-name SHOULD NOT appear more than once in the Newsgroups
   header. The order of newsgroup names in the Newsgroups header is not
   significant, except for determining which moderator to send the
   article to if one of the groups is moderated (see 8.2).

5.6.  Path

   The Path header shows the route taken by a message since its entry
   into the Netnews system. It is a variant header (4.2.2.4), each agent
   that processes an article being required to add one (or more) entries
   to it. This is primarily to enable relaying agents to avoid sending
   articles to sites already known to have them, in particular the site
   they came from, and additionally to permit tracing the route articles
   take in moving over the network, and for gathering Usenet statistics.
   Finally the presence of a '%' delimiter in the Path header can be
   used to identify an article injected in conformance with this
   standard.

5.6.1.  Format

      Path-content        = *( path-identity [FWS] delimiter [FWS] )
                               tail-entry *FWS
      path-identity       = 1*( ALPHA / DIGIT / "-" / "." / ":" / "_" )
      delimiter           = "/" / "?" / "%" / "," / "!"
      tail-entry          = 1*( ALPHA / DIGIT / "-" / "." / ":" / "_" )

        NOTE: A Path-content will inevitably contain at least one path-
        identity, except possibly in the case of a proto-article that
        has not yet been injected onto the network.

        NOTE: Observe that the syntax does not allow comments within the
        Path header; this is to simplify processing by relaying and
        injecting agents which have a requirement to process this header
        extremely rapidly.

   A relaying agent SHOULD NOT pass an article to another relaying agent
   whose path-identity (or some known alias thereof) already appears in
   the Path-content. Since the comparison may be either case sensitive
   or case insensitive, relaying agents SHOULD NOT generate a name which
   differs from that of another site only in terms of case.

   A relaying agent MAY decline to accept an article if its own path-
   identity is already present in the Path-content or if the Path-
   content contains some path-identity whose articles the relaying agent
   does not want, as a matter of local policy.

        NOTE: This last facility is sometimes used to detect and decline
        control messages (notably cancel messages) which have been
        deliberately seeded with a path-identity to be "aliased out" by
        sites not wishing to act upon them.

5.6.2.  Adding a path-identity to the Path header



C. H. Lindsey                                                  [Page 32]


                          News Article Format              February 2000

   When an injecting, relaying or serving agent receives an article, it
   MUST prepend its own path-identity followed by a delimiter to the
   beginning of the Path-content. In addition, it SHOULD then add CRLF
   and WSP if it would otherwise result in a line longer than 79
   characters.

   The path-identity added MUST be unique to that agent. To this end it
   SHOULD be one of:

   1. A fully qualified domain name (FQDN) associated (by the Internet
      DNS service [RFC 1034]) with an A record, which SHOULD identify
      the actual machine prepending this path-identity. Ideally, this
      FQDN should also be "mailable" in the sense that it enables the
      construction of a valid E-mail address of the form "usenet@<FQDN>"
      or "news@<FQDN>" [RFC 2142] whereby the administrators of that
      agent may be reached.

   2. A fully qualified domain name (FQDN) associated (by the Internet
      DNS service) with an MX record which MUST then enable the
      construction of a valid E-mail address of the form "usenet@<FQDN>"
      or "news@<FQDN>" whereby the administrators of that agent may be
      reached.

   3. A name registered previously in the UUCP maps database (found in
      the newsgroup comp.mail.maps), containing no '.' character.

   4. An encoding of an IP address - <dotted-quad> [RFC 820] or <ipv6-
      numeric> [RFC 2373] (the requirement to be able to use an <ipv6-
      numeric> is the reason for including ':' as an allowed character
      within a path-identity).

   5. A '.' followed by an arbitrary name not in the UUCP maps database,
      but believed to be unique and registered at least with all sites
      immediately downstream from the given site.

   Of the above options, nos. 1 to 3 are much to be preferred, unless
   there are strong technical reasons dictating otherwise. In
   particular, the injecting agent's path-identity MUST, as a special
   case, be an FQDN mailable address in the sense defined under option
   1, or with an associated MX record as in option 2.

   The injecting agent's path-identity MUST be followed by the special
   delimiter '%' which serves to separate the pre-injection and post-
   injection regions of the Path-content (see 5.6.3).

   In the case of a relaying or serving agent, the delimiter is chosen
   as follows.  When such an agent receives an article, it MUST
   establish the identity of the source and compare it with the leftmost
   path-identity of the Path-content. If it matches, a '/' should be
   used as the delimiter when prepending the agent's own path-identity.
   If it does not match then the agent should prepend two entries to the
   Path-content; firstly the true established path-identity of the
   source followed by a '?'  delimiter, and then, to the left of that,
   the agent's own path-identity followed by a '/' delimiter as usual.

C. H. Lindsey                                                  [Page 33]


                          News Article Format              February 2000

   This prepending of two entries SHOULD NOT be done if the provided and
   established identities match.

   Any method of establishing the identity of the source may be used
   (but see 5.6.5 below), with the consideration that, in the event of
   problems, the agent concerned may be called upon to justify it.

        NOTE: The use of the '%' delimiter marks the position of the
        injecting agent in the chain. In normal circumstances there
        should therefore be only one `%` delimiter present, and
        injecting agents MAY choose to reject proto-articles with a '%'
        already in them. If, for whatever reason, more than one '%' is
        found, then the path-identity in front of the leftmost '%' is to
        be regarded as the true injecting agent.

5.6.3.  The tail-entry

   For historical reasons, the tail-entry (i.e. the rightmost entry in
   the Path-content) is regarded as a "user name", and therefore MUST
   NOT be interpreted as a site through which the article has already
   passed. Moreover, the Path-content is not an E-mail address and MUST
   NOT be used to contact the poster. Posting and/or injecting agents
   MAY place any string here. When it is not an actual user name, the
   string "not-for-mail" is often used, but in fact a simple "x" would
   be sufficient.

   Often this field will be the only entry in the region (known as the
   pre-injection region) after the '%', although there may be entries
   corresponding to machines traversed between the posting agent and the
   injecting agent proper. In particular, injecting agents that receive
   articles from many sources SHOULD include the identity of the source
   machine connecting to do the injection, and possibly other
   information enabling them to establish the circumstances of the
   injection (provided it does not conflict with any genuine site
   identifier). The '!'  delimiter may be used freely within the pre-
   injection region, although '/' and '?' are also appropriate if used
   correctly.
[If/when we invent some form of Injector-Info header, we may want to
revisit that paragraph.]

5.6.4.  Delimiter Summary

   A summary of the various delimiters. The name immediately to the left
   of the delimiter is always that of the machine which added the
   delimiter.

   '/' The name immediately to the right is known to be the identity of
       the machine from which the article was received (either because
       the entry was made by that machine and we have verified it, or
       because we have added it ourselves).

   '?' The name immediately to the right is the claimed identity of the
       machine from which the article was received, but we were unable
       to verify it (and have prepended our own view of where it came

C. H. Lindsey                                                  [Page 34]


                          News Article Format              February 2000

       from, and then a '/').

   '%' Everything to the right is the pre-injection region followed by
       the tail-entry.  The name on the left is the FQDN of the
       injecting agent. The presence of two '%'s in a path indicates a
       double-injection (see 8.2.2).

   '!' The name immediately to the right is unverified. The presence of
       a '!' to the left of the '%' indicates that the identity to the
       left is that of an old-style system not conformant with this
       standard.

   ',' Reserved for future use, treat as '/'.

   Other
       Old software may possibly use other delimiters, which should be
       treated as '!'.  But note in particular that ':', '-' and '_' are
       components of names, not delimiters, and FWS on its own MUST NOT
       be used as the sole delimiter.

        NOTE: Old Netnews relaying and injecting programs almost all
        delimit Path entries with the '!' delimiter, and these entries
        are not verified. As such, the presence of '%' as a delimiter
        will indicate that the article was injected by software
        conforming to this standard, and the presence of '!' as a
        delimiter to the left of a '%' will indicate that the message
        passed through systems developed prior to this standard. It is
        anticipated that relaying agents will reject articles in the old
        style once this new standard has been widely adopted.

5.6.5.  Suggested Verification Methods

   The following approaches for common transports are suggested in order
   to meet a site's verification obligations. They are not required, but
   following them should avoid the necessity for wasteful double-entry
   Path additions.

   If the incoming article arrives through some TCP/IP protocol such as
   NNTP, the IP address of the source will be known, and will likely
   already have been checked against a list of known FQDNs or IP
   addresses that the receiving site has agreed to peer with (this will
   have involved a DNS lookup of a known FQDN, following CNAME chains as
   required, to find an A record containing that source IP).

   1. Where the path-identity is an FQDN (or even an arbitrary name
      starting with a '.') it is now a simple matter to check that it is
      the proper FQDN for the source, or some known registered alias
      thereof. Alternatively, where the FQDN in the path-identity has an
      associated A record, an immediate DNS lookup as above can be used
      to verify it.

   2. Where the path-identity is an encoding of an IP address which does
      not immediately match the known IP address of the source, a
      reverse-DNS (in-addr.arpa PTR record) lookup may be done on the

C. H. Lindsey                                                  [Page 35]


                          News Article Format              February 2000

      provided address, followed by a regular DNS "A" record lookup on
      the returned name. There may be A records for several IP
      addresses, of which one should match the path-identity and another
      should match the source.

   3. If the path-identity fails to match any known alias for the source
      (requiring the insertion of an extra path-identity for the true
      source followed by a '?'), simply doing a reverse DNS (PTR) lookup
      on the source IP address is not sufficient to generate the true
      FQDN. The returned name must be mapped back to A records to assure
      it matches the source's IP address.

   If the incoming article arrives through some other protocol, such as
   UUCP, that protocol MUST include a means of verifying the source
   site. In UUCP implementations, commonly each incoming connection has
   a unique login name and password, and that login name (or some alias
   registered for it) would be expected as the path-identity.
[The above description may still contain more detail that we would wish.
My aim so far was to retain everything in Brad's original, but expressed
in a more palatable manner. We can now decide how much of it we want to
keep.]

5.6.6.  Example


      Path: foo.isp.example/
         .foo-server/bar.isp.example?10.123.12.2/old.site.example!
         barbaz/baz.isp.example%dialup123.baz.isp.example!x

        NOTE: That article was injected into the news stream by
        baz.isp.example (complaints may be addressed to
        usenet@baz.isp.example). The injector has taken care to record
        that it got it from dialup123.baz.isp.example. "x" is the
        default tail entry, though sometimes a real userid is put there.

        The article was relayed, perhaps by UUCP, to the machine known
        in the UUCP maps database as "barbaz".

        Barbaz relayed it to old.site.example, which does not yet
        conform to this standard (hence the '!' delimiter). So one
        cannot be sure that it really came from barbaz.

        Old.site.example relayed it to a site claiming to have the IP
        address [10.123.12.2], and claiming (by using the '/' delimiter)
        to have verified that it came from old.site.example.

        [10.123.12.2] relayed it to ".foo-server" which, not being
        convinced that it truly came from [10.123.12.2], did a reverse
        lookup on the actual source and concluded it was known as
        bar.isp.example (that is not to say that [10.123.12.2] was not a
        correct IP address for bar.isp.example, but simply that that
        connection could not be substantiated by .foo-server).  Observe
        that .foo-server has now added two entries to the Path.


C. H. Lindsey                                                  [Page 36]


                          News Article Format              February 2000

        ".foo-server" is a locally significant name (observe the
        presence of the '.')  within the complex site of many machines
        run by foo.isp.example, so the latter should have no problem
        recognizing .foo-server and using a '/' delimiter.  Presumably
        foo.isp.example then delivered the article to its direct
        clients.

        It appears that foo.isp.example and old.site.example decided to
        fold the line, on the grounds that it seemed to be getting a
        little too long.

6.  Optional Headers

   The headers appearing in this section have established meanings and
   MUST be interpreted according to the definitions given here. None of
   them is required to appear in every article but some of them are
   required in certain types of article, such as followups. Any header
   defined in this (or any other) standard MUST NOT appear more than
   once in an article unless specifically stated otherwise.
   Experimental headers (4.2.2.1) and headers defined by cooperating
   subnets are exempt from this requirement.  See section 8 "Duties of
   Various Agents" for the full picture.

6.1.  Reply-To

   The Reply-To header specifies a reply address(es) to be used for
   personal replies for the author(s) of the article when this is
   different from the author's address(es) given in the From header. The
   content syntax makes use of syntax defined in [MESSFOR], but subject
   to the revised definition of local-part given in section 5.2.

      Reply-To-content    = From-content  ; see 5.2

   In the absence of Reply-To, the reply address(es) is the address(es)
   in the From header. For this reason a Reply-To SHOULD NOT be included
   if it just duplicates the From header.

        NOTE: Use of a Reply-To header is preferable to including a
        similar request in the article body, because reply agents can
        take account of Reply-To automatically.

   An address of "<>" in the Reply-To header MAY be used to indicate
   that the poster does not wish to recieve email replies.

6.1.1.  Examples

      Reply-To: John Smith <jsmith@site.example>
      Reply-To: John Smith <jsmith@site.example>, dave@isp.example
      Reply-To: John Smith <jsmith@site.example>,andrew@isp.example,
         fred@site2.example
      Reply-To: Please do not reply <>




C. H. Lindsey                                                  [Page 37]


                          News Article Format              February 2000

6.2.  Sender

   The Sender header specifies the mailbox of the entity which actually
   sent this article, if that entity is different from that given in the
   From header or if more than one address appears in the From header.
   This header SHOULD NOT appear in an article unless the sender is
   different from the author. This header is appropriate for use by
   automatic article posters. The content syntax makes use of syntax
   defined in [MESSFOR].

      Sender-content      = mailbox

6.3.  Organization

   The Organization header is a short phrase identifying the author's
   organization.

      Organization-content= 1*( [FWS] utext )

        NOTE: Posting and injecting  agents are discouraged from
        providing a default value for this header unless it is
        acceptable to all posters using those agents. Unless this header
        contains useful information (including some indication of the
        authors physical location) posters are discouraged from
        including it.

6.4.  Keywords

   The Keywords field contains a comma separated list of important words
   and phrases intended to describe some aspect of the content of the
   article. The content syntax makes use of syntax defined in [MESSFOR].

      Keywords-content    = phrase *( "," phrase )

        NOTE: The list is comma separated NOT space separated.

6.5.  Summary

   The Summary header is a short phrase summarizing the article's
   content.

      Summary-content     = 1*( [FWS] utext )

   The summary SHOULD be terse. Authors SHOULD avoid trying to cram
   their entire article into the headers; even the simplest query
   usually benefits from a sentence or two of elaboration and context,
   and not all reading agents display all headers. On the other hand the
   summary should give more detail than the Subject.

6.6.  Distribution

   The Distribution header is an inheritable header (see 4.2.2.2) which
   specifies geographical or organizational limits to an article's
   propagation.

C. H. Lindsey                                                  [Page 38]


                          News Article Format              February 2000

      Distribution-content= distribution *( dist-delim distribution )
      dist-delim          = ","
      distribution        = positive-distribution /
                               negative-distribution
      positive-distribution
                          = *FWS distribution-name *FWS
      negative-distribution
                          = *FWS "!" distribution-name *FWS
      distribution-name   = letter 1*distribution-rest
      distribution-rest   = letter / "+" / "-" / "_"

   Articles MUST NOT be passed between relaying agents or to serving
   agents unless the sending agent has been configured to supply and the
   receiving agent to receive BOTH of
       (a) at least one of the newsgroups in the article's Newsgroups
       header, and
       (b) at least one of the positive-distributions (if any) in the
       article's Distribution header and none of the negative-
       distributions.
   Additionally, reading agents MAY be configured so that unwanted
   distributions do not get displayed.

        NOTE: Although it would seem redundant to filter out unwanted
        distributions at both ends of a relaying link (and it is clearly
        more efficient to do so at the sending end), many sending sites
        have been reluctant, historically speaking, to apply such
        filters (except to ensure that distributions local to their own
        site or cooperating subnet did not escape); moreover they tended
        to configure their filters on an "all but those listed" basis,
        so that new and hitherto unheard of distributions would not be
        caught. Indeed many "hub" sites actually wanted to receive all
        possible distributions so that they could feed on to their
        clients in all possible geographical (or organizational)
        regions.

        Therefore, it is desirable to provide facilities for rejecting
        unwanted distributions at the receiving end. Indeed, it may be
        simpler to do so locally than to inform each sending site of
        what is required, especially in the case of specialized
        distributions (for example for control messages, such as cancels
        from certain issuers) which might need to be added at short
        notice.  Tha possibility for reading agents to filter
        distributions has been provided for the same reason.

   Exceptionally, ALL relaying agents are deemed willing to supply or
   accept the distribution "world", and NO relaying agent should supply
   or accept the distribution "local".  However, "world" SHOULD NEVER be
   mentioned explicitly since it is the default when the Distribution
   header is absent entirely.  "All" MUST NOT be used as a
   distribution-name.  Distribution-names SHOULD contain at least three
   characters, except when they are two-letter country names as in [ISO
   3166].  Distribution-names are case-insensitive (i.e. "US", "Us" and
   "us" all specify the same distribution).


C. H. Lindsey                                                  [Page 39]


                          News Article Format              February 2000

        NOTE: "Distribution: !us" can be used to cause an article to go
        to the whole of "world" except for "us".

   Posting agents SHOULD NOT provide a default Distribution header
   without giving the poster an opportunity to override it. Followup
   agents SHOULD initially supply the same Distribution header as found
   in the precursor.

6.7.  Followup-To

   The Followup-To header specifies which newsgroup(s) followups should
   be posted to.

      Followup-To-content = Newsgroups-content / "poster"

   The syntax is the same as that of the Newsgroups-content, with the
   exception that the magic word "poster" is allowed. In the absence of
   a Followup-To header, the default newsgroup(s) for a followup are
   those in the Newsgroups header, and for this reason the Followup-To
   header SHOULD NOT be included if it just duplicates the Newsgroups
   header.

   A Followup-To header consisting of the magic word "poster" indicates
   that the author requests no followups to be sent in response to this
   article, only personal replies to the article's reply address.

6.8.  References

   The References header lists optionally CFWS-separated message
   identifiers of precursors. The content syntax makes use of syntax
   defined in [MESSFOR].

      References-content  = msg-id *( CFWS msg-id )

        NOTE: This differs from the syntax of [MESSFOR] by requiring at
        least one CFWS between the msg-ids (this was an [RFC 1036]
        requirement).

   A followup MUST have a References header, and an article that is not
   a followup MUST NOT have a References header. In a followup, if the
   precursor did not have a References header, the followup's
   References-content MUST be formed by the message identifier of the
   precursor. A followup to an article which had a References header
   MUST have a References header containing the precursor's References-
   content (subject to trimming as described below) plus the precursor's
   message identifier appended to the end of the list (separated from it
   by CFWS).

   Followup agents SHOULD NOT trim message identifiers out of a
   References header unless the number of message identifiers exceeds
   21, at which time trimming SHOULD be done by removing sufficient
   identifiers starting with the second so as to bring the total down to
   21. However, it would be wrong to assume that References headers
   containing more than 21 message identifiers will not occur.

C. H. Lindsey                                                  [Page 40]


                          News Article Format              February 2000

6.8.1.  Examples

      References: <i4g587y@site1.example>
      References: <i4g587y@site1.example> <kgb2231+ee@site2.example>
      References: <i4g587y@site1.example> <kgb2231+ee@site2.example>
         <222@site1.example> <87tfbyv@site7.example>
         <67jimf@site666.example>
      References: <i4g587y@site1.example> <kgb2231+ee@site2.example>
         <tisjits@smeghead.example>

6.9.  Expires

   The  Expires  header specifies a date and time when the article is
   deemed to be no longer relevant and  could usefully  be removed
   ("expired"). The content syntax makes use of syntax defined in
   [MESSFOR].

      Expires-content     = date-time

   An Expires header should only be used in an article if the requested
   expiry time is earlier or later than the time typically to be
   expected for such articles. Local policy for each serving agent will
   dictate whether and when this header is obeyed and authors SHOULD NOT
   depend on it being completely followed.

6.10.  Archive

   This optional header is a signal to automatic archival agents on
   whether this article is available for long-term storage.

      Archive-content     = [CFWS] ("no" | "yes" ) [CFWS]
      Archive-header-parameter
                          = Filename-token "=" value
                            ; for USENET-header-parameters see 4.1
      Filename-token      = [CFWS] "filename" [CFWS]

   Agents which see "Archive: no" MUST NOT keep the article past the
   Expires date. "Archive: yes" merely confirms what is already the
   default state. The optional Filename parameter MAY then be used to
   suggest a filename under which the article should be archived.
   Further extensions to this standard may provide additional parameters
   for administration of the archiving process.

6.11.  Control

   The Control header marks the article as a control message, and
   specifies the desired actions (other than the usual ones of storing
   and/or relaying the article).

      Control-content     = CONTROL-verb CONTROL-argument
      CONTROL-verb        = <the verb defined in this standard
                               (or an extension of it) for a specific
                               CONTROL message>
      verb                = token

C. H. Lindsey                                                  [Page 41]


                          News Article Format              February 2000

      CONTROL-arguments   = <the argument defined in this standard
                               (or an extension of it) for a specific
                               CONTROL message>
      arguments           = *( CFWS value )  ; see 4.1
[Observe that <value> reqires the use of a quoted-string if any
tspecials or NON-ASCII characters are involved. This is a restriction on
present usage, but follows Mime practice.]

   The verb indicates what action should be taken, and the argument(s)
   (if any) supply details. In some cases, the body of the article may
   also contain details. Section 7 describes all of the standard verbs.

   An article with a Control header MUST NOT also have a Replaces or
   Supersedes header.

        NOTE: The presence of a Subject header starting with the string
        "cmsg " and followed by a Control-content MUST NOT be construed,
        in the absence of a proper Control header, as a request to
        perform that control action (as may have occurred in some legacy
        software). See also section 5.4.

6.12.  Approved

   The Approved header indicates the mailing addresses (and possibly the
   full names) of the persons or entities approving the article for
   posting.

      Approved-content    = From-content  ; see 5.2

   Each mailbox contained in the Approved-content MUST be that of the
   person or entity in question, and one of those mailboxes MUST be that
   of the actual injector of the article.
[This is the start of an attempt to strengthen this header. It should be
a TOSSable offence to put a dummy or invalid address in here. Later,
when we have some form of authentication, I would hope to be able to say
more.]

   An Approved header is required in all postings to moderated
   newsgroups. If this header is not present in such postings, then
   relaying and serving agents MUST reject the article. Please see
   section 8.2.2 for how injecting agents should treat postings to
   moderated groups that do not contain this header.

   An Approved header is also required in certain control messages, to
   reduce the risk of accidental posting of same; see the relevant parts
   of section 7.


6.13.  Replaces / Supersedes

   These two headers contain one or more message identifiers that the
   current article is expected to replace or supersede. All listed
   articles MUST be treated as though a "cancel" control message had
   arrived for the article (but observe that a site MAY choose not to

C. H. Lindsey                                                  [Page 42]


                          News Article Format              February 2000

   honour a "cancel" message, especially if its authenticity is in
   doubt).

6.13.1.  Syntax and Semantics

   The  Replaces and Supersedes headers specify articles to be cancelled
   on arrival of this one. The content syntax makes use of syntax
   defined in [MESSFOR].

      Replaces-content    = msg-id *( CFWS msg-id )
      Replaces-header-parameter
                          = Usage-token "=" Usage-value
                            ; for USENET-header-parameters see 4.1
      Usage-token         = [CFWS] "usage" [CFWS]
      Usage-value         = [CFWS] ("replace" / "revise" / "repost" )
                               [CFWS]
      Supersedes-content  = msg-id

        NOTE: There is no "c" in "Supersedes".
[I could be persuaded of a better token than "usage". I did wonder about
"disposition". Observe that "usage" is also now used also in
message/news-transmission.]

   If an article contains a Replaces header, then the old articles
   mentioned SHOULD simply be deleted by the serving agent, as in a
   cancel message (7.5), and the new article inserted into the system as
   any other new article would be.

   A Replaces-header-parameter is only meaningful when it occurs within
   a Replaces-content. If its Usage-value is "revise" or "repost" (or if
   the Replaces-header-parameter is absent, then by default) reading
   agents SHOULD NOT show the article as an "unread" article unless the
   replaced article(s) were themselves all unread, except when the
   reader has configured his reading agent otherwise.

   Moreover, if a Usage-value is "revise" or "repost", serving agents
   that generate a local Xref header MUST then include additional
   "revise" or "repost" information as set out in section 6.14.

        NOTE: A replacement with "usage=replace" is intended to be used
        in the case of an article that is sufficiently different from
        its predecessors that it is advisable for readers to see it
        again.  A replacement with "usage=revise" is intended to be used
        in the case of a minor change, unworthy of being brought to the
        attention of a reader who has already read one of its
        predecessors. A replacement with "usage=repost" is intended to
        be used in the case of an article identical to the one replaced
        (but possibly being reposted because the earlier one had likely
        expired).

        NOTE: A reader who elects to ignore all the articles available
        in a newsgroup (perhaps on the occasion of accessing that
        newsgroup for the first time) will likely have them all marked
        as "already read", unless the reading agent provides a distinct

C. H. Lindsey                                                  [Page 43]


                          News Article Format              February 2000

        mark such as "never offered". This could lead to a later
        replacement with "revise" or "repost" for one of those articles
        being missed.

   The Supersedes header is obsolescent, is provided only for
   compatibility with existing software, and may be removed entirely in
   some future version of this standard. Its meaning is the same as that
   of a corresponding Replaces header with its Replaces-header-parameter
   set to "usage=replace", and whenever a Supersedes header is provided
   a matching Replaces header SHOULD be provided as well. Observe that
   the Supersedes header makes provision for only a single msg-id.

   Until the Replaces header has become widely implemented, software
   SHOULD generate Replaces headers with only one msg-id, and cancel
   control messages SHOULD be issued if needed for further identifiers.
   Moreover, until that time, any article containing a Replaces header
   SHOULD contain also a Supersedes header (or alternatively be
   accompanied by a Control cancel message) for that same msg-id, to
   ensure that older systems still at least remove the predecessor.

   When a message contains both a Replaces and a Supersedes header they
   MUST be for the same msg-id.  Furthermore, to resolve any doubt, the
   Replaces header shall be deemed to take priority.

   Whatever security or authentication mechanisms are required for a
   Control cancel message MUST also be required for an article with a
   Replaces or Supersedes header. In the absence or failure of such
   checks, the article SHOULD be discarded, or at most stored as an
   ordinary article.
[We can write something more constructive in here as soon as the
situation with regard to cancel-locks and signed headers has been
clarified.]

6.13.2.  Message-ID version procedure

   Whilst this procedure is not essential for the operation of Netnews,
   it SHOULD be supported by all serving agents. However, for the
   procedure to work, all the msg-ids in the Replaces-content MUST be
   those of successive replacements of the same original article, and
   all be generated as described below.
[Whilst the procedure about to be described will undoubtedly work, it
must be pointed out that life would be much simpler if there was only a
single msg-id allowed in a Replaces-content.]

6.13.2.1.  Message version numbers

   According to [MESSFOR], and omitting the obsolete forms, the syntax
   of the left hand side of a msg-id (the part before the "@") is given
   by:

      id-left-side        = dot-atom-text / no-fold-quote




C. H. Lindsey                                                  [Page 44]


                          News Article Format              February 2000

   Consider this to be replaced by:

      id-left-side        = ( atom-text / no-fold-quote )
                               *( dollars-sequence )
      dollars-sequence    = version-number / random-dollars-sequence
      version-number      = "$" %d118 "=" 1*DIGIT ; $v=digits
      random-dollars-sequence
                          = "$" 1*atom-text

   Whilst this is admittedly ambiguous ("$" is already a possible value
   of atom-text) and does not in fact change what is allowable as an
   id-left-side, it does serve to allow dollars-sequences such as
   version-number (and any others that may be added by extensions to
   this standard) to be distinguished within a message identifier and
   utilized by agents which can understand them.  Observe that no-fold-
   quotes cannot occur within a dollars-sequence.

   Posters and/or posting agents when replacing (or superseding)
   articles SHOULD arrange that the message identifier of the
   replacement follows the following convention, generating what are
   known as "version-number" message identifiers. This is to enable the
   new version of the article to be retrieved by its original message
   identifier, notably when it occurs in a URL of the form
   <news:message-identifier> [RFC 1738].

   1. If the id-left-side of the most recent predecessor's message
      identifier contains a leftmost version-number "$v=<n>", where <n>
      is an integer version number, possibly followed by one or more
      random-dollars-sequences, the replacement message identifier
      should be obtained by replacing the <n> with the integer <n+1> and
      providing a different random-dollars-sequence(s). For example
      <foo$v=3$XYZ@faq-site.example> becomes <foo$v=4$PQR@faq-
      site.example>.

   2. If the id-left-side of the predecessor's message identifier does
      not contain a version-number, the replacement message identifier
      should be obtained by appending the string "$v=1", preferably
      followed by a random-dollars-sequence(s), to that id-left-side.
      For example <foo@faq-site.example> becomes <foo$v=1$ABC@faq-
      site.example>.

   Any random-dollars-sequence so added MUST NOT start with "$<l>=" for
   any letter <l>.

        NOTE: The presence of a random-dollars-sequence following the
        version-number is intended to prevent a malicious poster from
        preempting the posting of a replacement article by guessing its
        likely message identifier.

   Attempts to fetch a replaced (or superseded) article by its message
   identifier SHOULD retrieve instead its most recent successor which
   has used the version-number convention. Some indication that a newer
   version than was asked for has been delivered SHOULD be provided.
   This is intended to ensure that "news:" URLs [RFC-1738] will continue

C. H. Lindsey                                                  [Page 45]


                          News Article Format              February 2000

   to work even when an article has been replaced, but agents SHOULD
   then draw attention to the fact that the message identifier retrieved
   differed from that requested.

6.13.2.2.  Implementation and Use Note

[Here is the implementation technique that we discussed, based on the
use of a conventional History file. This is a sanity check for our own
use, not intended to go in the final text.

1. Ensure that the implementation of DBZ is not upset if the same key is
attempted to be stored a second time, and that such a key always
retrieves the latest record indexed by that key.

2. Additions to the History file are always made at the end. Removals or
changes to existing entries are only made by the expire program. An
entry for a Replaced (or otherwise cancelled) article will remain until,
first, the expire program removes the links to the articles that are no
longer stored, and later on removes the entire entry according to its
expiry date. For every entry containing a '$v=n' followed by random-
dollars-sequences there will be an immediately following entry identical
but for the omission of that '$v=n' and of the random-dollars-sequences.
Thus there may be several entries with identical message-ids but,
because of the change to DBZ just described, only the most recent will
ever be seen except by programs that access the History file directly,
rather than by its index.

3. When an article is Replaced, at the same time as the successor
article is entered into the History file, with '$v=7' say, a duplicate
entry (same article list) is entered under the same key, modified by
removing any leftmost '$v=n' and the following random-dollars-sequences
from it.

4. Provide a call to a routine which, if asked to retrieve any message
identifier with '$v=n' and finding it missing (or rather linked to no
stored groups), immediately tries again without the '$v=n' and its
random-dollars-sequences.  NOTE. We don't want this behaviour when
checking whether we already have an article offered to us by IHAVE, only
in response to an ARTICLE command. So this needs to be an extra call in
DBZ, in addition to the 'fetch' or 'dbzfetch' calls, to be used in the
proposed extension to the NNTP ARTICLE command. Observe that if the
requested '$v=n' is present and linked to stored articles (for whatever
reason) then you will be given exactly that version, even if later ones
are stored as well.

5. NOTE that I have dropped the idea of having '$v=0', because you can
never be sure that the very first issue of the FAQ used it, so you have
to provide the versionless root as well. If someone asks for '$v=0' (or
any '$v=n') the algorithm I gave will still find it via the root. So we
don't care what people put in URLs.

6. You are supposed to cancel the replaced/superseded article. If you
REALLY want to keep the old ones around a little longer, then this
implementation will not work if you want the latest to be retrieved

C. H. Lindsey                                                  [Page 46]


                          News Article Format              February 2000

automatically - you will have to invent something much more complicated.

7. Having said all that, here follows a brief account of the same thing,
but short enough to be included in our document (the convention being
that implementation issues are hinted at, rather than being described in
full detail).]

   Typically, a news database will index a Replacement article both by
   its "version-number" message identifier (containing a "$v=" tag
   followed by a random-dollars-sequence) and by its "root" version
   (without the "$v=" tag or any following random-dollars-sequence).
   Thus when a request for an article comes in that is not present under
   the version-number requested, any article that is present and indexed
   by the corresponding root version can be retrieved instead. The
   indexing mechanism needs to be such that, although the root version
   may have at times referred to many different articles, it is always
   the latest that is retrieved.

        NOTE: The presence of a version-number in the message identifier
        of an article without a Replaces or Supersedes header causes no
        extra action (it is just an ordinary article). Observe also that
        if an article with the exact message identifier (even though it
        contains a version-number) is, for whatever reason, already
        present on the serving agent, that article will always be
        retrieved in preference to the one indexed by any root version.

6.13.2.3.  The Message-Version NNTP extension

   The following Service Extension to the NNTP protocol is defined in
   accordance with the framework set out in [NNTP], and is to be
   registered with IANA.

   Name of the extension:                             Message-Version
   Extension Label (for the LIST EXTENSIONS command): MESSAGE-VERSION
   Additional keywords, syntax and parameters:        None

   In a server supporting this extension, the behaviour of the ARTICLE,
   HEAD, BODY and STAT commands when the parameter is a <message-id> is
   modified as follows.

   If the specified article is available on the server then it (or its
   Head, Body or Status as appropriate) is returned in the normal
   manner.  Otherwise, if a leftmost id-left-side of the <message-id>
   (the part before the '@') contains "$v=<n>", where <n> is an integer
   version number, that "$v=<n>"and everything following it is stripped
   from that id-left-side and the article (Head, Body or Status) with
   the stripped <message-id> is returned instead.  Otherwise (no article
   is available under the original, or any stripped, <message-id>), a
   430 response is given as usual.

        NOTE: If the client is concerned to know whether the article
        found was exactly the one requested or a replacement article
        corresponding to a stripped <message-id>, then it has only to
        compare the <message-id> requested with that returned in the 220

C. H. Lindsey                                                  [Page 47]


                          News Article Format              February 2000

        (221, 222, or 223) response. The intent of this extension is to
        enable the retrieval of the current version of an article (such
        as a regularly posted FAQ) referenced by a "news:" URL [RFC-
        1738] which quotes the <message-id> of an earlier version.

        NOTE: This extension has no effect on the IHAVE command.

6.13.2.4.  Examples

   Example 1. The first edition of a FAQ is posted with a message
   identifier of the form:  <examplegroup-faq@faq-site.example>. The
   next (but identical) version, a month later, has:

      Message-ID: <examplegroup-faq$v=1$A1b@faq-site.example>
      Replaces: <examplegroup-faq@faq-site.example> ; usage=repost
      Supersedes: <examplegroup-faq@faq-site.example>
   Observe the inclusion of a Supersedes header as well, it being
   presumed that the Replaces header was not yet widely implemented at
   that time.

   The next one, another month later (and with some significant changes
   justifying the use of "replace" rather than "repost") has:

      Message-ID: <examplegroup-faq$v=2$B2b@faq-site.example>
      Replaces: <examplegroup-faq$v=1$A1b@faq-site.example>
         <examplegroup-faq@faq-site.example> ; usage=replace
      Supersedes: <examplegroup-faq$v=1$A1b@faq-site.example>

   The next one, another month later, has:

      Message-ID: <examplegroup-faq$v=3$C3c@faq-site.example>
      Replaces: <examplegroup-faq$v=2$B2b@faq-site.example>
         <examplegroup-faq$v=1$A1b@faq-site.example> ; usage=repost
      Supersedes: <examplegroup-faq$v=2$B2b@faq-site.example>

   Note that the only reason to include more than one message identifier
   in the Replaces is in case a site had missed the previous
   Replacement. It is hardly necessary with such a long interval between
   the postings.

   Under the above, on systems using the version-number system (which is
   optional) requests for any message identifier in the chain will
   always return the most recent. As such the URL "news:examplegroup-
   faq@faq-site.example" will always work, making it suitable to appear
   in HTML documents.

   Example 2. A user posts a message <myuniquepart@mysite.example> to
   the net.  She notices a typo and, 2 minutes later, posts with:

      Message-ID: <myuniquepart$v=1$xxx@mysite.example>
      Replaces: <myuniquepart@mysite.example> ; usage=revise




C. H. Lindsey                                                  [Page 48]


                          News Article Format              February 2000

   3 minutes later she sees another typo, and posts:

      Message-ID: <myuniquepart$v=2$yyy@mysite.example>
      Replaces: <myuniquepart$v=1$xxx@mysite.example>
         <myuniquepart@mysite.example> ; usage=revise

   The two bad versions will be replaced with the 3rd, even if a site
   never sees the 2nd due to batching or feed problems (thus the use of
   two message identifiers is quite useful in this case, in
   contradistinction to the first example). Requests for the original
   will return the 3rd.

6.14.  Xref

   The Xref header is a local header (4.2.2.3) which indicates where an
   article was filed by the last server to process it, and whether it is
   a Replacement (6.13) for an earlier article.

      Xref-content      = [CFWS] server-name 1*( CFWS location )
      server-name       = path-identity  ; see 5.6.1
      location          = newsgroup-name ":" article-locator
                             [ CFWS ( "revise" / "repost" )
                               ":" article-locator ]
      article-locator   = 1*( %x21-7E ) ; US-ASCII printable characters

   The server-name is included so that software can determine which
   serving agent generated the header. The locations specify what
   newsgroups the article was filed under (which may differ from those
   in the Newsgroups header) and where it was filed under them. The
   exact form of an article-locator is implementation-specific.

        NOTE: The traditional form of an article-locator is a decimal
        number, with articles in each newsgroup numbered consecutively
        starting from 1. NNTP demands that such a model be provided, and
        much other software expects it, but it seems desirable to permit
        flexibility for unorthodox implementations.

   Whenever an Xref header is created by an agent for an article which
   includes a Replaces header with "usage=revise" or "usage=repost"
   (6.13), it SHOULD include, within the location field of each
   newsgroup in the Newsgroups header of whichever of the old articles
   referenced in that Replaces header is still current, a corresponding
   "revise:<old-article-locator>" or "repost:<old-article-locator>" for
   the oldest article known to be being replaced, where <old-article-
   locator> is the article-locator under which that oldest article was
   filed. If the Replaces header has a "usage=replace" (explicit or
   implicit) the Xref header MUST NOT include any such reference to an
   <old-article-locator>.

        NOTE: This is to enable reading agents to avoid showing that
        article to users who have already read any of those older
        articles (see 6.13).  Because several replacements for a given
        article may arrive in the period between attempts by a reader to
        read a given newsgroup, it is useful to include the oldest one

C. H. Lindsey                                                  [Page 49]


                          News Article Format              February 2000

        in the Xref header. The information necessary to determine this
        article can be obtained from the Xref header of the current
        version of the article just before it is deleted. Observe that a
        server that never received one of the replaced articles can
        still generate suitable information from whichever earlier
        version it actually has. This is why it is useful for a Replaces
        header to mention more than one earlier article, especially when
        replacements are being issued in quick succession.

        NOTE: "revise" and "repost" are case-insensitive.

   An agent inserting an Xref header into an article MUST delete any
   previous Xref header(s). A relaying agent MAY delete it before
   relaying, but otherwise it SHOULD be ignored (and usually replaced)
   by any relying or serving agent receiving it.

   An agent MUST use the same serving-name in Xref headers as the path-
   identity it uses in Path headers.

6.15.  Lines

   The Lines header indicates the number of lines in the body of the
   article.

      Lines-content       = [CFWS] 1*digit

   The line count includes all body lines, including the signature if
   any, including empty lines (if any) at the beginning or end of the
   body, and including the whole of all Mime message and multipart parts
   contained in the body (the single empty separator line between the
   headers and the body is not part of the body). The "body" here is the
   body as found in the posted article as transmitted by the posting
   agent.

   This header is to be regarded as obsolete, and it will likely be
   removed entirely in a future version of this standard. In the
   meantime, its use is deprecated.

6.16.  User-Agent

   The User-Agent header contains information about the user agent
   (typically a newsreader) generating the article, for statistical
   purposes and tracing of standards violations to specific software
   needing correction. Although optional, posting agents SHOULD normally
   include this header.

      User-Agent-content  = product-token *( CFWS product-token )
      product-token       = value ["/" product-version]  ; see 4.1
      product-version     = value

   This header MAY contain multiple product-tokens identifying the agent
   and any subproducts which form a significant part of the posting
   agent, listed in order of their significance for identifying the
   application. Product-tokens should be short and to the point - they

C. H. Lindsey                                                  [Page 50]


                          News Article Format              February 2000

   MUST NOT be used for information beyond the canonical name of the
   product and its version.  Injecting agents MAY include product
   information for servers (such as INN/1.7.2), but serving and relaying
   agents MUST NOT generate or modify this header to list themselves.

        NOTE: Variations from [RFC 2616] which describes a similar
        facility for the HTTP protocol:

           1. use of arbitrary text or octets from character sets other
              than US-ASCII in a product-token may require the use of a
              quoted-string,

           2. "{" and "}" are allowed in a value (product-token and
              product-version) in Netnews,

           3. UTF-8 replaces ISO-8859-1 as charset assumption.

        NOTE: Comments should be restricted to information regarding the
        product named to their left such as platform information and
        should be concise. Use as an advertising medium (in the mundane
        sense) is discouraged.

6.16.1.  Examples

      User-Agent: tin/1.2-PL2
      User-Agent: tin/1.3-950621beta-PL0 (Unix)
      User-Agent: tin/unoff-1.3-BETA-970813 (UNIX) (Linux/2.0.30 (i486))
      User-Agent: tin/pre-1.4-971106 (UNIX) (Linux/2.0.30 (i486))
      User-Agent: Mozilla/4.02b7 (X11; I; en; HP-UX B.10.20 9000/712)
      User-Agent: Microsoft-Internet-News/4.70.1161
      User-Agent: Gnus/5.4.64 XEmacs/20.3beta17 ("Bucharest")
      User-Agent: Pluto/1.05h (RISC-OS/3.1) NewsHound/1.30
      User-Agent: inn/1.7.2
      User-Agent: telnet

        NOTE: This header supersedes the role performed redundantly by
        experimental headers such as X-Newsreader, X-Mailer, X-Posting-
        Agent, X-Http-User-Agent, and other headers previously used on
        Usenet for this purpose. Use of these experimental headers
        SHOULD be discontinued in favor of the single, standard User-
        Agent header which can be used freely both in Netnews and mail.

6.17.  MIME headers

6.17.1.  Syntax

   The following headers, as defined within [RFC 2045] and its
   extensions, may be used within articles conforming to this standard.

        MIME-Version:
        Content-Type:
        Content-Transfer-Encoding:
        Content-ID:
        Content-Description:

C. H. Lindsey                                                  [Page 51]


                          News Article Format              February 2000

        Content-Disposition:
        Content-MD5:

   Insofar as the syntax for these headers, as given in [RFC 2045], does
   not specify precisely where whitespace and comments may occur
   (whether in the form of WSP, FWS or CFWS), the usage defined in this
   standard, and failing that in [MESSFOR], and failing that in [RFC
   822] MUST be followed. In particular, there MUST NOT be any WSP
   between a header-name and the following colon and there MUST be a SP
   following that colon.

   The meaning of the various MIME headers is as defined in [RFC 2045]
   and [RFC 2046], and in extensions registered in accordance with [RFC
   2048].  However, their usage is curtailed as described in the
   following sections.

6.17.2.  Content-Transfer-Encoding

   Posting agents SHOULD specify "Content-Transfer-Encoding: 8bit" for
   all articles not written in pure US-ASCII and not requiring full
   binary. They MAY use "8bit" encoding even when "7bit" encoding would
   have sufficed. They SHOULD specify "base64" when the content type
   implies binary (i.e. content intended for machine, rather than human,
   consumption).

        NOTE: If a future extension to the MIME standards were to
        provide a more compact encoding of binary suited to transport
        over an 8bit channel, it could be considered as an alternative
        to base64 once it had gained widespread acceptance.

   Posting agents SHOULD NOT specify encoding "quoted-printable", but
   reading agents MUST interpret that encoding correctly.  Encoding
   "binary" MUST NOT be used (except in cooperating subnets with
   alternative transport arrangements) because this standard does not
   mandate a transport mechanism that could support it.

   Injecting and relaying agents MUST NOT change the encoding of
   articles passed to them. Gateways SHOULD ONLY change the encoding if
   absolutely necessary.

6.17.3.  Content-Type

   The Content-Type: "text/plain" is the default type for any news
   article, but the recommendations and limits on line lengths set out
   in section 4.5 SHOULD be observed. The acceptability of other
   subtypes of Content-Type: "text" (such as "text/html") is a matter of
   policy (see 1.1), and posters SHOULD NOT use them unless established
   policy or custom in the particular hierarchies or groups involved so
   allows. Moreover, even in those cases, the material SHOULD, for the
   benefit of readers who see it only in its transmitted form, be
   "pretty-printed" so as to keep it within the line lengths recommended
   in section 4.5, and to keep any sequences which control its layout or
   style separate from the meaningful text.


C. H. Lindsey                                                  [Page 52]


                          News Article Format              February 2000

   In the same way, Content-Types requiring special processing for their
   display, such as "application", "image", "audio", "video" and
   "multipart/related" are discouraged except in groups specifically
   intended (by policy or custom) to include them. Exceptionally, those
   application types defined in [RFC 1847] and [RFC 2015] for use within
   "multipart/signed" articles, and the type "application/pgp-keys" (or
   other similar types containing digital certificates) may be used
   freely but, contrary to [RFC 2015] and unless the article is intended
   to be sent by mail also, the Content-Transfer-Encoding SHOULD be left
   as "8bit" (or "7bit" as appropriate).

   Reading agents SHOULD NOT, unless explicitly configured otherwise,
   act automatically on Application types which could change the state
   of that agent (e.g. by writing or modifying files), except in the
   case of those prescribed for use in control messages (7.1.2 and
  ).

6.17.3.1.  Message/partial

   The Content-Type "message/partial" MAY be used to split a long news
   article into several smaller ones, but this usage is discouraged on
   the grounds that modern transport agents should have no difficulty in
   handling articles of arbitrary length.

   However, IF this feature is used, then the "id" parameter SHOULD be
   in the form of a unique message identifier (but different from that
   in the Message-ID header of any of the parts).  Contrary to the
   requirements specified in [RFC 2046], the Transfer-Encoding SHOULD be
   set to "8bit" at least in each part that requires it. The second and
   subsequent parts SHOULD contain References headers referring to all
   the previous parts, thus enabling reading agents with threading
   capabilities to present them in the correct order. Reading agents MAY
   then provide a facility to recombine the parts into a single article
   (but this standard does not require them to do so).

6.17.3.2.  Message/rfc822

   The Content-Type "message/rfc822" should be used for the
   encapsulation (whether as part of another news article or, more
   usually, as part of a mail message) of complete news articles which
   have already been posted to Netnews and which are for the information
   of the recipient, and do not constitute a request to repost them.

   In the case where the encapsulated article has Content-Transfer-
   Encoding "8bit", it will be necessary to change that encoding if it
   is to be forwarded over some mail transport that only supports
   "7bit". However, this should not be necessary for any mail transport
   that supports the 8BITMIME feature [SMTP].  Moreover, where the
   headers of the encapsulated article contain any UTF8-xtra-chars
   (2.4), it may not be possible to transport them over mail transports
   even where 8BITMIME is supported. In such cases, it will be necessary
   to encode those headers as provided in [RFC 2047] (notwithstanding
   that such usage is deprecated for news headers by this standard, and
   actually forbidden in the case of the Newsgroups header).

C. H. Lindsey                                                  [Page 53]


                          News Article Format              February 2000

   In the event that the encapsulated article has to be encoded for
   either of these reasons, it may be necessary to reverse that encoding
   if certain forms of digital signatures have been employed, or if the
   article is to be reintroduced into some Netnews system (however, in
   the latter case, the Content-Type "application/news-transmission"
   should have been used instead).

        NOTE: It is likely, though not guaranteed, that headers
        containing UTF8-xtra-chars will pass safely through mail
        transports supporting 8BITMIME if the "message/rfc822" object is
        sent as an attachment (i.e.  as a part of a multipart) rather
        than as the top-level body of the mail message. Moreover, it is
        anticipated that future extensions to the mail standards will
        permit headers containing UTF8-xtra-chars to be carried without
        further ado over conforming transports.
[In fact, of current transports supporting 8BITMIME, only sendmail will
have problems with UTF-8 in top-level headers.]

6.17.3.3.  Message/external-body

   The Content-Type "message/external-body" could be apropriate for
   texts which it would be uneconomic (in view of the likely readership)
   to distribute to the entire network.

6.17.3.4.  Multipart types

   The Content-Types "multipart/mixed", "multipart/parallel" and
   "multipart/signed" may be used freely in news articles.  However,
   except where policy or custom so allows, the Content-Type:
   "multipart/alternative" SHOULD NOT be used, on account of the extra
   bandwidth consumed and the difficulty of quoting in followups, but
   reading agents MUST accept it.

   The Content-Type: "multipart/digest" is commended for any article
   composed of multiple messages more conveniently viewed as separate
   entities. The "boundary" should be composed of 28 hyphens (US-ASCII
   45) (which makes each boundary delimiter 30 hyphens, or 32 for the
   final one) so as to accord with current practice for digests [RFC
   1153].
[Actually, this conflicts with some present digest usage (such as the
news.answers rules), but should still be the right way to go. I suggest
this is left in for now (just to stake a claim), while we discuss the
matter with the news.answers moderators and the faq-maintainers.]

6.17.4.  Character Sets

   In principle, any character set may be specified in the "charset="
   parameter of a content type. However, character sets other than "us-
   ascii", "iso-8859-1" (and the corresponding parts of UTF-8) ought
   only to be used in hierarchies where the language customarily used so
   requires (and whose readers could be expected to possess agents
   capable of displaying them).



C. H. Lindsey                                                  [Page 54]


                          News Article Format              February 2000

6.17.5.  Content Disposition

   Reading agents SHOULD honour any Content-Disposition header that is
   provided (in particular, they SHOULD display any part of a multipart
   for which the disposition is "inline", possibly distinguished from
   adjacent parts by some suitable separator). In the absence of such a
   header, the body of an article or any part of a multipart with
   Content-Type "text" SHOULD be displayed inline. Followup agents which
   quote parts of a precursor (see 4.3.2) SHOULD initially include all
   parts of the precursor that were displayed inline, as if they were a
   single part.

6.17.6.  Definition of some new Content-Types

   This standard defines (or redefines) several new Content-Types, which
   require to be registered with IANA as provided for in [RFC 2048].
   For "application/news-groupinfo" see 7.1.2, for "application/news-
   checkgroups" see 7.4.1, and for "application/news-transmission" see
   the following section.

6.17.6.1.  Application/news-transmission

   The Content-Type "application/news-transmission" is intended for the
   encapsulation of complete news articles where the intention is that
   the recipient should then inject them into Netnews. This Application
   type SHOULD be used when mailing articles to moderators and to mail-
   to-news gateways (see 8.2.2).

        NOTE: The benefit of such encapsulation is that it removes
        possible conflict between news and email headers and it provides
        a convenient way of "tunnelling" a news article through a
        transport medium that does not support 8bit characters.

   The MIME content type definition of "application/news-transmission"
   is:

   MIME type name:           application
   MIME subtype name:        news-transmission
   Required parameters:      none
   Optional parameters:      usage=moderate
                             usage=inject
                             usage=relay
   Encoding considerations:  A transfer-encoding (such as Quoted-
                             Printable or Base64) different from that of
                             the article transmitted MAY be supplied
                             (perhaps en route) to ensure correct
                             transmission over some 7bit transport
                             medium.
   Security considerations:  A news article may be a "control message",
                             which could have effects on the recipient
                             host's system beyond just storage of the
                             article. However, such control messages
                             also occur in normal news flow, so most
                             hosts will already be suitably defended

C. H. Lindsey                                                  [Page 55]


                          News Article Format              February 2000

                             against undesired effects.
   Published specification:  [USEFOR]
   Body part:                A complete article or proto-article, ready
                             for injection into Netnews, or a batch of
                             such articles.

        NOTE: It is likely that the recipient of an "application/news-
        transmission" will be a specialised gateway (e.g. a moderator's
        submission address) able to accept articles with only one of the
        three usage parameters "moderate", "inject" and "relay", hence
        the reason why they are optional, being redundant in most
        situations. Nevertheless, they MAY be used to signify the
        originator's intention with regard to the transmission, so
        removing any possible doubt.

   When the parameter "relay" is used, or implied, the body part MAY be
   a batch of articles to be transmitted together, in which case the
   following syntax MUST be used.

      batch             = 1*( batch-header article )
      batch-header      = "#!" SP "rnews" SP article-size CRLF
      article-size      = 1*digit

   where the "rnews" is case-sensitive. Thus a batch is a sequence of
   articles, each prefixed by a header line that includes its size. The
   article-size is a decimal count of the octets in the article,
   counting each CRLF as one octet regardless of how it is actually
   represented.

        NOTE: Despite the similarity of this format to an executable
        UNIX script, it is EXTREMELY unwise to feed such a batch into a
        command interpreter in anticipation of it running a command
        named "rnews"; the security implications of so doing would be
        disastrous.

6.17.6.2.  Message/news withdrawn

   The Content-Type "message/news", as previously registered with IANA,
   is hereby obsoleted and should be withdrawn. It was never widely
   implemented, and its default treatment as "application/octet-stream"
   by agents that did not recognise it was counter productive. The
   Content-Type "message/rfc822" SHOULD be used in its place, as already
   described above.

6.18.  Obsolete Headers

   Persons writing new agents SHOULD ignore any former meanings of the
   following headers:

        Also-Control
        See-Also
        Article-Names
        Article-Updates


C. H. Lindsey                                                  [Page 56]


                          News Article Format              February 2000

7.  Control Messages

   The following sections document the control messages.  "Message" is
   used herein as a synonym for "article" unless context indicates
   otherwise.  Group control messages are the sub-class of control
   messages that request some update to the configuration of the groups
   known to a serving agent, namely "newgroup".  "rmgroup", "mvgroup"
   and "checkgroups", plus any others created by extensions to this
   standard.

   All of the group control messages MUST have an Approved header
   (6.12).  Moreover, in those hierarchies where appropriate
   administrative agencies exist (see 1.1), group control messages
   SHOULD NOT be issued except as authorized by those agencies.
[They SHOULD also use one of the authentication mechanisms which we
shall define when we get a Round Tuit.]

   The Newsgroups header of each control message MUST include the
   newsgroup-name(s) for the group(s) affected (i.e. groups to be
   created, modified or removed, or containing articles to be canceled).
   This is to ensure that the message progagates to all sites which
   receive (or would receive) that group(s). It MAY include other
   newsgroup-names so as to improve propagation (but this practice
   should be regarded as exceptional rather than normal).

   The descriptions below are generally phrased in terms suggesting
   mandatory actions, but any or all of these MAY be subject to local
   administrative restrictions, and MAY be denied or referred to an
   administrator for approval (either as a class or on a case-by-case
   basis). Analogously, where the description below specifies that a
   message or portion thereof is to be ignored, this action MAY include
   reporting it to an administrator.

   Relaying Agents MUST propagate even control messages that they do not
   understand.

   In the following sections, each type of control message is defined
   syntactically by defining its verb, its arguments, and possibly its
   body.

7.1.  The 'newgroup' Control Message

      newgroup-verb       = "newgroup"
      newgroup-arguments  = CFWS newsgroup-name [ CFWS newgroup-flag ]
      newgroup-flag       = "moderated"

   The "newgroup" control message requests that the specified group be
   created or changed. The newgroup-flag "moderated" is appended to mark
   the group as moderated. The absence of this flag marks the group as
   unmoderated. "Moderated" is the only such flag defined by this
   standard; other flags MAY be defined for use in cooperating subnets,
   but newgroup messages containing them MUST NOT be acted on outside of
   those subnets.


C. H. Lindsey                                                  [Page 57]


                          News Article Format              February 2000

        NOTE: Specifically, some alternative flags such as "y" and "m",
        which are sent and recognised by some current software, are NOT
        part of this standard.  Moreover, some existing implementations
        treat any flag other than "moderated" as indicating an
        unmoderated newsgroup. Both of these usages are contrary to this
        standard.

   The message body comprises or includes a "application/news-groupinfo"
   (7.1.2) part containing machine- and human-readable information about
   the group.

   The newsgroup-name MUST conform to all requirements set out in
   section 5.5, and it is the responsibility of the newgroup message
   issuer to ensure this (since some of those requirements are hard to
   enforce mechanically). Moreover, the newsgroup-name SHOULD conform to
   whatever policies have been established by the administrative agency,
   if any, for that hierarchy.

   The newgroup command is also used to update the newsgroups-line or
   the moderation status of a group.

7.1.1.  The Body of the 'newgroup' Control Message

   The body of the newgroup message contains the following subparts,
   preferably in the order shown:

   1. An "application/news-groupinfo" part (7.1.2) containing the name
      and newsgroups-line of the group(s). This part MUST be present.

   2. Other parts containing useful information about the background of
      the newsgroup message (typically of type "text/plain").

   3. Parts containing initial articles for the newsgroup. See section
      7.1.3 for details.

   In the event that there is only the single (i.e. application/news-
   groupinfo) subpart present, it will suffice to include a "Content-
   Type:  application/news-groupinfo" amongst the headers of the control
   message.  Otherwise, a "Content-Type: multipart/mixed header" will be
   needed, and each separate part will then need its own Content-Type
   header.

7.1.2.  Application/news-groupinfo

   The "application/news-groupinfo" body part contains brief information
   about a newsgroup, i.e. the group's name, it's newsgroup-description
   and the moderation-flag.

        NOTE: The presence of the newsgroups-tag "For your newsgroups
        file:" is intended to make the whole newgroup message compatible
        with current practice as described in [Son-of-1036].




C. H. Lindsey                                                  [Page 58]


                          News Article Format              February 2000

   The MIME content type definition of "application/news-groupinfo" is:

   MIME type name:           application
   MIME subtype name:        news-groupinfo
   Required parameters:      none
   Disposition:              by default, inline
   Encoding considerations:  "7bit" or "8bit" is sufficient and MUST be
                             used to maintain compatibility.
   Security considerations:  this type MUST NOT be used except as part
                             of a control message for the creation or
                             modification of a Netnews newsgroup
   Published specification:  [USEFOR]

   The content of the "application/news-groupinfo" body part is defined
   as:

      groupinfo-body      = [ newsgroups-tag CRLF ]
                               1*( newsgroups-line CRLF )
      newsgroups-tag      = %x46.6F.72 SP %x79.6F.75.72 SP
                               %x6E.65.77.73.67.72.6F.75.70.73 SP
                               %x66.69.6C.65.3A
                               ; case sensitive
                               ; "For your newsgroups file:"
      newsgroups-line     = newsgroup-name
                               [ 1*HTAB newsgroup-description ]
                               [ 1*WSP moderation-flag ]
      newsgroup-description
                          = 1*( [WSP] utext)
      moderation-flag     = %x28.4D.6F.64.65.72.61.74.65.64.29
                               ; case sensitive "(Moderated)"
   The whole groupinfo-body is intended to be interpreted as a text
   written in the UTF-8 character set.

   The "application/news-groupinfo" is used in conjunction with the
   "newgroup" (7.1) and "mvgroup" (7.3) control messages.  The
   newsgroup-name(s) in the newsgroups-line MUST agree with the
   newsgroup-name(s) in the "newgroup" or "mvgroup" control message (and
   thus there cannot be more than a single newsgroups-line except in the
   case of a "mvgroup" control message affecting a whole (sub-
   )hierarchy).  The Content-Type "application/news-groupinfo" MUST NOT
   be used except as a part of such control messages.  Although
   optional, the newsgroups-tag SHOULD be included until such time as
   this standard has been widely adopted, to ensure compatibility with
   present practice.

   Moderated newsgroups MUST be marked by appending the case sensitive
   text " (Moderated)" at the end. It is NOT recommended that the
   moderator's email address be included in the newsgroup-description as
   has sometimes been done.

   Although, in accordance with [MESSFOR] and section 4.5 of this
   standard, a newsgroups-line could have a maximum length of 998
   octets, as a matter of policy a far lower limit, expressed in
   characters, SHOULD be set. The current convention is to limit its

C. H. Lindsey                                                  [Page 59]


                          News Article Format              February 2000

   length so that the newsgroup-name, the HTAB(s) (interpreted as 8-
   character tabs that takes one at least to column 24) and the
   newsgroup-description (excluding any moderation-flag) fit into 79
   characters.  However, this standard does not seek to enforce any such
   rule, and reading agents SHOULD therefore enable a newsgroups-line of
   any length to be displayed, e.g. by wrapping it as required.

        NOTE: The newsgroups-line is intended to provide a brief
        description of the newsgroup, written in the UTF-8 character
        set.  Since newsgroup-names are required to be expressed in
        UTF-8 when they appear in headers, and since [NNTP] requires the
        use of UTF-8 when such a description is transmitted by the LIST
        NEWSGROUPS command, it would also be convenient for servers that
        keep a "newsgroups" file to store them in that form, so as to
        avoid unnecessary conversions.

7.1.3.  Initial Articles

   Some subparts of a "newgroup" or "mvgroup" control message MAY
   contain an initial set of articles to be posted to the affected
   newsgroup(s) as soon as it has been created. These parts are
   identified by having the Content-Type "application/news-
   transmission", possibly with the parameter "usage=inject".  The body
   of each such part should be a complete proto-article, ready for
   posting. This feature is intended for the posting of charters,
   initial FAQs and the like to the newly formed group(s).

   The Newsgroups header of the proto-article MUST include the
   newsgroup-name of the newly created group (or one of them, if more
   than one). It MAY include other newsgroup-names. If the proto-article
   includes a Message-ID header, the message indentifier in it MUST be
   different from that of any existing article and from that of the
   control message as a whole, though it MAY be derived from it by
   appending "$p=<n>", where <n> is an integer part number (see also
   6.13.2.1), immediately after its id-left-side (i.e.  before the "@").
   Alternatively such a message identifier MAY be derived by the
   injecting agent when the proto-article is posted. The proto-article
   SHOULD include the header "Distribution: local".

   The proto-article SHOULD be injected at the serving agent that
   processes the control message AFTER the newsgroup(s) in question has
   been created. It MUST NOT be injected if the newsgroup is not, in
   fact, created (for whatever reason). It MUST NOT be submitted to any
   relaying agent for transmission beyond the server(s) upon which the
   newsgroup creation has just been effected (in other words, it is to
   be treated as having a "Distribution: local" header, whether such a
   header is actually present or not).

        NOTE: The "$p=<n>" convention, if applied uniformly, should
        ensure that initial articles relayed beyond the local server in
        contravention of the above prohibition will not propagate in
        competition with similar copies injected at other local servers.



C. H. Lindsey                                                  [Page 60]


                          News Article Format              February 2000

        NOTE: It is not precluded that the proto-article is itself a
        control message or other type of special article, to be
        activated only upon creation of the new newsgroup. However,
        except as might arise from that possibility, any
        "application/news-transmission" within some nested "multipart/*"
        structure within the proto-article is not to be activated.
[Observe the possibility for initial Named articles (whatever they may
turn out to be) here.]

7.1.4.  Example

   A "newgroup" with bilingual charter and policy information:

      From: "example.all Administrator" <admin@example.invalid>
      Newsgroups: example.admin.groups,example.admin.announce
      Date: 27 Feb 1997 12:50:22 +0200
      Subject: cmsg newgroup example.admin.info moderated
      Approved: admin@example.invalid
      Control: newgroup example.admin.info moderated
      Message-ID: <ng-example.admin.info-19970227@example.invalid>
      Content-Type: multipart/mixed; boundary="nxtprt"
      Content-Transfer-Encoding: 8bit

      This is a MIME control message.
      --nxtprt
      Content-Type: application/news-groupinfo

      For your newsgroups file:
      example.admin.info      About the example.* groups (Moderated)

      --nxtprt
      Content-Type: application/news-transmission

      Newsgroups: example.admin.info
      From: "example.all Administrator" <admin@example.invalid>
      Subject: Charter for example.admin.info
      Message-ID: <ng-example.admin.info-19970227$p=1@example.invalid>
      Distribution: local
      Content-Type: multipart/alternative ;
         differences = content-language ;
         boundary = nxtlang

      --nxtlang
      Content-Type: text/plain; charset=us-ascii
      Content-Transfer-Encoding: 7bit
      Content-Language: en

      The group example.admin.info contains regularly posted
      information on the example.* hierarchy.

      --nxtlang
      Content-Type: text/plain; charset=iso-8859-1
      Content-Transfer-Encoding: 8bit


C. H. Lindsey                                                  [Page 61]


                          News Article Format              February 2000

      Content-Language: de

      Die Gruppe example.admin.info enthaelt regelmaessig versandte
      Informationen ueber die example.*-Hierarchie.
      --nxtlang--
      --nxtprt--

7.2.  The 'rmgroup' Control Message

      rmgroup-verb        = "rmgroup"
      rmgroup-arguments   = CFWS newsgroup-name

   The "rmgroup" control message requests that the specified group be
   removed from the list of valid groups. The Content-Type of the body
   is unspecified; it MAY contain anything, usually an explanatory text.

        NOTE: It is entirely proper for a serving agent to retain the
        group until all the articles in it have expired, provided that
        it ceases to accept new articles.

7.2.1.  Example

   Plain "rmgroup":

      From: "example.all Administrator" <admin@example.invalid>
      Newsgroups: example.admin.groups, example.admin.announce
      Date: 4 Jul 1997 22:04 -0900 (PST)
      Subject: cmsg rmgroup example.admin.obsolete
      Message-ID: <rm-example.admin.obsolete-19970730@example.invalid>
      Approved: admin@example.invalid
      Control: rmgroup example.admin.obsolete

      The group example.admin.obsolete is obsolete. Please remove it
      from your system.

7.3.  The 'mvgroup' Control Message

      mvgroup-verb      = "mvgroup"
      mvgroup-arguments = CFWS ( mvgrp-groups / mvgrp-hrchy )
      mvgrp-groups      = newsgroup-name
                             CFWS newsgroup-name [ CFWS newgroup-flag ]
      mvgrp-hrchy       = groupnamepart ".*" CFWS groupnamepart ".*"
      groupnamepart     = newsgroup-name    ; syntactically
[The possibility remains of introducing an "unmoderated" version of the
newgroup-flag. As it stands now, a moderated group might inadvertently
become unmoderated as it was moved (if the issuer of the mvgroup was not
paying attention). But the same thing could as easily happen if
"unmoderated" was allowed, but not used, unless the flag was declared
not optional in a mvgroup.]

7.3.1.  Single group




C. H. Lindsey                                                  [Page 62]


                          News Article Format              February 2000

   The "mvgroup" control message requests that the first specified group
   be moved to the second specified group. The message body MUST contain
   a "application/news-groupinfo" (7.1.2) containing machine- and
   human-readable information about the new group, and possibly other
   subparts as for a newgroup control message.

   When this message is received, the new group SHOULD be created (and
   MUST be moderated if a newgroup-flag "moderated" is present) and all
   existing articles SHOULD be copied or moved to the new group; then
   the old, now empty group SHOULD be removed.

   If the old group does not exist, the message is ignored unless the
   new group does not exist either, in which case the message SHOULD be
   treated as if it had been an equivalent "newgroup" message.

   If both groups exist, the groups MAY be "merged". If this is done, it
   MUST be done correctly, i.e. implementations MUST take care that the
   messages in the group being deleted are renumbered accordingly to
   avoid overwriting articles in one group with those of the other, and
   that crossposted articles do not appear twice. Otherwise, the old
   group is just removed.

        NOTE: Due to the severe difficulties of implementing this
        merging, those proposing to merge existing groups using this
        control message should be aware that it may not be implemented
        on many (if not most) sites, and should therefore be prepared
        for such disruption as may ensue.

   An indication that the old group was replaced by the new group MAY be
   retained by the serving agent so that continuity of service may be
   maintained, and clients made aware of the new arrangements.

        NOTE: Some serving agents that use an "active" file permit an
        entry of the form "oldgroup xxx yyy =newgroup", which enables
        any articles arriving for oldgroup to be diverted to newgroup,
        and could even enable users already subscribed to oldgroup to
        receive articles from newgroup instead.

   In all cases, the information conveyed in the "application/news-
   groupinfo" body part is applied to the new group.

   Until most serving agents conform to this standard, whenever a
   mvgroup control message for a single group is issued, a corresponding
   pair of rmgroup and newgroup control messages SHOULD be issued a few
   days later.

7.3.2.  Multiple Groups

   If the two names ends with the character sequence ".*", the newgroup
   message requests that a whole (sub)hierarchy be moved.  The same
   procedure as for single groups (7.3.1) applies to each matched group,
   except that the moderation status of each old group MUST be copied to
   the corresponding new group.


C. H. Lindsey                                                  [Page 63]


                          News Article Format              February 2000

   To avoid recursion, the new groups' names MUST NEVER match the old
   groups' name pattern; i.e., moving a whole (sub)hierarchy to a
   subhierarchy of the original hierarchy is explicitly disallowed.

   Until most serving agents conform to this standard, whenever a
   mvgroup control message for multiple groups is issued, a
   corresponding set of rmgroup and newgroup control messages for all
   the affected groups SHOULD be issued a few days later.

7.3.3.  Examples

   Plain "mvgroup":

      From: "example.all Administrator" <admin@example.invalid>
      Newsgroups: example.admin.groups, example.admin.announce
      Date: 30 Jul 1997 22:04 -0500 (EST)
      Subject: cmsg mvgroup example.oldgroup example.newgroup moderated
      Message-ID: <mvgroup-example.oldgroup-19970730@example.invalid>
      Approved: admin@example.invalid
      Control: mvgroup example.oldgroup example.newgroup moderated
      Content-Type: multipart/mixed; boundary=nxt

      --nxt
      Content-Type: application/newgroupinfo

      For your newsgroups file:
      example.newgroup        The new replacement group (Moderated)

      --nxt

      The moderated group example.oldgroup is replaced by
      example.newgroup. Please update your configuration.
      --nxt--

   More complex "mvgroup" for a whole hierarchy:

   The charter of  the group example.talk.jokes contained a reference to
   example.talk.jokes.d, which is also being moved. So the charter is
   updated.

      From: "example.all Administrator" <admin@example.invalid>
      Newsgroups: example.admin.groups, example.admin.announce
      Date: 30 Jul 1997 22:04 -0500 (EST)
      Subject: cmsg mvgroup example.talk.* example.conversation
      Message-ID: <mvgroup-example.talk-19970730@example.invalid>
      Approved: admin@example.invalid
      Control: mvgroup example.talk.* example.conversation
      Content-Type: multipart/mixed; boundary=nxt

      --nxt
      Content-Type: application/news-groupinfo

      For your newsgroups file:
      example.conversation.boring     Boring conversations

C. H. Lindsey                                                  [Page 64]


                          News Article Format              February 2000

      example.conversation.better     Better conversations
      example.conversation.jokes      Funny stuff
      example.conversation.jokes.d    Discussion of funny stuff

      --nxt
      Content-Type: application/news-transmission

      Newsgroups: example.conversation.jokes
      From: "example.all Administrator" <admin@example.invalid>
      Subject: Charter for renamed group example.conversation.jokes
      Distribution: local
      Message-ID: <mvgroup-example.talk-19970730$p=1@example.invalid>

      This group is to publish jokes and other funny stuff.
      Discussions about the articles posted here should be redirected
      to example.conversation.jokes.d; adding a Followup-To: header
      is recommended.

      --nxt--

7.4.  The 'checkgroups' Control Message

   The "checkgroups" control message contains a list of all the valid
   groups in a complete hierarchy.

      checkgroup-verb     = "checkgroups"
      checkgroup-arguments= [ chkscope ] [ chksernr ]
      chkscope            = 1*( CFWS ["!"] newsgroup-name )
      chksernr            = CFWS "#" 1*DIGIT

   The chkscope parameter(s) specifies the (sub)hierarchy(s) for which
   this "checkgroups" message applies. The chksernr parameter is a
   serial number, which can be any positive integer (e.g. just numbered
   or the date in YYYYMMDD).  It SHOULD increase by an arbitrary value
   with every change to the group list and MUST NOT ever decrease.

        NOTE: This was added to circumvent security problems in
        situations where the Date header cannot be authenticated.

   Example:

      Control: checkgroups de !de.alt #248

        NOTE: Some existing software does not support the "chkscope"
        parameter.  Thus a "checkgroups" message SHOULD also contain the
        groups of other subhierarchies the sender is not responsible
        for. "New" software MUST ignore groups which do not fall into
        the scope of the "checkgroups" message.
[What systems, if any, currently support the chkscope parameter?]

   If no scope for the checkgroups message is given, it applies to all
   hierarchies for which group statements appear in the message.



C. H. Lindsey                                                  [Page 65]


                          News Article Format              February 2000

   The body of the message has the Content-Type "application/news-
   checkgroups".  It asserts that the newsgroups it lists are the only
   newsgroups in the specified hierarchies.

        NOTE: The checkgroups nessage is intended to synchronize the
        list of newsgroups stored by a serving agent, and their
        newsgroup-descriptions, with the lists stored by other serving
        agents throughout the network. However, it might be inadvisable
        for the serving agent actually to create or delete any
        newsgroups without first obtaining the approval of its
        administrators for such proposed actions.

7.4.1.  Application/news-checkgroups

   The "application/news-checkgroups" body part contains a complete list
   of all the newsgroups in a hierarchy, their newsgroup-descriptions
   and their moderation status.

   The MIME content type definition of "application/news-checkgroups"
   is:

   MIME type name:           application
   MIME subtype name:        news-checkgroups
   Required parameters:      none
   Disposition:              by default, inline
   Encoding considerations:  "7bit" or "8bit" is sufficient and MUST be
                             used to maintain compatibility.
   Security considerations:  this type MUST NOT be used except as part
                             of a checkgroups control message

   The content of the "application/news-checkgroups" body part is
   defined as:

      checkgroups-body    = *( valid-group CRLF )
      valid-group         = newsgroups-line ; see 7.1.2
   The whole checkgroups-body is intended to be interpreted as a text
   written in the UTF-8 character set.

   The "application/news-checkgroups" content type is used in
   conjunction with the "checkgroups" control message (7.4).

        NOTE: The possibility of removing a complete hierarchy by means
        of an "invalidation" line beginning with a '!' is no longer
        provided by this standard. The intent of the feature was widely
        misunderstood and it was misused more often than it was used
        correctly. The same effect, if required, can now be obtained by
        the use of an appropriate chkscope argument in conjunction with
        an empty checkgroups-body.

7.5.  Cancel

   The cancel message requests that a target article be "canceled" i.e.
   be withdrawn from circulation or access. A cancel message may be
   issued in the following circumstances.

C. H. Lindsey                                                  [Page 66]


                          News Article Format              February 2000

   1. The poster of an article (or, more specifically, any entity
      mentioned in the From header or the Sender header, whether or not
      that entity was the actual poster) is ALWAYS entitled to issue a
      cancel message for that article, and serving agents SHOULD honour
      such requests. Posting agents SHOULD facilitate the issuing of
      cancel messages by posters fulfilling these criteria.

   2. The agent which injected the article onto the network (more
      specifically, the entity identified by the path-identity in front
      of the leftmost '%' delimeter in the Path header) and, where
      appropriate, the moderator (more specifically, any entity
      mentioned in the Approved header) is ALWAYS entitled to issue a
      cancel message for that article, and serving agents SHOULD honour
      such requests.

   3. Other entities MAY be entitled to issue a cancel message for that
      article, in circumstances where established policy for any
      hierarchy or group in the Newsgroup header, or established custom
      within Usenet, so allows (such policies and customs are not
      defined by this standard). Such cancel messages MUST include an
      Approved header identifying the responsible entity. Serving agents
      MAY honour such requests, but SHOULD first take steps to verify
      their appropriateness.
[I think that accords with the accepted norms for 1st, 2nd and 3rd party
cancels (or is a moderator a 1st party?). Observe the use of an Approved
header in place of the present X-Cancelled-By (I cannot see that we need
a new header for that when Approved is available). The definitions given
are sufficient to establish which category a cancel was in, assuming
that nobody told any lies, and to establish who had committed abuse
otherwise. So far so good, but we now need authentication methods on top
of all that.]

[A future draft of this standard will contain provisions for a Cancel-
Lock header to enable verification of the authenticity of 1st (and even
2nd) party cancels, and means for digital signatures to establish the
authenticity of 3rd party cancels.]

[A future draft of this standard may also contain provision for a "block
cancel" message, with a list of messages to be canceled contained in its
body rather than in the headers. Whether this needs to have a Control
header at all, and whether the existing "nocem-on-spool" is adequate for
this purpose, and indeed whether NOCEM as such should be part of this,
or some other, standard are issues that are yet to be addressed.]

      cancel-verb         = "cancel"
      cancel-arguments    = CFWS message-id

   The argument identifies the article to be cancelled by its message
   identifier.  The body SHOULD contain an indication of why the
   cancellation was requested. The cancel message SHOULD be posted to
   the same newsgroup(s), with the same distribution(s), as the article
   it is attempting to cancel.



C. H. Lindsey                                                  [Page 67]


                          News Article Format              February 2000

   A serving agent that elects to honour a cancel message SHOULD delete
   the target article completely and immediately (or at the minimum make
   the article unavailable for relaying or serving) and also SHOULD
   reject any copies of this article that appear subsequently. See also
   sections 8.3 and 8.4.

        NOTE: The former requirement [RFC 1036] that the From and/or
        Sender headers of the cancel message should match those of the
        original article has been removed from this standard, since it
        only encouraged cancel issuers to conceal their true identity,
        and it was not usually checked or enforced by canceling
        software.  Therefore, both the From and/or Sender headers and
        any Approved header should now relate to the entity responsible
        for issuing the cancel message.

7.6.  Ihave, sendme

   The "ihave" and "sendme" control messages implement a crude batched
   predecessor of the NNTP [NNTP] protocol. They are largely obsolete on
   the Internet, but still see use in conjunction with some transort
   protocols such as UUCP, especially for backup feeds that normally are
   active only when a primary feed path has failed. There is no
   requirement for relaying agents that do not support such transport
   protocols to implement them.

        NOTE: The ihave and sendme messages defined here have ABSOLUTELY
        NOTHING TO DO WITH NNTP, despite similarities of terminology.

   The two messages share the same syntax:

      ihave-arguments     = *( msg-id SP ) relayer-name
      sendme-arguments    = ihave-arguments
      relayer-name        = path-identity  ; see 5.6.1
      ihave-body          = *( msg-id CRLF )
      sendme-body         = ihave-body

   Msg-ids MUST appear in either the arguments or the body, but NOT
   both. Relayers SHOULD generate the form putting msg-ids in the body,
   but the other form MUST be supported for backward compatibility.

   The ihave message states that the named relaying agent has received
   articles with the specified message identifiers, which may be of
   interest to the relaying agents receiving the ihave message.  The
   sendme message requests that the agent receiving it send the articles
   having the specified message identifiers to the named relaying agent.

   These control messages are normally sent essentially as point-to-
   point messages, by using newgroups-names in the Newsgroups header of
   the form "to." followed by one of more components in the form of a
   relayer-name (see section 5.5.1 which forbids "to" as the first
   component of a newgroup-name). The control message SHOULD then be
   delivered ONLY to the relaying agent(s) identitifed by that relayer-
   name, and any relaying agent receiving such a message which includes
   its own relayer-name MUST NOT propagate it further. Each pair of

C. H. Lindsey                                                  [Page 68]


                          News Article Format              February 2000

   relaying agent(s) sending and receiving these messages MUST be
   immediate neighbors, exchanging news directly with each other. Each
   relaying agent advertises its new arrivals to the other using ihave
   messages, and each uses sendme messages to request the articles it
   lacks.

   To reduce overhead, ihave and sendme messages SHOULD be sent
   relatively infrequently and SHOULD contain reasonable numbers of
   message IDs. If ihave and sendme are being used to implement a backup
   feed, it may be desirable to insert a delay between reception of an
   ihave and generation of a sendme, so that a slightly slow primary
   feed will not cause large numbers of articles to be requested
   unnecessarily via sendme.

7.7.  Obsolete control messages.

   The following control message verbs are declared obsolete by this
   standard:

        sendsys
        version
        whogets
        senduuname
[There have been requests for some of these to be reinstated, but there
is no consensus, and in any case we would need authentication first.]

8.  Duties of Various Agents

   The following section sets out the duties of various agents involved
   in the creation, relaying and serving of Usenet articles.

   In this section, the word "trusted", as applied to the source of some
   article, means that an agent processing that article has verified, by
   some means, the identity of that source (which may be another agent
   or a poster).

        NOTE: In many implementations, a single agent may perform
        various combinations of the injecting, relaying and serving
        functions. Its duties are then the union of the various duties
        concerned.

8.1.  General principles to be followed

   There are two important principles that news implementors (and
   administrators) need to keep in mind. The first is the well-known
   Internet Robustness Principle:

        Be liberal in what you accept, and conservative in what you
        send.

   However, in the case of news there is an even more important
   principle, derived from a much older code of practice, the
   Hippocratic Oath (we will thus call this the Hippocratic Principle):


C. H. Lindsey                                                  [Page 69]


                          News Article Format              February 2000

        First, do no harm.

   It is VITAL to realize that decisions which might be merely
   suboptimal in a smaller context can become devastating mistakes when
   amplified by the actions of thousands of hosts within a few minutes.

   In the case of gateways, the primary corollary to this is:

        Cause no loops.

8.2.  Duties of an Injecting Agent

   An Injecting Agent is responsible for taking a proto-article from a
   posting agent and either forwarding it to a moderator or injecting it
   into the relaying system for access by readers.

   As such, an injecting agent is considered responsible for ensuring
   that any article it injects conforms with the rules of this standard
   and the policies of any newsgroups or hierarchies that the article is
   posted to. It is also expected to bear some responsibility towards
   the rest of the network for the behaviour of its posters (and
   provision is therefore made for it to be easily contactable by
   email).

   To this end injecting agents MAY cancel articles which they have
   previously injected (see 7.5).

8.2.1.  Proto-articles

   A proto-article is one that has been created by a posting agent and
   has not yet been injected into the news system by an injecting agent.
   It SHOULD NOT be propagated in that form to other than injecting
   agents.  A proto-article has the same format as a normal article
   except that some of the following mandatory headers MAY be omitted:
   Message-Id, Date and Path. These headers MUST NOT contain invalid
   values; they MUST either be correct or not present at all.

   A proto-article SHOULD NOT contain the '%' delimiter in any Path
   header, except in the rare cases where an article gets injected
   twice. It MAY contain path-identities with other delimiters in the
   pre-injection portion of the Path header (5.6.3).

8.2.2.  Procedure to be followed by Injecting Agents

   A injecting agent receives proto-articles from posting and followup
   agents. It verifies them, adds headers where required and then either
   forwards them to a moderator or injects them by passing them to
   serving or relaying agents.

   If an injecting agent receives an otherwise valid article that has
   already been injected it SHOULD either act as if it is a relaying
   agent or else pass the article on to a relaying agent completely
   unaltered. Exceptionally, it MAY reinject the article, perhaps as a
   part of some complex gatewaying process (in which case it will add a

C. H. Lindsey                                                  [Page 70]


                          News Article Format              February 2000

   second '%' delimiter to the Path header).  It MUST NOT forward an
   already injected article to a moderator.

   An injecting agent processes articles as follows:

   1. It SHOULD verify that the article is from a trusted source.
      However, it MAY allow articles in which headers contain "forged"
      email addresses, that is, addresses which are not valid for the
      known and trusted source, especially if they end in ".invalid".

   2. It MUST reject any article whose Date header is more than 24 hours
      into the past or into the future (cf. 5.1).

   3. It MUST reject any article that does not have the correct
      mandatory headers for a proto-article (5 and 8.2.1) present, or
      which contains any header that does not have legal contents.

   4. If the article is rejected, or is otherwise incorrectly formatted
      or unacceptable due to site policy, the posting agent MUST be
      informed (such as via an NNTP 44x response code) that posting has
      failed and the article MUST NOT then be processed further.

   5. The Message-ID and Date headers (and their content) MUST be added
      when not already present.

   6. A Path header with a tail-entry (5.6.3) MUST be correctly added if
      not already present (except that it SHOULD NOT be added if the
      article is to be forwarded to a moderator).

   7. The path-identity of the injecting agent with a '%' delimiter
      (5.6.2) MUST be prepended to the Path header; moreover, that
      path-identity MUST be an FQDN mailable address (5.6.2).
[At this point, we should mention the Injector-Info header if/when we
invent it.]

   8. The injecting agent MAY add other headers not already provided by
      the poster, but SHOULD NOT alter, delete or reorder any headers
      already present in the article (except for headers intended for
      tracing purposes). The injecting agent MUST NOT alter the body of
      the article in any way.
[An Injector-Info header would be mentioned as an example of tracing.]

   9. If the Newsgroups line contains one or more moderated groups and
      the article does NOT contain an Approved header, then the
      injecting agent MUST forward it to the moderator of the first
      (leftmost) moderated group listed in the Newsgroups line via
      email. The complete article SHOULD be encapsulated (headers and
      all) within the email, preferably using the Content-Type
      "application/news-transmission" (6.17.6.1).

   10.Otherwise, the injecting agent forwards the article to one or more
      relaying or serving agents.



C. H. Lindsey                                                  [Page 71]


                          News Article Format              February 2000

8.3.  Duties of a Relaying Agent

   A Relaying Agent accepts injected articles from injecting and other
   relaying agents and passes them on to relaying or serving agents
   according to mutually agreed policy. Relaying agents SHOULD accept
   articles ONLY from trusted agents.

   A relaying agent processes articles as follows:

   1. It MUST verify the leftmost entry in the Path header and then
      prepend its own path-identity with a '/' delimiter, and possibly
      also the verified path-identity of its source with a '?' delimiter
      (5.6.2).

   2. It MUST reject any article whose Date header is stale (see 5.1).

   3. It MUST reject any article that does not have the correct
      mandatory headers (section 5) present with legal contents.

   4. It SHOULD reject any article whose optional headers (section 6) do
      not have legal contents.

   5. It SHOULD reject any article that has already been sent to it (a
      database of message identifiers of recent messages is usually kept
      and matched against).

   6. It SHOULD reject any article that has already been Canceled,
      Superseded or Replaced by its author or by another trusted entity.

   7. It MAY reject any article without an Approved header posted to
      newsgroups known to be moderated (this practice is strongly
      recommended, but the information necessary to do it may not be
      available to all agents).

   8. It then passes articles which match mutually agreed criteria on to
      neighboring relaying and serving agents. However, it SHOULD NOT
      forward articles to sites whose path-identity is already in the
      Path header.

        NOTE: It is usual for relaying and serving agents to restrict
        the Newsgroups, Distributions, age and size of articles that
        they wish to receive.

   If the article is rejected as being invalid, unwanted or unacceptable
   due to site policy, the agent that passed the article to the relaying
   agent SHOULD be informed (such as via an NNTP 43x response code) that
   relaying failed. In order to prevent a large number of error messages
   being sent to one location, relaying agents MUST NOT inform any other
   external entity that an article was not relayed UNLESS that external
   entity has explicitly requested that it be informed of such errors.

   In order to prevent overloading, relaying agents SHOULD NOT routinely
   query an external entity (such as a key-server) in order to verify an
   article.

C. H. Lindsey                                                  [Page 72]


                          News Article Format              February 2000

[But do we want to say that it is OK if you then keep a decent-sized
cache?]

   Relaying agents MUST NOT alter, delete or rearrange any part of an
   article expect for the Path and Xref Headers.


8.4.  Duties of a Serving Agent

   A Serving Agent takes an article from a relaying or injecting agent
   and files it in a "news database". It also provides an interface for
   reading agents to access the news database. This database is normally
   indexed by newsgroup with articles in each newsgroup identified by an
   article-locater (usually in the form of a decimal number - see 6.14).

        NOTE: Since control messages are often of interest, but should
        not be displayed as normal articles in regular newsgroups, it is
        common for serving agents to make them available in a pseudo-
        newsgroup named "control" or in a pseudo-newsgroup in a sub-
        hierarchy under "control." (e.g. "control.cancel").

   A serving agent processes articles as follows:

   1. It MUST verify the leftmost entry in the Path header and then
      prepend its own path-identity with a '/' delimiter, and possibly
      also the verified path-identity of its source with a '?' delimiter
      (5.6.2).

   2. It MUST reject any article whose Date header is stale (see 5.1).

   3. It MUST reject any article that does not have the correct
      mandatory headers (section 5) present, or which contains any
      header that does not have legal contents.

   4. It SHOULD reject any article that has already been sent to it (a
      database of message identifiers of recent messages is usually kept
      and matched against).

   5. It SHOULD reject any article that has already been Canceled,
      Superseded or Replaced by its author or by another trusted entity,
      and delete any of such article that it already has in its news
      database.

   6. It MUST reject any article without an Approved header posted to
      any moderated newsgroup which it is configured to receive, and it
      MAY reject such articles for any newgroup it knows be moderated.

   7. It SHOULD generate a correct Xref header (6.14) for each article.

   8. Finally, it stores the article in its news database.

8.5.  Duties of a Posting Agent



C. H. Lindsey                                                  [Page 73]


                          News Article Format              February 2000

   A Posting Agent is used to assist the poster in creating a valid
   proto-article and forwarding it to an injecting agent.

   Postings agents SHOULD ensure that proto-articles they create are
   valid Netnews articles according to this standard and other
   applicable policies.

   Posting agents meant for use by ordinary posters SHOULD reject any
   attempt to post an article which cancels, Supersedes or Replaces
   another article of which the poster is not the author.

8.6.  Duties of a Followup Agent

   A Followup Agent is a special case of a posting agent and as such is
   bound by all the posting agent's requirements plus additional ones.
   Followup agents MUST create valid followups, in particular by
   providing correctly adjusted forms of those headers described as
   inheritable (4.2.2.2), notably the Newgroups header (5.5), the
   Subject header (5.4) and the References header (6.8), and they SHOULD
   observe appropriate quoting conventions in the body (see 4.3.2).

   Followup agents MUST by default follow the Followup-To header when
   deciding which newsgroups a followup is posted to; however posters
   MAY override the default if they wish.

   Followup agents MUST NOT attempt to send email to any address ending
   in ".invalid".  Followup agents SHOULD NOT email copies of the
   followup to the author of the precursor (or any other person) unless
   this has been explicitly requested.
[Mention the Mail-Copies-To header if/when we have that.]

8.7.  Duties of a Gateway

   NOT DONE
[Volunteers to write it? There is lots of useful material in [Son-of-
1036].]

9.  Security Considerations

[The following is taken from our previous draft, and is a much cut down
version of material in Son-of-1036. What else should be said, and should
more of the Son-of-1036 material be rescued?]

   There is no security. Don't fool yourself. Usenet is a prime example
   of an Internet Adhocratic-Anarchy; that is, an environment in which
   trust forms the basis of all agreements.  It works.

   Articles which are intended to have restricted distribution are
   dependent on the goodwill of every site receiving them.  The
   "Archive: no" header is available as a signal to automated archivers
   not to file an article, but that cannot be guaranteed.




C. H. Lindsey                                                  [Page 74]


                          News Article Format              February 2000

   The Distribution header makes provisions for articles which should
   not be propagated beyond a cooperating subnet. The key security word
   here is "cooperating". When a machine is not configured properly, it
   may become uncooperative and tend to distribute all articles.

9.1.  Attacks

   The two categories of attack that news is most vulnerable to are
   Denial-of-Service and exploitations of particular implementations.
   Many have argued that "spam", massively crossposted or reposted
   articles constitutes a DoS attack in its own regard. This may be so.

   Sending off-topic messages is a matter for individual hierarchies and
   newsgroups to control. It is a violation of this standard to "forge"
   an email address, that is, to use a valid email address which you are
   not entitled to use. All invalid email addresses used in headers MUST
   end in the ".invalid" top-level-domain. This facility is provided
   primarily for those who wish to remain anonymous, but do not care to
   take the additional precautions of using more sophisticated anonymity
   measures.

   It is possible that legal penalties may apply to sending unsolicited
   commercial email and/or news articles. Check with your local legal
   authorities.

10.  References


   [ANSI X3.4] "American National Standard for Information Systems -
      Coded Character Sets - 7-Bit American National Standard Code for
      Information Interchange (7-Bit ASCII)", ANSI X3.4, 1986.

   [ISO 10646] "International Standard - Information technology -
      Universal Multiple-Octet Coded Character Set (UCS) - Part 1:
      Architecture and Basic Multilingual Plane", ISO/IEC 10646-1, 1993.

   [ISO 3166] "Codes for the representation of names of countries and
      their subdivisions -- Part 1: Country codes", ISO 3166, 1997.

   [ISO 8859] International Standard - Information Processing - 8-bit
      Single-Byte Coded Graphic Character Sets.  Part 1: Latin alphabet
      No. 1, ISO 8859-1, 1987 Part 2: Latin alphabet No. 2, ISO 8859-2,
      1987 Part 3: Latin alphabet No. 3, ISO 8859-3, 1988 Part 4: Latin
      alphabet No. 4, ISO 8859-4, 1988 Part 5: Latin/Cyrillic alphabet,
      ISO 8859-5, 1988 Part 6: Latin/Arabic alphabet, ISO 8859-6, 1987
      Part 7: Latin/Greek alphabet, ISO 8859-7, 1987 Part 8:
      Latin/Hebrew alphabet, ISO 8859-8, 1988

   [MESSFOR] P. Resnick, "Internet Message Format Standard", draft-
      ietf-drums-msg-fmt-07.txt, March 1998.

   [NNTP] S. Barber, "Network News Transport Protocol", draft-ietf-
      nntpext-base-*.txt.


C. H. Lindsey                                                  [Page 75]


                          News Article Format              February 2000

   [RFC 1034] P. Mockapetris, "Domain Names - Concepts and Facilities",
      RFC 1034, November 1987.

   [RFC 1036] M. Horton and R. Adams, "Standard for Interchange of
      USENET Messages", RFC 1036, December 1987.

   [RFC 1153] F. Wancho, "Digest Message Format", RFC 1153, April 1990.

   [RFC 1738] T. Berners-Lee, L. Masinter, and M. McCahill, "Uniform
      Resource Locators (URL)", RFC 1738, December 1994.

   [RFC 1847] J. Galvin, S. Murphy, S. Crocker, and N. Freed, "Security
      Multiparts for MIME: Multipart/Signed and Miltipart/Encrypted",
      RFC 1847, October 1995.

   [RFC 2015] M. Elkins, "MIME Security with Pretty Good Privacy (PGP)",
      RFC 2015, October 1996.

   [RFC 2044] F. Yergeau, "UTF-8, a transformation format for Unicode
      and ISO 10646", RFC 2044, October 1996.

   [RFC 2045] N. Freed and N. Borenstein, "Multipurpose Internet Mail
      Extensions (MIME) Part One: Format of Internet Message Bodies",
      RFC 2045, November 1996.

   [RFC 2046] N. Freed and N. Borenstein, "Multipurpose Internet Mail
      Extensions (MIME) Part Two: Media Types", RFC 2046, November 1996.

   [RFC 2047] K. Moore, "MIME (Multipurpose Internet Mail Extensions)
      Part Three: Message Header Extensions for Non-ASCII Text", RFC
      2047, November 1996.

   [RFC 2048] N. Freed, J. Klensin, and J. Postel, "Multipurpose
      Internet Mail Extensions (MIME) Part Four: Registration
      Procedures", RFC 2048, November 1996.

   [RFC 2119] S. Bradner, "Key words for use in RFCs to Indicate
      Requirement Levels", RFC 2119, March 1997.

   [RFC 2130] C. Weider, C. Preston, K. Simonsen, H. Alvestrand, R.
      Atkinson, M. Crispin, and P. Svanberg, "The Report of the IAB
      Character Set Workshop held 29 February - 1 March, 1996", RFC
      2130, April 1997.

   [RFC 2142] D. Crocker, "Mailbox Names for Common Services, Roles and
      Functions", RFC 2142, May 1997.

   [RFC 2234] D. Crocker and P. Overell, "Augmented BNF for Syntax
      Specifications: ABNF", RFC 2234, November 1997.

   [RFC 2279] F. Yergeau, "UTF-8, a transformation format of ISO 10646",
      RFC 2279, January 1998.

   [RFC 2373] R. Hinden and S. Deering, "IP Version 6 Addressing

C. H. Lindsey                                                  [Page 76]


                          News Article Format              February 2000

      Architecture", RFC 2373, July 1998.

   [RFC 2606] D. Eastlake and A. Panitz, "Reserved Top Level DNS Names",
      RFC 2606, June 1999.

   [RFC 2616] R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter,
      P. Leach, and T. Berners-Lee, "Hypertext Transfer Protocol --
      HTTP/1.1", RFC 2616, June 1999.

   [RFC 820] J. Postel and J. Vernon, "Assigned Numbers", RFC 820,
      January 1983.

   [RFC 822] D. Crocker, "Standard for the Format of ARPA Internet Text
      Messages.", STD 11, RFC 822, August 1982.

   [RFC 850] Mark R. Horton, "Standard for interchange of Usenet
      messages", RFC 850, June 1983.

   [RFC 976] Mark R. Horton, "UUCP mail interchange format standard",
      RFC 976, February 1986.

   [SMTP] John C. Klensin and Dawn P. Mann, "Simple Mail Transfer
      Protocol", draft-ietf-drums-smtpupd-*.txt.

   [Son-of-1036] Henry Spencer, "News article format and transmission",
      <ftp://ftp.zoo.toronto.edu/pub/news.txt.Z>, June 1994.

   [UNICODE] The Unicode Consortium, "The Unicode Standard - Version
      2.0", Addison-Wesley, 1996.

   [USEFOR] Charles H. Lindsey, "News Article Format", draft-ietf-
      usefor-article-format-*.txt.


11.  Acknowledgements

[It is intended to insert a list of those who have been prominent
contributors to the mailing list of the working group at this point.]

12.  Contact Addresses

Editor

      Charles. H. Lindsey
      5 Clerewood Avenue
      Heald Green
      Cheadle
      Cheshire SK8 3JU
      United Kingdom
      Phone: +44 161 437 4506
      Email: chl@clw.cs.man.ac.uk




C. H. Lindsey                                                  [Page 77]


                          News Article Format              February 2000

Working group chair

      David Barr
      Digital Island
      Email: barr@visi.com

   Comments on this draft should preferably be sent to the mailing list
   of the Usenet Format Working Group at

      usenet-format@landfield.com.

   This draft expires six months after the date of publication (see Page
   1) (i.e. in August 2000).

13.  Intellectual Property Rights

[The following are taken from RFC 2026. It is not entirely clear whether
all of this is necessary at this stage. Please can someone explain it to
me?]

   The IETF takes no position regarding the validity or scope of any
   intellectual property or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; neither does it represent that it
   has made any effort to identify any such rights.  Information on the
   IETF's procedures with respect to rights in standards-track and
   standards-related documentation can be found in BCP-11.  Copies of
   claims of rights made available for publication and any assurances of
   licenses to be made available, or the result of an attempt made to
   obtain a general license or permission for the use of such
   proprietary rights by implementors or users of this specification can
   be obtained from the IETF Secretariat.

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights which may cover technology that may be required to practice
   this standard.  Please address the information to the IETF Executive
   Director.

   Copyright (C) The Internet Society (date). All Rights Reserved.

   This document and translations of it may be copied and furnished to
   others, and derivative works that comment on or otherwise explain it
   or assist in its implmentation may be prepared, copied, published and
   distributed, in whole or in part, without restriction of any kind,
   provided that the above copyright notice and this paragraph are
   included on all such copies and derivative works.  However, this
   document itself may not be modified in any way, such as by removing
   the copyright notice or references to the Internet Society or other
   Internet organizations, except as needed for the  purpose of
   developing Internet standards in which case the procedures for
   copyrights defined in the Internet Standards process must be
   followed, or as required to translate it into languages other than

C. H. Lindsey                                                  [Page 78]


                          News Article Format              February 2000

   English.

   The limited permissions granted above are perpetual and will not be
   revoked by the Internet Society or its successors or assigns.

   This document and the information contained herein is provided on an
   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Appendix A.1 - A-News Article Format

[There is some text present in Son-of-1036 at this point which may well
be removed to a separate informational RFC giving a proper historical
background.]

Appendix A.2 - Early B-News Article Format

[The same applies to this.]

[Son-of-1036 also had appendices on "Obsolete Headers" and "Obsolete
Control Messages". Do we want these? There are already mentioned at
appropriate places in the draft.]

Appendix B - Collected Syntax

   TO BE DONE


























C. H. Lindsey                                                  [Page 79]