IETF NNTP Working Group                            N. Ballou, Microsoft
Internet Draft                                    B. Hernacki, Netscape
Document: draft-ballou-nntpsrch-04.txt                  September, 1997


                    NNTP Full-text Search Extension


Status of this Memo

   This document is an Internet Draft. Internet Drafts are working
   documents of the Internet Engineering Task Force (IETF), its Areas,
   and its Working Groups. Note that other groups may also distribute
   working documents as Internet Drafts.

   Internet Drafts are draft documents valid for a maximum of six
   months. Internet Drafts may be updated, replaced, or obsoleted by
   other documents at any time. It is not appropriate to use Internet
   Drafts as reference material or to cite them other than as a
   "working draft" or "work in progress".

   To learn the current status of any Internet-Draft, please check the
   1id-abstracts.txt listing contained in the Internet-Drafts Shadow
   Directories on ds.internic.net, nic.nordu.net, ftp.isi.edu, or
   munnari.oz.au.

   A revised version of this draft document will be submitted to the
   RFC editor as a Proposed Standard for the Internet Community.
   Discussion and suggestions for improvement are requested. This
   document will expire before March 1998.  Distribution of this draft
   is unlimited.


1. Abstract

   This document describes a set of enhancements to the Network News
   Transport Protocol [NNTP-977] that allows full-text searching of
   news articles in multiple newsgroups.  The proposed SEARCH command
   supports functionality similar to the [IMAP4] SEARCH command, minus
   user specific search keys (i.e., ANSWERED, DRAFT, FLAGGED, KEYWORD,
   NEW, OLD, RECENT, SEEN) and minus search keys based on headers that
   do not exist in news (i.e., CC, BCC, TO).

   The availability of the extensions described here will be advertised
   by the server using the extension negotiation-mechanism described in
   the new NNTP protocol specification currently being developed [NNTP-
   NEW].


2. Conventions used in this document

   In examples, "C:" and "S:" indicate lines sent by the client and
   server respectively.
                    NNTP Full-text Search Extension     September 1997


   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in [RFC-2119].


3. Introduction

   The NNTP SEARCH command is sent from the client to the server to
   specify and initiate a full-text search on articles in one or more
   newsgroups. The NNTP SEARCH command is similar to the [IMAP4] SEARCH
   command, with user property and mail-specific header search keys not
   present in NNTP SEARCH.  The results of an NNTP Search is OVER data
   as specified in [NNTP-NEW] for each article that satisfies the
   search criteria.

   In addition, the PAT command is extended so that it can be used to
   full-text search articles within a single newsgroup. Both the
   headers and the body of the articles are searched.


3.1. New and Enhanced NNTP Commands

   There are four new NNTP commands: two new options to the existing
   LIST command, and enhancements to one existing command.

   *  SEARCH
   *  LIST SRCHFIELDS
   *  LIST SEARCHABLE
   *  PAT

   The SEARCH command runs a one-time search, returning overview-like
   data.

   The LIST SRCHFIELDS command returns the fields that the server
   allows in full-text searches.

   The LIST SEARCHABLE command allows the client to determine which
   newsgroups are full-text searchable.

   The PAT command allows the pseudo-header ":TEXT".  This specifies a
   full-text (headers and body) search of the articles in a single
   newsgroup.


4. Use of NNTP Extension Mechanism

   The NNTP extension mechanism allows a server to describe its
   capabilities. The following extensions are used to describe the
   capabilities described in this document.


4.1. SEARCH Extension
                    NNTP Full-text Search Extension     September 1997


   The SEARCH extension means that the server supports the following
   commands: SEARCH, LIST SEARCHABLE, LIST SRCHFIELDS.


4.2. PATTEXT Extension

   The PATTEXT extension means that the server supports the :TEXT
   header in the PAT command, as described by this document.


5. SEARCH Command

   Arguments: optional newsgroup specification
              searching criteria (one or more)

   Responses: 224 overview information follows
              412 no news group selected
              462 error performing search
              480 authentication required
              501 command syntax error
              502 no permission

   The SEARCH command searches the newsgroups for articles that match
   the given searching criteria.  Searching criteria consist of one or
   more search keys. If there are articles that match the search
   criteria, the server responds with code 224 and returns OVER data
   for each matching article in a similar format as described in [NNTP-
   NEW] with one exception. The one change from [NNTP-NEW] OVER format
   is to change the article number field to a format that supports
   searches over multiple newsgroups. The article ID field for SEARCH
   OVER data will use the format newsgroup:art-ID rather than just an
   article number as defined in [NNTP-NEW] (note: this is the same
   format used by the Xref header).

   A response of 501 indicates a syntax error in the search criteria. A
   response of 502 indicates that the user does not have permission to
   search one or more of the specified newsgroups. If the search
   criteria did not specify a newsgroup, and there is no current
   newsgroup (i.e., set using the NNTP GROUP command), then the server
   returns the error code 412, indicating that no newsgroup has been
   specified. A response of 462 indicates that the server encountered
   an error when processing the search.

   When multiple keys are specified, the result is the intersection
   (AND function) of all the messages that match those keys. For
   example, the criteria FROM "SMITH" SINCE 1-Feb-1994 refers to all
   articles from Smith that were placed in the newsgroup since February
   1, 1994. A search key may also be a parenthesized list of one or
   more search keys (e.g. for use with the OR and NOT keys).

   Server implementations MAY exclude [MIME-1] body parts with terminal
   content types other than TEXT and MESSAGE from consideration in
   SEARCH matching.
                    NNTP Full-text Search Extension     September 1997



   The optional newsgroup specification consists of the word "IN"
   followed by either a wildcard character "*" - indicating a search
   over all newsgroups - or a list of newsgroup names separated by a
   comma. A newsgroup name can end with the wildcard string ".*"
   indicating a search over a sub-hierarchy of the newsgroup name
   space. If no newsgroup specification is given, the search is over
   the current newsgroup. If there is no current newsgroup, the server
   returns the 412 error code.

   The ON, BEFORE, and SINCE search criteria use the same date as used
   in the NNTP NEWNEWS command in [NNTP-NEW] - the date the article
   arrived on the server. A server indicates support for the ON,
   BEFORE, and SINCE search criteria by listing :Date in the LIST
   SRCHFIELDS response.

   The defined search keys are as follows.  Refer to the Formal Syntax
   section for the precise syntactic definitions of the arguments.

      <message range> Articles with article numbers corresponding to
                      the specified range.

      ALL             All Articles in the current newsgroup; the
                      default initial key for ANDing.

      BEFORE <date>   Articles whose server arrival date is earlier
                      than the specified date.

      BODY <string>   Articles that contain the specified string in
                      the body of the message.

      FROM <string>   Articles that contain the specified string in
                      the article structure's FROM field.

      HEADER <field-name> <string>
                      Articles that have a header with the specified
                      field-name (as defined in [RFC-822]) and that
                      contains the specified string in the [RFC-822]
                      field-body.
      LARGER <n>      Articles with an size larger than the specified
                      number of octets.

      NOT <search-key>
                      Articles that do not match the specified search
                      key.

      ON <date>       Articles whose server arrival date is within
                      the specified date.

      OR <search-key1> <search-key2>
                      Articles that match either search key.

      SENTBEFORE <date>
                    NNTP Full-text Search Extension     September 1997


                      Articles whose [RFC-822] Date: header is
                      earlier than the specified date.

      SENTON <date>   Articles whose [RFC-822] Date: header is within
                      the specified date.

      SENTSINCE <date>
                      Articles whose [RFC-822] Date: header is within
                      or later than the specified date.

      SINCE <date>    Articles whose server arrival date is within or
                      later than the specified date.

      SMALLER <n>     Articles with a size smaller than the specified
                      number of octets.

      SUBJECT <string>
                      Articles that contain the specified string in
                      the envelope structure's SUBJECT field.

      TEXT <string>   Articles that contain the specified string in
                      the header or body of the message.

      Example: C: SEARCH FROM "Smith" SINCE 1-Feb-1994
               S: 224 overview information follows
               S: comp.object:573 \t RE: object-oriented langs \t \
                  "John Smith" <JSmith@xyz.com> \t Sun, 03 Nov 1996 \
                  14:25:05 -0800 \t <01cbc9d5f3c70$eab9a2cd@xyz.com> \
                  \t 4080 \t 33
               S: .

      Note: each field in OVER response is separated by a tab - shown
            as a \t in the example above.


5.1.1. Search Formal Syntax

   The search query syntax is derived from the search syntax defined
   for the IMAP4 protocol. It is somewhat different because of the way
   international character sets need to be encoded.

   The following syntax specification uses the augmented Backus-Naur
   Form (BNF) as described in [ABNF].

   Except as noted otherwise, all alphabetic characters are case-
   insensitive. The use of upper or lower case characters to define
   token strings is for editorial clarity only. Implementations MUST
   accept these strings in a case-insensitive fashion.

     astring       ::= atom / string

     atom          ::= 1*ATOM_CHAR
                    NNTP Full-text Search Extension     September 1997


     ATOM_CHAR     ::= <any CHAR except atom_specials>

     atom_specials ::= "," / "(" / ")" / SPACE / CTL / "*" /
                       quoted_specials

     CHAR          ::= <any ASCII character except NUL,
                        0x01 - 0x7f>

     CTL           ::= <any ASCII control character and DEL,
                        0x00 - 0x1f, 0x7f>

     date          ::= date_text / <"> date_text <">

     date_day      ::= 1*2digit
                   ;; Day of month

     date_month    ::= "Jan" / "Feb" / "Mar" / "Apr" / "May" /
                       "Jun" / "Jul" / "Aug" / "Sep" / "Oct" /
                       "Nov" / "Dec"

     date_text     ::= date_day "-" date_month "-" date_year

     date_year     ::= 4digit

     digit         ::= "0" / digit_nz

     digit_nz      ::= "1" / "2" / "3" / "4" / "5" / "6" / "7" /
                       "8" / "9"

     header_fld_name ::= sstring

     mstring       ::= A MIME encoded string surrounded by double
                       quotes

     newsgroup     ::= atom [ ".*"]

     newsgroups    ::= "*" / newsgroup_list

     newsgroup_list ::= newsgroup [ "," newsgroup_list]

     number        ::= 1*digit
                   ;; Unsigned 32-bit integer
                   ;; (0 <= n < 4,294,967,296)

     nz_number     ::= digit_nz *digit
                   ;; Non-zero unsigned 32-bit integer
                   ;; (0 < n < 4,294,967,296)

     QUOTED_CHAR   ::= <any TEXT_CHAR except quoted_specials> /
                   "\" quoted_specials

     quoted_specials ::= <"> / "\"
                    NNTP Full-text Search Extension     September 1997


     range         ::= nz_number / nz_number "-" [ nz_number ]
                   ;; Identifies a range of Articles.

     search        ::= "SEARCH" SPACE
                       ["IN" SPACE newsgroups SPACE]
                       1#search_key

     search_key    ::= "ALL" / "BODY" SPACE sstring /
                       "FROM" SPACE sstring / "ON" SPACE date /
                       "SINCE" SPACE date / "BEFORE" SPACE date /
                       "SUBJECT" SPACE sstring / "TEXT" SPACE sstring /
                       "HEADER" SPACE header_fld_name SPACE sstring /
                       "LARGER" SPACE number / "NOT" SPACE search_key /
                       "OR" SPACE search_key SPACE search_key /
                       "SENTBEFORE" SPACE date / "SENTON" SPACE date /
                       "SENTSINCE" SPACE date /
                       "SMALLER" SPACE number / range /
                       "(" 1#search_key ")"

     SPACE         ::= 1*<ASCII SP, space, 0x20>

     sstring       ::= astring / mstring

     string        ::= <"> *QUOTED_CHAR <">

     TEXT_CHAR     ::= <any CHAR except CR and LF>


5.2. LIST SRCHFIELDS Command

   Arguments: none

   Responses: 224 data follws

   The LIST SRCHFIELDS command returns a list of which fields can be
   specified in full-text search queries on the server. The response is
   a list of searchable fields, one per line. A "." on its own line
   terminates the list.  The fields are either newsgroup headers, or
   non-header fields supported by the query syntax.

   The three currently defined non-header fields are ":Body", ":Text",
   and ":Date". ":Text" means all the searchable text in the article,
   and indicates that the "TEXT" keyword is supported in the search
   query language. ":Body" means the body of the article, excluding the
   headers, and indicates that the "BODY" keyword is supported in the
   search query language. ":Date" means the date at which an article
   arrived on a server - similar to the date used in the NNTP NEWNEWS
   command - and indicates that the "ON", "SINCE", and "BEFORE"
   keywords are supported in the search query language.

   The "TEXT" and "BODY" search query fields are optional, but the
   server must indicate whether they are supported or not in the LIST
   SRCHFIELDS response.
                    NNTP Full-text Search Extension     September 1997



     Example: C: LIST SRCHFIELDS
              S: 224 Data follows.
              S: From
              S: Date
              S: Subject
              S: :Text
              S: .


5.3. LIST SEARCHABLE Command

   Arguments: none

   Responses: 224 Data Follows

   The LIST SEARCHABLE command returns a list of strings that define
   which new groups are being indexed by the news server and are thus
   available for searching. In addition, the character sets allowed for
   each group is returned.

   When there are newsgroups indexed it will return 224, followed by
   each portion of the tree that is indexed. If all groups are indexed,
   a line with "*" is returned. If only some parts of the newsgroup
   hierarchy are indexed, they are identified in the form <indexed-
   hierarchy>.*. Clients should not assume that these will always be
   top level hierarchies.  A "." on its own line terminates the list.

     Example: C: LIST SEARCHABLE
              S: 224 Data follows.
              S: alt.*
              S: comp.lang.*
              S: mcom.*
              S: .


5.4. PAT Command Enhancement

   Arguments: header range|<message-id> [pat [pat...]]

   Responses: <same as PAT - see [NNTP-NEW]>

   The PAT command is enhanced in a simple way: The new value ":TEXT"
   will be supported as a header when invoking the command. The :TEXT
   header requests a full-text search the body and all headers of the
   specified articles. Other than adding a new header name, the PAT
   command arguments are the same as specified in [NNTP-NEW].

   If :TEXT isn't specified as the header, the response is the same as
   it always has been for PAT, with each result line containing the
   article number and the value of the header that matched the pattern.
                    NNTP Full-text Search Extension     September 1997


   If the :TEXT header is specified, the constant string "TEXT" is
   returned in place of the value of the header that matched the
   pattern.

    Example: C: PAT :TEXT 1000-2000 searchtext
             S: 221 Header follows
             S: 1021 TEXT
             S: 1024 TEXT
             S:.


6. Security Considerations

   The search commands must be implemented in a way that does not allow
   access to articles in newsgroups that a client is otherwise
   restricted from reading due to access control rules.


9. References

   [ABNF], DRUMS working group, Dave Crocker Editor, "Augmented BNF for
   Syntax Specifications: ABNF", draft-drums-abnf-02.txt (work in
   progress), Internet Mail Consortium, April 1997

   [IMAP4] IMAP4 INTERNET MESSAGE ACCESS PROTOCOL - VERSION 4rev1. M
   Crispin, Request for Comment (RFC) 2060, December 1994

   [MIME-1] Borenstein N., and N. Freed, MIME (Multipurpose Internet
   Mail Extensions) Part One: Format of Internet Message Bodies,
   Request for Comment (RFC) 2045, December 1996.

   [NNTP-977] Network News Transfer Protocol. B. Kantor, Phil Lapsley,
   Request for Comment (RFC) 977, February 1986.

   [NNTP-NEW] Network News Transfer Protocol. S. Barber INTERNET DRAFT,
   draft-ietf-nntpext-base-02.txt, September 1997.

   [RFC-2119], Bradner, S, "Key words for use in RFCs to Indicate
   Requirement Levels", RFC 2119, Harvard University, March 1997

10.  Acknowledgments

   TBD

11. Author's Addresses

   Nathaniel Ballou
   Microsoft
   One Microsoft Way
   Redmond, WA 98052
   Phone: +1 425-703-0574
   Email: NatBa@Microsoft.com
                    NNTP Full-text Search Extension     September 1997


   Brian Hernacki
   Netscape Communications
   501 E. Middlefield Rd.
   Mountain View, CA 94043
   Phone: (650) 937-6738
   Email: bhern@netscape.com