INTERNET-DRAFT                                                 N. Ballou
Expires: July 15, 1997                                         Microsoft
<draft-ballou-nntpsrch-02.txt>                          January 29, 1997



                   NNTP Full-text Search Enhancements



1.  Status of this Memo

This  document is an Internet-Draft.   Internet-Drafts are working docu-
ments of the Internet Engineering Task Force (IETF),  its areas, and its
working groups.  Note that   other groups  may also  distribute  working
documents as Internet-Drafts.

Internet-Drafts are draft documents valid   for a maximum of six  months
and may be updated,  replaced, or obsoleted   by other documents  at any
time.  It is inappropriate to use Internet- Drafts as reference material
or to cite them other than as ``work in progress.''

To  learn the current status   of any  Internet-Draft, please check  the
``1id-abstracts.txt''  listing  contained in  the Internet-Drafts Shadow
Directories on ds.internic.net  (US East Coast), nic.nordu.net (Europe),
ftp.isi.edu (US West Coast), or munnari.oz.au (Pacific Rim).

2.  Abstract

This  document describes  a   set of enhancements  to the   Network News
Transport  Protocol [NNTP-977] that  allows  full-text searching of news
articles in multiple newsgroups.   The proposed SEARCH command  supports
functionality similar to the [IMAP4] SEARCH command, minus user specific
search keys (i.e., ANSWERED,  DRAFT, FLAGGED, KEYWORD, NEW, OLD, RECENT,
SEEN) and minus search keys based  on headers that  do not exist in news
(i.e., CC, BCC, TO).

The availability of the extensions described  here will be advertised by
the  server using  the extension negotiation-mechanism  described in the
new NNTP protocol specification currently being developed [NNTP-NEW].















Ballou                                                          [Page 1]


INTERNET-DRAFT                                          January 29, 1997


3.  Introduction

The NNTP SEARCH command is sent from the client to the server to specify
and initiate a full-text  search on articles  in one or more newsgroups.
The NNTP SEARCH command is a subset of the  [IMAP4] SEARCH command, with
user property  and mail-specific header search  keys not present in NNTP
SEARCH.   The results of  an NNTP  Search  is OVER data  as specified in
[NNTP-NEW] for each article that satisfies the search criteria.


4.  SEARCH Command Description

Arguments: optional character set specification
           searching criteria (one or more)

Responses: 224 overview information follows
           412 no news group selected
           421 no results found
           462 error performing search
           501 command syntax error
           502 no permission

The SEARCH  command searches the newsgroup for  articles that  match the
given searching criteria.   Searching  criteria consist of one   or more
search keys.  If there are articles that  match the search criteria, the
server responds with  code 224 and returns  OVER data for each  matching
article in a similar format as described  in [NNTP-NEW].  The one change
from  [NNTP-NEW]  OVER  format  is  to  change the article ID field to a
format that supports searches over  multiple newsgroups.  The article ID
field  for  SEARCH OVER data will use the format newsgroup:art-ID rather
than just an article ID as defined in [NNTP-NEW].

A response of 421 indicates  that there are  no articles that match  the
search  criteria.  A  response  of 501 indicates a   syntax error in the
search  criteria.  A response  of 502 indicates   that the user does not
have permission to search one  or more of the  specified newsgroups.  If
the search criteria did not specify a newsgroup, and there is no current
newsgroup  (i.e.,  set using the NNTP   GROUP command), then  the server
returns  the error  code 412,   indicating that  no newsgroup has   been
specified.   A response of 462  indicates that the server encountered an
error when processing the search.

When multiple keys  are specified, the result  is the  intersection (AND
function) of all the  messages that match  those keys.  For example, the
criteria FROM "SMITH" SINCE 1-Feb-1994 refers to all articles from Smith
that were placed in the newsgroup since February 1,  1994.  A search key
may also be a parenthesized list of one  or more search  keys (e.g.  for
use with the OR and NOT keys).

Ballou                                                          [Page 2]


INTERNET-DRAFT                                          January 29, 1997


Server  implementations  MAY exclude [MIME-1]  body  parts with terminal
content  types other than TEXT and  MESSAGE from consideration in SEARCH
matching.

The optional character set  specification consists of the word "CHARSET"
followed by a registered MIME character set.  It indicates the character
set of the strings that appear in the search criteria.  [MIME-2] strings
that   appear in  RFC 822/MIME  message   headers, and [MIME-1]  content
transfer  encodings,  MUST be decoded     before matching.  Except   for
US-ASCII, it    is not required  that  any  particular character  set be
supported.  If the server does  not support the specified character set,
it MUST return a tagged NO response (not a BAD).

In all search  keys that use strings,  a message matches  the key if the
string is a substring of the field.  The matching is case-insensitive.

The defined   search keys are as  follows.   Refer to the  Formal Syntax
section for the precise syntactic definitions of the arguments.

      <message set>  Messages with article identifiers
                     corresponding to the specified message sequence
                     number set.  This can only relevant for searches
                     on a single newsgroups.

      ALL            All messages in the newsgroup; the default initial
                     key for ANDing.

      BODY <string>  Messages that contain the specified string in the
                     body of the message.

      FROM <string>  Messages that contain the specified string in the
                     envelope structure's FROM field.

      HEADER <field-name> <string>
                     Messages that have a header with the specified
                     field-name (as defined in [RFC-822]) and that
                     contains the specified string in the [RFC-822]
                     field-body.

      LARGER <n>     Messages with an RFC822.SIZE larger than the
                     specified number of octets.

      NEWSGROUP <string>
                     Messages in the specified newsgroup.  The string
                     can either be a fully-qualified newsgroup name,
                     or a partial newsgroup name that ends with the
                     substring ".*" (i.e., search the newsgroup
                     hierarchy), or the string "*" (i.e., search all
                     newsgroups).


Ballou                                                          [Page 3]


INTERNET-DRAFT                                          January 29, 1997


      NOT <search-key>
                     Messages that do not match the specified search
                     key.

      ON <date>      Messages whose internal date is within the
                     specified date.

      OR <search-key1> <search-key2>
                     Messages that match either search key.

      SENTBEFORE <date>
                     Messages whose [RFC-822] Date: header is earlier
                     than the specified date.

      SENTON <date>  Messages whose [RFC-822] Date: header is within the
                     specified date.

      SENTSINCE <date>
                     Messages whose [RFC-822] Date: header is within or
                     later than the specified date.

      SINCE <date>   Messages whose internal date is within or later
                     than the specified date.

      SMALLER <n>    Messages with an RFC822.SIZE smaller than the
                     specified number of octets.

      SUBJECT <string>
                     Messages that contain the specified string in the
                     envelope structure's SUBJECT field.

      TEXT <string>  Messages that contain the specified string in the
                     header or body of the message.

      UID <message set>
                     Messages with message identifiers corresponding to
                     the specified message identifier set.  Can only be
                     used when searching a single newsgroup.

   Example:    C: SEARCH FROM "Smith" SINCE 1-Feb-1994
               S: 224 overview information follows
               S: comp.object:573 \t RE: object-oriented langs \t \
                  "John Smith" <JSmith@xyz.com> \t Sun, 03 Nov 1996 \
                  14:25:05 -0800 \t <01cbc9d5f3c70$eab9a2cd@xyz.com> \t \
                  4080 \t 33
               S: .

   Note: each field in OVER response is separated by a tab - shown as a
         \t in the example above.


Ballou                                                          [Page 4]


INTERNET-DRAFT                                          January 29, 1997


5.      Formal Syntax

The search query syntax is derived from the search  syntax  defined  for
the  IMAP4 protocol.  It is somewhat different because of the way inter-
national character sets need to be encoded.

The following syntax specification  uses the augmented Backus-Naur  Form
(BNF) notation  as   specified  in  [RFC-822] with  one  exception;  the
delimiter used with the "#" construct is a  single space (SPACE) and not
one or more commas.

Except as   noted otherwise,  all    alphabetic characters   are   case-
insensitive.  The use of upper or  lower case characters to define token
strings is  for editorial  clarity  only.  Implementations   MUST accept
these strings in a case-insensitive fashion.

   astring         ::= atom / string

   atom            ::= 1*ATOM_CHAR

   ATOM_CHAR       ::= <any CHAR except atom_specials>

   atom_specials   ::= "(" / ")" / "{" / SPACE / CTL / list_wildcards /
                       quoted_specials

   CHAR            ::= <any 7-bit US-ASCII character except NUL,
                        0x01 - 0x7f>

   CHAR8           ::= <any 8-bit octet except NUL, 0x01 - 0xff>

   CRLF            ::= CR LF

   CTL             ::= <any ASCII control character and DEL,
                        0x00 - 0x1f, 0x7f>

   date            ::= date_text / <"> date_text <">

   date_day        ::= 1*2digit
                       ;; Day of month
   date_day_fixed  ::= (SPACE digit) / 2digit
                       ;; Fixed-format version of date_day

   date_month      ::= "Jan" / "Feb" / "Mar" / "Apr" / "May" / "Jun" /
                       "Jul" / "Aug" / "Sep" / "Oct" / "Nov" / "Dec"

   date_text       ::= date_day "-" date_month "-" date_year

   date_year       ::= 4digit



Ballou                                                          [Page 5]


INTERNET-DRAFT                                          January 29, 1997


   date_time       ::= <"> date_day_fixed "-" date_month "-" date_year
                       SPACE time SPACE zone <">

   digit           ::= "0" / digit_nz

   digit_nz        ::= "1" / "2" / "3" / "4" / "5" / "6" / "7" / "8" /
                       "9"

   header_fld_name ::= sstring

   list_wildcards  ::= "%" / "*"

   literal         ::= "{" number "}" CRLF *CHAR8
                       ;; Number represents the number of CHAR8 octets

   mstring         ::= A MIME-2 encoded string.

   number          ::= 1*digit
                       ;; Unsigned 32-bit integer
                       ;; (0 <= n < 4,294,967,296)

   nz_number       ::= digit_nz *digit
                       ;; Non-zero unsigned 32-bit integer
                       ;; (0 < n < 4,294,967,296)

   quoted          ::= <"> *QUOTED_CHAR <">

   QUOTED_CHAR     ::= <any TEXT_CHAR except quoted_specials> /
                       "\" quoted_specials

   quoted_specials ::= <"> / "\"

   search          ::= "SEARCH" SPACE ["CHARSET" SPACE astring SPACE]
                       1#search_key
                       ;; [CHARSET] MUST be registered with IANA

   search_key      ::= "ALL" / "BEFORE" SPACE date /
                       "BODY" SPACE sstring / "FROM" SPACE sstring /
                       "ON" SPACE date / "SINCE" SPACE date /
                       "SUBJECT" SPACE sstring / "TEXT" SPACE sstring /
                       "TO" SPACE sstring /
                       "HEADER" SPACE header_fld_name SPACE sstring /
                       "LARGER" SPACE number / "NOT" SPACE search_key /
                       "OR" SPACE search_key SPACE search_key /
                       "SENTBEFORE" SPACE date / "SENTON" SPACE date /
                       "SENTSINCE" SPACE date / "SMALLER" SPACE number /
                       "UID" SPACE set / set / "(" 1#search_key ")"




Ballou                                                          [Page 6]


INTERNET-DRAFT                                          January 29, 1997

   sequence_num    ::= nz_number / "*"
                       ;; * is the largest number in use.  For message
                       ;; sequence numbers, it is the number of messages
                       ;; in the mailbox.  For unique identifiers, it is
                       ;; the unique identifier of the last message in
                       ;; the mailbox.



   set             ::= sequence_num / (sequence_num ":" sequence_num) /
                       (set "," set)
                       ;; Identifies a set of messages.  For message
                       ;; sequence numbers, these are consecutive
                       ;; numbers from 1 to the number of messages in
                       ;; the mailbox
                       ;; Comma delimits individual numbers, colon
                       ;; delimits between two numbers inclusive.
                       ;; Example: 2,4:7,9,12:* is 2,4,5,6,7,9,12,13,
                       ;; 14,15 for a mailbox with 15 messages.

   SPACE           ::= <ASCII SP, space, 0x20>

   sstring         ::= <"> astring <"> | <"> mstring <">

   string          ::= quoted / literal

   TEXT_CHAR       ::= <any CHAR except CR and LF>

   time            ::= 2digit ":" 2digit ":" 2digit
                       ;; Hours minutes seconds






















Ballou                                                          [Page 7]


INTERNET-DRAFT                                          January 29, 1997

7.  Bibliography

[NNTP-977]
     Network News Transfer Protocol.  B. Kantor, Phil Lapsley, Request
     for Comment (RFC) 977, February 1986.

[NNTP-NEW]
     Network News Transfer Protocol.  S.  Barber INTERNET DRAFT, Sep-
     tember 1996.

[IMAP4]
     IMAP4 INTERNET MESSAGE ACCESS PROTOCOL - VERSION 4.  M Crispin,
     Request for Comment (RFC) 1730, December 1994


[MIME-1]
     Borenstein N., and N.  Freed, MIME (Multipurpose Internet Mail
     Extensions) Part One: Mechanisms for Specifying and Describing the
     Format of Internet Message Bodies, RFC 1521, Bellcore, Innosoft,
     September 1993.

[MIME-2]
     Moore, K., MIME (Multipurpose Internet Mail Extensions) Part Two:
     Message Header Extensions for Non-ASCII Text, RFC 1522, University
     of Tennessee, September 1993.


8.  Author's Address

   Nat Ballou
   Microsoft
   One Microsoft Way
   Redmond, WA 98052
   USA

   Phone: +1 206-703-0574
   Email: natba@microsoft.com


                  This Internet Draft expires April xx, 1997.












Ballou                                                          [Page 8]