INTERNET-DRAFT                                                     PICS
<draft-pics-labels-00.txt>                                      MIT/W3C
Expires May 21, 1996                                  November 21, 1995


                Label Syntax and Communication Protocols

Status of this Memo

   This document is an Internet-Draft.  Internet-Drafts are working
   documents of the Internet Engineering Task Force (IETF), its areas,
   and its working groups.  Note that other groups may also distribute
   working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six
   months and may be updated, replaced, or obsoleted by other documents
   at any time.  It is inappropriate to use Internet-Drafts as
   reference material or to cite them other than as "work in progress."

   To learn the current status of any Internet-Draft, please check the
   "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow
   Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe),
   munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or
   ftp.isi.edu (US West Coast).

   Distribution of this document is unlimited.

   Comments on this draft should be sent to
   "pics-spec-comments@w3.org".


1. Introduction

   This document has been prepared for the technical subcommittee of
   PICS (Platform for Internet Content Selection).  It defines a
   general format for labels that permits them to be embedded in
   RFC-822-style headers.  It defines three methods by which PICS
   labels may be transmitted:

   In a document
      One or more labels may be embedded in a document. We specify the
      format and note in particular how to use a META tag to embed
      labels in HTML documents.
   With a document
      An HTTP client can request that labels be sent along with a
      document.  An HTTP server can satisfy the request, by sending the
      labels in RFC-822-style headers.
   Separately
      A client can request labels from a "label bureau" that runs the
      HTTP protocol.  The labels may refer to items available through
      protocols other than HTTP, such as ftp, gopher, or netnews.  The
      simplest implementation of a label bureau is an off-the-shelf
      HTTP server running a special CGI script.


2. General Format

   A label consists of a _service identifier_, _label options_, and a
   _rating_.  The service identifier is the URL chosen by the rating
   service (see [1], "Rating Services and Rating Systems") as its
   unique identifier.  Label options give additional properties of the
   document being rated as well as the rating itself, such as the time
   the document was rated.  The rating itself is a set of
   attribute-value pairs that describe a document along one or more
   dimensions.  One or more labels may be distributed together as a
   list.  The general form for a label list (formatted for
   presentation, and not showing error status codes) is:

      (PICS-1.0
         <service url> [option...]
         labels [option...] ratings (<category> <value> ...)
                [option...] ratings (<category> <value> ...)
                ...
         <service url> [option...]
         labels [option...] ratings (<category> <value> ...)
                [option...] ratings (<category> <value> ...)
                ...
         ...)

   Label options are as follows (some options can be abbreviated, as
   shown):

   at _quoted-ISO-date_
      The last modification date of the item to which this rating
      applies, at the time the rating was assigned.  This can serve as
      a less expensive, but less reliable, alternative to the message
      integrity check (MIC) options.
   by _quotedname_
      An identifier for the person or entity within the rating service
      who is responsible for this particular label.
   comment _quotedname_
      Information for humans who may see the label; no associated
      semantics.
   complete-label _quotedURL_
   full _quotedURL_
      Dereferencing this URL returns a complete label that can be used
      in place of the current one.  The complete label has values for
      as many attributes as possible.  This is used when a short label
      is transmitted for performance purposes but additional
      information is also available.  When the URL is dereferenced it
      returns an item of type application/pics-labels that contains a
      labellist with exactly the one label.
   extension (optional _quotedURL_ _data_*)
   extension (mandatory _quotedURL_ _data_*)
      Future extension mechanism.  To avoid duplication of extension
      names, each extension is identified by a _quotedURL_.  The URL
      can be dereferenced to get a human-readable description of the
      extension.  If the extension is *optional* then software which
      does not understand the extension can simply ignore it; if the
      extension is *mandatory* then software which does not understand
      the extension should act as though no label had been supplied.
      Each item of _data_ must be one of a fixed set of simple-to-parse
      data types as specified in the detailed syntax below.
   for _quotedURL_
      The URL of the item to which this rating applies.
   generic _boolean_
   gen _boolean_
      This label can be applied to any URL starting with the prefix
      given in the *for* option.  This is used to supply ratings for
      entire sites or directories.
   MIC-md5 "_Base64-string_"
   md5 "_Base64-string_"
      A message integrity check (MIC) of the item being rated.  The MD5
      Message Digest Algorithm is used to compute the MIC.  See [2],
      "RFC 1321".
   on _quoted-ISO-date_
      The date on which this rating was issued.
   signature-PKCS "_Base64-string_"
      An RSA digital signature encompassing the label as transmitted,
      signed by the rating service that issued the label.  See section
      14, "MICs and Digital Signatures".
   until _quoted-ISO-date_
   exp _quoted-ISO-date_
      The date on which this rating expires.

3. Example

   For example, a label that uses the example rating system from the
   document [1] "Rating Services and Rating Systems" might be as
   follows:

      (PICS-1.0 "http://www.gcf.org"
         labels on "1994.11.05T08:15-0500"
            until "1995.12.31T23:59-0000"
            for "http://www.gcf.org/index.html"
            by "John Patrick"
            ratings (suds 0.5 density 0 color/hue 1))

   The same label may be transmitted more compactly by converting all
   of the line breaks and subsequent indentation characters into a
   single space, and by replacing the word "labels" with "l", "ratings"
   with "r" and long option names with their abbreviations.  It may be
   compressed for transmission purposes even further by removing all of
   the optional information to a separate document and referencing that
   document by a URL:

      (PICS-1.0 "http://www.gcf.org" l
         full "http://www.gcf.org/labels/13242123"
         r (suds 0.5 density 0 color/hue 1))

   Finally, the optional information may be omitted entirely, reducing
   the information content of the label but making the transmission
   even smaller.  The resulting label would then be:

   (PICS-1.0 "http://www.gcf.org" l r (suds 0.5 density 0 color/hue 1))


4. Detailed Syntax

   The following grammar, in modified BNF, describes the syntax of
   labels.  The methods by which labels are embedded in specific
   protocols are detailed below.

   Notes:

     1.  Whitespace is ignored except in quoted strings.
     2.  The string in a _transmit-name_ is case insensitive.  All
         other strings are case sensitive.
     3.  Option names ("on", "until", "at", etc.) are case insensitive.
     4.  This specification requires the use of US-ASCII.  Note that
         the document [1] "Rating Services and Rating Systems"
         describes how a service can map the US-ASCII transmit-names to
         descriptive strings using other character sets.
     5.  An option that appears in the _service-info_ applies to all
         labels in that _service-info_ unless overridden by an option
         in a specific _label_.  That is, a _label_ is effectively
         lexically nested within the enclosing _service-info_ for the
         purpose of understanding the applicable options.  This is most
         likely to be useful in the case of the "at", "by", "generic",
         "until" and experimental or future options.
     6.  Numbers in PICS labels may be integers or fractions with no
         greater range or precision than that provided by IEEE
         single-precision floating point numbers.
     7.  The _multi-value_ syntax *must* be used when the value on a
         particular (multi-valued) scale has either zero or more than
         one value.  It *may* be used for a single-valued or
         multi-valued field when there is exactly one value, but the
         more compact version may also be used in that case.
     8.  The only options that may occur more than once in a single
         label are "comment" and "extension"; if the "extension"
         option is supplied more than once, the _quotedURL_s defining
         the extensions must be distinct.


   labellist :: '(' 'PICS-1.0' _service-info_+ ')'
   service-info :: 'error' '(no-ratings' _explanation_* ')'
              | _serviceID_ _service-error_
              | _serviceID_ _option_* _labelword_ _label_*
   serviceID :: _quotedURL_
   labelword :: 'labels' | 'l'
   label :: _label-error_ | _single-label_ | '(' _single-label_* ')'
   single-label :: _option_* _ratingword_ '(' _rating_+  ')'
   ratingword :: 'ratings' | 'r'
   quotedURL :: '"' _URL_ '"' as described and extended in [1] "Rating
              Services and Rating Systems.
   option :: 'at' _quoted-ISO-date_
        | 'by' _quotedname_
        | 'comment' _quotedname_
        | 'complete-label' _quotedURL_ | 'full' _quotedURL_
        | 'extension' '(' _mand/opt_ _quotedURL_ _data_* ')'
        | 'generic' _boolean_          | 'gen' _boolean_
        | 'for' _quotedURL_
        | 'MIC-md5' "_base64-string_"  | 'md5' "_base64-string_"
        | 'on' _quoted-ISO-date_
        | 'signature-PKCS' "_base64-string_"
        | 'until' _quoted-ISO-date_    | 'exp' _quoted-ISO-date_
   mand/opt :: 'optional' | 'mandatory'
   data :: _quoted-ISO-date_ | _quotedURL_ | _number_ | _quotedname_
        | '(' _data_* ')'
   quoted-ISO-date :: '"'YYYY'.'MM'.'DD'T'hh':'mmStz'"'
     based on the ISO 8601:1988 date and time standard, restricted
     to the specific form described here:
     YYYY :: four-digit year
     MM :: two-digit month (01=January, etc.)
     DD :: two-digit day of month (01 through 31)
     hh :: two digits of hour (00 through 23) (am/pm NOT allowed)
     mm :: two digits of minute (00 through 59)
     S  :: sign of time zone offset from UTC ('+' or '-')
     tz :: four digit amount of offset from UTC
           (e.g., 1512 means 15 hours and 12 minutes)
     For example, "1994.11.05T08:15-0500" is a valid _quoted-ISO-date_
     denoting November 5, 1994, 8:15 am, US Eastern Standard Time.
     Note:  The ISO standard allows considerably greater flexibility
     than that described here.  PICS requires *precisely* the syntax
     described here -- neither the time nor the time zone may be
     omitted, none of the alternate formats are permitted, and the
     punctuation must be as specified here.
   rating :: _transmit-name_ _number_ | _transmit-name_ '(' _multi-value_* ')'
   multi-value :: _number_ | _number_ ':' _number_
   transmit-name :: [1*n]_alphanumpm_ ['/' _transmit-name_]
   number :: [_sign_]_unsignedint_['.' [_unsignedint_]]
   sign :: '+' | '-'
   unsignedint :: [1*n][0-9]
   quotedname :: ' " ' [1*n]_extendedalphanum_ ' " '
   alphanumpm :: 'A' | ... | 'Z' | 'a' | ... | 'z' | '+' | '-'
   extendedalphanum :: _alphanumpm_ | '.' | ' ' | ',' | ';' | ':'
              | '&' | '=' | '?' | '!' | '*' | '~' | '@' | '#'
   base64-string :: as defined in [3] "RFC 1521".
   service-error :: 'error' '(' 'request-denied' _explanation_* ')'
              | 'error' 'service-unavailable'
   label-error :: 'error' '(' request-denied' [_quotedURL_
              _explanation_*] ')'
              | 'error' '(' not-labeled' _quotedURL_* ')'
   explanation :: _quotedname_


5. Semantics of PICS Labels and Label Lists

   A _labellist_ is used to transmit a set of PICS labels.  The format
   specified here is intended to be registered with IANA as the MIME
   type "application/pics-labels."  It allows for transmission of both
   labels and reasons why labels are not available, and is the format
   used when labels must be conveyed in a document, along with a
   document, or from a PICS label bureau.  The _labellist_ will always
   be surrounded by parentheses and begin with the PICS version number
   (1.0 in this specification).

   A label list either specifies that there are no labels available at
   all ("error (no-ratings ...)") or is separated into sections of
   labels, one section for each rating service.  The URL of each
   service must be specified (the _serviceID_).  This is either
   followed by an error message indicating why no labels are available
   from that service (_service-error_) or an overall set of optional
   information (_option_*) followed by the keyword "labels" (or "l")
   and the _label_s from the service.  The optional information
   provided here applies to every label from the service, unless
   overridden in the specific label itself.

   A _label_ encompasses three separate cases.  The first is an error
   that applies to retrieving the label for a particular URL
   (_label-error_).  The second, and most common, is a _single-label_
   consisting of options (which override those specified with the
   service), the marker word "ratings" (or "r") and the ratings
   themselves (a list of category names and values).  Finally, in the
   special case where the ratings for an entire tree of documents have
   been requested, any number of _single-label_s can be transmitted,
   enclosed in parentheses.  This case is described in more detail in
   the section on "Requesting Labels Separately".

   A label may apply to a specific URL, or it may be generic.  A
   generic label implicitly rates every URL for which the specified one
   is a prefix.  For example, a generic label for the URL
   "http://www.gcf.org" implicitly rates every document available at
   that site.  A regular (non-generic) label for the same URL,
   "http://www.gcf.org", does not give any implicit ratings: it merely
   rates the organization's home page that is fetched by the command
   "GET / " sent by HTTP to the host "www.gcf.org".  A generic label
   *must* include the "for" option specifying the URL to which it
   applies.

   When a _multi-value_ is provided, any combination of numbers and
   ranges of numbers may be specified, with the endpoints of a range
   separated by a ":".  Thus, in the labellist

      (PICS-1.0 "http://www.gcf.org" l
         r (suds 0.5 density 0 color/hue 1 subject (0.5:2.5 3)))

   all subject values between 0.5 and 2.5 (including both endpoints)
   apply to the item, as does the subject value 3.  Given the example
   service description in [1], Rating Services and Rating Systems", all
   three document subjects apply, "soap", "water", and "soapdish".


6. RFC 822 Headers

   Many protocols, such as Internet electronic mail, the HyperText
   Transfer Protocol, and USENET News, use ASCII headers as described
   in RFC 822.  For use in such protocols, we define a new header,
   PICS-Label, used to contain the labels described in this document.
   The syntax is:

      PICS-Label: <labellist>

   where _labellist_ is described according to the syntax above.
   Continuation lines beginning with whitespace may be used following
   the specification given in RFC 822.


7. Embedding Labels in HyperText Markup Language (HTML)

   Labels may be embedded in HTML files as meta-information, using the
   META element defined in the HTML specification.  This embedding uses
   the HTTP header equivalency mechanism:

      <META http-equiv="PICS-Label" content='_labellist_'>

   (Note that the content attribute uses single quotes, because the
   PICS label syntax uses double quotes.  Any of the following
   characters appearing within the content must be escaped using SGML
   entities:

      '       &#39;           /* single quote */
      &       &amp;           /*  ampersand   */
      >       &gt;            /* greater than */

   See [4], the "HTML 2.0 Proposed Standard".


8. Sending Labels With A Document

   When an HTTP server sends a document to a client, it sends
   additional headers as well.  We specify how the client can request
   that one or more labels be included in a header.  HTTP servers
   should include PICS label headers only if requested to do so by the
   client, and should only include the labels from services requested
   by the client.

   Example:

   Client sends to HTTP server www.greatdocs.com:

      GET foo.html HTTP/1.0
      Accept-Protocol:
        {PICS-1.0 {params full
                    {services "http://www.gcf.org/ratings"}}}

   Server responds to client:

      HTTP/1.0 200 OK
      Date: Thursday, 30-Jun-95 17:51:47 GMT
      MIME-version: 1.0
      Last-modified: Thursday, 29-Jun-95 17:51:47 GMT
      Protocol: {PICS-1.0 {headers PICS-Label}}
      PICS-Label:
         (PICS-1.0 "http://www.gcf.org" labels
         on "1994.11.05T08:15-0500"
         exp "1995.12.31T23:59-0000"
         for "http://www.gcf.org/index.html"
         by "George Sanderson, Jr."
         ratings (suds 0.5 density 0 color/hue 1))
      Content-type: text/html
      ...contents of foo.html...

   Explanation of example:

   The client requests the document foo.html.  In addition, the client
   requests the full label of the document from the rating service
   "http://www.gcf.org/ratings".  The server responds by sending back
   the label, in the PICS-Label header, as well as the document.  The
   format of the PICS-Label header field (a _labellist_) allows the
   server to respond either with a label or an explanation of why the
   label is not available, since it would be inappropriate for the
   server to generate an HTTP error status if the document is available
   but (some of) the labels are not.

   Following the usual HTTP distinction between HEAD and GET, a client
   that wishes to examine a rating before retrieving the full document
   can substitute the word HEAD for GET in the request.  The server
   responds with exactly the headers shown above, but does not send
   back the document "foo.html".


9. Detailed Syntax of HTTP Requests for Labels With Document

   The following grammar, in modified BNF, describes the syntax of the
   additional header line to be included in an HTTP request for a
   document and associated labels.

   accept-header ::
      'Accept-Protocol: {PICS-1.0 {params ' [_completeness_]
         _extension_* _services_ '}}'
   completeness :: 'minimal' | 'short' | 'full' | 'signed'
   extension :: '{' _token-or-quoted-string_+ '}'
      where the first _token-or-quoted-string_ is not 'services'.
   token-or-quoted-string :: _token_ | _quotedname_
   token :: [1*n]_alphanumpm_
   services :: '{' 'services' _quotedURL_+ '}'

   A request for a *minimal* label asks that all options be omitted,
   unless a generic label is returned, in which case the "generic" and
   "for" options must also be included in the label.  A *short* label
   includes everything that is included in a minimal label, plus
   additional options that the server deems appropriate.  A request for
   a *full* label asks that as much information as possible should be
   sent back in the label, either directly or through the use of a
   "complete-label" (or "full") option, but no "signature-PKCS" option
   is needed.

   A request for *signed* labels asks that all the information in a
   "full" label should be sent, along with a digital signature on the
   label itself.  In a signed label the information must be transmitted
   directly as part of the label (and included in the computation of
   the signature); the "complete-label" (or "full") option may be sent,
   but it would be redundant.  Details of signing labels are included
   in section 14, "MICs and Digital Signatures".

   It is acceptable for a server to ignore the _completeness_, either
   by delivering more or fewer options than requested.  If the
   _completeness_ is omitted, it should be treated as though "minimal"
   had been supplied.  For future extensibility, any alphanumeric
   string may be used for a value of the _completeness_ option.
   Servers which receive a value of _completeness_ that they do not
   recognize must treat it as though "minimal" had been specified.

   The _extension_s are for future extensions to the protocol; any
   extensions which are not understood by the server must be ignored by
   it.  It is recommended that experimental extensions use a URL, which
   dereferences to a description of the extension, as the initial
   _token-or-quoted-string_.

   Each _service_ specifies a rating service from which the client is
   requesting a label for the document.  There may be as many
   repetitions of the _service_ part of the query as desired.


10. Detailed Syntax For HTTP Response Headers for Labels With Document

   Two additional headers are specified:

   protocol-header :: 'Protocol: {PICS-1.0 {headers PICS-Label}}'
   label-header :: 'PICS-Label: ' _labellist_


11. Requesting Labels Separately

   PICS labels can also be retrieved separately from the documents to
   which they refer.  To request labels in this way, a client contacts
   a *label bureau*.  A label bureau is an HTTP server that understands
   a particular query syntax, defined below.  It can provide labels for
   documents that reside on other servers, and, indeed, for documents
   available through protocols other than HTTP.  It is anticipated that
   there will be "well-known" label bureaus which dispense (possibly
   for a fee) labels created by many rating services.

   Rating services are also encouraged to act as label bureaus,
   providing on-line access to their own labels. By default, the URL
   that identifies a rating service also identifies its label bureau.
   If a client requests the URL that identifies a rating service, a
   human-readable description of the service is returned, as specified
   in [1], "Rating Services and Rating Systems".  If, on the other
   hand, a client requests the same URL and includes query parameters
   as defined below, it should be interpreted as a request for labels.
   A rating service, however, is not required to act as a label bureau,
   and it may choose a different URL (perhaps even on a different HTTP
   server) to act as its label bureau.

   Sample Query:

   Imagine a rating service, identified by the URL
   "http://www.labels.org/Ratings", which decides to run a label bureau
   to dispense (at least) its own labels for documents.  The following
   sample request, made to the HTTP server "www.labels.org", is
   illustrative (line breaks are inserted for presentation purposes
   only):

      GET /Ratings?opt=generic&
         u="http%3A%2F%2Fwww.questionable.org%2Fimages"&
         s="http%3A%2F%2Fwww.gcf.org%2Fratings"&
         HTTP/1.0

   The query asks the label bureau "http://www.labels.org/Ratings" to
   send a single label that applies to everything in the images
   directory at site "www.questionable.org".  The desired label should
   have been created by the service "http://www.gcf.org/ratings".
   Notice the use of %3A to represent a ":" and %2F for "/".  This is
   required for encoding characters within a URL.  See [5], "RFC 1738".

   The label bureau responds by sending back a document of type
   "application/pics-labels."  The labels should be as complete as
   possible, either by including as many options as possible or by
   supplying the "complete-label" (or "full") option.


12. Detailed Syntax and Semantics of HTTP Query for Labels Separate
    From Documents

   The following grammar, in modified BNF, describes the syntax of the
   GET request to a label bureau:

   get :: 'get' _url-fragment_ '?' [_opt_] [_format_] _extension_*
      _url_+ _service_+
   url-fragment :: the part of the original URL after the host name, as
      specified in HTTP 1.0.
   opt :: 'opt=' _option_
   option :: 'generic' | 'normal' | 'tree' | 'generic+tree'
   format :: [and] 'format=' _form_
   form :: 'minimal' | 'short' | 'full' | 'signed'
   extension :: _token_ '=' _token-or-quoted-string_
      where the _token_ is not one of "opt", "format", "u", or "s"; and
      _token-or-quoted-string_ follows the quoting conventions
      specified in [5], "RFC 1738".
   token-or-quoted-string :: _token_ | _quotedname_
   token :: [1*n]_alphanumpm_
   url :: [and] 'u=' encodedURL
   service :: [and] 's=' encodedURL
   boolean :: 't' | 'f' | 'true' | 'false'
   and :: '&' this must be included unless it immediately follows the ?
      in the query.
   encodedURL :: a URL, with quotation as required for inclusion within
   another URL.  According to [5], "RFC 1738", quotation is done using
   "%xx" notation.  Alphabetic characters, digits, and the special
   characters $_-.+!*'(), need not be quoted, but other characters must
   be.  This *does* imply that the colon (:) must be encoded as %3A and
   slash (/) as %2F.

   Notes:

      1.  "opt=generic" requests generic labels.  For each requested
          URL, the desired response is a generic label that implicitly
          applies to all URLs matching it.  This is useful for
          requesting a rating of a site or directory.
      2.  "opt=tree" requests a tree of labels.  For each requested
          URL, the desired response is all labels for URLS that match
          it.  This is a way to request all the labels for items in a
          directory or a site.  In the response, everywhere a _label_
          would normally be expected in the response, a set of
          _simple-label_s will be returned, surrounded by parentheses.
      3.  "opt=generic+tree" requests all generic labels that apply to
          matching URLs.  This is a way to request generic labels for
          all of the directories at a site.  In the response,
          everywhere a _label_ would normally be expected in the
          response, a set of _simple-label_s will be returned,
          surrounded by parentheses.
      4.  "opt=normal", or omitting the "opt" completely, requests
          specific labels for the URLs specified.
      5.  It is permitted to include more than one URL in the request.
      6.  The "format=" specifies the optional information that should
          be transmitted with the labels.  It is treated precisely as
          the similar keywords would be when sent to a document server
          as the "completeness" (see section 9), except that the
          default is "full" (rather than "minimal").  Servers which
          receive a value of "completeness" that they do not recognize
          must treat it as though the default, "full" had been
          specified.


13. Detailed Syntax and Semantics of Response to Query for Labels
    Separate From Documents

   The label bureau responds by sending back a document of type
   "application/pics-labels".  Unless the document indicates an overall
   error, there should be one _service-info_ for each rating service
   requested in the query.  Each _service-info_ should have an error
   message or a label (or list of labels, in the case of a "tree"
   query) for each requested URL.

   The query's ordering must be preserved in the response. That is, the
   information from the rating services must be presented in the same
   order the rating services appear in the query, and the labels from
   each service must be presented in the same order the URLs appear in
   the query.  If a rating service or label is not provided, the error
   message should appear in the same position that the _service-info_
   or label would appear.  Because order is preserved, it is acceptable
   to omit from the labels the "for" option which indicates the URL
   being rated (*unless* the label is "generic" in which case, as
   always for generic labels, the "for" is required).  The client
   should match the label positionally with the URL for which it
   requested a rating.

   In response to a request for a generic label, only a generic label
   may be returned.  In response to a request for a regular label, a
   generic label for a URL that is a prefix of the requested URL may be
   returned.  For example, in response to a label request for URL
   "http://www.gcf.org/index.html" a generic label for the URL
   "http://www.gcf.org" may be returned.  In this case, it is required
   that the "for" and "generic" options be included in the label, to
   specify exactly what rating is being returned.

   For a tree request, all the labels sent in response to a particular
   URL are enclosed in parentheses, so the client can match them
   positionally with the single request URL.  The "for" option must be
   included in such labels to specify exactly which URLs the labels
   apply to.


14. MICs and Digital Signatures

   This section remains to be specified.  There are three particular
   difficulties that must be addressed:

      1.  On what data is the MIC included in the _mic-md5_ (or
          _md5_) option computed?  In particular, if the URL
          "ftp://www.somewhere.com/Pictures/Interesting/Look.gif"
          refers to a compressed GIF image, is the MIC computed on the
          compressed or uncompressed form?  Does it depend on the
          content-transfer-encoding?  The MIME type?
      2.  How is the label canonicalized before computing the digital
          signature?  Because header lines can be folded by various
          transports, it is important that a canonical form be
          carefully defined.  Clearly, it should not include the
          signature itself, but does it include all of the other
          optional fields?  Does a signed label necessarily imply a
          full label (hence the distinction should be dropped)?
      3.  How are the public keys for rating services distributed?  Can
          it be done using a variant on the same technique used for
          communicating with a label bureau or is a full certificate
          authority required?  What authority should be used or can
          multiple be used?  Is the service's URL a satisfactory
          distinguished name for use with a certificate authority?


15. Security Considerations

   Security considerations will be addressed in future revisions of
   this draft.


16. Glossary


   application/pics-service
      A new MIME data type used to describe a _rating service_,
      defined in [1], "Rating Services and Rating Systems".
   application/pics-labels
      A new MIME data type used to transmit one or more _labels_,
      defined in this document.
   BNF
      Backus-Naur Form (or Backus Normal Form).  A notation for
      describing a formal syntax, used extensively in describing
      programming languages and computer-readable data formats.
   category
      The part of a rating system which describes a particular
      criterion used for rating.  For example, a rating system might
      have three categories named "sexual material," "violence," and
      "vocabulary."  Also called a _dimension_.
   content label
      A data structure containing information about a given document's
      contents.  Also called a _rating_ or _content rating_.  The
      content label may accompany the document it is about or be
      available separately.
   content rating
      See _content label_.
   dimension
      See _category_.
   HTML
      HyperText Markup Language.  A means of representing _hypertext_
      documents.  Based on _SGML_.  See [4], the "HTML 2.0 Proposed
      Standard".
   HTTP
      HyperText Transfer Protocol.  Used for retrieving document
      contents and/or descriptive header information.
   hypertext
      Text, graphics, and other media connected through links.
   label
      See _content label_.
   MD5
      An algorithm, see [2], "RFC 1321", that can be used to compute a
      MIC.  PICS specifies this particular algorithm for use in PICS
      labels.
   MIC
      Message Integrity Check.  Also known as a "cryptographic
      checksum."  For PICS, the importance of a MIC is that a rating
      service can compute the MIC of a piece of information when the
      label is created and that MIC can be put into the label itself.
      A client can retrieve the label and the information to which it
      is supposed to be attached, recompute the MIC and compare it to
      the one in the label.  If they match, for all practical purposes,
      it is a proof that the label really belongs to the information
      that has been retrieved.  The particular algorithm specified by
      PICS to compute the MIC is MD5.
   MIME
      Multimedia Internet Message Extension.  A technique for sending
      arbitrary data through electronic mail on the Internet.  See [3],
      "RFC 1521".
   PICS
      Platform for Internet Content Selection, the name for both the
      suite of specification documents of which this is a part, and for
      the organization writing the documents.  For more information,
      see the PICS home page on the World Wide Web at:
      "http://www.w3.org/PICS".
   rating
      See _content label_.
   label bureau
      A computer system which supplies, via a computer network, ratings
      of documents.  It may or may not provide the documents
      themselves.
   rating server
      See _label bureau_.
   rating service
      An individual or organization that assigns labels according to
      some rating system, and then distributes them, perhaps via a
      label bureau or via CD-ROM.
   rating system
      A method for rating information.  A rating system consists of one
      or more _categories_.
   scale
      The range of permissible values for a category.
   SGML
      Standard Generalized Markup Language.  See ISO 8879.
   transmission name
      (of a _category_)  The short name intended for use over a
      network to refer to the category.  This is distinct from the
      category name in as much as the transmission name must be
      language-independent, encoded in ASCII, and as short as
      reasonably possible.  Within a single _rating system_ the
      transmission names of all categories must be distinct.
   URL
      Uniform Resource Locator.  Described in [5], "RFC 1738".  A URL
      describes the location and means of retrieval for a single
      document.  It consists of three components: the "scheme"
      (protocol used to retrieve a document, like "http" or "ftp"), a
      host name, and a hierarchical document name within that host.
      For example "http://www.w3.org/PICS" is the URL of the PICS home
      page.  The scheme for retrieving it is "http," the host is
      "www.w3.org" and the name within that host is "PICS".


17. References

   [1] PICS, "Rating Services and Rating Systems", Internet Draft,
       "draft-pics-services-00.txt", 11/21/95.
   [2] R. Rivest, "The MD5 Message-Digest Algorithm", RFC 1321,
       04/16/1992.
   [3] N. Borenstein, N. Freed, "MIME  (Multipurpose Internet Mail
       Extensions) Part One:  Mechanisms for Specifying and Describing
       the Format of Internet Message Bodies", RFC 1521, 09/23/1993.
   [4] T. Berners-Lee, D. Connolly, "Hypertext Markup Language - 2.0",
       RFC 1866, 11/03/1995.
   [5] T. Berners-Lee, L. Masinter, M. McCahill, "Uniform Resource
       Locators (URLs)", RFC 1738, 12/20/94.


18. Acknowledgments

   Primary authors of this document:

   Tim Krauskopf, Spyglass
   Jim Miller, W3C
   Paul Resnick, AT&T
   G. Winfield Treese, OpenMarket

   Additional contributors:

   Brenda Baker, AT&T
   Tim Berners-Lee, W3C
   Roxana Bradescu, AT&T
   Daniel W. Connolly, W3C
   Roy Fielding, W3C
   Jay Friedland, SurfWatch
   Michael Gordon, Prodigy
   Wayne Gramlich, Sun
   Woodson Hobbs, NewView
   Rohit Khare, W3C
   Charlie Kim, Apple
   John C. Klensin, MCI
   Ann McCurdy, Microsoft
   Rich Petke, CompuServe
   Dave Raggett, W3C
   Bob Schloss, IBM
   David Singer, IBM
   Michael Smith, Prodigy
   Marcy Swenson, Providence Systems
   Jason Thomas, MIT


19. Author's Address

   PICS Technical Committee
   World Wide Web Consortium
   545 Technology Square
   Cambridge, MA  02139
   Phone: 617-253-3194
   EMail: pics-spec-comments@w3.org


Temporary Appendix A: Why HTTP For Label Bureaus

   This section is not expected to be contained in future versions of
   this document.

   Instead of extending HTTP, we considered proposals for
   special-purpose label transport protocols.  Before making a final
   decision, we constructed the following lists of pros and cons.

   Advantages of Using HTTP

   o  An existing HTTP server can be used as a PICS label bureau.  This
      is particularly useful in the short term.  CGI scripts at the
      HTTP server can handle the special header fields of a request for
      labels.
   o  A label returned from a label bureau and a label returned along
      with a document from an HTTP server can use identical label
      formats.
   o  Client programs that already support HTTP will have much less new
      code to implement.
   o  Client programs that do not support HTTP will have to support a
      new protocol in any case.  It may be easier to support HTTP than
      a newly defined label transport protocol, because of available
      software libraries.
   o  Several protocol elements are already fully specified by HTTP
      that would be required in any PICS protocol.

        o Date and time formats.
        o Content encoding types.
        o Character set and Internationalization issues.
        o Error/result conditions. Both result categories (extensible),
          as well as a sample set of messages are specified.
        o Handling of expiration dates for each URL queried.

   o  HTTP is quite stable, has not diverged, and is well accepted.
   o  Security and payment systems either exist or are being developed
      for HTTP.  A binary format may also be developed for speed. PICS
      need not reinvent such systems.
   o  Firewalls tend to allow HTTP headers to be transmitted already.
      A new protocol would take much longer to be accepted.
   o  A reliable connection (initially TCP based), ASCII-based protocol
      seems desirable initially.
   o  Current extensibility already defines how extensions to PICS
      itself should be accomplished.

   Advantages of Creating a New Protocol Instead of Using HTTP

   o  A new protocol would avoid any HTTP protocol wars.
   o  Label bureaus and clients would not need to be updated to
      accommodate HTTP changes.
   o  RFC 822 and other precedents could still be used in the design of
      a new protocol.
   o  A binary format could be considered initially for speed.
   o  UDP or other datagram lookups could be considered.


Temporary Appendix B: FAQ - Frequently Asked Questions

   This section is not expected to be contained in future versions of
   this document.

   Why is there no ftp, gopher, or netnews protocol for requesting
   labels along with a document?

      Labels can be sent as additional headers in any protocol that
      employs RFC 822 style headers.  We have not yet determined,
      however, convenient extensions to protocols other than HTTP to
      permit requests that ask for labels created by specific services.
      We may specify such extensions in the future.

   How do you get labels for items on FTP, Gopher, or netnews servers?
   Are we forcing all FTP implementations to implement all of HTTP as
   well?

      FTP, Gopher, and netnews servers need not distribute PICS labels.
      Labels for items on such servers can be retrieved from an
      HTTP-based label bureau.

      The PICS premise is that all compliant clients will have to
      implement some new protocol. The subset of HTTP which would be
      required for obtaining a PICS label can be minimal. HTTP will be
      no more difficult to implement in an FTP (or other) client than a
      brand-new protocol that provides similar features.

   Can existing HTTP servers be used as PICS label bureaus?

      Using CGI scripts, or with a small amount of added code in the
      HTTP server, an existing HTTP server can be configured to access
      a database of labels and return that information coded as
      additional HTTP Headers.  Most of the work is in the lookup and
      formatting of the labels themselves, not the modifications to
      HTTP.

   How do I design a really fast PICS label bureau?  Won't the overhead
   be too much?

      HTTP already explicitly defines the minimum fields required and
      then what rules must be followed when additional information is
      useful to the transaction.  For example, HTTP does not require
      that clients provide "Accept:" headers to indicate preferred MIME
      types for the content, but if they are provided, servers can
      match up available formats with the client's request.  An HTTP
      server may be designed to optimize throughput or to optimize the
      appearance of the result, or to adjust to the client software's
      preference.

      If you minimize the server's response to one line, plus the label
      information, you are already dealing with the minimum amount of
      data transfer possible to obtain a label. In addition, most
      performance issues for PICS will probably be addressed with
      caching, not by reducing lookup time for a single label.  Caching
      optimization requires meta-data which can be easily encoded
      within HTTP headers.

   How can we keep the PICS extensions from getting tied up in HTTP
   standardization?

      The management of header extensions for HTTP has been an issue of
      discussion and work by the HTTP group for some time.  The HTTP
      specification lays down specific rules for the handling of
      extensions which guarantee that those extensions will not be made
      invalid by any revisions of HTTP itself.  In addition, the W3C is
      working on a system (PEP) for managing and negotiating HTTP
      extensions even more intelligently.

      The worst risk seems to be that HTTP could be upgraded to a new
      revision level forcing some HTTP implementations to support
      multiple versions (1.0 and 2.0, for example) or forcing some PICS
      bureaus to update their protocol as well.  Hopefully a major
      update in HTTP would bring enough benefits for PICS to make any
      update worthwhile.

   What is PEP and Why is PICS Using It?

      The Protocol Extension Proposal from the World Wide Web
      Consortium uses a trio of header fields (Protocol,
      Accept-Protocol, and Content-Encoding) to allow a HTTP client and
      server to do sophisticated negotiation about the set of header
      fields and their meanings.  It is being proposed for use in HTTP
      1.2 and HTTP-ng, and is currently under careful scrutiny by the
      W3C Security Editorial Board to make sure that it contains the
      features necessary to provide security for general document
      transmission as well as electronic payments.

      PICS faces many of the same problems that face the security and
      electronic payment community. In PICS the issue revolves around
      the ability for the client to tell the server from which rating
      services it would like to have labels.  This is a simple
      negotiation problem of the kind PEP was designed to solve.
      Rather than invent an orthogonal mechanism it seemed best to use
      one that is already being proposed and investigated.

   What if PEP Does Not Catch On?

      If the general extension mechanism specified by PEP does not
      become a generic feature of HTTP servers, PICS label bureaus will
      need to look for the specific header line beginning
      Accept-Protocol: PICS/1.0 and process it to determine the rating
      request.  PICS clients will need to look for and process the
      specific header lines PICS-Label and PICS-Status.  We will also
      have to hope that no other group tries to extend HTTP in a way
      that uses headers named PICS-Label or PICS-Status.

This Internet Draft Expires on May 21, 1996