IETF conneg working group                                 Graham Klyne
Internet draft                                5GM/Content Technologies
Category: Work-in-progress                            12 February 1999
                                                  Expires: August 1999


                Identifying composite media features
               <draft-ietf-conneg-feature-hash-00.txt>


Status of this memo

  This document is an Internet-Draft and is in full conformance with
  all provisions of Section 10 of RFC2026.

  Internet-Drafts are working documents of the Internet Engineering
  Task Force (IETF), its areas, and its working groups.  Note that
  other groups may also distribute working documents as Internet-
  Drafts.

  Internet-Drafts are draft documents valid for a maximum of six
  months and may be updated, replaced, or obsoleted by other
  documents at any time.  It is inappropriate to use Internet- Drafts
  as reference material or to cite them other than as "work in
  progress."

  To view the list Internet-Draft Shadow Directories, see
  http://www.ietf.org/shadow.html.

Copyright Notice

  Copyright (C) The Internet Society 1999.  All Rights Reserved.

Abstract

  In "A syntax for describing media feature sets" [1], an expression
  format is presented for describing media feature capabilities as a
  combination of simple media feature tags [2].

  This document proposes an abbreviated format for a composite media
  feature set, based upon a hash of the feature expression describing
  that composite.










Klyne                      Work-in-progress                   [Page 1]


Internet Draft                   Identifying composite meadia features
12 February 1999



Table of contents

  1. Introduction ............................................2
     1.1 Organization of this document                        2
     1.2 Terminology and document conventions                 3
  2. Motivation and goals ....................................3
  3. Composite feature representation ........................4
     3.1 Feature set reference format                         4
     3.2 Hash value calculation                               5
     3.3 Dereferencing feature set expressions                6
       3.3.1 Inline feature set details                       6
       3.3.2 URI reference                                    6
     3.4 The birthday problem                                 7
  4. Feature set resolution and matching .....................8
  5. Examples ................................................8
  6. Internationalization considerations .....................8
  7. Security considerations .................................9
  8. Full copyright statement ................................9
  9. Acknowledgements ........................................10
  10. References .............................................10
  11. Author's address .......................................11
  Appendix A: Revision history ...............................11


1. Introduction

  In "A syntax for describing media feature sets" [1], an expression
  format is presented for describing media feature capabilities as a
  combination of simple media feature tags [2].

  This document proposes an abbreviated format for a composite media
  feature set, based upon a hash of the feature expression describing
  that composite.

1.1 Organization of this document

  Section 2 sets out somne of the background and goals for feature
  set references.

  Section 3 preents a syntax for feature set references, and
  describes how they are related to feature set expressions.

  Section 4 discusses how feature set references are used in conction
  with feature set matching.







Klyne                      Work-in-progress                   [Page 2]


Internet Draft                   Identifying composite meadia features
12 February 1999


1.2 Terminology and document conventions

  This section defines a number of terms and other document
  conventions, which are used with specific meaning in this memo.
  The terms are listed in alphabetical order.

  dereference
            the act of replacing a feature set reference with its
            corresponding feature set expression.

  feature set
            some set of media features described by a media feature
            assertion, as described in "A syntax for describing media
            feature sets" [1].  (See that memo for a more formal
            definition of this term.)

  feature set expression
            a string that describes some feature set, formulated
            according to the rules in "A syntax for describing media
            feature sets" [1] (and possibly extended by other
            specifications).

  feature set reference
            a brief construct that references some feature set.
            (See also: "dereference".)

  This specification uses syntax notation and conventions described
  in RFC2234 "Augmented BNF for Syntax Specifications: ABNF" [3].

       NOTE:  Comments like this provide additional nonessential
       information about the rationale behind this document.
       Such information is not needed for building a conformant
       implementation, but may help those who wish to understand
       the design in greater depth.

2. Motivation and goals

  The range of media feature capabilities of a message handling
  system can be quite extensive, and the corresponding feature set
  expression [1] can reach a significant size.

  A requirement has been identified to allow recurring feature sets
  to be identified by a single reference value, which can be combined
  with other elements in a feature set expression.  It is anticipated
  that mechanisms will be provided that allow the recipient of such a
  feature set reference to discover the corresponding feature set
  expression.





Klyne                      Work-in-progress                   [Page 3]


Internet Draft                   Identifying composite meadia features
12 February 1999


  Thus, the goals for this proposal are:

  o  to provide an abbreviated form for referencing an arbitrary
     feature set expression.

  o  the meaning of (i.e. the corresponding feature set expression) a
     feature set reference should be independent of any particular
     mechanism that may be used to dereference it.

  o  to be able to verify whether a given feature set expression
     corresponds to some feature set reference without having to
     perform an explicit dereferencing operation (i.e. without
     incurring additional network traffic).

  o  for protocol processors that conform to [1] to be able to
     sensibly handle a feature set reference without explicit
     knowledge of its meaning (i.e. the introduction of feature set
     references should not break existing feature expression
     processors).

  o  to allow, but not require, some indication of how to dereference
     a feature set reference to be included in a feature set
     expression.

  This proposal does not attempt to address the "override" or
  "default" problem.  (Also called "delegation", where a feature set
  may be referenced and selectively overridden.)

3. Composite feature representation

  This specification hinges on two central ideas:

  o  the use of auxiliary predicates (introduced in [1]) to form the
     basis of a feature set reference, and

  o  the use of a token based on a hash function computed over the
     referenced feature set expression.

3.1 Feature set reference format

  This specification introduces a special form of auxililiary
  predicate name with the following syntax:

     fname = "h." 1*HEXDIG

  The sequence of hexadecimal digits is the value of a hash function
  calculated over the corresponding feature set expression (see next
  section), represented as a hexadecimal number.




Klyne                      Work-in-progress                   [Page 4]


Internet Draft                   Identifying composite meadia features
12 February 1999


  Thus, within a feature set expression, a feature set reference
  would have the following form:

     (h.123456789abcdef0123456789abcdef0)

       NOTE:  Base64 representation (per MIME [4]) would be more
       compact (21 rather than 32 characters for the hash
       value), but an auxiliary predicate name is defined (by
       [1]) to have the same syntax as a feature tag, and the
       feature tag matching rules (per [2]) state that feature
       tag matching is case in sensitive.

3.2 Hash value calculation

  The hash value is calculated using the MD5 algorithm [6] over the
  text of the referenced feature set expression subjected to certain
  normalizations.  The feature expression must conform to the syntax
  given in "A syntax for describing media feature sets" [1] for
  'filter':

     filter = "(" filtercomp ")" *( ";" parameter )

  The steps for calculating a hash value are:

  1. Whitespace normalization:  all spaces, CR, LF, TAB and any other
     layout control characters that may be embedded in the feature
     expression string are removed (or ignored for the purpose of hash
     value computation).

  2. Case normalization:  all lower case letters in the feature
     expression, other than those contained within quoted strings, are
     converted to upper case.  That is, unquoted characters with
     values 97 to 122 (decimal) are changed to corresponding
     characters in the range 65 to 90.

  3. Hash computation:  the MD5 algorithm [6] is applied to the
     normalized feature expression string.

  The result obtained in step 3 is a 128-bit number that is converted
  to a hexadecimal representation to form the feature set reference.

       NOTE: under some circumstances, removal of ALL whitespace
       may result in an invalid feature expression string.  This
       should not be a problem as significantly different
       feature expressions are expected to differ in ways other
       than their whitespace.

       NOTE: case normalization is deemed appropriate since
       feature tag and token matching is case insensitive.



Klyne                      Work-in-progress                   [Page 5]


Internet Draft                   Identifying composite meadia features
12 February 1999


3.3 Dereferencing feature set expressions

  This memo does not mandate any particular mechanism for
  defeferencing a feature set reference.  It is expected that
  specific dereferencing mechanisms will be specified for any
  application that uses them.

  The following sections describe two specific ways that feature set
  dereferencing information may be incorporated into a feature set
  expression.  Both of these mechanisms are based on auxiliary
  predicate definitions within a "where" clause [1].

       NOTE: both of the forms described below may be used with
       feature set references that are not constructed as
       "h.<hash>" values described above.  The consequence of
       not using hash-based reference values is that feature set
       differences, changes or other errors may be undetectable.

3.3.1 Inline feature set details

  The feature set expression associated with a reference value may be
  specified directly in a "where" clause, using the auxiliary
  predicate definition syntax [1];  e.g.

     (& (dpi=100) (h.1234567890) )
     where
     (h.1234567890) :- (& (pix-x<=200) (pix-y<=150) )

  This form might be used on request (where the request mechanism is
  defined by the invoking application protocol), or when the
  originator believes the recipient may not understand the reference.

3.3.2 URI reference

  This and associates a URI with a feature set reference.

       NOTE:  How a calling application interprets the URI is
       not specified here.  For URIs that are URLs, one
       reasonable approach would be to use the URL scheme
       protocol to access the corresponding feature set
       expression.  But other mechanisms are possible.

       [[[e.g. RESCAP?]]]

  An auxiliary predicate name is defined to be a feature tag [1], and
  one allowable form for a feature tag is 'u.<URI>' [2].  Thus a
  standard form of auxiliary predicate definition can be used to
  associate a URI with a feature set reference:

     (h.1234567890) :- (u.http://www.acme.com/widget-feature/modelT)


Klyne                      Work-in-progress                   [Page 6]


Internet Draft                   Identifying composite meadia features
12 February 1999


  [[[The range of URI forms allowed by [2] is restricted, and that
  restriction would apply to the above proposal.  Another approach
  would be to introduce some new syntax...

  A new form of auxiliary predicate definition is introduced,
  extending the feature expression syntax [1]:

     named-pred =/ "(" fname ")" ":-" "<" URI ">"
     URI        =  <any string conforming to the definition of
                   'URI-reference' in RFC 2396 [5]>

  An example predicate definition using this form is:

     (h.1234567890) :- <http://www.acme.com/widget-feature/modelT>

  ...]]]

3.4 The birthday problem

       NOTE:  this entire section is commentary, and does not
       affect the feature set reference specification in any
       way.

  The use of a hash value to represent an arbitrary feature set is
  based on a presumption that no two distinct feature sets will yield
  the same hash value.

  There is clearly a small but distinct possibility that two
  different feature sets will indeed yield the same hash value.

  We assume that the hash function distributes hash values for
  feature sets with even very small differences randomly and evenly
  through the range of 2^128 (approximately 10^38) possible values.
  This is a fundamental property of a good digest algorithm like MD5.
  Thus, the chance that any two distinct feature set expressions
  yield the same hash is roughly 1 in 10^38.  This is negligible when
  compared with, say, the probability that a receiving system will
  fail having received data conforming to a negotiated feature set.

  But when the number of distinct feature sets in circulation
  increases, the probability of clashing hash values increases
  surprisingly.  This is illustrated by the "birthday paradox":
  given a random collection of just 23 people, there is a greater
  than even chance that there exists some pair with the same birthay.
  This topic is discussed further in sections 7.4 and 7.5 of Bruce
  Scheier's "Applied Cryptography" [7].

     [[[TODO: Include some numbers to illustrate actual probabilities
     of clash with 10^3, 10^6, 10^9, 10^12, 10^15, 10^18 feature sets
     in circulation.]]]


Klyne                      Work-in-progress                   [Page 7]


Internet Draft                   Identifying composite meadia features
12 February 1999


  If original feature set expressions are generated manually, or only
  in response to some manually constrained process, the total number
  of feature sets in circulation is likely to remain very small in
  relation to the total number of possible hash values.

  The outcome of all this is:  assuming that the feature sets are
  manually generated, even taking account of the birthday paradox
  effect, the probability of incorrectly identifying a feature set
  using a hash value is still negligibly small when compared with
  other possible failure modes.

4. Feature set resolution and matching

  This section discusses the use of feature references in conjunction
  with feature set matching [1].

  The definitive position on matching feature sets containing feature
  set references is given by dereferencing all of the references;
  i.e. every feature set reference is replaced by the corresponding
  expression.

  Sometimes, it may be desirable to process feature sets without
  performing dereferencing.  The rules below may facilitate this
  while achieving results that are consistent with the definitive
  position.

     (& ... (h.<hash>) (h.<hash>) ... ) -->  (& ... (h.<hash>) ... )
     (| ... (h.<hash>) (h.<hash>) ... ) -->  (& ... (h.<hash>) ... )
     (& ... (h.<hash>) (! (h.<hash>) ) ... ) -->  FALSE
     (| ... (h.<hash>) (! (h.<hash>) ) ... ) -->  TRUE

  If some referenced feature set is known to be TRUE or FALSE, then
  the corresponding references may be replaced by the corresponding
  TRUE or FALSE value.

  [[[Can more be said?]]]

5. Examples

  [[[TODO]]]

6. Internationalization considerations

  Feature set expressions are currently defined to consist of only
  characters from the US-ASCII repertoire;  under these circumstances
  this specification is not impacted by internationalization
  considerations.





Klyne                      Work-in-progress                   [Page 8]


Internet Draft                   Identifying composite meadia features
12 February 1999


  But, if future revisions of the feature set syntax permit non-US-
  ASCII characters (e.g. within quoted strings), then some canonical
  representation must be defined for the purposes of calculating hash
  values.  One choice might be to use a UTF-8 equivalent
  representation as the basis for calculating the feature set hash.
  Another choice might be to leave this as an application protocol
  issue (but this could lead to non-interoperable feature sets
  between different protocols).

  Another conceivable issue is that of up-casing the feature
  expression in preparation for computing a hash value.  This does
  not apply to the content of strings so is not likely to be an
  issue.  But if changes are made that do permit non-US-ASCII
  characters in feature tags or token strings, consideration must be
  given to properly defining how case conversion is to be performed.

7. Security considerations

  <<<TBD>>>

8. Full copyright statement

  Copyright (C) The Internet Society 1999.  All Rights Reserved.

  This document and translations of it may be copied and furnished to
  others, and derivative works that comment on or otherwise explain
  it or assist in its implementation may be prepared, copied,
  published and distributed, in whole or in part, without restriction
  of any kind, provided that the above copyright notice and this
  paragraph are included on all such copies and derivative works.
  However, this document itself may not be modified in any way, such
  as by removing the copyright notice or references to the Internet
  Society or other Internet organizations, except as needed for the
  purpose of developing Internet standards in which case the
  procedures for copyrights defined in the Internet Standards process
  must be followed, or as required to translate it into languages
  other than English.

  The limited permissions granted above are perpetual and will not be
  revoked by the Internet Society or its successors or assigns.

  This document and the information contained herein is provided on
  an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET
  ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR
  IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
  THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
  WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.





Klyne                      Work-in-progress                   [Page 9]


Internet Draft                   Identifying composite meadia features
12 February 1999


9. Acknowledgements

  This proposal is developed from a suggestion by Larry Masinter.
  Some of the ideas have been honed in early discussions with Martin
  Duerst, Al Gilman, Ted Hardie and Bill Newman.

10. References

[1]  "A syntax for describing media feature sets"
     Graham Klyne, 5GM/Content Technologies
     Internet draft: <draft-ietf-conneg-feature-syntax-00.txt>"
     Work in progress, September 1998.

[2]  "Media Feature Tag Registration Procedure"
     Koen Holtman, TUE
     Andrew Mutz, Hewlett-Packard
     Ted Hardie, NASA
     Internet draft: <draft-ietf-conneg-feature-reg-03.txt>
     Work in progress, July 1998.

[3]  RFC 2234, "Augmented BNF for Syntax Specifications: ABNF"
     D. Crocker (editor), Internet Mail Consortium
     P. Overell, Demon Internet Ltd.
     November 1997.

[4]  RFC 2045, "Multipurpose Internet Mail Extensions (MIME)
     Part 1: Format of Internet message bodies"
     N. Freed, Innosoft
     N. Borenstein, First Virtual
     November 1996.

[5]  RFC 2396, "Uniform Resource Identifiers (URI): Generic Syntax",
     Tim Berners-Lee, World Wide Web Consortium/MIT
     Roy T. Fielding, University of California, Irvine
     Larry Masinter, Xerox PARC
     August 1998.

[6]  RFC 1321, "The MD5 Message-Digest Algorithm",
     R. Rivest, MIT Laboratory for Computer Science and RSA Data
     Security, Inc.,
     April 1992.

[7]  "Applied Cryptography"
     Bruce Schneier
     John Wiley and Sons, 1996 (second edition)
     ISBN 0-471-12845-7 (cloth)
     ISBN 0-471-11709-9 (paper)





Klyne                      Work-in-progress                  [Page 10]


Internet Draft                   Identifying composite meadia features
12 February 1999


11. Author's address

  Graham Klyne
  5th Generation Messaging Ltd.    Content Technologies Ltd.
  5 Watlington Street              Forum 1, Station Road
  Nettlebed                        Theale
  Henley-on-Thames, RG9 5AB        Reading, RG7 4RA
  United Kingdom                   United Kingdom.
  Telephone: +44 1491 641 641      +44 118 930 1300
  Facsimile: +44 1491 641 611      +44 118 930 1301
  E-mail: GK@ACM.ORG



Appendix A: Revision history

  00a  10-Feb-1999  Initial draft.



































Klyne                      Work-in-progress                  [Page 11]