INTERNET-DRAFT                                    Donald E. Eastlake 3rd
                                                                Motorola
Expires March 2001                                        September 2000



                 Mapping Between Content-Types and URIs
                 ------- ------- ------- ----- --- ----
                     <draft-eastlake-cturi-00.txt>

                         Donald E. Eastlake 3rd



Status of This Document

   Distribution of this document is unlimited. Comments should be sent
   to the author.

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC 2026.  Internet-Drafts are
   working documents of the Internet Engineering Task Force (IETF), its
   areas, and its working groups.  Note that other groups may also
   distribute working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet- Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.



Abstract

   Multipurpose Internet Mail Extension (MIME) Content-Type headers and
   Uniform Resource Identifiers (URIs) are both used, in different
   contexts, to label entities.  A mapping is specified such that the
   union of their meaning can be expressed in either syntax.











D. Eastlake 3rd                                                 [Page 1]


INTERNET-DRAFT                      Mapping Between Content-Types & URIs


Table of Contents

      Status of This Document....................................1
      Abstract...................................................1

      Table of Contents..........................................2

      1. Introduction............................................3
      1.1 Definitions and Conventions............................4
      1.2 Overview of Remaining Sections.........................4
      2. Simple Mapping..........................................5
      2.1 Simple Mapping of Content-Type to URI..................5
      2.1.1 The Basic Case.......................................5
      2.1.2 More Complete Rules..................................6
      2.2. Simple Mapping of URI to Content-Type, The Basic Case.6
      2.3 Content-Type Mapping Special Case for Basic Closure....7
      2.4 URI Mapping Special Case for Basic Closure.............8
      3. Controlled Mapping......................................9
      4. Troublesome Characters.................................10
      5. IANA Considerations and Potential Conflicts............10
      6. Security Considerations................................11

      References................................................12

      Author's Address..........................................13
      Expiration and File Name..................................13


























D. Eastlake 3rd                                                 [Page 2]


INTERNET-DRAFT                      Mapping Between Content-Types & URIs


1. Introduction

   Both MIME types and URIs have come to be used for type labeling and
   similar information.

   The IETF Multipurpose Internet Mail Extensions (MIME) message body
   standards have developed into a general tagging and bagging
   mechanism.  This mechanism has spread from SMTP mail to USENET, HTTP,
   and other protocols. In MIME, the type of an object is given in a
   "Content-Type" header line. [RFC 2045, 2046, 2048] Such a line
   consists of a MIME type and optionally additional parameters.  A MIME
   type consists of a MIME top level type, a slash, and a MIME subtype.

   The original Uniform Resource Locator (URL [RFC 1738]), used to point
   to World Wide Web (WWW) resources, has meanwhile grown in the more
   general Uniform Resource Identifier (URI [RFC 2396]).  Increasingly
   URIs are used as general labels for algorithms, XML namespaces, web
   based protocol data types, etc.  (In some of these label uses, URIs
   are considered opaque while in other cases they are assumed to
   reference something which explicates their meaning.)

   In most protocol syntax cases where there are provisions for a "type
   label", the label is restricted to the syntax of a URI or the syntax
   of a Content-Type.  In many such cases, it will sometimes be useful
   to be able to express labels of the "other" syntax. That is, it may
   be useful in a URI syntax slot to also be able to express a MIME type
   or Content-Type and, conversely, it may be useful in a Content-Type
   syntax slot to also be able to express a URI. This document specifies
   how.

   Note that a URI or Content-Type could get converted back and forth
   multiple times between these two syntaxes. To stop this from
   resulting in ever longer and more complex tags, a check is specified
   so that if a coversion is of a previously converted syntax, the
   prevous conversion is reversed, in so far as practical.

   To improve the repeatability of the results from single or multiple
   steps of syntax conversion, capitalization and puctuation
   recommendations are made where tokens are case insensitive or
   variable punctuation is allowed.

   Finally, in cases where the default conversion does not provide for
   sufficient control, optional elements are defined for inclusion in
   URIs and Content-Types that provide substantial control conver the
   mapping output.







D. Eastlake 3rd                                                 [Page 3]


INTERNET-DRAFT                      Mapping Between Content-Types & URIs


1.1 Definitions and Conventions

   Concerning URIs, please note the following:

       (1) In this document, the term URI is used to include URI
       Reference.  That is, it includes the case where an octothorpe
       ("#") followed by a fragment identifier is suffixed to a pure
       URI.

       (2) Only absolute URIs are mappable.  Relative URIs, with just a
       hierarchial part, are not included in URI as used in this
       document.  They must first be converted to absolute URIs as
       described in [RFC 2396].

       (3) For presentation purposes, URIs are shown inside angle
       brackets ("<...>") but these angle brackets are not actually a
       part of the URI.

   Concerning Content-Types, please note the following:

       Content-Type values are shown preceeded by "Content-Type: " and,
       when long, they are ling folded as per [RFC 822].  This prefix
       and line folder are for presentation and are not actually a part
       of the Content-Type.

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in [RFC 2119].



1.2 Overview of Remaining Sections

   Sections 2 and 3 below give an explanation of the mapping sepcified
   more or less in Engligh.  The material is organized to start with the
   simplest and most common rules and then add exceptions for special
   cases and additional user control.

   Section 4 lists characters that must be URI ("%") encoded when
   mapping from a URI to a Content-Type.

   Section 5 covers IANA Considerations and potential conflicts.

   Section 6 give Security Considerations.








D. Eastlake 3rd                                                 [Page 4]


INTERNET-DRAFT                      Mapping Between Content-Types & URIs


2. Simple Mapping

   This section describes simple mappings such that any MIME Type or
   Content-Type can be mapped into a URI and any URI can be mapped into
   a Content-Type.  Other than checks for mutiple conversions, the
   mapping is simple. It can produce only a special scheme URI for the
   mapping of a Content-Type and only a special sub-type tree in the
   "application" top level type for the mapping of a URI.  Section 3
   below describes additional features optionally allowing much greater
   control over the result of the mapping.



2.1 Simple Mapping of Content-Type to URI

   Section 2.1.1 below describes the most basic case of converting a
   simple MIME type to a URI.  Section 2.1.2 extends this to converting
   a general Content-Type to a URI.  Section 2.3 adds the check
   necessary to recognize where the MIME type being coverted is of the
   form indicating it was previous converted from a URI using basic
   mapping and is being converted back.



2.1.1 The Basic Case

   In the simplest case of a Content-Type consisting of just a MIME
   type, create a URI with scheme "ContentType" and a scheme dependent
   part consisting of the MIME type.  For example

       Content-Type: image/JPEG

   simply converts to

       <ContentType:image/jpeg>

   White space is not allowed in URIs so it must be removed.  Scheme
   names (the part before the first ":" in a URI) are case insensitive
   but for readability and repeatability, the capitalization
   "ContentType" SHOULD be used.  Similarly, MIME top level types and
   subtypes (the fields before and after the "/" in a MIME type field,
   respectively) are case insensitive but SHOULD be all lower cased when
   mapped to the URI form.

   Note: There is no "//" after the "ContentType:" scheme as used
       herein.  Such a "//" would imply a specific structuring of the
       scheme dependent part appearing in the URI after the
       "ContentType:" as defined in [RFC 2396].  Since that full
       structuring is not used, "//" is not used.  The meaning of URIs
       starting with "ContentType://" is reserved for future definition.


D. Eastlake 3rd                                                 [Page 5]


INTERNET-DRAFT                      Mapping Between Content-Types & URIs


   Note: "Content-Type", with hyphen, is syntactically allowed as a
       scheme name.  However, [RFC 2717] reserves embedded hyphens in
       scheme names to indicate the prefix of an alternate tree of
       scheme names, so ContentType is used.



2.1.2 More Complete Rules

   A Content-Type header frequently includes more than just the
   mandatory MIME type.  It can also have type dependent parameters,
   including private parameters, such as

       Content-Type: text/plain; charset="us-ascii",
           x-mac-type="54455854", x-mac-creator="4D4F5353"

       Content-Type: image/tiff; application=faxbw

   Content-Type parameters are mapped into a "query portion" suffix of
   the URI in much the same way that HTML form fields [HTML] are.  That
   is, they are concatenated to the MIME type after a "?" and, if there
   is more than one parameter, separated by "&". Thus the above
   Content-Types would be mapped into the following:

       <ContentType:text/plain?charset="us-ascii"&x-mac-type="54455854"&
           x-mac-creator="4D4F5353">

       <ContentType:image/tiff?application="faxbw">

   Parameter values in the mapped URI MUST always enclosed in double
   quotes ('"').  If the Content-Type has a trailing ";" but no
   parameters, then "?" SHOULD NOT be added to the URI.



2.2. Simple Mapping of URI to Content-Type, The Basic Case

   This section describes the basic case of mapping a URI to a Content-
   Type.  Section 2.4 adds the check to see if the URI appears to be the
   result of a previous converion from a Content-Type and if so undoes
   that conversion in so far as practical.

   In the basic case, a URI maps to a Content-Type with a top level MIME
   type of "application" a MIME sub-type in the "uri." tree.  In
   addition, any "query" parameters in the URI are mapped to Content-
   Type parameters and if the URI ends with a fragment identifier, it is
   mapped to the special Content-Type parameter "URI-Fragment". Any
   special characters in the URI that might be troublesome (see section
   4) are encoded by replacing them with a "%" followed by two hex
   digits for the character code.


D. Eastlake 3rd                                                 [Page 6]


INTERNET-DRAFT                      Mapping Between Content-Types & URIs


   Note: Current URI syntax permits scheme dependent parts in which "?"
       does not indicate a query section however no such syntaxes have
       been publicly defined.

   Some examples of the basic case follow:

       <http://example.com/tag42>

       <mailto:U@example.net?subject="misc"&body="line1%0D%0Aline2">

       <xyz://abc.test/def?h=ijk#lmn>

   convert to

       Content-Type: application/uri.http%3A%2F%2Fexample.com%2Ftag42

       Content-Type: application/uri.mailto%3Aexample.net;
           subject="misc", body="line1%250D%250Aline2"

       Content-Type: application/uri.xyz%3A%2F%2Fabc.text%2Fdef;
           h="ijk", URI-Fragment="lmn"

   Content-Type parameters values extracted from the query portion of a
   URI MUST be surrounded with double quotes ('"').  When URI encoding,
   if the hex value has any letters (a-f) in it, they SHOULD be upper
   cased.

   [Is splitting off the Fragment worth it?  The "#" and frament
   identifier could just be included in the constructed "uri." subtype.
   In fact the query stuff could also, eliminating the need for
   Content-Type parameters...  but I don't think query parameters sr
   fragment identifiers in a URI constitute the same sort of type
   information in most cases and would be more accesible to most
   software as Content-Type parameters.]



2.3 Content-Type Mapping Special Case for Basic Closure

   A URI may have been converted back to a Content-Type and get
   converted back.  To stop this from resulting in an ever more complex
   syntax, a check MUST be made to see if the MIME subtype of a
   Content-Type being converted is in the "uri." subtype tree (see
   section 2.2 above).  If so, the URI is computed from the subtype by
   stripping the "uri." prefix and performing one level of undoing URI
   encoding.  (Note: The top level MIME type is ignored in this case.)
   In addition, Content-Type parameters, if any, are added as a "query
   portion" and a "URI-Fragment" parameter is added as a fragment.




D. Eastlake 3rd                                                 [Page 7]


INTERNET-DRAFT                      Mapping Between Content-Types & URIs


   For example:

       Content-Type: application/uri.mailto%3Auser%40host.example

       Content-Type: application/uri.http%3A%2F%2Fx.test; foo="123",
   bar="abcd"

       Content-Type:
           application/uri.http%3A%2F%2Fa%3Ab%40c.text%2Fx%2Fy;
           URI-Fragment="z"

   convert to

       <mailto:user@host.example>

       <http://x.test?foo="123"&bar="abcd">

       <http://a:b@c.text/x/y#z>

   Note: If a Content-Type or MIME Type is being written by a user and
       they know that there is a URI which is a more natural expression
       of the labeling desired, they can simply use an
       "application/uri." MIME Type to start with.



2.4 URI Mapping Special Case for Basic Closure

   It is desireable that an arbitrary Content-Type be recovered
   semanticly intact when mapped to a URI and then that URI is mapped
   back to a Content-Type.  To achieve this, the following special case
   is added to the simple case described in section 2.2 above.

   If the URI scheme is "ContentType:", then the Content-Type is
   computed from the remaining part of the URI (the "scheme specific
   part"), by replacing the first question mark ("?") and all query
   section ampersands ("&") with semi-colon space ("; "), and then
   undoing one level of URI encoding, i.e., replacing percent sign ("%")
   followed by two hex digits with the character having that hex value.

   For example

       <ContentType:model/vnd.example.longish.subtype.name>

       <ContentType:text/plain?charset="US-ASCII"&x-obscure="value">

   map to

       Content-Type: model/vnd.example.longish.subtype.name



D. Eastlake 3rd                                                 [Page 8]


INTERNET-DRAFT                      Mapping Between Content-Types & URIs


       Content-Type: text/plain; charset="US-ASCII", x-obscure="value"

   Note: A URI produced by simple mapping from a normal Content-Type
       will never have a fragment suffix.

   Note: If a URI is being written by a user and they know that there is
       a Content-Type which is a more natural expression of the labeling
       desired, they can simply use a "ContentType:" scheme to start
       with.



3. Controlled Mapping

   [Is this controlled mapping stuff below too complex?  Would it be
   better to just have sections 2 and 3 above and drop controlled
   conversion?]

   As an additional feature, there may be cases where a URI is designed
   knowing that it might be converted to a Content-Type and it is
   desired to control the MIME type so that it would have a more
   appropriate top level than "application" or a more appropriate
   subtype than one in the "uri." tree. To accomplish this, a special
   URI query part parameter "MIME-Type" is defined. If a URI is not of
   scheme ContentType and this special parameter is found, then the MIME
   type is set to the parameter value and the URI body (all of the URI
   except "query" parameters and any fragment identifier) is preseved in
   a "URI-body" Content-Type parameter.

   Similarly, there may be cases where a Content-Type is designed
   knowing that it might be converted to a URI and it is desired to
   control the URI scheme and non-query scheme dependent parts so that
   it is not necessary to have a scheme of "ContentType:" or scheme
   dependent part calculated as indicated in section 2.1. To accomplish
   this, a special Content-Type parameter "URI-body" is defined.  If a
   Content-Type does not have a MIME subtype in the "uri." tree and this
   parameter is present, it controls the non-query portion of the URI
   mapped to and the original MIME type is preserved in a URI query
   parameter called "MIME-Type".

   For example

       Content-Type: application/xml; URI-Body="http://xml.example"

   would map to

       <http://xml.example?MIME-Type="application/xml">

   and



D. Eastlake 3rd                                                 [Page 9]


INTERNET-DRAFT                      Mapping Between Content-Types & URIs


       <mailto:joe@blow.test?MIME-Type="message/rfc822"#123>

   would map to

       Content-Type: message/rfc822; URI-Body="mailto:joe@blow.text",
           URI-Fragment="123"



4. Troublesome Characters

   Troublesome characters are defined as those not permitted in a token
   in [RFC 2045] except double quote but including percent sign.  That
   is, any character code from 0 through 32 inclusive or charcter code
   127 or any of "(", ")", "<", ">", "@", ",", ";", ":", "\", "/", "[",
   "]", "?", "%", or "=" are troublesome characters.



5. IANA Considerations and Potential Conflicts

   This document allocates and specifies the following:

   (1) The "ContentType" URI scheme.

   (2) The "uri." MIME subtype tree.  Since this subtree is totally
       delegated to the URI specification, there are no independent
       publication or review requirements for it.  Any valid URI can be
       used after the "uri." in any MIME top level type, after
       troublesome characters (see section 4) in the URI are % escaped.

   (3) In the context of automatic URI to Content-Type type conversion,
       a meaning is specified for the "MIME-Type" URI query section
       parameter.

   (4) In the context of automatic Content-Type to URI conversion, a
       meaning is specified for the "URI-Body" and "URI-Fragment"
       Content-Type parameters.

   Because this document authoritatively specifies the "ContentType" URI
   scheme and the "uri." MIME subtype tree, no conflict can arise due to
   other uses of them.

   However, there is no precident for the specifiction of Content-Type
   parameters valid across all MIME types, such as URI-Body and URI-
   Fragment, and in fact [RFC 2046] denies their possibility.  Nor is
   there any precident for the specification of a universal URI query
   parameter such as MIME-Type.  The probability that any different use
   is currently being made or will in the future have to be made of
   these seems low enough that it can be ignored.  It is possible that


D. Eastlake 3rd                                                [Page 10]


INTERNET-DRAFT                      Mapping Between Content-Types & URIs


   some processing systems are sensitive to the presence of parameters
   they do not understand and will indicate errors when presented with
   controlled mapping URIs or Content-Types.  However, Content-Type
   parameters and URI query parameters are usually handled on receipt by
   such mechanisms as storing the name-value pair in an associative
   array or as "environment variables" and ignorning extra parameters.
   In fact, Content-Type processors are required by [RFC 2046] to ignore
   any parameters they do not understand and to ignore parameter order.



6. Security Considerations

   In some sense, the security considerations for MIME and content types
   [RFC 2046], URIs [RFC 2396], and for every individual MIME type and
   URI scheme can apply.  In addition, the deployment of mapping aware
   software may enable the introduction into or transmission through
   MIME or content type contexts of URI semantics, including possibly
   dangerous action schemes such as "mailto", and the introduction into
   or tramismission through URI contexts of MIME and content type
   semantics, including possibly dangerous exeuctable data types or the
   like.  Finally, implementation of controlled mapping may enable a
   malicious user, by adding one of the special parameters specified
   herein, to cause a surprising change in the semantics of a URI or
   Content-Type produced by the mapping from an apparently innocuous
   Content-Type or URI.


























D. Eastlake 3rd                                                [Page 11]


INTERNET-DRAFT                      Mapping Between Content-Types & URIs


References

   [HTML] - Dave Raggett, Arnaud Le Hors, Ian Jacobs, "HTML 4.01
   Specifcation", <http://www.w3.org/TR/html4>, December 1999.

   [RFC 822] - D. Crocker, "Standard for the format of ARPA Internet
   text messages", Aug-13-1982.

   [RFC 1738] - T. Berners-Lee, L. Masinter, M.McCahill, "Uniform
   Resource Locators (URL)", December 1994.

   [RFC 2045] - N. Freed & N. Borenstein, "Multipurpose Internet Mail
   Extensions (MIME) Part One: Format of Internet Message Bodies",
   November 1996.

   [RFC 2046] - N. Freed & N. Borenstein, "Multipurpose Internet Mail
   Extensions (MIME) Part Two: Media Types", November 1996.

   [RFC 2048] - N. Freed, J. Klensin & J. Postel, "Multipurpose Internet
   Mail Extensions (MIME) Part Four: Registration Procedures", November
   1996.

   [RFC 2119] - S. Bradner, "Key words for use in RFCs to Indicate
   Requirement Levels", March 1997.

   [RFC 2396] - T. Berners-Lee, R. Fielding, L. Masinter, "Uniform
   Resource Identifiers (URI): Generic Syntax", August 1998.

   [RFC 2717] - R. Petke, I. King, "Registration Procedures for URL
   Scheme Names", November 1999.

   [RFC 2718] - L. Masinter, H. Alvestrand, D.  Zigmond, R. Petke,
   "Guidelines for new URL Schemes", November 1999.



















D. Eastlake 3rd                                                [Page 12]


INTERNET-DRAFT                      Mapping Between Content-Types & URIs


Author's Address

   Donald E. Eastlake 3rd
   Motorola
   140 Forest Avenue
   Hudson, MA 01749 USA

   Telephone:   +1 508-261-5434 (w)
                +1 978-562-2827 (h)
   FAX:         +1 508-261-4777 (w)
   EMail:       Donald.Eastlake@motorola.com




Expiration and File Name

   This draft expires March 2001.

   Its file name is draft-eastlake-cturi-00.txt.
































D. Eastlake 3rd                                                [Page 13]