INTERNET-DRAFT                                    Donald E. Eastlake 3rd

                                                                Motorola

Expires August 2001                                        February 2001




          Mapping Between MIME Types, Content-Types, and URIs
          ------- ------- ---- ------ -------------- --- ----
                     <draft-eastlake-cturi-02.txt>

                         Donald E. Eastlake 3rd



Status of This Document

   Distribution of this document is unlimited. Comments should be sent
   to the author.

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC 2026.  Internet-Drafts are
   working documents of the Internet Engineering Task Force (IETF), its
   areas, and its working groups.  Note that other groups may also
   distribute working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months

   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet- Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.



Abstract

   Multipurpose Internet Mail Extension (MIME) Content-Type headers, the

   MIME types used therein, and Uniform Resource Identifiers (URIs) are
   being used, in different contexts, to label entities.  A mapping is
   specified such that the union of their meaning can be expressed in
   either URI or Content-Type syntax.










D. Eastlake 3rd                                                 [Page 1]


INTERNET-DRAFT                      Mapping Between Content-Types & URIs



Table of Contents

      Status of This Document....................................1
      Abstract...................................................1

      Table of Contents..........................................2

      1. Introduction............................................3
      1.1 Introduction to URIs and MIME Type/Content-Type........3
      1.2 Definitions and Conventions............................3
      1.3 Additional Features....................................4
      1.4 Overview of Remaining Sections.........................4
      2. Simple Mapping..........................................5
      2.1 Simple Mapping of Content-Type to URI..................5
      2.1.1 The Basic Case.......................................5
      2.1.2 More Complete Rules..................................6
      2.2. Simple Mapping of URI to Content-Type, The Basic Case.6
      2.3 Content-Type Mapping Special Case for Basic Closure....7
      2.4 URI Mapping Special Case for Basic Closure.............8
      3. Controlled Mapping......................................9
      4. Troublesome Characters.................................10
      5. IANA Considerations and Potential Conflicts............10
      6. Security Considerations................................11

      Appendix..................................................12

      References................................................13

      Author's Address..........................................14
      Expiration and File Name..................................14






















D. Eastlake 3rd                                                 [Page 2]


INTERNET-DRAFT                      Mapping Between Content-Types & URIs



1. Introduction

   Both MIME types and URIs have come to be used for type labeling and
   similar information.

   In most protocols where there are provisions for a general "type
   label", the label is restricted to the syntax of a URI or the syntax
   of a Content-Type.  In some cases, it will be useful to be able to
   express labels which already exist in the "other" syntax. That is, it

   may be useful in a URI syntax slot to also be able to express a MIME
   type or Content-Type and, conversely, it may be useful in a Content-
   Type syntax slot to also be able to express a URI. This document
   specifies how.



1.1 Introduction to URIs and MIME Type/Content-Type

   The IETF Multipurpose Internet Mail Extensions (MIME) message body
   standards have developed into a general tagging and bagging
   mechanism.  This mechanism has spread from SMTP mail to USENET, HTTP,

   and other protocols. In MIME, the type of an object is given in a
   "Content-Type" header line. [RFC 2045, 2046, 2048] Such a line
   consists of a MIME type and, optionally, additional parameters.  A
   MIME type consists of a MIME top level type, a slash, and a MIME
   subtype.

   The original Uniform Resource Locator (URL [RFC 1738]), used to point

   to World Wide Web (WWW) resources, has grown into the more general
   Uniform Resource Identifier (URI [RFC 2396]).  Increasingly URIs are
   used as general labels for algorithms, XML namespaces [XML NAME], web

   based protocol data types, etc.  (In some of these label uses, URIs
   are considered opaque while in other cases they are assumed to be
   de-referencable into something which explicates their meaning.)



1.2 Definitions and Conventions

   Concerning URIs, please note the following:

       (1) In this document, the term URI is used to include URI
       Reference.  That is, it includes the case where an octothorpe
       ("#") followed by a fragment identifier is suffixed to a pure
       URI.

       (2) Only absolute URIs are mappable.  Relative URIs, with just a
       hierarchial part, are not included in URI as used in this
       document.  They must first be converted to absolute URIs as
       described in [RFC 2396].


D. Eastlake 3rd                                                 [Page 3]


INTERNET-DRAFT                      Mapping Between Content-Types & URIs



       (3) For presentation purposes, URIs are shown inside angle
       brackets ("<...>") but these angle brackets are not actually a
       part of the URI.

   Concerning Content-Types, please note the following:

       Content-Type values are shown preceded by "Content-Type: " and,
       when long, they are line folded as per [RFC 822].  This prefix
       and line folding are for presentation purposes and are not
       actually a part of the Content-Type.

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in [RFC 2119].



1.3 Additional Features

   Note that a URI or Content-Type could get converted back and forth
   multiple times between these two syntaxes. To stop this from
   resulting in ever longer and more complex tags, a check is mandated
   so that if a conversion is of a previously converted syntax, the
   previous conversion is reversed, in so far as practical.

   To improve the repeatability of the results from single or multiple
   steps of syntax conversion, capitalization and punctuation
   recommendations are made where tokens are case insensitive or
   variable punctuation is allowed.

   Finally, in cases where the default conversion does not provide for
   sufficient control, optional elements are defined for inclusion in
   URIs and Content-Types that provide substantial control over the
   mapping output.



1.4 Overview of Remaining Sections

   Sections 2 and 3 below give an explanation of the mapping specified,
   more or less in English.  The material is organized to start with the

   simplest and most common rules and then add exceptions for special
   cases and additional user control.

   Section 4 lists characters that must be URI ("%") encoded when
   mapping from a URI to a Content-Type.

   Section 5 covers IANA Considerations and potential conflicts.

   Section 6 give Security Considerations.


D. Eastlake 3rd                                                 [Page 4]


INTERNET-DRAFT                      Mapping Between Content-Types & URIs



   The Appendix presents some sample code in Perl.



2. Simple Mapping

   This section describes simple mappings such that any MIME Type or
   Content-Type can be mapped into a URI and any URI can be mapped into
   a Content-Type.  Other than checks for multiple conversions, the
   mapping is simple. It can produce only a special scheme URI for the
   mapping of a Content-Type and only a special sub-type tree in the
   "application" top level type for the mapping of a URI.  Section 3
   below describes additional features optionally allowing much greater
   control over the result of the mapping.



2.1 Simple Mapping of Content-Type to URI

   Section 2.1.1 below describes the most basic case of converting a
   simple MIME type to a URI.  Section 2.1.2 extends this to converting
   a general Content-Type to a URI.  Section 2.3 adds the check
   necessary to recognize where the MIME type being converted is of the
   form indicating it was previous converted from a URI using basic
   mapping and is being converted back to a URI.



2.1.1 The Basic Case

   For the simplest case of a Content-Type consisting of just a MIME
   type, create a URI with scheme "ContentType" and a scheme dependent
   part consisting of the MIME type.  For example

       Content-Type: image/JPEG

   simply converts to

       <ContentType:image/jpeg>

   White space is not allowed in URIs so it must be removed.  Scheme
   names (the part before the first ":" in a URI) are case insensitive
   but for readability and repeatability, the capitalization
   "ContentType" SHOULD be used.  Similarly, MIME top level types and
   subtypes (the fields before and after the "/" in a MIME type field,
   respectively) are case insensitive but SHOULD be all lower cased when

   mapped to the URI form.

   Note: There is no "//" after the "ContentType:" scheme as used
       herein.  Such a "//" would imply a specific structuring of the


D. Eastlake 3rd                                                 [Page 5]


INTERNET-DRAFT                      Mapping Between Content-Types & URIs



       scheme dependent part appearing in the URI after the
       "ContentType:" as defined in [RFC 2396].  Since that full
       structuring is not used, "//" is not used.  The meaning of URIs
       starting with "ContentType://" is reserved for future definition.

   Note: "Content-Type", with hyphen, is syntactically allowed as a
       scheme name.  However, [RFC 2717] reserves embedded hyphens in
       scheme names to indicate the prefix of an alternate tree of
       scheme names. Therefore, the un-hyphenated ContentType is used.



2.1.2 More Complete Rules

   A Content-Type header frequently includes more than just the
   mandatory MIME type.  It can also have type dependent parameters,
   including private parameters, such as

       Content-Type: text/plain; charset="us-ascii";
           x-mac-type="54455854"; x-mac-creator="4D4F5353"

       Content-Type: image/tiff; application=faxbw

   Content-Type parameters are mapped into a "query portion" suffix of
   the URI in much the same way that HTML form fields [HTML] are.  That
   is, they are concatenated to the MIME type after a "?" and, if there
   is more than one parameter, separated by "&". Thus the above
   Content-Types would be mapped into the following URIs:

       <ContentType:text/plain?charset="us-ascii"&x-mac-type="54455854"&

           x-mac-creator="4D4F5353">

       <ContentType:image/tiff?application="faxbw">

   Parameter values in the mapped URI MUST always be enclosed in double
   quotes ('"').  If the Content-Type has a trailing ";" but no
   parameters, then "?" SHOULD NOT be added to the URI.



2.2. Simple Mapping of URI to Content-Type, The Basic Case

   This section describes the basic case of mapping a URI to a Content-
   Type.  Section 2.4 adds the check to see if the URI appears to be the

   result of a previous conversion from a Content-Type and if so undoes
   that conversion in so far as practical.

   In the basic case, a URI maps to a Content-Type with a top level MIME

   type of "application" a MIME sub-type in the "uri." tree.  In
   addition, any "query" parameters in the URI are mapped to Content-


D. Eastlake 3rd                                                 [Page 6]


INTERNET-DRAFT                      Mapping Between Content-Types & URIs



   Type parameters and if the URI ends with a fragment identifier, it is

   mapped to the special Content-Type parameter "URI-Fragment". Any
   special characters in the URI that might be troublesome (see section
   4) are encoded by replacing them with a "%" followed by two hex
   digits for the character code.

   Note: Current URI syntax permits scheme dependent parts in which "?"
       does not indicate a query section; however, no such syntaxes have

       been publicly defined.

   Some examples of the basic case follow:

       <http://example.com/tag42>

       <mailto:U@example.net?subject="misc"&body="line1%0D%0Aline2">

       <xyz://abc.test/def?h=ijk#lmn>

   convert to

       Content-Type: application/uri.http%3A%2F%2Fexample.com%2Ftag42

       Content-Type: application/uri.mailto%3Aexample.net;
           subject="misc"; body="line1%250D%250Aline2"

       Content-Type: application/uri.xyz%3A%2F%2Fabc.text%2Fdef;
           h="ijk"; URI-Fragment="lmn"

   Content-Type parameters values extracted from the query portion of a
   URI MUST be surrounded with double quotes ('"').  When URI encoding,
   if the hex value has any letters (a-f) in it, they SHOULD be upper
   cased.



2.3 Content-Type Mapping Special Case for Basic Closure

   A URI may have been converted to a Content-Type and get converted
   back.  To stop this from resulting in an ever more complex syntax, a
   check MUST be made to see if the MIME subtype of a Content-Type being

   converted is in the "uri." subtype tree (see section 2.2 above).  If
   so, the URI is computed from the subtype by stripping the "uri."
   prefix and performing one level of undoing URI encoding.  (Note: The
   top level MIME type is ignored in this case.)  In addition, Content-
   Type parameters, if any, are added as a "query portion" and a "URI-
   Fragment" parameter is added as a fragment.

   For example:

       Content-Type: application/uri.mailto%3Auser%40host.example


D. Eastlake 3rd                                                 [Page 7]


INTERNET-DRAFT                      Mapping Between Content-Types & URIs



       Content-Type: application/uri.http%3A%2F%2Fx.test; foo="123",
           bar="abcd"

       Content-Type:
           application/uri.http%3A%2F%2Fa%3Ab%40c.text%2Fx%2Fy;
           URI-Fragment="z"

   convert to

       <mailto:user@host.example>

       <http://x.test?foo="123"&bar="abcd">

       <http://a:b@c.text/x/y#z>

   Note: If a Content-Type or MIME Type is being written by a user and
       they know that there is a URI which is a more natural expression
       of the labeling desired, they can simply use an ".../uri." MIME
       Type to start with.



2.4 URI Mapping Special Case for Basic Closure

   It is desirable that an arbitrary Content-Type be recovered
   semantically intact when mapped to a URI and then that URI is mapped
   back to a Content-Type.  To achieve this, the following special case
   is added to the simple case described in section 2.2 above.

   If the URI scheme is "ContentType:", then the Content-Type is
   computed from the remaining part of the URI (the scheme specific
   part), by replacing the first question mark ("?") and all query
   section ampersands ("&") with semi-colon space ("; "), and then
   undoing one level of URI encoding, i.e., replacing percent sign ("%")

   followed by two hex digits with the character having that hex value.

   For example

       <ContentType:model/vnd.example.longish.subtype.name>

       <ContentType:text/plain?charset="US-ASCII"&x-obscure="value">

   map to

       Content-Type: model/vnd.example.longish.subtype.name

       Content-Type: text/plain; charset="US-ASCII"; x-obscure="value"

   Note: A URI produced by simple mapping from a normal Content-Type
       will never have a fragment suffix.


D. Eastlake 3rd                                                 [Page 8]


INTERNET-DRAFT                      Mapping Between Content-Types & URIs



   Note: If a type label URI is being written by a user and they know
       that there is a Content-Type which is a more natural expression
       of the labeling desired, they can simply use a "ContentType:"
       scheme to start with.



3. Controlled Mapping

   [Is this controlled mapping stuff below too complex?  Would it be
   better to just have section 2 above and drop controlled conversion?]

   As an additional feature, there may be cases where a URI is designed
   knowing that it might be converted to a Content-Type and it is
   desired to control the MIME type so that it would have a more
   appropriate top level than "application" and/or a more appropriate
   subtype than one in the "uri." tree. To accomplish this, a special
   URI query part parameter "MIME-Type" is defined. If a URI is not of
   scheme ContentType and this special parameter is found, then the MIME

   type is set to the parameter value and the URI body (all of the URI
   except "query" parameters and any fragment identifier) is preserved
   in a "URI-body" Content-Type parameter.

   Similarly, there may be cases where a Content-Type is designed
   knowing that it might be converted to a URI and it is desired to
   control the URI scheme and non-query scheme dependent parts so that
   it is not of scheme "ContentType:" or does not have the scheme
   dependent part calculated as indicated in section 2.1. To accomplish
   this, a special Content-Type parameter "URI-body" is defined.  If a
   Content-Type does not have a MIME subtype in the "uri." tree and this

   parameter is present, it controls the non-query portion of the URI
   mapped to and the original MIME type is preserved in a URI query
   parameter called "MIME-Type".

   For example

       Content-Type: application/xml; URI-Body="http://xml.example"

   would map to

       <http://xml.example?MIME-Type="application/xml">

   and

       <mailto:joe@blow.test?MIME-Type="message/rfc822"#123>

   would map to

       Content-Type: message/rfc822; URI-Body="mailto:joe@blow.text";
           URI-Fragment="123"


D. Eastlake 3rd                                                 [Page 9]


INTERNET-DRAFT                      Mapping Between Content-Types & URIs



4. Troublesome Characters

   Troublesome characters are defined as those not permitted in a token
   in [RFC 2045] with the addition of percent sign and the deletion of
   double quote.  That is, any character code from 0 through 32
   inclusive or character code 127 or any of "(", ")", "<", ">", "@",
   ",", ";", ":", "\", "/", "[", "]", "?", "%", or "=" are troublesome
   characters.



5. IANA Considerations and Potential Conflicts

   This document allocates and specifies the following:

   (1) The "ContentType" URI scheme.

   (2) The "uri." MIME subtype tree.  Since this subtree is totally
       delegated to the URI specification, there are no independent
       publication or review requirements for it.  Any valid URI can be
       used after the "uri." in any MIME top level type, after
       troublesome characters (see section 4) in the URI are % escaped.

   (3) In the context of automatic URI to Content-Type type conversion,
       a meaning is specified for the "MIME-Type" URI query section
       parameter.

   (4) In the context of automatic Content-Type to URI conversion, a
       meaning is specified for the "URI-Body" and "URI-Fragment"
       Content-Type parameters.

   Because this document specifies the "ContentType" URI scheme and the
   "uri." MIME subtype tree, no conflict can arise due to other uses of
   them.

   However, there has been no precedent for the specification of
   Content-Type parameters valid across all MIME types, such as URI-Body

   and URI-Fragment, and in fact [RFC 2046] denies their possibility.
   Nor has there been any precedent for the specification of a universal

   URI query parameter such as MIME-Type.  The probability that any
   different use is currently being made or will in the future have to
   be made of these names is low enough that it can be ignored.  It is
   possible that some processing systems are sensitive to the presence
   of parameters they do not understand and will indicate errors when
   presented with controlled mapping URIs or Content-Types.  However,
   Content-Type parameters and URI query parameters are usually handled
   on receipt by such mechanisms as storing the name-value pair in an
   associative array or as "environment variables" and ignoring extra
   parameters.  In fact, Content-Type processors are required by [RFC
   2046] to ignore any parameters they do not understand and to ignore


D. Eastlake 3rd                                                [Page 10]


INTERNET-DRAFT                      Mapping Between Content-Types & URIs



   parameter order.



6. Security Considerations

   In some sense, the security considerations for MIME and content types

   [RFC 2046], URIs [RFC 2396], and for every individual MIME type and
   URI scheme can apply.

   In addition, the deployment of mapping aware software may enable the
   introduction into or transmission through MIME or Content-Type
   contexts of URI semantics, including possibly dangerous action
   schemes such as "mailto", and the introduction into or transmission
   through URI contexts of MIME and content type semantics, including
   possibly dangerous executable data types or the like.

   Finally, implementation of controlled mapping may enable a malicious
   user, by adding one of the special parameters specified herein, to
   cause a surprising change in the semantics of a URI or Content-Type
   produced by the mapping from an apparently innocuous Content-Type or
   URI.






























D. Eastlake 3rd                                                [Page 11]


INTERNET-DRAFT                      Mapping Between Content-Types & URIs



Appendix

   The following Perl code implements most of the mapping given in
   Section 2 above:

   (will be in the next revision)














































D. Eastlake 3rd                                                [Page 12]


INTERNET-DRAFT                      Mapping Between Content-Types & URIs



References

   [HTML] - Dave Raggett, Arnaud Le Hors, Ian Jacobs, "HTML 4.01
   Specification", <http://www.w3.org/TR/html4>, December 1999.

   [RFC 822] - D. Crocker, "Standard for the format of ARPA Internet
   text messages", Aug-13-1982.

   [RFC 1738] - T. Berners-Lee, L. Masinter, M.McCahill, "Uniform
   Resource Locators (URL)", December 1994.

   [RFC 2045] - N. Freed & N. Borenstein, "Multipurpose Internet Mail
   Extensions (MIME) Part One: Format of Internet Message Bodies",
   November 1996.

   [RFC 2046] - N. Freed & N. Borenstein, "Multipurpose Internet Mail
   Extensions (MIME) Part Two: Media Types", November 1996.

   [RFC 2048] - N. Freed, J. Klensin & J. Postel, "Multipurpose Internet

   Mail Extensions (MIME) Part Four: Registration Procedures", November
   1996.

   [RFC 2119] - S. Bradner, "Key words for use in RFCs to Indicate
   Requirement Levels", March 1997.

   [RFC 2396] - T. Berners-Lee, R. Fielding, L. Masinter, "Uniform
   Resource Identifiers (URI): Generic Syntax", August 1998.

   [RFC 2717] - R. Petke, I. King, "Registration Procedures for URL
   Scheme Names", November 1999.

   [RFC 2718] - L. Masinter, H. Alvestrand, D.  Zigmond, R. Petke,
   "Guidelines for new URL Schemes", November 1999.

   [XML NAME] - Tim Bray, Dave Hollander, Andrew Layman, "Namespaces in
   XML", <http://www.w3.org/TR/REC-xml-names>, 14 January 1999.
















D. Eastlake 3rd                                                [Page 13]


INTERNET-DRAFT                      Mapping Between Content-Types & URIs



Author's Address

   Donald E. Eastlake 3rd
   Motorola
   155 Beaver Street
   Milford, MA 01757 USA

   Telephone:   +1 508-261-5434 (w)
                +1 508-634-2066 (h)
   FAX:         +1 508-261-4447 (w)
   EMail:       Donald.Eastlake@motorola.com




Expiration and File Name

   This draft expires August 2001.

   Its file name is draft-eastlake-cturi-02.txt.
































D. Eastlake 3rd                                                [Page 14]