draft-ietf-mhtml-info-09

Network Working Group                                       Jacob Palme
Internet Draft                                 Stockholm University/KTH
draft-ietf-mhtml-info-09.txt
Category-to-be: Informational
Expires: August 1998                                      February 1998



Sending HTML in MIME, an informational supplement to the RFC:
MIME Encapsulation of Aggregate Documents, such as HTML (MHTML)


Status of this Memo


This document is an Internet-Draft.  Internet-Drafts are working
documents of the Internet Engineering Task Force (IETF), its
areas, and its working groups.  Note that other groups may also
distribute working documents as Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other
documents at any time.  It is inappropriate to use Internet-
Drafts as reference material or to cite them other than as
``work in progress.''

To learn the current status of any Internet-Draft, please check
the ``1id-abstracts.txt'' listing contained in the Internet-
Drafts Shadow Directories on ftp.is.co.za (Africa),
nic.nordu.net (Europe), munnari.oz.au (Pacific Rim),
ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast).

This memo provides information for the Internet community. This' memo
does not specify an Internet standard of any kind. Distribution of this
memo is unlimited.

Copyright (C) The Internet Society 1998. All Rights Reserved.


1.    Abstract

The memo "MIME Encapsulation of Aggregate Documents, such as HTML
(MHTML)" (draft-ietf-mhtml-rev-05.txt) specifies how to send packaged
aggregate HTML objects in MIME format. This memo is an accompanying
informational document, intended to be an aid to developers. This
document is not an Internet standard.

Issues discussed are implementation methods, caching strategies,
problems with rewriting of URIs, making messages suitable both for
mailers which can and which cannot handle Multipart/related and handling
recipients which do not have full Internet connectivity.


2.    Table of Contents

1. Abstract
2. Table of Contents
3. Introduction
4. Implementation methods
4.1 Method 1: Combining viewer and MIME receiving program
4.2 Method 2: Rewriting the HTML
4.3 Method 3: Using a translation table
4.4 Method 4: Using a proxy HTTP server to retrieve referenced body
parts
4.5 Method 5: Putting the mail client into a proxy HTTP server
4.6 Other methods
4.7 Combined methods
4.8 Communication between document viewer and mail client
5. Problems with rewriting URIs when copying HTML documents
6. Caching of body parts
7. Recipients which cannot handle the Multipart/related Content-Type
8. Use of the Content-Type: Multipart/alternative
8.1 Multipart/alternative inside Multipart/related
8.2 Multipart/alternative outside Multipart/related
8.3 Comparing the two methods
9. Recipient may not have full Internet connectivity
10. Encoding of non-ascii characters
11. Conversion from HTTP to MIME
12. Acknowledgments
13. References
14. Author's Address


Mailing List Information

Further discussion on this document should be done through the mailing
list MHTML@SEGATE.SUNET.SE.

To subscribe to this list, send a message to
   LISTSERV@SEGATE.SUNET.SE
which contains the text
SUB MHTML <your name (not your email address)>

Archives of this list are available by anonymous ftp from
   FTP://SEGATE.SUNET.SE/lists/mHTML/
The archives are also available by email. Send a message to
LISTSERV@SEGATE.SUNET.SE with the text "INDEX MHTML" to get a list of
the archive files, and then a new message "GET <file name>" to retrieve
the archive files.

Comments on less important details may also be sent to the editor, Jacob
Palme <jpalme@dsv.su.se>.

More information may also be available at URL:
HTTP://www.dsv.su.se/~jpalme/ietf/mhtml.html


3.    Introduction

[MHTML] specifies how to send packaged aggregate HTML objects in MIME
multipart format. This memo is an accompanying informational document,
intended to be an aid to developers. This document is not an Internet
standard.


4.    Implementation methods

The [MHTML] standard has been intentionally written to be implementable
both in cases where a HTML document viewer (web browser) and a program
receiving MIME objects, such as an email program, are combined, and when
they are separate programs. Implementation is of course easier if the
document viewer is combined with the MIME receiving client.

Below is described different implementation methods. Real
implementations may sometimes combine ideas from more than one of the
different methods described below.

Note: Some document viewers can take a whole document of "Content-Type:
message" or "Content-Type: multipart" as one single file to be
displayed. When such viewers are known to be used, the problems
described below become much easier to handle, just submit the whole
combined MIME message as a single file to the viewer.

4.1   Method 1: Combining viewer and MIME receiving program

This is the architecturally simplest approach. A web-browser with a
built in MIME receiving program (such as an email program) will be able
to use its own document viewer capabilities to display HTML-formatted
messages. Since it is the same program, that program will more easily be
able to connect a URL in the HTML text to a body part in the message.

4.2   Method 2: Rewriting the HTML

    +----------+                          +--------+
    | Document |                          | Mail   |
    | viewer   |                          | client |
    +-------+--+                          +-+------+
            |                               |
         +--+-------------------------------+--+
         | +----------+  +--+  +--+            |
         | | Start    |  |  |  |  | Related    |        Figure 1
         | | HTML     |  |  |  |  | body part  |
         | | document |  |  |  |  | parts      |
         | +----------+  +--+  +--+            |
         +-------------------------------------+

If the document viewer is separate from the MIME receiving client, the
MIME client might turn over the HTML body part to the document viewer
and ask it to display it (Figure 1). One way of doing this is to store
the HTML body part in a file, and ask the document viewer to display
this file. If multipart/related is used, this can be implemented by
storing all the body parts within the multipart/related in an otherwise
empty folder/directory.

The mail client may have to rewrite the HTML, replacing URI-s with
(possibly relative) URL-s which the Document viewer can resolve as file
names in the same directory/folder where the HTML document itself is
stored when turning it over to the Document viewer. Problems with such
rewriting of URIs is discussed in chapter 5 below.

4.3   Method 3: Using a translation table

    +----------+                         +--------+
    | Document |                         | Mail   |
    | viewer   |                         | client |
    +-------+--+                         +-+------+
            |                              |
         +--+------------------------------+-+
         | +--------+  +--+  +--+            |
         | | Trans- |  |  |  |  | Related    |        Figure 2
         | | lation |  |  |  |  | body part  |
         | | table  |  |  |  |  | parts      |
         | +--------+  +--+  +--+            |
         +-----------------------------------+

An alternative to rewriting the HTML file before turning it over to the
Document viewer may be to use a translation table, in case the Document
viewer has the capability to use such a table to rewrite URL-s on the
fly while displaying the document (Figure 2). This requires that the
Document viewer is capable of receiving CID: URL-s and resolving them
using this translation table in the same way as for other URL-s.

4.4   Method 4: Using a proxy HTTP server to retrieve referenced body
parts

    +--------+       +-----------+       +--------+
    | Proxy  |       | Data base |       | Mail   |
    | web    |-------| of cached |-------| server |
    | server |       | objects   |       |        |
    +----+---+       +-----------+       +----+---+
         |                                    |
    +----+-----+                         +----+---+   Figure 3
    | Document |                         | Mail   |
    | viewer   |                         | client |
    +-------+--+                         +-+------+
            |                              |
         +--+------------------------------+-+
         |         Start HTML object         |
         +-----------------------------------+

Yet another method is to use a proxy web server, to which the document
viewer requests are sent, and which will then use the cached body parts
instead of normal web retrieval from the network (Figure 3). If the
Document viewer is set to use this proxy server for all URL-s, including
CID URL-s, no rewriting of the HTML will be necessary.

4.5   Method 5: Putting the mail client into a proxy HTTP server

     +--------+--------+
     | Proxy  | Mail   |
     |  HTTP  | client |
     | server |        |
     +--------+--------+
              |
        HTTP protocol              Figure 4
              |
         +----+-----+
         | Document |
         | Viewer   |
         +----------+

A mail client can also be included in an HTTP server (Figure 4). The
user will then not have to install any mail client software in his
personal computer, all the mail functionality is mapped on HTTP and HTML
elements.

4.6   Other methods

The mail client and the document viewer can of course communicate in
other ways, such as using inter-process communication.

4.7   Combined methods

Several of the methods described above can also be combined. The mailer
might for example display simpler HTML documents itself, but
automatically or manually trnasfer the HTML documents to a separate HTML
viewer for more complex documents.

A common practice in HTML viewers is to simply ignore all markup which
the viewer does not understand. This practice, if implemented in a
mailer with limited HTML viewing capabilities, might mean that the user
is shown a very incomplete message without any warning that information
is missing. In this case, it is better to give the user some kind of
warning, combined with a command to view the letter with a separate HTML
viewer, or turn the document over automatically to a separate viewer
when the document contains markup which the mailer cannot render itself.

4.8   Communication between document viewer and mail client

Many document viewers (web browsers) have API-s to allow other programs
to communicate with them. There is however no accepted real or de-facto
standard for such API-s, which means that a mail program which relies on
such API-s will only be able to use those document viewers, whose API
they support.

Note however, that most of the methods described above can be
implemented with a very minimal such API. The only API function needed
is to be able to tell a document viewer, when it is started, to open a
particular file. And this API function is a standardized part of the
operating system on most platforms. In particular, method 1 and 3 above
uses the functionality that a relative URL is resolved with the location
of the base document as base. This means that if the base document is a
file, relative URL-s will be resolved as FILE URL-s in the same
directory/folder where the HTML document itself is placed.

There is a need for buttons in the Web page which the user can use to
get back to the mail program again after reading the mail with the
document viewer. A common technique to achieve this is to define a new
MIME data type for this button. The document viewer is then configured
to transfer control to the mail client when the user pushes this button,
i.e. downloads a file of this new MIME type.

5.    Problems with rewriting URIs when copying HTML documents

Sending of HTML-formatted messages is based on the assumption that an
HTML documents, together with in-line objects like images, applets and
frames, can be copied into a MIME message. Such copying may require
rewriting of URIs containing references between the different message
parts. The MHTML standard [MHTML] has been carefully prepared to allow
existing web pages to be copied without such rewriting, through the use
of the Content-Location MIME content heading field.

There is however a problem if the source HTML document contains relative
URIs in parameters to objects and applets, such as in the example below:

From: foo1@bar.net
To: foo2@bar.net
Subject: A simple example
Mime-Version: 1.0
Content-Type: multipart/related; boundary="boundary-example-1";
                 type=Text/HTML
Content-Base: "http://www.ietf.cnri.reston.va.us"

--boundary-example 1
Content-Type: Text/HTML; charset=US-ASCII

  ... text of the HTML document...
<OBJECT
   CLASSID = "clsid:5220cb21-c88d-11cf-b347-00aa00a28331">
   <PARAM NAME="imageurl" VALUE="image.gif">
</OBJECT>
...etc...

--boundary-example-1
Content-Location: "image.gif"
Content-Type: IMAGE/GIF
Content-Transfer-Encoding: BASE64

R0lGODlhGAGgAPEAAP/////ZRaCgoAAAACH+PUNvcHlyaWdodCAoQykgMTk5
..etc...

--boundary-example-1--

Only the object might know that the imageurl parameter is a relative
URI.
It's nearly impossible for the HTML parser to understand that the
parameter is a relative URI.  Simply searching for "image.gif" is not
robust, as the string "image.gif" may be used elsewhere. URIs in scripts
can also have similar problems.

One might envisage even more difficult cases, an applet might take a
parameter "subject" and another parameter "range" and when
subject="auto" and range="1-5" it could compute, and try to use
auto1.gif, auto2.gif ... auto5.gif as relantive URLs.

Some implementation methods described in chapter 4 above, for example
method 2 described in chapter 4.2, may require rewriting of the URIs in
the HTML document.

There is no perfect solution to this problem.

One way of alleviating the problem is to produce the original document
using only absolute URIs, preferably of the CID type, since they are
more easily identifiable.

Another way of alleviating the problem is to make all URIs and
Content-Locations into simple relative URIs containing file names only
(without paths, preferably using a file name format common to most
platforms, i.e. 1-6 ascii letters or digits, a period, and 1-3 extension
ascii letters or digits). An implementation using method 2 described in
chapter 4.2 above can then just store the parts as files in an empty
directory on the recipient computer with the Content-Locations as file
names, and then turn the start HTML file over to a document viewer, and
need not rewrite the URIs at all. This simple variant of use of the
MHTML standard is probably most robust, and those implementors who can
control the production of the HTML documents to be sent are thus
recommended to use this variant.


6.    Caching of body parts

Suppose a message contains body parts with the Content-Location header
as defined in [MHTML]. A receiving agent might then put this body part
into a web cache, with the URI in the Content-Location as its name, so
that later retrievals of this URI use the cached body parts. There is
however no guarantee that such a cached item is correct. Such caching is
thus not recommended for use in other ways than for resolution of links
within one particular MIME message.

The MHTML standard does not cover links between different messages, but
if you want to implement this, use of Content-ID and/or Message-ID,
rather than Content-Location, is recommended.

If incoming messages are stored in a store where messages can be
automatically deleted (purged), purging of body parts should not occur
before purging of the whole message, to which they belong.

If an incoming message contains a body part which is linked via Content-
Location, then no HTTP lookup should be performed to check if the body
part is recent. The message should thus still contain the old HTML
document, even if the HTTP-available document has been revised.
(Example: "Here is the weather map of October 29, 1997"). Exception from
this is:

(a) If the linked document is not enclosed in the message, but referred
    to via Content-Type: message/external-body, then the latest version'
    should be shown using ordinary HTTP caching conventions.

(b) If a new message is sent with a Supersedes reference to the old
    message, the old message should still show the old version of all
    the body parts, but it might be wise to inform the user that a
    superseding message is available.


7.    "Save as" command

Many HTML viewers have a "Save as" command to save a HTML document in a
local file. Usually, this command has two variants, "Save as text" which
converts the HTML document to plain text before saving it, and "Save as
source" which saves the HTML document as a HTML-formatted document.

These two variants may not be enough in the case of MHTML documents.
There is a third option, which might be named "Save as aggregate". This
option would save the HTML plus all related parts in a file with the
Content-Type: Multipart/related. The file would thus begin with the
heading of the Multipart/related body part.

There are two variants of this: Saving the document as it looked like
when you got it, or saving the document including all inline body parts,
even those you had to retrieve from the Internet when showing the
message to the user. The second format is of special value, because it
provides an archiving format of the full document, allowing the user to
view it in the future as it looked like at one particular time, even
though web content may change in the future.

Finally, a user may also want to save the e-mail or http heading fields
of an incoming message. This is sometimes the same as "Save as
aggregate", but may include additional body parts before or outside of
the mulitpart/related aggregate.

To indicate whether such a saved document was received by e-mail or
http, it might be saved with an additional surrounding body part of
content-type message/rfc822 or message/http.

Example, suppose you receive by e-mail the following message:

   MAIL FROM:<alice@bar.net>
   RCPT TO:<bob@foo.net>
   DATA
   From: Alice <alice@bar.net>
   To: Bob <bob2@foo.net>
   Date: 23 Jan 1998 10:51
   Subject: A simple example
   Mime-Version: 1.0
   Content-Type: multipart/related; boundary="boundary-example-1";
                 type="text/html"; start=<foo3@foo1@bar.net>

   --boundary-example-1
      Content-Type: text/html;charset=US-ASCII
      Content-ID: <foo3@foo1@bar.net>

      Here is the IETF logo with white background:
      <IMG SRC="http://www.ietf.cnri.reston.va.us/images/ietflogo.gif"
       ALT="IETF logo with white background">
      And here is the IETF logo with transparent background:
      <IMG SRC="http://www.ietf.cnri.reston.va.us/images/ietflogo2e.gif"
      <ALT="IETF logo with transparent background">

   --boundary-example-1
       Content-Location: ietflogo.gif
       Content-Base: http://www.ietf.cnri.reston.va.us/images/
      Content-Type: IMAGE/GIF
      Content-Transfer-Encoding: BASE64

      R0lGODlhGAGgAPEAAP/////ZRaCgoAAAACH+PUNvcHlyaWdodCAoQykgMTk5
      NSBJRVRGLiBVbmF1dGhvcml6ZWQgZHVwbGljYXRpb24gcHJvaGliaXRlZC4A
      etc...

   --boundary-example-1--
   .

Saving the above message as text might give the following file:

   From: Alice <alice@bar.net>
   To: Bob <bob2@foo.net>
   Date: 23 Jan 1998 10:51
   Subject: A simple example

      Here is the IETF logo with white background:
      IETF logo with white background
      And here is the IETF logo with transparent background:
      IETF logo with transparent background

Saving the same text as html source might give the following file:

      Here is the IETF logo with white background:
      <IMG SRC="http://www.ietf.cnri.reston.va.us/images/ietflogo.gif"
       ALT="IETF logo with white background">
      And here is the IETF logo with transparent background:
      <IMG SRC="http://www.ietf.cnri.reston.va.us/images/ietflogo2e.gif"
      <ALT="IETF logo with transparent background">

Saving the same text as aggregate might give the following file

   From: Alice <alice@bar.net>
   To: Bob <bob2@foo.net>
   Date: 23 Jan 1998 10:51
   Subject: A simple example
   Mime-Version: 1.0
   Content-Type: multipart/related; boundary="boundary-example-1";
                 type="text/html"; start=<foo3@foo1@bar.net>

   --boundary-example-1
      Content-Type: text/html;charset=US-ASCII
      Content-ID: <foo3@foo1@bar.net>

      Here is the IETF logo with white background:
      <IMG SRC="http://www.ietf.cnri.reston.va.us/images/ietflogo.gif"
       ALT="IETF logo with white background">
      And here is the IETF logo with transparent background:
      <IMG SRC="http://www.ietf.cnri.reston.va.us/images/ietflogo2e.gif"
      <ALT="IETF logo with transparent background">

   --boundary-example-1
       Content-Location: ietflogo.gif
       Content-Base: http://www.ietf.cnri.reston.va.us/images/
      Content-Type: IMAGE/GIF
      Content-Transfer-Encoding: BASE64

      R0lGODlhGAGgAPEAAP/////ZRaCgoAAAACH+PUNvcHlyaWdodCAoQykgMTk5
      NSBJRVRGLiBVbmF1dGhvcml6ZWQgZHVwbGljYXRpb24gcHJvaGliaXRlZC4A
      etc...

   --boundary-example-1--

Saving the same text as archiving aggregate might give the following
file (where the missing body part is fetched through http and added to
the saved file):

   From: Alice <alice@bar.net>
   To: Bob <bob2@foo.net>
   Date: 23 Jan 1998 10:51
   Subject: A simple example
   Mime-Version: 1.0
   Content-Type: multipart/related; boundary="boundary-example-1";
                 type="text/html"; start=<foo3@foo1@bar.net>

   --boundary-example-1
      Content-Type: text/html;charset=US-ASCII
      Content-ID: <foo3@foo1@bar.net>

      Here is the IETF logo with white background:
      <IMG SRC="http://www.ietf.cnri.reston.va.us/images/ietflogo.gif"
       ALT="IETF logo with white background">
      And here is the IETF logo with transparent background:
      <IMG SRC="http://www.ietf.cnri.reston.va.us/images/ietflogo2e.gif"
      <ALT="IETF logo with transparent background">

   --boundary-example-1
       Content-Location: ietflogo.gif
       Content-Base: http://www.ietf.cnri.reston.va.us/images/
      Content-Type: IMAGE/GIF
      Content-Transfer-Encoding: BASE64

      R0lGODlhGAGgAPEAAP/////ZRaCgoAAAACH+PUNvcHlyaWdodCAoQykgMTk5
      NSBJRVRGLiBVbmF1dGhvcml6ZWQgZHVwbGljYXRpb24gcHJvaGliaXRlZC4A
      etc...

   --boundary-example-1
     Content-Location: ietflogo2e.gif
     Content-Base: http://www.ietf.cnri.reston.va.us/images/
     Content-Type: IMAGE/GIF
     Content-Transfer-Encoding: BASE64

     R0lGODlhGAGgANX/ACkpKTExMTk5OUJCQkpKSlJSUlpaWmNjY2tra3Nzc3t7e4
     SEhIyMjJSUlJycnKWlpa2trbW1tcDAwM7Ozv/eQnNzjHNzlGtrjGNjhFpae1pa
      etc...

   --boundary-example-1--

Saving the same message as message might give the following file:

   from:<alice@bar.net>
   To:<bob@foo.net>
   Mime-Version: 1.0
   Content-Type: Message/rfc822; boundary="boundary-example-2"

   --boundary-example-2
   From: Alice <alice@bar.net>
   To: Bob <bob2@foo.net>
   Date: 23 Jan 1998 10:51
   Subject: A simple example
   Mime-Version: 1.0
   Content-Type: multipart/related; boundary="boundary-example-1";
                 type="text/html"; start=<foo3@foo1@bar.net>

   --boundary-example-1
      Content-Type: text/html;charset=US-ASCII
      Content-ID: <foo3@foo1@bar.net>

      Here is the IETF logo with white background:
      <IMG SRC="http://www.ietf.cnri.reston.va.us/images/ietflogo.gif"
       ALT="IETF logo with white background">
      And here is the IETF logo with transparent background:
      <IMG SRC="http://www.ietf.cnri.reston.va.us/images/ietflogo2e.gif"
      <ALT="IETF logo with transparent background">

   --boundary-example-1
       Content-Location: ietflogo.gif
       Content-Base: http://www.ietf.cnri.reston.va.us/images/
      Content-Type: IMAGE/GIF
      Content-Transfer-Encoding: BASE64

      R0lGODlhGAGgAPEAAP/////ZRaCgoAAAACH+PUNvcHlyaWdodCAoQykgMTk5
      NSBJRVRGLiBVbmF1dGhvcml6ZWQgZHVwbGljYXRpb24gcHJvaGliaXRlZC4A
      etc...

   --boundary-example-1--
   --boundary-example-2--


8.    Recipients which cannot handle the Multipart/related Content-Type

A message sent according to the specifications in [MHTML] may have
recipients, whose mailers cannot handle the Multipart/related
Content-Type in the way specified in [MHTML].

According to [MIME1] a mailer which encounters an unknown subtype to
Multipart, should handle this as Multipart/mixed.

To improve this, Multipart/alternative can be used as discussed in
section 8 of this memo.

Content-Disposition, as specified in [CONDISP] and in [MHTML], section
10, can also be used as an aid to mailers which do not understand
Multipart/related.

Captions on images, which are included in the HTML text, might for
non-HTML-capable recipients be found in the Content-Description header
[CONDISP]. Do not assume, however, that HTML-capable user agents will
display the Content-Description header, they may assume that this
information is included in the HTML text instead.


9.    Use of the Content-Type: Multipart/alternative

If the message is sent to recipients, all of which may not have mailers
capable of handling the Text/HTML content-type, then the "Content-Type:
Multipart/Alternative" [MIME1] can be used in two ways:

9.1   Multipart/alternative inside Multipart/related

The Multipart/alternative is put inside the "Content-Type
Multipart/related", body parts can be specified with "Content-Type:
Text/plain" as the first choice, and "Content-Type: Text/HTML" as the
second choice.

Example:

   Content-Type: Multipart/related; boundary="boundary-example-1";
                 type=MULTIPART/ALTERNATIVE

      --boundary-example 1
      Content-Type: MULTIPART/ALTERNATIVE
      Boundary: boundary-example-2

         --boundary-example-2
         Content-Type: Text/plain

         ... plain text version of the document for recipients
         whose mailers cannot handle Text/HTML ...

         --boundary-example-2
         Content-Type: Text/HTML; charset=US-ASCII
         Content-ID: content-id-example@example.host

         ... text of the HTML document ...

         --boundary-example-2--
      --boundary-example-1
      Content-Type: Image/GIF

      ... a body part, to which the HTML document has a link  ...
      --boundary-example-1--

Note that the type parameter of Multipart/related in this case should be
Multipart/alternative and not Text/HTML.

9.2   Multipart/alternative outside Multipart/related

The multipart/alterantive is put outside the Multipart/Related, with
Multipart/Related as one alternative and Multipart/Mixed as the other
alternative. Note however that the [MHTML] does not recommend links from
inside Multipart/Related to objects outside of the Multipart/Related, so
putting inline images outside the Multipart/Related is not suitable.
Instead, such inline images may have to repeated in both branches of the
multipart/alternative with this method.

Example:

   Content-Type: MULTIPART/ALTERNATIVE
   Boundary: boundary-example-1

   --boundary-example-1
      Content-Type: Multipart/mixed; boundary="boundary-example-3"

      --boundary-example-3
         Content-Type: Text/plain; charset=US-ASCII

         ... plain text version of the message for recipients
         whose mailers cannot handle Text/HTML ...

      --boundary-example-3
      Content-Type: Image/GIF

         ... A picture associated with the plain text message  ...
      --boundary-example-3--

   --boundary-example-1
      Content-Type: Multipart/related; boundary="boundary-example-1";
                    type=Text/HTML

      --boundary-example 2
         Content-Type: Text/HTML; charset=US-ASCII
         Content-ID: content-id-example@example.host

         ... text of the HTML document ...

      --boundary-example-2
      Content-Type: Image/GIF

         ... a body part, to which the HTML document has a link  ...
      --boundary-example-2--
   --boundary-example-1--

9.3   Comparing the two methods

When choosing between these two methods of employing
multipart/alternative, note the following:

 (1) Clients which do not support Multipart/related, and which thus will
     interpret it as Multipart/mixed, will with choice 8.1 display
     the inline objects. Thus, a recipient whose mailer can handle
     image/gif but not multipart/related will still be shown the images,
     they will not be suppressed by being inside a suppressed branch of
     the Multipart/alternative.

 (2) Choice 8.2 will not show inline images in the Multipart/Related,
     unless this information is repeated in both branches of the
     Multipart/Alternative.

A general warning: Some mailers do not support "Content-Type:
Multipart/alternative", and may then interpret it as Multipart/mixed,
even though support of multipart/alternative is required for MIME
conformance.


10.   Recipient may not have full Internet connectivity

The recipient of a message sent by email may not always have full
Internet connectivity. The recipient may be behind a gateway or firewall
which prohibits or restricts Internet connectivity.

This means that the recipient may not be able to resolve URI-s in an
email message, unless the referred-to documents are included in the
email message itself. Thus, it is often suitable to include in an email
message all documents which are referred to (directly or indirectly) by
URI-s in the message. This may of course not always be possible, in some
cases the set of referred-to documents (directly or indirectly) may be
the whole WWW document space, i.e. millions of documents. A choice must
then be made how much to include. Of course, it is most important to
include all inline objects, i.e. objects linked by such hyperlinks as
IMG, etc., which specify that the linked objects are to be shown to the
user immediately.

In the case of ACTION elements in HTML forms, by making these ACTION
elements of the "mailto:" URL type, rather than the "http:" URL type,
you will enable also recipients without full Internet connectivity to
fill in and send in your forms. The HTML specification [HTML2] allows
default action when no ACTION element is included, but this default
action may not be suitable when sending the HTML document via email.
Thus, it is better to always put an explicit ACTION element into HTML
forms sent by email.


11.   Encoding of non-ascii characters

         Displayed text                        Displayed text
               |                                     ^
               V                                     |
         +-------------+                       +----------------+
         | HTML editor |                       | HTML viewer    |
         |             |                       | or Web browser |
         +-------------+                       +----------------+
             |                                       ^
             V                                       |
         HTML markup                             HTML markup
             |                                       ^
             V                                       |
  +---------+ +---------------+       +-------------+ +---------------+
  | MIME    | | MIME content- |       | MIME        | | MIME content- |
  | encap-  | | transfer-     |       | heading     | | transfer-     |
  | sulator | | encoder       |       | interpreter | | decoder       |
  +---------+ +---------------+       +-------------+ +---------------+
    |              |                            ^              ^
    V              V         +-----------+      |              |
MIME heading + MIME content->| Transport |->MIME heading + MIME content
                             +-----------+

                               Figure 5

Definitions (see Figure 5):

Displayed text   A visual representation of the intended text.

HTML markup      A sequence of characters formatted according to the
                 HTML specification [HTML2].

MIME content     A sequence of octets physically forwarded via email,
                 may use MIME content-transfer-encoding as specified
                 in [MIME1].

HTML editor      Software used to produce HTML markup.

MIME content-    Software used to encode non-US-ASCII characters
transfer-encoder as specified in [MIME1].

MIME content-    Software used to decode non-US-ASCII characters
transfer-decoder as specified in [MIME1].

MIME heading     Software used to interpret the information in MIME
interpreter      headings.

HTML viewer      Software used to display HTML documents to recipients.

Some implementations may have a choice of whether to represent non-ascii
characters at the HTML layer (using "&" entity references or numeric
character references as defined in [HTML2] section 3.2.1) or at the MIME
layer (using Content-Transfer-Encoding as defined in [MIME1] section 5).

In choosing between these two representation methods, note the following
effects:

(1) Modifying HTML markup may disrupt security content integrity
    checksums. If the checksums are computed between the HTML editor
    and the MIME encapsulator, then making the encoding in the MIME
    encapsulator will not break the checksums.

(2) The choice of modifying HTML markup may be more suitable for
    recipients whose mailers do not support MIME.

(3) Using MIME Content-Transfer-Encoding may be more suitable for
    recipients who have MIME-compliant mailers but do pass the text over
    to a document viewer (web browser).


12.   Conversion from HTTP to MIME

Information received or retrieved using HTTP cannot always be sent
unchanged as email using the "Content-Type: Text/HTML", because of the
restrictions which MIME places on the format of "Content-Type:
Text/HTML". The same problem may occur for documents retrieved via HTTP,
which are in other textual formats than HTML. In particular, note the
following:

(a) Content-encodings allowed in HTTP, but not allowed in MIME, must be
removed.

(b) HTTP allows line breaks as bare CRs or bare LFs or something else,
while MIME only allows line breaks as CRLF in subtypes of the Text
content-type.

(c) HTTP allows character sets like Unicode-1-1, which do not represent
line breaks as CRLFs, such text may have to be rewritten to character
sets like Unicode-1-1-UTF-7 in which line breaks are represented as
CRLFs.

A good overview of the differences, with regard to the use of
"Content-Type: Text", between MIME and HTTP, can be found in [HTTP]
appendix C.

If you want to provide web documents, which can be sent through e-mail
without modification (which might break integrity checksums) then you
SHOULD provide them up in the canonical form, with line breaks as CRLF,
and avoid lines longer than 76 characters/line.

If you want to send HTTP unchanged via email, you might consider using
the "Content-Type: Message/HTTP" instead of the "Content-Type:
Text/HTML". Note that with this Content-Type, the whole object, as sent
through HTTP, can be encoded as a single object with, for example,
BASE64 encoding. After decoding of the BASE64, the resulting object can
have HTTP peculiar formats, like single LF or single CR between lines.
However, some mailers may not be capable of handling the Message/HTTP
Content-Type.

Example, the binary part of the following message

   Content-Type: message/http
   Content-Transfer-Encoding: base64

   SFRUUC8xLjEgMjAwIE9LDURhdGU6IFNhdCwgMTQgRmViIDE5OTggMTM6MDM6MzggR01U
   DVNlcnZlcjogQXBhY2hlLzEuMi40DUxhc3QtTW9kaWZpZWQ6IFdlZCwgMjMgSnVsIDE5
   ... ... ...

might, when the base64 encoding above is decoded, yield:

   HTTP/1.1 200 OK
   Date: Sat, 14 Feb 1998 13:03:38 GMT
   ETag: "43788-124-33d658c5"
   Content-Length: 292
   Accept-Ranges: bytes
   Content-Type: text/html

   ... <HTML data with only LF between lines> ...


13.   Acknowledgments

Harald Tveit Alvestrand, Richard Baker, Dave Crocker, Martin J. Duerst,
Roy Fielding, Lewis Geer, Al Gilman, Paul Hoffman, Alexander Hopmann,
Mark K. Joseph, Greg Herlihy, Valdis Kletnieks, Daniel LaLiberte, Ed
Levinson, Jay Levitt, Albert Lunde, Larry Masinter, Keith Moore, Gavin
Nicol, Pete Resnick, Jon Smirl, Einar Stefferud, Jamie Zawinski and
several other people have helped us with preparing this memo. I alone
take responsibility for any errors which may still be in the memo.


14.   References

Temporary note: This list contains some references to Internet drafts.
It is anticipated that these Internet drafts will become RFC-s before
this memo. The references will then in this memo be changed to refer to
the corresponding RFC instead. This list also includes some RFC-s which
are not up to date, and which will be replaced by new memos presently in
ietf draft status.

Ref.            Author, title
---------       -------------------------------------------------------

[CONDISP]       R. Troost, S. Dorner: "Communicating Presentation
                Information in Internet Messages: The Content-
                Disposition Header", RFC 1806, June 1995.

[HOSTS]         R. Braden (editor): "Requirements for Internet Hosts --
                Application and Support", STD-3, RFC 1123, October
                1989.

[HTML2]         T. Berners-Lee, D. Connolly: "Hypertext Markup Language
                - 2.0", RFC 1866, November 1995.

[HTTP]          T. Berners-Lee, R. Fielding, H. Frystyk: Hypertext
                Transfer Protocol -- HTTP/1.0. RFC 1945, May 1996.

[MHTML]         J. Palme & A. Hopmann: "Packaging Aggregate HTML
                Objects in MIME Email", draft-ietf-mhtml-rev-
                02.txt , October 1997.

[MIDCID]        E. Levinson: "Message/External-Body Content-ID Access
                Type", draft-ietf-mhtml-cid-v2-00.txt, July, 1997.

[MIME1]         N. Freed & N. Borenstein: "MIME (Multipurpose Internet
                Mail Extensions) Part One: Mechanisms for Specifying
                and Describing the Format of Internet Message Bodies",
                RFC 2045, November 1996.

[MIME2]         N. Freed & N. Borenstein: "Multipurpose Internet Mail
                Extensions (MIME) Part Two: Media Types". RFC 2046,
                November 1996.

[NEWS]          M.R. Horton, R. Adams: "Standard for interchange of
                USENET messages", RFC 1036, December 1987.

[REL]           Harald Tveit Alvestrand, Edward Levinson: "The MIME
                Multipart/Related Content-type", <draft-mhtml-
                related-02.txt>, August 1997.

[RELURL]        R. Fielding: "Relative Uniform Resource Locators", RFC
                1808, June 1995.

[RFC822]        D. Crocker: "Standard for the format of ARPA Internet
                text messages." STD 11, RFC 822, August 1982.

[SMTP]          J. Postel: "Simple Mail Transfer Protocol", STD 10, RFC
                821, August 1982.

[URL]           T. Berners-Lee, L. Masinter, M. McCahill: "Uniform
                Resource Locators (URL)", RFC 1738, December 1994.

[URLBODY]       N. Freed and Keith Moore: "Definition of the URL MIME
                External-Body Access-Type", RFC 2017, October 1996.


15.   Author's Address

Jacob Palme                          Phone: +46-8-16 16 67
Stockholm University and KTH         Fax: +46-8-783 08 29
Electrum 230                         Email: jpalme@dsv.su.se
S-164 40 Kista, Sweden

Working group chairman:

Einar Stefferud <stef@nma.com>
Document	Document type	This is an older version of an Internet-Draft whose latest revision state is "Expired". Expired & archived
	Select version	00 01 02 03 04 05 06 07 08 09 10 11
	Compare versions
	Author
	RFC stream
	Other formats	txt pdf bibtex bibxml
	Additional resources	Mailing list discussion