Network Working Group                                       Jacob Palme
Internet Draft                                 Stockholm University/KTH
draft-ietf-mhtml-info-04.txt
Category-to-be: Informational
Expires: March 1997                                        October 1996



Sending HTML in E-mail, an informational supplement to RFC ???:
MIME E-mail Encapsulation of Aggregate HTML Documents (MHTML)


Status of this Memo


This document is an Internet-Draft.  Internet-Drafts are working
documents of the Internet Engineering Task Force (IETF), its
areas, and its working groups.  Note that other groups may also
distribute working documents as Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other
documents at any time.  It is inappropriate to use Internet-
Drafts as reference material or to cite them other than as
``work in progress.''

To learn the current status of any Internet-Draft, please check
the ``1id-abstracts.txt'' listing contained in the Internet-
Drafts Shadow Directories on ftp.is.co.za (Africa),
nic.nordu.net (Europe), munnari.oz.au (Pacific Rim),
ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast).

This memo provides information for the Internet community. This' memo
does not specify an Internet standard of any kind. Distribution of this
memo is unlimited.


1.    Abstract

The memo "MIME E-mail Encapsulation of Aggregate HTML Documents (MHTML)"
(draft-ietf-mhtml-spec-04.txt) specifies how to send packaged aggregate
HTML objects in MIME e-mail. This memo is an accompanying informational
document, intended to be an aid to developers. This document is not an
Internet standard.

Issues discussed are implementation methods, caching strategies,
problems with rewriting of URIs, making messages suitable both for
mailers which can and which cannot handle Multipart/related and handling
recipients which do not have full Internet connectivity.








Sending HTML in E-mail                                      October 1996
draft-ietf-mhtml-info-04.txt                                    [Page 1]


2.    Table of Contents

1. Abstract
2. Table of Contents
3. Introduction
4. Implementation methods
4.1 Method 1: Combining web browser and e-mail program
4.2 Method 2: Rewriting the HTML
4.3 Method 3: Using a translation table
4.4 Method 4: Using a proxy HTTP server to retrieve referenced body
parts
4.5 Method 5: Putting the mail client into a proxy HTTP server
4.6 Other methods
4.7 Communication between web browser mail client
5. Problems with rewriting URIs when copying HTML documents
6. Caching of body parts
7. Recipients which cannot handle the Multipart/related Content-Type
8. Use of the Content-Type: Multipart/alternative
9. Recipient may not have full Internet connectivity
10. Encoding of non-ascii characters
11. Conversion from HTTP to e-mail
12. Acknowledgments
13. References
14. Author's Address


Mailing List Information

Further discussion on this document should be done through the mailing
list MHTML@SEGATE.SUNET.SE.

To subscribe to this list, send a message to
   LISTSERV@SEGATE.SUNET.SE
which contains the text
SUB MHTML <your name (not your e-mail address)>

Archives of this list are available by anonymous ftp from
   FTP://SEGATE.SUNET.SE/lists/mHTML/
The archives are also available by e-mail. Send a message to
LISTSERV@SEGATE.SUNET.SE with the text "INDEX MHTML" to get a list of
the archive files, and then a new message "GET <file name>" to retrieve
the archive files.

Comments on less important details may also be sent to the editor, Jacob
Palme <jpalme@dsv.su.se>.

More information may also be available at URL:
HTTP://www.dsv.su.se/~jpalme/ietf/jp-ietf-home.HTML







Sending HTML in E-mail                                      October 1996
draft-ietf-mhtml-info-04.txt                                    [Page 1]


3.    Introduction

[MHTML] specifies how to send packaged aggregate HTML objects in MIME
e-mail. This memo is an accompanying informational document, intended to
be an aid to developers. This document is not an Internet standard.


4.    Implementation methods

The [MHTML] standard has been intentionally written to be implementable
both in cases where the web browser and e-mail program is combined, and
when they are separate programs. Implementation is of course easier if
the web browser is combined with the e-mail client.

4.1   Method 1: Combining web browser and e-mail program

This is the architecturally simplest approach. A web-browser with a
built in e-mail program will be able to use its own web browser
capabilities to display HTML-formatted messages. Since it is the same
program, that program will more easily be able to connect a URL in the
HTML text to a body part in the message.

4.2   Method 2: Rewriting the HTML

    +---------+                           +--------+
    | Web     |                           | Mail   |
    | browser |                           | client |
    +-------+-+                           +-+------+
            |                               |
         +--+-------------------------------+--+
         | +----------+  +--+  +--+            |
         | | Start    |  |  |  |  | Related    |        Figure 1
         | | HTML     |  |  |  |  | body part  |
         | | document |  |  |  |  | parts      |
         | +----------+  +--+  +--+            |
         +-------------------------------------+

If the web browser is separate from the e-mail client, the e-mail client
might turn over the HTML body part to the web browser and ask it to
display it (Figure 1). One way of doing this is to store the HTML body
part in a file, and ask the web browser to display this file. If
multipart/related is used, this can be implemented by storing all the
body parts within the multipart/related in an otherwise empty
folder/directory.

The mail client may have to rewrite the HTML, replacing URI-s with
(possibly relative) URL-s which the Web browser can resolve as file
names in the same directory/folder where the HTML document itself is
stored when turning it over to the Web browser. Problems with such
rewriting of URIs is discussed in chapter 5 below.




Sending HTML in E-mail                                      October 1996
draft-ietf-mhtml-info-04.txt                                    [Page 1]


4.3   Method 3: Using a translation table

    +---------+                          +--------+
    | Web     |                          | Mail   |
    | browser |                          | client |
    +-------+-+                          +-+------+
            |                              |
         +--+------------------------------+-+
         | +--------+  +--+  +--+            |
         | | Trans- |  |  |  |  | Related    |        Figure 2
         | | lation |  |  |  |  | body part  |
         | | table  |  |  |  |  | parts      |
         | +--------+  +--+  +--+            |
         +-----------------------------------+

An alternative to rewriting the HTML file before turning it over to the
Web browser may be to use a translation table, in case the Web browser
has the capability to use such a table to rewrite URL-s on the fly while
displaying the document (Figure 2). This requires that the Web browser
is capable of receiving CID: URL-s and resolving them using this
translation table in the same way as for other URL-s.

4.4   Method 4: Using a proxy HTTP server to retrieve referenced body
parts

    +--------+       +-----------+       +--------+
    | Proxy  |       | Data base |       | Mail   |
    | web    |-------| of cached |-------| server |
    | server |       | objects   |       |        |
    +----+---+       +-----------+       +----+---+
         |                                    |
    +----+----+                          +----+---+   Figure 3
    | Web     |                          | Mail   |
    | browser |                          | client |
    +-------+-+                          +-+------+
            |                              |
         +--+------------------------------+-+
         |         Start HTML object         |
         +-----------------------------------+

Yet another method is to use a proxy web server, to which the web
browser requests are sent, and which will then use the cached body parts
instead of normal web retrieval from the network (Figure 3). If the Web
browser is set to use this proxy server for all URL-s, including CID
URL-s, no rewriting of the HTML will be necessary.










Sending HTML in E-mail                                      October 1996
draft-ietf-mhtml-info-04.txt                                    [Page 1]


4.5   Method 5: Putting the mail client into a proxy HTTP server

     +--------+--------+
     | Proxy  | Mail   |
     |  HTTP  | client |
     | server |        |
     +--------+--------+
              |
        HTTP protocol              Figure 4
              |
         +----+----+
         | Web     |
         | browser |
         +---------+

A mail client can also be included in an HTTP server (Figure 4). The
user will then not have to install any mail client software in his
personal computer, all the mail functionality is mapped on HTTP and HTML
elements.

4.6   Other methods

The mail client and the web browser can of course communicate in other
ways, such as using inter-process communication.

4.7   Communication between web browser mail client

Many web browsers have API-s to allow other programs to communicate with
them. There is however no accepted real or de-facto standard for such
API-s, which means that a mail program which relies on such API-s will
only be able to use those Web browser, whose API they support.

Note however, that most of the methods described above can be
implemented with a very minimal such API. The only API function needed
is to be able to tell a Web browser, when it is started, to open a
particular file. And this API function is a standardized part of the
operating system on most platforms. In particular, method 1 and 3 above
uses the functionality that a relative URL is resolved with the location
of the base document as base. This means that if the base document is a
file, relative URL-s will be resolved as FILE URL-s in the same
directory/folder where the HTML document itself is placed.

There is a need for buttons in the Web page which the user can use to
get back to the mail program again after reading the mail with the Web
browser. A common technique to achieve this is to define a new MIME data
type for this button. The Web browser is then configured to transfer
control to the mail client when the user pushes this button, i.e.
downloads a file of this new MIME type.







Sending HTML in E-mail                                      October 1996
draft-ietf-mhtml-info-04.txt                                    [Page 1]


5.    Problems with rewriting URIs when copying HTML documents

Sending of HTML-formatted messages is based on the assumption that an
HTML documents, together with in-line objects like images, applets and
frames, can be copied into an e-mail message. Such copying may require
rewriting of URIs containing references between the different message
parts. The MHTML standard [MHTML] has been carefully prepared to allow
existing web pages to be copied without such rewriting, through the use
of the Content-Base and Content-Location MIME content heading fields.

There is however a problem if the source HTML document contains relative
URIs in parameters to objects and applets, such as in the example below:

From: foo1@bar.net
To: foo2@bar.net
Subject: A simple example
Mime-Version: 1.0
Content-Type: multipart/related; boundary="boundary-example-1";
                 type=Text/HTML
Content-Base: "http://www.ietf.cnri.reston.va.us"

--boundary-example 1
Content-Type: Text/HTML; charset=US-ASCII

  ... text of the HTML document...
<OBJECT
   CLASSID = "clsid:5220cb21-c88d-11cf-b347-00aa00a28331">
   <PARAM NAME="imageurl" VALUE="image.gif">
</OBJECT>
...etc...

--boundary-example-1
Content-Location: "image.gif"
Content-Type: IMAGE/GIF
Content-Transfer-Encoding: BASE64

R0lGODlhGAGgAPEAAP/////ZRaCgoAAAACH+PUNvcHlyaWdodCAoQykgMTk5
..etc...

--boundary-example-1--

Only the object might know that the imageurl parameter is a relative
URI.
It's nearly impossible for the HTML parser to understand that the
parameter is a relative URI.  Simply searching for "image.gif" is not
robust, as the string "image.gif" may be used elsewhere. URIs in scripts
can also have similar problems.

One might envisage even more difficult cases, an applet might take a
parameter "subject" and another parameter "range" and when
subject="auto" and range="1-5" it could compute, and try to use
auto1.gif, auto2.gif ... auto5.gif as relantive URLs.


Sending HTML in E-mail                                      October 1996
draft-ietf-mhtml-info-04.txt                                    [Page 1]


Some implementation methods described in chapter 4 above, for example
method 2 described in chapter 4.2, may require rewriting of the URIs in
the HTML document.

There is no perfect solution to this problem.

One way of alleviating the problem is to produce the original document
using only absolute URIs, preferably of the CID type, since they are
more easily identifiable.

Another way of alleviating the problem is if to make all URIs and
Content-Locations into simple relative URIs containing file names only
(without paths, preferably using a file name format common to most
platforms, i.e. 1-6 ascii letters or digits, a period, and 1-3 extension
ascii letters or digits). An implementation using method 2 described in
chapter 4.2 above can then just store the parts as files in an empty
directory on the recipient computer with the Content-Locations as file
names, and then turn the start HTML file over to a web browser, and need
not rewrite the URIs at all. This simple variant of use of the MHTML
standard is probably most robust, and those implementors who can control
the production of the HTML documents to be sent as e-mail are thus
recommended to use this variant.


6.    Caching of body parts

Suppose a message contains body parts with the Content-Location header
as defined in [MHTML]. A receiving agent might then put this body part
into a web cache, with the URI in the Content-Location as its name, so
that later retrievals of this URI use the cached body parts. There is
however no guarantee that such a cached item is correct. Such caching is
thus not recommended for use in other ways than for resolution of links
within one particular e-mail message.


7.    Recipients which cannot handle the Multipart/related Content-Type

A message sent according to the specifications in [MHTML] may have
recipients, whose mailers cannot handle the Multipart/related
Content-Type in the way specified in [MHTML].

According to [MIME1] a mailer which encounters an unknown subtype to
Multipart, should handle this as Multipart/mixed.

To improve this, Multipart/alternative can be used as discussed in
section 8 of this memo.

Content-Disposition, as specified in [CONDISP] and in [MHTML], section
10, can also be used as an aid to mailers which do not understand
Multipart/related.





Sending HTML in E-mail                                      October 1996
draft-ietf-mhtml-info-04.txt                                    [Page 1]


Captions on images, which are included in the HTML text, might for
non-HTML-capable recipients be found in the Content-Description header
[CONDISP]. Do not assume that HTML-capable user agents will display the
Content-Description header, they may assume that this information is
included in the HTML text instead.


8.    Use of the Content-Type: Multipart/alternative

If the message is sent to recipients, all of which may not have mailers
capable of handling the Text/HTML content-type, then the "Content-Type:
Multipart/Alternative" [MIME1] can be used in two ways:

(a) Inside the "Content-Type Multipart/related", body parts can be
specified with "Content-Type: Text/plain" as the first choice, and
"Content-Type: Text/HTML" as the second choice.

Example:

   Content-Type: Multipart/related; boundary="boundary-example-1";
                 type=MULTIPART/ALTERNATIVE

      --boundary-example 1
      Content-Type: MULTIPART/ALTERNATIVE
      Boundary: boundary-example-2

         --boundary-example-2
         Content-Type: Text/plain

         ... plain text version of the document for recipients
         whose mailers cannot handle Text/HTML ...

         --boundary-example-2
         Content-Type: Text/HTML; charset=US-ASCII
         Content-ID: content-id-example@example.host

         ... text of the HTML document ...

         --boundary-example-2--
      --boundary-example-1
      Content-Type: Image/GIF

      ... a body part, to which the HTML document has a link  ...
      --boundary-example-1--

Note that the type parameter of Multipart/related in this case should be
Multipart/alternative and not Text/HTML.

(b) Outside the Multipart/Related, with Multipart/Related as one
alternative and Multipart/Mixed as the other alternative.





Sending HTML in E-mail                                      October 1996
draft-ietf-mhtml-info-04.txt                                    [Page 1]


When choosing between these two methods of employing
multipart/alternative, note the following:

 (1) Clients which do not support Multipart/related, and which thus will
     interpret it as Multipart/mixed, will with choice (a) display
     the inline objects. Thus, a recipient whose mailer can handle
     Image/gif but not multipart/related will still be shown the images,
     they will not be suppressed by being inside a suppressed branch of
     the Multipart/alternative.

 (2) Choice (b) will not show inline images in the Multipart/Related,
     unless this information is repeated in both branches of the
     Multipart/Alternative.

A general warning: Many mailers do not support "Content-Type:
Multipart/alternative", and may then interpret it as Multipart/mixed.


9.    Recipient may not have full Internet connectivity

The recipient of a message sent by e-mail may not always have full
Internet connectivity. The recipient may be behind a gateway or firewall
which prohibits or restricts Internet connectivity.

This means that the recipient may not be able to resolve URI-s in an
e-mail message, unless the referred-to documents are included in the
e-mail message itself. Thus, it is often suitable to include in an
e-mail message all documents which are referred to (directly or
indirectly) by URI-s in the message. This may of course not always be
possible, in some cases the set of referred-to documents (directly or
indirectly) may be the whole WWW document space, i.e. millions of
documents. A choice must then be made how much to include. Of course, it
is most important to include all inline objects, i.e. objects linked by
such hyperlinks as IMG, etc., which specify that the linked objects are
to be shown to the user immediately.

In the case of ACTION elements in HTML forms, by making these ACTION
elements of the "mailto:" URL type, rather than the "http:" URL type,
you will enable also recipients without full Internet connectivity to
fill in and send in your forms. The HTML specification [HTML2] allows
default action when no ACTION element is included, but this default
action may not be suitable when sending the HTML document via e-mail.
Thus, it is better to always put an explicit ACTION element into HTML
forms sent by e-mail.











Sending HTML in E-mail                                      October 1996
draft-ietf-mhtml-info-04.txt                                    [Page 1]


10.   Encoding of non-ascii characters

Displayed text                        Displayed text
               |                                     ^
               V                                     |
         +-------------+                       +----------------+
         | HTML editor |                       | HTML viewer    |
         |             |                       | or Web browser |
         +-------------+                       +----------------+
             |                                       ^
             V                                       |
         HTML markup                             HTML markup
             |                                       ^
             V                                       |
  +---------+ +---------------+       +-------------+ +---------------+
  | MIME    | | MIME content- |       | MIME        | | MIME content- |
  | encap-  | | transfer-     |       | heading     | | transfer-     |
  | sulator | | encoder       |       | interpreter | | decoder       |
  +---------+ +---------------+       +-------------+ +---------------+
    |              |                            ^              ^
    V              V         +-----------+      |              |
MIME heading + MIME content->| Transport |->MIME heading + MIME content
                             +-----------+

                               Figure 5

Definitions (see Figure 5):

Displayed text   A visual representation of the intended text.

HTML markup      A sequence of characters formatted according to the
                 HTML specification [HTML2].

MIME content     A sequence of octets physically forwarded via e-mail,
                 may use MIME content-transfer-encoding as specified
                 in [MIME1].

HTML editor      Software used to produce HTML markup.

MIME content-    Software used to encode non-US-ASCII characters
transfer-encoder as specified in [MIME1].

MIME content-    Software used to decode non-US-ASCII characters
transfer-decoder as specified in [MIME1].

MIME heading     Software used to interpret the information in MIME
interpreter      headings.

HTML viewer      Software used to display HTML documents to recipients.





Sending HTML in E-mail                                      October 1996
draft-ietf-mhtml-info-04.txt                                    [Page 1]


Some implementations may have a choice of whether to represent non-ascii
characters at the HTML layer (using "&" entity references or numeric
character references as defined in [HTML2] section 3.2.1) or at the MIME
layer (using Content-Transfer-Encoding as defined in [MIME1] section 5).

In choosing between these two representation methods, note the following
effects:

(1) Modifying HTML markup may disrupt security content integrity checks.

(2) The choice of modifying HTML markup may be more suitable for
    recipients whose mailers do not support MIME.

(3) Using MIME Content-Transfer-Encoding may be more suitable for
    recipients who have MIME-compliant mailers but do pass the text over
    to a web browser.


11.    Conversion from HTTP to e-mail

Information received or retrieved using HTTP cannot always be sent
unchanged as e-mail using the "Content-Type: Text/HTML", because of the
restrictions which MIME places on the format of "Content-Type:
Text/HTML". The same problem may occur for documents retrieved via HTTP,
which are in other textual formats than HTML. In particular, note the
following:

(a) Content-encodings allowed in HTTP, but not allowed in MIME, must be
removed.

(b) HTTP allows line breaks as bare CRs or bare LFs or something else,
while MIME only allows line breaks as CRLF in subtypes of the Text
content-type.

(c) HTTP allows character sets like Unicode-1-1, which do not represent
line breaks as CRLFs, such text may have to be rewritten to character
sets like Unicode-1-1-UTF-7 in which line breaks are represented as
CRLFs.

A good overview of the differences, with regard to the use of
"Content-Type: Text", between MIME and HTTP, can be found in [HTTP]
appendix C.

If you want to send HTTP unchanged via e-mail, you might consider using
the "Content-Type: Message/HTTP" instead of the "Content-Type:
Text/HTML".









Sending HTML in E-mail                                      October 1996
draft-ietf-mhtml-info-04.txt                                    [Page 1]


12.   Acknowledgments

Harald Tveit Alvestrand, Richard Baker, Dave Crocker, Martin J. Duerst,
Roy Fielding, Lewis Geer, Al Gilman, Paul Hoffman, Alexander Hopmann,
Mark K. Joseph, Greg Herlihy, Valdis Kletnieks, Daniel LaLiberte, Ed
Levinson, Jay Levitt, Albert Lunde, Larry Masinter, Keith Moore, Gavin
Nicol, Pete Resnick, Jon Smirl, Einar Stefferud, Jamie Zawinski and
several other people have helped us with preparing this memo. I alone
take responsibility for any errors which may still be in the memo.


13.   References

Temporary note: This list contains some references to Internet drafts.
It is anticipated that these Internet drafts will become RFC-s before
this memo. The references will then in this memo be changed to refer to
the corresponding RFC instead. This list also includes some RFC-s which
are not up to date, and which will be replaced by new memos presently in
ietf draft status.

Ref.            Author, title
---------       -------------------------------------------------------

[CONDISP]       R. Troost, S. Dorner: "Communicating Presentation
                Information in Internet Messages: The Content-
                Disposition Header", RFC 1806, June 1995.

[HOSTS]         R. Braden (editor): "Requirements for Internet Hosts --
                Application and Support", STD-3, RFC 1123, October
                1989.

[HTML2]         T. Berners-Lee, D. Connolly: "Hypertext Markup Language
                - 2.0", RFC 1866, November 1995.

[HTTP]          T. Berners-Lee, R. Fielding, H. Frystyk: Hypertext
                Transfer Protocol -- HTTP/1.0. RFC 1945, May 1996.

[MHTML]         J. Palme & A. Hopmann: "Packaging Aggregate HTML
                Objects in MIME E-mail", <draft-ietf-mhtml-spec-
                03.txt>, August 1996.

[MIDCID]        E. Levinson: "Message/External-Body Content-ID Access
                Type", RFC 1873, December 1995.

[MIME1]         N. Borenstein & N. Freed: "MIME (Multipurpose Internet
                Mail Extensions) Part One: Mechanisms for Specifying
                and Describing the Format of Internet Message Bodies",
                RFC 1521, Sept 1993.

[MIME2]         N. Borenstein & N. Freed: "Multipurpose Internet Mail
                Extensions (MIME) Part Two: Media Types". draft-ietf-
                822ext-mime-imt-02.txt, December 1995.



Sending HTML in E-mail                                      October 1996
draft-ietf-mhtml-info-04.txt                                    [Page 1]

[NEWS]          M.R. Horton, R. Adams: "Standard for interchange of
                USENET messages", RFC 1036, December 1987.

[REL]           Harald Tveit Alvestrand, Edward Levinson: "The MIME
                Multipart/Related Content-type", <draft-levinson-
                multipart-related-00.txt>, January 1995.

[RELURL]        R. Fielding: "Relative Uniform Resource Locators", RFC
                1808, June 1995.

[RFC822]        D. Crocker: "Standard for the format of ARPA Internet
                text messages." STD 11, RFC 822, August 1982.

[SMTP]          J. Postel: "Simple Mail Transfer Protocol", STD 10, RFC
                821, August 1982.

[URL]           T. Berners-Lee, L. Masinter, M. McCahill: "Uniform
                Resource Locators (URL)", RFC 1738, December 1994.

[URLBODY]       N. Freed and Keith Moore: "Definition of the URL MIME
                External-Body Access-Type", draft-ietf-mailext-acc-url-
                01.txt, November 1995.


14.   Author's Address

Jacob Palme                          Phone: +46-8-16 16 67
Stockholm University and KTH         Fax: +46-8-783 08 29
Electrum 230                         E-mail: jpalme@dsv.su.se
S-164 40 Kista, Sweden

Working group chairman:
Einar Stefferud <stef@nma.com>

Sending HTML in E-mail                                      October 1996