draft-ietf-smtpext-8bittransport-06

Network Working Group                                J. C. Klensin
INTERNET DRAFT                                       July 15, 1992
Updates: RFC-821                        Expires:  January 27, 1993


          SMTP Extensions for Transport of Enhanced Messages


Abstract

    A series of extensions and clarifications are provided for the
    Simple Mail Transfer Protocol specified by RFC-821.  In combination,
    they provide for the transport of "8 bit mail", i.e., data
    characters with all bits of the octets used for information, for
    more robust and efficient handling of large messages, and for an
    improved foundation for any future extensions to SMTP.


Status of this Memo
   This document is an Internet Draft.  Internet Drafts are working
   documents of the Internet Engineering Task Force (IETF), its Areas,
   and its Working Groups. Note that other groups may also distribute
   working documents as Internet Drafts).

   Internet Drafts are draft documents valid for a maximum of six
   months.  Internet Drafts may be updated, replaced, or obsoleted
   by other documents at any time.  It is not appropriate to use
   Internet Drafts as reference material or to cite them other than
   as a "working draft" or "work in progress."

   Please check the I-D abstract listing contained in each Internet Draft
   directory to learn the current status of this or any other Internet
   Draft.

   This document is a working draft as part of the development of an
   extension to the SMTP protocol.  A subsequent version will be
   submitted to the RFC editor as a proposed standard.  Distribution
   is unlimited.  Comments are solicited and should be sent to the
   editor at Klensin@MIT.EDU or, preferably, by joining in discussions
   on the ietf-smtp mailing list (subscription requests to
   ietf-smtp-request@dimacs.rutgers.edu, postings to
   ietf-smtp@dimacs.rutgers.edu).


1. Introduction and Background

1.1 Introduction

RFC-821 [RFC821] defines a protocol, SMTP, to transfer mail reliably
and efficiently. It is largely independent of the transmission
subsystem used.  It requires only a reliable ordered data stream, of at
least 7-bit units, that consists of "lines" and "characters".  It also
makes some implied assumptions about end-to-end virtual circuit
connections as the primary model for transporting and delivering mail.

SMTP, as described in RFC-821, is restricted to the transport of data in
7-bit ASCII [ANSI-X3.4] encoding.  The term "ASCII", as used in this
document, refers to ANSI X3.4, and not to any national language
variations on ISO 646; the use of "US" with "ASCII" is merely to add
additional emphasis when that appears useful.  Strictly speaking,
incorporation of any non-ASCII character encoding, whether 7 or 8 bits,
or the assumption of a special interpretation for any control character
other than ASCII, CR, and LF is an extension from RFC-821 that may not
be compatible in subtle ways with existing conforming implementations.
Such extensions require either changes to RFC-821 itself, or prior
agreement among all parties and hosts which will transport or handle the
mail.  A strict reading of RFC-821 would permit the receiver of a
message to assume that it contained only ASCII characters.

MIME [RFC1341], Multimedia Internet Mail Extensions, provides for
identifying the use and encoding of character sets other than [US]
ASCII within a structured message body, using extended headers. Because
MIME does not require an 8-bit transport mechanism, its use
with 7-bit transport is likely to provide better interoperability than
the use of an 8-bit transport mechanism in situations where mail must
be passed through one or more unknown mail relays, gateways, or
exploders between the sender and the receiver.

At the same time, most electronic mail messages do not pass through such
mechanisms, but are simple textual messages sent within a small, "local"
community of users.  Within such local communities, sending characters
represented using other than 7-bit coding with a transport mechanism
that logically reflects the length of the character codes, without
additional encoding, provides considerable simplification.  Such a
system has been much in demand for 8-bit characters.  The consequences,
within a community that has decided to use this protocol extension, of
discovering that a receiving host will not accept transport of
extended-length characters, are also not severe, since that problem will
presumably rarely arise.  Nonetheless, this document provides a
framework for conversion of enhanced transport forms into the line- and
character-oriented 7-bit form permitted by the original SMTP and
outlines mechanisms by which the acceptability of 8-bit transport to a
given server can be inferred without first opening a mail connection.

In addition to the issues of 8-bit transport and general extensibility,
a number of trends, not least of which is the introduction of the
extended "multipart/multimedia" design of MIME, have contributed to
steady increases in the average and maximum sizes of messages that
people wish to transport over electronic mail facilities.  When hosts
imposed limits on mail message sizes in the early days of the ARPANET,
limits in the ranges of four, or even one, kilocharacters, were
considered reasonable.  The applications-level host requirements RFC
[RFC1123] specified 64K characters as the minimum size at which it was
reasonable to reject mail messages for excessive length.

Under RFC-821, there is no mechanism for rejecting a message as being
too long without actually having that message transmitted.  There is
also no provision for checkpointing or otherwise salvaging the portion
of the message transmitted before the size limit of the server is
encountered.  If all hosts accept messages of at least 64Kb,
experimenting with longer limits may waste considerable bandwidth.
This may be a major consideration on slow or expensive links.  This
document provides a model for determining whether a large message will
be accepted without actually transmitting most of it.


1.2.  Background, History, and Context of this Draft

The strongest evidence for the importance of 8-bit transport is that
many vendors and implementors already support it over the usual SMTP
channels and many report that they have done so in response to intense
customer pressure.  Since the mechanisms that have been chosen have not
been standardized, messages containing octets with the high bit set may
"escape" the local environment.  Difficulties of varying degrees of
severity may arise when they do so, including information loss as
characters are "bit stripped", which may be considered a severe
violation of user expectations about reliable mail transport and
delivery.

This document has two primary purposes.  One is to specify a clear
extension model for SMTP, so potential problems with further extensions
can be avoided.  The second, and the original goal of the working
group, was to provide for 8-bit character-oriented transport via SMTP
when that is deemed necessary.  A critical secondary purpose is to
standardize mechanisms and clarify procedures in ways that prevent
destructive "escape" of improperly-identified 8-bit characters and
potentially even more severe problems which could otherwise result from
the transport of characters that comprise multiple octets or of data
not organized into character form.

In other words, transport of 8-bit characters is occurring, will
continue to occur, and is perceived of as desirable under many
circumstances.  For it to coexist with older, more restricted,
implementations, requires that it be used in a coordinated way and only
when both parties are able and willing to use it.  That, in turn,
requires a clear mechanism as to how coordination will occur and
agreement be verified.  This protocol extension provides that
mechanism.

2.  Notation and terminology

There are several situations in this document in which the bit pattern
associated with the code for a character is, in the event of possible
ambiguity, more significant than the character itself.  In those
situations, the bit pattern is cited (in hexadecimal notation) as the
value of the octet, and the referenced ASCII characters are then
indicated in parentheses.  When characters, or character names, are
mentioned, they are to be construed strictly in accord with ASCII, that
is, from American National Standard ANSI X3.4-1986.  However, for the
purposes of this specification, the "international reference version"
table in ISO 646 [ISO646] and that in ASCII are identical.

  <>  Discussion:  ISO 646 has traditionally contained two character
      tables.  One is called the "International Reference Version"
      (often referred to as ISO646/IRV) and has been identical to
      ASCII except for the substitution of "universal currency
      symbol" for "dollar sign".  The other is called the "Basic
      Version" (often referred to as ISO646/BV).  National Language
      Variants on ISO 646 (often referred to as ISO646/NLV-language)
      are derived from ISO646/BV by the substitution of national
      characters into positions that ISO646/BV designates as reserved
      for this purpose.  So-called "invariant ISO 646" is a large
      subset of the non-reserved characters in ISO646/BV.

Except in a few situations where the distinction is important, the
terms "8-bit characters", "8-bit text", and "8-bit transport" are used
interchangably to refer to messages that might contain octets with the
high bit set to 1.  As above, when the distinction is important, the
term "octet" is used rather than "character" or "byte".  While it
provides a framework that could be extended to the logical transport of
characters longer than 8 bits, this document does not specify or permit
such transport.

 <> Discussion: Character codings in which individual characters occupy
    more than one octet (e.g., 16- or 32-bit character codes), may pose
    special problems for SMTP-style transport that 8-bit characters do
    not.  In particular, with some possible codings, some of the octets
    of some characters might have the same bit patterns as, e.g., CR
    LF.  Such character sets can always be handled by an encoding above
    the mail transport level, so that only conventional 7-bit or 8-bit
    characters are actually seen by the mail transport mechanisms.
    However, if they are to be transported in "native" form, transport
    extensions beyond those specified here will be required to insure
    the unambiguous recognizability of CR, LF, and "." or to avoid the
    necessity of recognizing those characters.

For many years, we have used the term "gateway" in discussions of mail
to refer to something that operates at the applications level,
translating between different mail protocols or environments.  Such
gateways are quite distinct from gateways at the IP and routing level.
The term "mail gateway" should perhaps be used consistently to avoid any
possible confusion, but that confusion rarely occurs.  A similar
situation applies when we discuss "transport" in a mail context,
referring to "mail transport" and not the underlying transport layer.
With RFC-821, we have had a "mail transport" mechanism that logically
deals only with 7 bit characters, even though most of the underlying
transport layers deal in octets.  This document extends the logical mail
transport environment to full octet width, again largely independent of
the underlying mechanisms.  "Transport" in this document should always
be read as "mail transport" and not in terms of how the network carries
octets or packets.

This document uses the terms "byte" and "kilobytes" in several contexts.
"Byte", as used here, is taken to be an 8-bit quantity, not one of
variable size.  In particular, "kilobyte" is intended to be construed as
1024 octets, not as 1024 "characters".  Similarly, uses of "length"
terminology in this document is always associated with bit and octet
storage units and not with the lengths of strings in character units.

The terms "client" and "sender" are used together or interchangably to
indicate the source of a particular mail transaction and "server" is
used to describe the target.  This usage is consistent with that in
RFC-821.

This document uses the term "mail transaction" to indicate a use of
SMTP, with or without the extensions specified here, with the intent of
sending mail.  Mail transactions always start with a HELO or EHLO
command with the intent of following it with a FROM command and one or
more RCPT verbs.  Use of VRFY, EXPN, or the new EHLO, SIZE, or EVFY
commands without prior FROM commands in the SMTP session does not
constitute a mail transaction.  A mail transaction ends when a QUIT or
RSET is sent.  It may also be considered to end when the CRLF.CRLF that
terminates the DATA command is received but, in that case, a second mail
transaction in the same [connection] session will normally start with a
FROM command rather than HELO or EHLO.

  <>  Discussion: From the standpoint of the sender/client, a mail
      transaction is a matter of intent.  From the standpoint of the
      server, the state is "not a mail transaction" until a FROM
      command is received.

This protocol provides the framework for several features that require
specification in additional RFCs before they can be validly used.  For
those features, which are clearly identified, this document provides
only syntax and, in most cases, an overview of the characteristics that
must be defined in feature-specific supplemental RFCs.  The major
feature for which neither this document nor RFC-821 provide is a
specification for the transport of information that is not structured
into "characters" and "lines".  However, this document provides a
framework around which such a definition might be developed in the
future if that were desired.  Language in this document that implies
forms of enhanced transport other than that specified with the EMAL verb
has been retained to allow for compatible extension for additional
features.

Finally, this document contains many subsections that are identified
with the terms "discussion", "implementation note", or "example".
While not strictly part of the specification, these subsections
provide context for the features, guidance for implementors, and other
"folklore" about the intent of the working group that produced the
specification to aid understanding and the generation of interoperable
implementations.


3. Organization and summary

This document consequently contains ten major components, which
follow:

  (i) Definition of a new SMTP verb, EMAL FROM, as an alternative to
      MAIL FROM where 8 bit transport is desired.

 (ii) Provision of a framework for additional transport type extensions
      via the addition of additional "FROM" verbs.

(iii) Definition of a new SMTP verb, EVFY, which can be used to
      determine whether an enhanced transport request is likely to be
      accepted for a particular address.

 (iv) Definition of a new SMTP verb, EHLO, which can be used by
      a client as an alternative to HELO.  Either HELO or EHLO may be
      used when the client intends to use enhanced mail facilities, but
      the server response to EHLO provides the client with a structured
      listing of the mail features supported by the server.

  (v) Definition of a new SMTP verb, SIZE, which can be used by the
      client to inform the server of the approximate size of the data
      to be transferred.  The semantics of this verb and its
      interaction with maximum message size limitations are also
      specified.

 (vi) A discussion of the interaction of enhanced transport with message
      formats (i.e., RFC-822 material)

(vii) A description of enhancements to "trace" header fields to permit
      more efficient isolation of problems in today's more complex
      world.

(viii) Clarification, and additional specification, of the RFC-821
      description of the application and semantics of the RSET command.

 (ix) A discussion of failure and error conditions when server and
      client conform to this protocol.

  (x) A discussion of the handling of failure conditions in general with
      specific discussion of the alternatives in the "unverified eight
      bit encountered" error.  This problem might be encountered when
      the server conforms to this specification but the client does not.

Most of these sections impose requirements which are mandatory if the
protocol specified here is implemented.

Under some circumstances, such as the management of large mailing lists
or relays with large aggregate message traffic, the costs of opening a
mail connection and determining whether the destination host will accept
the enhanced features specified here may be considered excessive.
Additional specifications will be needed to provide methods for making
that determination using the domain name system or other tables or
caching methods.  This protocol does not require that those facilities
be used, nor does the use of those facilities change the actual mail
transport command sequence specified here.  Advice as to when
supplemental facilities are permitted or required to be used may appear
in future applicability statements.


4. The EMAL FROM verb

The SMTP protocol, as specified in RFC-821, is extended to permit the
use of a new verb that supplements the "handling" components of what we
shall refer to as the "FROM" verbs.  In other words, this specification
adds "EMAL FROM", as defined below, to the "MAIL FROM", "SEND FROM",
"SOML FROM", and "SAML FROM" forms specified in RFC-821.  This addition
also provides an explicit extension model for future transport
variations as needed.

If this new verb is to be taken as an acronym, "E" should be read as
"enhanced" or as "eight".

This is an extension in the traditional sense.  An implemention MUST
NOT be so constructed that it is possible for it to accept EMAL FROM in
a context in which it would not accept MAIL FROM.

 <>   Discussion: Mailing list discussion seems to indicate that this
      should be explicitly stated.  It is a statement for which
      conformance is easily tested "on the wire".

As specified in RFC-821, DATA is treated as introducing a stream of
ASCII (and therefore 7-bit) characters, divided into lines that are
delimited by the ASCII control characters CR followed by LF, with
potential restrictions on line lengths, and terminated with the
sequence "CR LF . CR LF".

If 8 bit transport is desired, the MAIL FROM verb is replaced by EMAL
FROM.  If the receiver does not recognize that verb (which will be the
case with all SMTP servers that conform to RFC-821 alone), or will not
accept enhanced mail features, it gives a fatal negative reply (500 if
the verb is not recognized, 556 if the verb is recognized but not
accepted). Such a reply would indicate that the sender MUST NOT send
octets with the high bit turned on.  Otherwise the receiver sends the
positive 250 reply identical to that normally returned in response to
MAIL FROM.  The sender can then proceed with the rest of the mail
transaction, sending a message containing 8-bit text after the DATA
verb.

All SMTP command verbs, including the enhanced FROM verbs, are written
in ASCII characters.  Nothing in this specification provides for any
character or coding other than those of ASCII to be used in SMTP
transactions ("the envelope").  It does provide for such characters in
the message body initiated by the DATA verb and terminated with CR LF .
CR LF.  The format of such messages are as specified in other RFCs,
e.g., RFC-822 [RFC822] and MIME [RFC1342].


5.  Further extensions to SMTP

5.1  Provisions for adding additional FROM forms.

Specifications may be written to add additional FROM forms which
should then be registered with the Internet Assigned Number Authority
(IANA) in accord with the provisions of section 5.3.

5.2 Provisions for other new verbs

The general approach used in this specification assumes that further
extensions to support different forms of transport that preserve a
character-and-line-oriented model (e.g., direct transport of character
sets that must be handled in special ways due to multiple-octet
encodings or transport-level data compression) will be handled by
adding additional forms of the FROM command, as discussed above.
Other transport arrangments (such as data "streaming" or transport of
true binary data), if introduced, may require additional or variant
commands and verbs.  Such new verbs may be registered with IANA, as
described below.  In addition, those that are intended for general use
should be documented in RFCs and submitted for standardization.

In order to permit experimentation, verbs starting with the character
"X" are reserved for use between consenting systems by mutual
agreement.  Any command without a rigorous and public definition must
be given a name starting in "X", and public (registered) values shall
never begin with "X".

 <> Discussion: All commands defined by RFC-821 and by this
    specification have precisely four characters in their first (or
    only) token.  It is likely that some mail systems depend on this
    property, which should be preserved unless there is reason for
    doing otherwise.


5.3 Specifications and Registration Procedures

Even when new features or verbs are not intended for general use, it
is undesireable that two different sets of systems use the same verb
in different ways.  The introduction of the EHLO functionality (see
section 7), which permits a client to interrogate a server about the
features supported by the latter, may exacerbate the potential
problems of identical verbs being used for different purposes, since a
client may discover a server-supported feature when no prior agreement
exists between the two hosts.  To avoid these problems, and to keep
the specification of EHLO useful, unambiguous, and meaningful, any
verbs used in SMTP processing must either be registered or must be
explicitly private.

Registration must be with the Internet Assigned Numbers Authority
(IANA) and may occur in either of the following forms:

(i) Commands intended for general use: These commands should be
developed and documented using IETF standards-track procedures.  The
RFCs or working drafts leading to them are expected to specify any and
all special treatment that these new verbs imply for transport.  Such
documents must also specify any deviations or exceptions to the rules
of section 9.1.  If the transport extensions being proposed have
implications for conversions at gateways, those conversions must be
discussed and, preferably, completely specified.  The verbs should be
registered when serious testing begins, but not later than approval of
the extension as a proposed standard.

(ii) Commands intended for use in special communities: the names of
these commands should be registered, along with a short description of
applicability, before the commands are placed into use.

In either case, verbs must be registered before any server announces
them in the response to an EHLO inquiry.

Completely private verbs -- those starting in "X" as discussed above
-- do not require registration and will not be registered.   Except
for short-term experimentation, use of such verbs is discouraged.


6. EVFY command

A new command verb, EVFY is defined, corresponding to VRFY (as defined
in RFC-821) and with the same argument, but requesting information as to
whether the address is acceptable for 8 bit transport.  EVFY has the
same reply codes as VRFY, but the successful 250 or 251 codes are
returned only if enhanced transport will be accepted for that address.
Code 556 must be returned if the address is acceptable, but enhanced
transport will not be accepted for it.  EVFY without an argument MUST be
treated as a syntax error.

Circumstances in which the address appears to be valid but is remote
or cannot be exactly verified for other reasons, should be treated as
specified for VRFY in RFC-1123 [RFC1123].


7. EHLO command

Clients may be able to act more intelligently if they can determine the
characteristics and capabilities of servers to which they expect to send
messages.  It is desirable for a client to be able to determine which of
the SMTP extensions defined herein are supported on a particular host.
Similarly, a client might wish to be able to determine whether other
optional features of RFC-821 such as SEND, SOML, or SAML are provided by
a given server.

 <>   Discussion:  Under some circumstances, it may be desirable to
      make some of these determinations in an out-of-band way.  This
      protocol does not prohibit such mechanisms and anticipates at
      least one of them.  See the discussion at the end of section 3
      above.

A new command verb, EHLO, is defined.  In order to minimize the number
of commands issued, if the special capabilities of EHLO are needed, it
is used as an alternative to "HELO", not as a separate command.  If a
server implementation provides support for any of the features
specified in this document or subsequently defined as specified in
section 5, it must accept and process EHLO.  The argument syntax for
EHLO is identical to that for HELO, i.e., the primary fully-qualified
domain name of the sender-SMTP.

Except for verbs starting in "X" (see the discussion in section 5),
all verbs supported in a particular server must be listed.  In
addition, LIMIT information must be provided as specified in section
7.3.  Other than verbs starting in "X", no verb may be listed that is
not specified in RFC-821 or this document, or registered as provided
for in section 5.  Verbs starting in "X" may be listed or not
depending on the particular needs of the server or private agreements
between server and client.

The term "supported", as used above, implies that the server provides
meaningful support for the capability implied by the command, rather
than just recognizing the verb.  For example, SEND FROM is not
"supported" if the command would be refused with all possible
predicates or if there are no possible addresses (in RCPT TO) that
will be accepted.  The response is also expected to reflect actual
system configuration and operation, e.g., if a server implementation
provides support for VRFY, but the command is disabled for security
reasons as provided for in RFC-1123, EHLO should not list that verb.

 <> Discussion: If VRFY is meaningfully supported--i.e., the server
    expects to actually confirm accessibility of addresses--then it
    should be listed even if some (or under some circumstances most or
    all) addresses that the server supports cannot be confirmed in
    real time.

The information provided by EHLO will usually be static for most
servers (at least once they are configured for a particular site).
However, since it might change from session to session in some cases,
clients should, in general, not cache the information between mail
connections.

 <> Discussion: There are several SMTP servers in use on the Internet
    that support and use verbs that are not specified in this document
    or in RFC-821.  Since RFC-821 takes no position on command
    extensions, these may be conforming implementations.  This
    specification does take such a position (above and in section 5)
    and therefore has much stronger conformance implications than
    RFC-821.  The intent with the EHLO verb and other enhanced
    capabilities is not to invalidate any existing RFC-821
    implementations that are valid in the absence of this
    specification. It is, however, intended to provide an "all or
    nothing" approach: if enhanced capabilities are supported (e.g.,
    EHLO is accepted at all) then all of the stricter requirements of
    this specification apply.  In particular, if these enhanced
    capabilities are supported, then any (non "X") verbs that do not
    appear either here or in RFC-821 must be registered with IANA and
    reported by EHLO.


7.1 Server considerations

If a EHLO command is received the server must return a formatted
message that consists of a multiple-line 255 reply, using the
continuation convention specified in RFC-821.  The first line of this
response will be the the primary fully-qualified domain name of the
receiver-SMTP.  Subsequent lines will consist of a verb supported by
the server and a special LIMIT indicator with two values (see section
7.3, below).  All verbs supported by the server must be included in
the reply, including those specified in RFC-821, in this document, and
in extensions provided for in section 5.

For example, a typical receiving server supporting this protocol might
respond to a EHLO command with:
            255-foo.domain
            255-HELO
            255-MAIL FROM
            255-VRFY
            255-EXPN
            255-RCPT TO
            255-EHLO
            255-EMAL FROM
            255-EVFY
            255-SIZE
            255-RSET
            255-QUIT
            255-DATA
            255 LIMIT 64 3000

 <>   Discussion: 255 was chosen using the "positive completion" and
      "mail system" model of RFC-821.  This is really a system response
      in context, rather than a purely informational (x1z) one.  The
      terminal "5" is arbitrary, motivated by a desire to leave a
      little space after 251.  Note that the normal response to HELO is
      250, not 255.

Servers supporting EHLO MUST NOT return "502 Command not implemented"
for any verb that they list in their response to EHLO.

Servers that support any enhanced verb (i.e., a verb that appears in
this specification) that does not apear in RFC-821 MUST NOT return
"500 Syntax error" in response to EHLO.

 <>  Discussion: The statements above are part of the process of
     binding all of the enhanced commands specified here together on a
     basis in which implementations are expected to support all of
     them together, rather than picking and choosing.  At least one
     example of these rules could be stated as "EMAL support does not
     conform to this specification if the server that proports to
     support EMAL does not support EHLO; EHLO support does not conform
     if it does not return all of the "supported" verbs (as defined
     above); and an EHLO implementation does not conform if it returns
     any (non-"X") verb strings that are not in this specification, in
     RFC-821, or registered with IANA.


7.2 Sender (client) considerations

A client that chooses to send a EHLO command to a server will receive
either host identification and a capability list or a "500 Syntax error"
indication (if the EHLO verb is not supported).  If a capability list is
received, the client MUST NOT send verbs not listed.  If a 500 reply is
received, the client MUST NOT attempt to send EMAL or other verbs that
do not appear in RFC-821.

A client that chooses to not send a EHLO command may, in parallel with
the RFC-821 model, attempt to use any desired facility and determine
its availability based on the response codes.

 <> Discussion:  EHLO Response Caching.
      A client is permitted to send whatever commands it likes if it does
      not send EHLO.  The worst that can happen is that it will get a
      rejection reply, and that can happen regardless.  With the
      exception of the size limits, the other capabilities can be
      thought of as optimizations--if you can use them in a particular
      situation, then things will work a little better, or smoother, or
      faster, or...  And it will be a rare case indeed that a host will
      start supporting a given enhanced feature and then stop supporting
      it.

      So caching and consequently making a "wrong" inference is not,
      unlike picking up a bad address from a DNS cache, a threat to much
      of anything.  It would be sensible (although not necessarily
      ideal) to operate on a basis that, if you don't like the answer
      you get, you don't cache it for very long at all; if you do like
      the answer, you keep it cached until such time as you get a
      rejection message.

 <> Discussion: In practical terms, if a client sends "EHLO" to a
    server and receives a response starting in 5, if is explicitly
    justified in assuming that the server does not support 8-bit
    transport according to this specification.  If it gets a sequence
    of 255 responses, it is explicitly justified in assuming that the
    server does support 8-bit transport as specified here, as well as
    the additional features (such as SIZE verification) that this
    specification requires be supported if EHLO is specified.

7.3 The LIMIT Reply of EHLO

As message sizes grow it becomes progressively more useful for
connecting clients to know whether or not a server will be able to
accept a message of a given size.  The LIMIT reply of EHLO returns two
values that are set by the administrator of the server to guide the
client in the handling of large messsages.  The format of the reply is
as follows:

   255 LIMIT <low> <high>

Both <low> and <high> are positive integer counts referring to message
sizes (length during transport) in kilo-octets.  <low> is the size for
a message below which the server will accept under normal conditions.
<high> is the size for a message above which the server will not
accept.  <high> must be greater than or equal to <low> which must, in
turn, be greater than or equal to zero.  Messages between sizes <low>
and <high> may be accepted by the server, depending on available
resources.

 <> Implementation note: Servers may, for various administrative
    reasons, not want to give out exact limits.  In practice, limits may
    also depend on the designated recipient, with some users able to
    receive larger messages than others even on a given host.  For these
    and other reasons, the values <low> and <high> should be taken as
    general guidance, but not as absolute figures.  Note that existing
    provisions of RFC1123 imply minimum values of
                               LIMIT 64 64
    for Internet hosts.


 <> Discussion: A client can follow up this information by using the
    SIZE verb (see below) to determine if the server is willing to
    accept messages between <low> and <high>.

                      <low>|                  |<high>
      |---------------------<================>---------------------|
       Server will accept       Ask Server         Server will reject



8. SIZE verb

A new command verb, SIZE, is defined.  Server implementations that
provide support for the enhanced mail verbs must accept and process
SIZE.  Size does not specify an exact message length, but an upper bound
in kilobytes, on what will be transmitted as the "message", i.e., the
number of octets to be placed on the wire after the DATA verb and up to
the terminating ".CRLF" string.  It is intended for server capability
verification purposes and not as an alternative for delimiting the end
of the message body.

 <> Discussion: This type of estimated size has two characteristics
    that exact sizes (byte or bit lengths) do not.  First, it may be
    convenient to estimate this type of size from crude file system
    measures (e.g., "number of records" or "number of pages"), while a
    specific length may require careful examination of the data stream
    for, e.g. "." characters appearing at the beginning of the line.
    Second, it is not unusual to change internal end of line conventions
    to SMTP CRLF, to remap character sets, or to perform encodings to
    different transport conventions dynamically, rather than storing the
    transport-encoded mail file prior to transport (see "client
    considerations", below, for additional discussion of this estimating
    process).  Various widely-practiced transport behaviors (e.g.,
    deletion of trailing blanks), while undesirable, also can distort
    exact sizes.  It is possible with a crude upper-bound size to
    statistically estimate the effects of these transformations, while
    exact sizes require creation (or careful simulation) of the file to
    be transported and possibly simulation of the transport mechanism.

The argument to SIZE is a numeral specifying the predicted maximum
message length in kilobytes of the message that is part of the current
mail transaction.  A SIZE agreement (i.e., sending of the command by
the client and positive reply by the server) extends only through the
end of the next DATA statement.

 <>   Discussion: Kilobytes, rather than bytes, were chosen to stress
      the fact that this is an estimate, rather than a precise
      value, and to prevent anyone from trying to infer end-of-message
      from it.   The use of a maximum-kilobyte estimate is also intended
      to smooth over most of the differences among file systems in terms
      of representation of end of line, widths of characters, and so on.
      However, the intent is to have this estimate be of the only length
      that has a canonical meaning, that is, the number of octets
      actually being transported, rather than the length in either the
      sending or receiving file system.

SIZE should be sent, if at all, after the FROM command and prior to
any RCPT commands.  SIZE is not meaningful outside a mail transaction;
EHLO should be used to obtain similar information.


8.1 Server considerations

If a SIZE command is received as part of a mail transaction, the server
SHOULD make any of three types of replies:

(1) An acceptance reply, normally "250 OK", indicating that a message of
up to that size may be accepted.  A server MUST NOT make this reply and
then reject, for reasons under its control, a message whose transport
size is less than the limit specified.

(2) A temporary rejection reply, normally "452 insufficient system
storage", indicating that a message of the specified size cannot be
accepted at present, but that this is a temporary restriction.    This
response means that the requested size is acceptable to the server
system at some times.

(3) A fatal error reply, normally "552 message size too large",
indicating that a message of the specified size is not acceptable to
this host.

 <>   Discussion:  This distinction is made for those cases where the
      size limitation may be quite transient and consistent with the
      sender's requeuing the message for retry and delivery later.
      Examples of such limitations would be such traditional problems
      as "system disk full", but not "we expect a new system release
      next week that has higher limits".  Of course, if it is feasible,
      it would be better for systems in these transient situations to
      accept the message and queue or store it locally, but these could
      be very large messages, at least in principle.

 <>   Implementations of support for SIZE should use caution to insure
      that it does not become a conduit for denial-of-service attacks.

Support for SIZE is discouraged outside mail transactions and no
semantics are defined for it.  Enhanced servers should reject it as a
syntax error, or, preferably, with "503 SIZE not accepted without a
FROM verb".


8.2 Sender (client) considerations

As part of a mail transaction, a sender MAY send the SIZE command for
messages whose expected length is below 64 kilobytes.  Senders are
encouraged to send the SIZE command or use EHLO or out-of-band
information to verify normal capacities for messages whose expected
length is larger than 64 kilobytes.  Server rejection of the SIZE
command as a syntax error (not permitted from enhanced servers) SHOULD
be construed by the sender as "no information" and the sender should
behave as it would have behaved had the SIZE verb not been sent.

If the server accepts the SIZE command but rejects the particular size
requested with a temporary or fatal reply code, the sender may either
abandon the mail transaction (sending QUIT or RSET) or may continue
with it.  However, it is not intended that SIZE become a subject for
iterative negotiation between sender and receiver; senders MUST NOT
send a second SIZE command within the same mail transaction.

 <>   Discussion: Nothing other than good sense prevents a client from
      wildly overestimating a SIZE and, for obvious reasons,
      overestimating is better than underestimating.  Overestimated
      sizes may, of course, result in unnecessary rejections. It seems
      unreasonable to require that servers enforce the limits to which
      they earlier agreed, although most will presumably enforce some
      limit at or above the accepted size.  This is consistent with
      the general model of SIZE as specifying a sloppy value.

 <> Discussion and implementation note: One of the difficulties in
    estimating the amount of data to be transmitted, and hence the
    value to be sent with SIZE, arises when the internal storage
    conventions of the originating host use a single character end of
    line convention, or some other marking or counting convention,
    rather than CR LF.  In this situation, some implementations have
    historically created a file in Internet canonical mail transfer
    form, i.e., with doubled leading periods and CR LF line delimiters,
    while others have converted to the Internet form as lines are read
    in and actually transmitted.  In the latter case, while the file
    size on the local host may be readily determined from the file
    system, the actual number of octets to be transmitted is not known
    until after all of them have been sent.  If such an implementation
    does not wish to scan the file and count line delimiters for
    performance reasons, the worst-case estimate of SIZE for systems
    using single-character line delimiters is twice the number of
    characters in the file (expressed in kilo-octets).  This worst case
    would be reached if either the file consists only of line delimiters
    or if it consists of alternating periods and line delimiters.

    Since the optimal value to be sent with SIZE for files with no
    line-starting periods is the internal length of the file plus the
    number of lines it contains (that is, adding one extra character per
    line), a considerably better estimate than one for the the worst
    case may be obtained by knowing the average or typical line length
    in the file and dividing it into the file size to obtain the
    number of lines.  In some cases, such as those in which a message
    composing agent performs line wrapping and filling functions,
    typical line length information might be obtained from that agent.
    In others, a much better-than-worst-case estimate may be obtained
    statistically by sampling the lengths of lines in the file,
    preferably by probing at random or by examining lines at several
    different points in the file, or, if that is not feasible, by
    examining the first several lines.  Similar logic applies when
    "lines" in the internal file system are denoted in some fashion
    that does not involve and end-of-line character sequence, e.g., by
    carrying character counts for each line.


8.3  Server replies to RCPT in an implementation supporting SIZE.

A relay (or post office host) that can not accept a message of some
specified size may provide the client with the next hop information, in
the hope that the next hop is either the final destination or can relay
message of this size.  If this information is to be supplied, it should
be provided via the message
      "559 Too big, deliver to user at ...
which parallels the RFC-821 message code 551.

 <> Example: Many sites implement (typically via DNS MX records) a
    single host as the normal receiver of all mail for the site or
    organization.  This host may, however, have limited resources
    relative to overall demands on mail flow into the organization. On
    the other hand, particular users may have powerful workstations
    which do not have the same resource constraints such that having
    large messages sent directly to them might permit larger messages to
    be delivered.

 <> Discussion: Server designers contemplating this strategy should be
    aware that few sending systems have the capability of dealing with
    551 codes automatically; these codes typically cause messages to be
    rejected and "bounced" to the user.  Presumably the new 559 code in
    this case will get much the same treatment.

Even if a SIZE command is set and accepted, a server is permitted to
reject messages based on size for individual addresses (i.e., after receipt
of RCPT TO) by responding to the delivery address with code 552.

Since a client may send large messages without first sending SIZE, or
may, in principle, send sizes larger than those specified, a server may
reject a message as being too long if it exceeds a specified size (or if
size is unspecified) as provided for in RFC-821, i.e., by returning a
552 (preferred) or 554 code after the data are received.


9. Interaction with the message format and headers.

Both RFC-821 and 822 explicitly reference "ASCII" as the character code
in which all text is written and with which it is interpreted.  The
introduction of an enhanced transport mechanism introduces a potential
ambiguity, since, while there is only one ASCII, there are many
character sets and mechanisms using 8-bit and longer coding.  This has
two implications:


9.1 Message format.

When sending a message using EMAL FROM, the message format MUST conform
to MIME and, in particular, with its provisions for specifying message
body types and character sets.  Hence, message body-parts which contain
8-bit data may do so only in a fashion consistent with MIME.


9.2 Header character set.

With the exception of the "trace" or "time stamp" fields specified in
RFC-821 and 822 and elaborated upon below, this specification imposes
no requirements on mail header fields other than those in 9.1 above.
Trace fields must be entirely in ASCII, using the leading zero form
specified in RFC-821 if 8 bit underlying transport is in use.

 <>   Discussion: Additional requirements about other header fields
      do appear in RFC-822 and RFCs that supplement it.  This
      specification neither relaxes nor increases those requirements.


10. Trace fields

RFC-821 specifies that mail transport agents add time stamps as trace
information to messages they are processing.  RFC-822 specifies, in
section 4.3 (especially 4.3.2), the format of these ("Received")
fields for relayed messages.  RFC-822 indicates that additional "via"
and "with" values should be registered but none have been as of the
date of this document.

While the tracing information specified in these earlier documents has
proven useful, the Internet and its mail handling has evolved so that
an audit trail that only documents relay and delivery activities has
become inadequate.  In particular, messages may be converted from one
character set to another, formats may be altered, and address strings
may be changed at gateways.  These transformations, and information
about where and how they were performed, should be included in the
audit trail.  The extensions to the transport mechanism contemplated
here involve further complications, since gateways may be called upon
to convert between one transport format and another, an activity that
may require significant analysis and transformation of the message
itself.

The principle of providing and maintaining trace and audit trail
information is reaffirmed and extended.  Any mail transport facility,
including gateways within the Internet and gateways from other mail
systems, that relays, converts, translates, or otherwise modifies an
enhanced mail message MUST add one or more "Received" fields to the
message to document these changes.  Mail transport facilities that
relay, convert, or translate traditional SMTP mail are encouraged to
do so.  The intent here is to insist that any change to a message as
it passes through a transport, other than adding the Received line, be
documented, and documented fairly explicitly.


The list of "Received" parameters in RFC-822 is extended to include

  ["convert" atom "to" atom ["to" atom]...]

These represent, respectively, the character set and/or transport form
received by the relay or gateway and the character set and/or
transport form produced by the relay or gateway.  "ASCII" and
"EBCDIC", the keyword "8-bit", all of the transport encodings
permitted in MIME, the keywords "7-bit-MIME" and "8-bit-MIME"
(designating MIME over 7 and 8 bit paths respectively), and the
keyword "unknown" (discussed below) are explicitly permitted for use
with "convert" and "to".

 <> Discussion: "MIME7" and "MIME8", while obvious and more attractive
    alternatives, almost guarantee future confusion with, e.g., "version
    seven of MIME".

Servers providing "Received" lines of this sort are explicitly
encouraged to supplement the atoms associated with "convert" and "to"
with parenthesized comments that provide prose descriptions of
decisions made and actions performed when those might be helpful in
subsequent understanding or debugging.

When structured messages are converted from one MIME format to another,
or from another format to structured MIME messages, the conversion will
typically occur on individual body parts, not homogeneously for the
entire message.  These cases should be documented using body part
conversion trace fields embedded in the message according to MIME
conventions.  "to 7-bit-MIME" is to be used in conjunction with such
per-body-part conversion trace fields, to indicate that such fields
appear and that the specific conversion information appears in them.

 <>  Discussion: It is assumed that "convert 8-bit to 7-bit-MIME"
     will appear only if the message entering the gateway was
     determined to be in 8-bit form, but was not compliant with this
     specification in terms of verification of capabilities or use of
     MIME formats.  See section 13 below.  "Convert 8-bit-MIME to
     7-bit-MIME" or "convert 7-bit-MIME to 7-bit-MIME" would both
     indicate that conversion trace information appears on a
     per-body-part basis in the message body and implies the presence
     of such information.

As is the case for "with", multiple "to" parameters may be specified in
a single "Received" header to denote multiple transformations.

 <> Discussion: If the relevant atoms were registered, this permits,
    e.g., "convert ASCII to PostScript to G3Fax..." although, under
    most circumstances, the starting and ending conversions within a
    given host are really all that is required.  In practice, a
    specification that detailed would normally appear as "convert
    7-bit-MIME to 7-bit-MIME", with additional information specified
    on a per-body-part basis within the message.

As provided elsewhere in this specification, servers may choose to
accept messages or protocol negotiations that are invalid in one or
more respects and transform them into an acceptable form (presumably
using external information) rather than returning them.  In these
situations, at least some of the information about the format of the
incoming message cannot be known with certainty or specified with
registered keywords.  "Convert unknown to..." should be used to denote
this situation, and the clause should be supplemented with a comment
that indicates what was assumed about the incoming message, what
actions were applied to it, or both.

  <> While there may be other cases, it is explicitly intended that
     "convert unknown" be used when the incoming message is invalid in
     the opinion of the server and the server attempts to "fix" the
     message before relaying it or passing it through a gateway.  A
     parenthesised comment should be used to describe the fix applied.

For the purposes of the "with" parameter, the original protocol
specified by RFC-821 should be designated by "smtp", as indicated
there.  If the extensions of this protocol are used, "esmtp" should be
used.


11. RSET and related RFC-821 issues

RFC-821 is not specific about exactly what the RSET verb resets.  This
has apparently not been a problem in the past because of the simplicity
of the protocol.  This enhanced protocol includes additional commands
and state information, making a more precise definition desirable.  The
definition provided should not constrain any existing RFC-821
implementation since it is consistent with both the current practice
and the only two plausible interpretations.

RSET is to be interpreted by SMTP servers as clearing state information
present in a session.  In particular, it eliminates the effect of any
prior FROM commands, any DATA, and any delivery addresses.  It resets
the server's state to "not a mail transaction" (see section 2).

RSET has been interpreted by some SMTP servers as requiring that a new
HELO command be sent after RSET is acknowledged.  Other servers assume
that the previous HELO is not reset.  Servers SHOULD accept a HELO
command subsequent to RSET without special comment, overriding a
previous one if necessary.  Servers MUST NOT require a HELO command
after a RSET.

 <> Discussion: The description above summarizes the current situation
    with SMTP implementations based on a series of experiments.  No
    implementations have been identified that reject a second HELO, but
    it would not be surprising to find one.

While the SMTP protocol provides for multiple destination (RCPT)
commands, other state-inducing commands (e.g., the choice of MAIL,
EMAL, or SEND with FROM) provide exclusive information that it is not
meaningful to specify more than once in a given mail transaction.  If a
second instance of a state-inducing command appears in a given mail
transaction, the server MAY either accept it, overriding earlier
information, or may reject it as an out-of-sequence command with a "503
bad sequence of commands" code.  A client sending multiple of these
commands within a mail transaction MUST be prepared to send a RSET and
start over, or to send QUIT and abandon the session, if 503 is received
in this case.  Clients SHOULD, if possible, behave in a way that avoids
this situation.

 <> Discussion: The issues above do not arise in the normal case of
    multiple successful message transmissions in the same session,
    since each successful message completion (i.e., server receipt of
    DATA, the message, CR LF . CR LF, and then sending a positive
    completion reply) results in terminating a mail transaction.
    Clients SHOULD NOT send RSET after receipt of a 250 response after
    DATA and the message; servers MUST reset their states after sending
    that 250 response and MUST NOT require clients to send RSET before
    the next xxxx FROM command, where "xxxx" is "MAIL", "SEND",
    "SOML", "SAML", "EMAL" or some future extension as specified in
    section 5.1.

 <> Discussion: This involves another nasty and intrusive bit of
    reality about which RFC-821 is vague.  Where something as
    meaning-laden as an enhanced FROM verb is involved, we
    can't leave this to chance. The discussion above prohibits the
    "use the first and ignore all the rest" and the "pick one to
    believe at random" cases.  Some SMTP servers have been observed
    experimentally to work in the "accept the last one" model
    outlined.

12. Failure and error conditions

12.1 RFC-821 behavior with unrecognized verbs.

While it is not quite explicit, RFC-821 appears to expect that, if a
verb is not recognized by the receiver, it will reject the command with
a "permanent error", 5yz, code, presumably 500 (Syntax error).
Similarly, it appears to specify that, if the sender receives such a
code, it must either abandon the mail message (sending QUIT or RSET,
presumably) or do something else involving the same or a different verb;
it may not simply ignore the 5yz error code and pretend it was a 2yz (or
354) code.  This specification depends on that behavioral model.

Consistent with RFC-821, we specify that existing SMTP servers are to
reply with a return code of 500 (Syntax error) when any unfamiliar verb
is received.

 <>   Discussion: The material above should probably have made it into
      RFC-1123, but some of the issues--particularly the fact that
      anyone could ever have believed that anything else (such as
      simply ignoring 5yz codes) was permitted--have emerged only in
      the process of the investigation leading to this specification.
      Nonetheless, this clarification is believed to be consistent
      with existing usage and implementations of SMTP.

 <>   That belief has been reinforced by fairly extensive testing-by-
      probing of existing implementations.  No implementations
      exhibited catastrophic failure upon receipt of an unknown verb
      and all of those probed responded to such verbs by returning a
      "500 Syntax error" response.  At the same time, it is impossible
      to verify that unknown commands will not cause subtle state
      changes in servers.  Consequently, SMTP clients SHOULD respond
      to a "syntax error" reply by sending RSET and starting over,
      rather than assuming the state of the remote machine.


12.2 Responses when EMAL is recognized.

An SMTP server which does implement this specification may nonetheless
respond to the EMAL verb or its variations with an error message.  The
new code 556 is assigned to this purpose, to be construed as "enhanced
transport not accepted" if it appears in response to EMAL FROM.
Presumably this would occur only if the originator address (the
parameter to EMAL FROM) was unacceptable for enhanced transport for
some reason.

556 may also be returned in response to one or more of the RCPT commands
if the refusal is destination-specific.  More specifically, a receiving
implementation that conforms to this specification MUST return 556
rather than 550 (or some other code) if it would accept 7-bit mail for a
particular address but would not accept enhanced transport for it.
Conversely, 550 must be returned when the recipient would be rejected in
either case.

 <>   Discussion: Ideally, a server that can accept a particular
      enhanced transport option at all should be able to
      accept it for any destination for which it accepts mail.  In
      practice, that may sometimes not be the case.  In addition, the
      general design of RFC-821 permits a server to decline to accept a
      particular piece of mail for any particular destination for any
      reason.  Consequently, it is not possible to prohibit a server
      from accepting enhanced mail and subsequently rejecting a
      delivery address.  Our design choices in the matter are limited
      to whether to permit RCPT TO to deliver 556 (indicating that the
      particular transport type is not acceptable) or whether it
      should be restricted to one of existing (RFC-821) non-delivery
      codes.

From the sender's point of view, one could appropriately deduce that a 556
error in response to EMAL or some future enhanced FROM verb indicates
that enhanced transport is not accepted from the sending host.  A 556 in
response to a RCPT verb would indicate that enhanced transport is not
accepted for that particular address.

 <> Discussion: Server designers should be aware that accepting
    enhanced transport (e.g., 8-bit EMAL FROM) for mail to a given
    destination and then bouncing it is likely to be disruptive to the
    general mail environment, especially if the originating system was
    prepared to send mail in 7-bit form if necessary.  Consequently, it
    is desirable for servers to not accept such mail unless they can
    guarantee delivery if the address is otherwise acceptable.  This
    implies that it is desirable for servers to be prepared to either
    cause conversion of the message to an acceptable 7-bit MIME form
    (e.g., send it to an appropriate gateway), or that they should have
    out-of-band information available to permit them to determine the
    feasibility of enhanced delivery to a final destination without
    first accepting the mail.  Conversely, it implies that mail client
    systems and those setting up, e.g., DNS records for particular
    hosts, should endeavor to prevent rejections from arising.

If enhanced transport is accepted, and there is a subsequent delivery
failure that necessitates the generation of a notification message (see
RFC-1123, section 5.3.3), the error message text itself should be
prepared using only ASCII graphic characters.

 <> Discussion: If the notification message contains the original
    content text, that message will normally have to be returned using
    enhanced transport if it was received using enhanced transport.
    Other provisions of this specification imply that if this is not
    feasible (e.g., the notification message must be returned over a
    path that does not support enhanced transport, the server
    generating the notification must either be prepared to convert the
    message content in a loss-less way to a 7-bit form, or that it
    should not attempt to return the content.

If the specified enhanced transport verb is acceptable for the context
specified in the mail transaction, then, when the DATA command is
received, the server should return the same "354 Go ahead, terminating
with CR LF . CR LF" message normally produced in response to that
command.  Any of the other codes returned by DATA may be returned also.


12.3 Sender action in response to fatal errors.

The action to be taken by the sender if 500, 556, or any other
500-series code is returned is not specified by this specification
other than in terms of the limitation imposed above that "something
else" must be done.  In other words, these codes MUST NOT be ignored,
and octets with the high bit turned on (or extended-length characters)
MUST NOT be transmitted unless an enhanced FROM command has been sent
and acknowledged with a 250 code, AND a 250 or 251 reply has been
received in response to at least one of the RCPT commands.

The sender may, however, send a RSET and renegotiate the transfer after
preparing to send data in a different form.  The transformations
permitted by the rules under "Interaction with message formats and
headers" above are available to hosts providing intra-Internet gateway
services between transport types.  Of course the originating or
destination system environments may make other transformations in
messages appropriate to their knowledge of their own environments.

 <>   Discussion and pointer:  That paragraph contains the new version
      of the conversion weasel-words.  With the understanding that
      alternate text would be gratefully welcomed, what it intends to
      do is to incorporate the Freed compromise.  Restated very crudely
      into the authority model, the originating UA can do whatever it
      wants, and can delegate that authority to anything within its
      local system environment.  The definition of "local system
      environment", the mechanism for delegation, and whether that
      delegation can be assumed within a local system environment are
      administrative questions beyond the scope of this (or any other)
      specification.  Similarly, the ultimate destination UA can do
      whatever it wants, and can delegate that authority to anything
      within its local system environment (same definitions and
      qualifications).

 <>   Nothing is said about gateways into non-Internet environments:
      While this further points out the importance of a "mail gateway
      requirements and guidelines" RFC, we have agreed that is a
      separate problem and one that we must avoid trying to solve here
      if we ever want to converge.

The only conversions that are explicitly conformant to this
specification involve gateways providing loss-less conversions between
valid MIME formats, presumably by conversion to appropriate 7 bit
transport formats and adding content-tranport-encoding fields to
reflect the result of the transformations.  In particular, an enhanced
mail gateway MUST NOT attempt to convert between character sets or
transport encodings by discarding high-order bits or octets.
Similarly, conversion from one character set to another requires
knowledge of both character sets and, as such, is not a transport
activity.



12.4 Mail relays, mail gateways, and this protocol.

While it is not explicit in RFC-821, there is a general principle that
mail transport facilities should not alter, or even inspect, the
message itself.  There is already a small exception to this in the
requirement for receivers to add trace information and "time stamp"
("Received") lines (RFC-821, page 21; RFC-1123, section 5.2.8).
Although this document respecifies the trace information, it is
intended to avoid making further exceptions unless necessary and to be
specific about those that are necessary.

If a mail gateway is used to transform the message from an enhanced
transport form to a 7-bit transport form, the resulting message MUST
conform to the formats specified by MIME for 7-bit transport.  This may
require that it understand the content and structure of messages
written in that format, since mechanical translation (e.g., character
set encoding) of a message that uses extended-length characters in
conjunction with MIME may not produce a resulting message that is
compliant with that specification.

 <>  Discussion/Translation into plain English: Nothing in this
     document permits nested encodings if MIME does not permit them.

The responsibility for insuring that a message transmitted with EMAL
FROM is in MIME format falls largely on the originator.  Section 13.2
discusses violations of this principle.


12.4.1 Review of present RFC-821 status and requirements.

Under a number of circumstances, an RFC-821 SMTP sender implementation
may be called upon to deliver mail, not to a final destination, but to
an intermediary (relay or gateway site) or to the address of a mailing
list exploder.  The sender may have no way to know that it is dealing
with an intermediary.  An intermediate mail system may not be able to
verify, e.g., addresses during the SMTP negotiation.  RFC-1123
explicitly provides (section 5.2.7) for intermediate systems to return
"ok" 250 codes for addresses that cannot be verified, only to send mail
messages with error indications back when addresses fail after the SMTP
connection is closed.  Such failure could occur on the local host
(e.g., local list expansion) or remotely (e.g., in a relay's SMTP
processing with the next host in sequence).  Consequently, a mechanism
that we might describe as "whoops, that isn't really something that can
be delivered as specified" alreadys exist in many SMTP server
implementations, especially those that operate as relays or mail
gateways.  To the degree their implementation models require, clients
must be prepared to deal with such delayed responses as well as
immediate ones.  And, as discussed above, returning messages to users
as undeliverable is an acceptable (and normal) response to a receiver's
rejection of enhanced forms of transport.

At the same time, mail gateways are permitted to accept one address
from a sender for delivery and then carry out significant
transformations of that address (and even the message) before passing
it along to the actual delivery host, or the next host in sequence.
While RFC-821 provides for altering host names (section 3.6) and RFC-1123
provides for header, address, and protocol modifications (section
5.3.7), nothing in any Internet standard protocol to date attempts to
completely specify this behavior in the general case.


12.4.2 Relay behavior.

The basic model described above for RFC-821 is not changed by this
protocol.  While hosts that accept 8 bit messages for relaying may
be prepared to "downgrade" such messages to seven bit transport, they
cannot be required to do so.  A receiver may reject a request for
enhanced transport or for any specific transport type, regardless of
whether the request comes directly from the originating host or some
intermediary.  A relay host that accepts enhanced transport of a
particular type must be prepared for a host to which it attempts to
pass the message to reject that option.  Hosts may agree to enhanced
transport and specific types and then "bounce" messages by mailing
error indications as specified in RFC-1123 and above, just as they may
accept mailbox designations and then bounce those messages.

13.  Treatment of other important protocol violations

13.1  Receipt of 8 bit characters without prior verification

Since it is known that some Internet hosts now send 8-bit characters
without performing the verification specified in this document (i.e.,
sending EMAL and getting a positive response), servers should be robust
enough to avoid self-destruction if non-compliant behavior of this type
is encountered.

As mentioned above, sending SMTPs MUST NOT transmit octets with the
high bit non-zero without first successfully negotiating 8-bit
transport with the receiver.  Receivers are not required to enforce
this requirement beyond the degree needed to prevent their destruction
if this rule is violated.  If a receiver encounters octets with the
high bit set after a DATA command, it MUST select one of the following
three alternatives to be conforming to this specification:

  (i) It may reject the message with a 520 error code, indicating an
attempt to send invalid data over the transmission channel.  This
message SHOULD NOT be sent until the terminating CR LF . CR LF is
received.

  (ii) It may deliver the message in 8 bit form if it knows that such
delivery can be made reliably and without loss of information (if it is
the destination MTA) and may transparently relay the message (using MAIL
FROM) as received (if it is a relay MTA).

  <> Discussion:  This option has the effect of [almost] encouraging and
     permitting a strategy that is otherwise taken as explicitly
     non-conforming: the sending of 8 bit data over a conventional 821
     connection with MAIL FROM.  The context for this option should be
     carefully understood.  To be in this situation, the delivery or
     relay host has received a non-conforming message from a
     non-conforming sender.  This rule exists to permit the relay to
     forward the message "without making things any worse", i.e., to
     cause the same non-conforming message to be delivered to the
     destination as would have been delivered had the relay not been
     involved.  And it permits the final delivery SMTP server to do
     whatever it decides to do for (or to) its users, a situation that
     is impossible to restrict anyway.

  (iii) If sufficient information is available to make the conversion
and it has gateway capabilities, it may convert the message to a valid
MIME form consistent with seven bit transport and forward or deliver
the message in that form.  This requires that the received 8 bit
message be in MIME form in order that, e.g., character sets can be
reliably determined or that the MTA has access to reliable out of band
information about the character set(s) present in the message.  MTAs
MUST NOT attempt to guess at information not explicitly supplied in
incoming messages in order to perform conversions of this type.

 <> Discussion: The options above deliberately and explicitly prohibit
    the practice of relays "bit stripping" messages (i.e., zeroing the
    high-order bit) as a conversion method to 7 bit transport.  This
    technique loses information; the severity of the information loss is
    a function of the actual message content and the perceptions of the
    user, but can be quite significant.

13.2. Receipt of non-MIME message bodies after receipt and acceptance
of EMAL.

This protocol requires that a message transmitted after EMAL is used in
a mail transaction conform to the MIME format (see section 9.1).  A
receiving SMTP server or relay is not required to detect failure to
conform to this requirement.  However, if the server does do so, it may
reject the message and should use a 558 error code in the rejection.  A
gateway which is otherwise inspecting or modifying the message body
assumes responsibility for the messages it forwards and, consequently,
must either reject invalid messages or transform them into valid form
without loss of information (paralleling the discussion in section
13.1).


14. Compliance summary

A server implementation supporting any of these verbs other than SIZE,
must support all of them.  SIZE might plausibly be supported in
implementations that do not support the other verbs (which does not
make that implementation fully conform to this specification), but must
be supported in implementations that support the others.  A server must
not require that EHLO preceed the use of other verbs specified here.

A client may attempt to use any of these verbs, but must observe
responses to insure that the server verifies its willingness to accept
them.  Some of those responses constrain further action on the part of
the client, as discussed above.  For example, if the client asks for a
capabilities list (via EHLO), it must not send commands that are not
represented on the list received.  Similarly, if a receiving SMTP
rejects the EMAL FROM command, the client must not attempt to transport
8-bit information with the DATA command.

Nothing in this specification imposes any requirements that clients wait
for responses to particular commands before issuing the next one(s) that
are not imposed by RFC-821 or the logic of the commands themselves.
However, several of the provisions above imply that a client must
synchronize with verifications (affirmative responses) from the server
before actually sending the message body.


15. References

[ANSI-X3.4] American National Standards Institute, "Coded Character
   Set--7-Bit American Standard Code for Information Interchange",
   ANSI X3.4-1986.

[Gianone] Gianone, Christine M., "A Kermit Protocol Extension for
    International Character Sets", Columbia University, 1990
    (unpublished paper).  Available via anonymous FTP from
    watsun.cc.columbia.edu:kermit/e/isok5.txt.

[RFC821] Postel, J.  "Simple Mail Transfer Protocol", RFC-821,
   August 1982.

[RFC822] Crocker, D. "Standard for the Format of ARPA Internet Text
   Messages", RFC 822, August 1982.

[RFC1123] Braden, R.  "Requirements for Internet Hosts -- Application
   and Support", RFC-1123, October 1989.

[RFC1342]  Borenstein, N. and N. Freed. "MIME (Multipurpose Internet
   Mail Extensions): Mechanisms for specifying and describing the
   format of internet message bodies", RFC-1341, June 1992.

[ISO646] International Organization for Standardization.
  "International Standard--Information Processing--ISO 7-bit coded
  character set for information interchange", ISO 646:1983.


16.   Acknowledgements

   This document represents a synthesis of the ideas of many people and
reactions to the ideas and proposals of others.  Randall Atkinson,
Craig Everhart, Risto Kankkunen, and Greg Vaudreuil contributed ideas
and text sufficient to be considered co-authors.  Other important
suggestions, text, or encouragement came from Harald Alvestrand, Jim
Conklin, Mark Crispin, Frank da Cruz, Dave Crocker, Ned Freed, 'Olafur
Gudmundsson, Per Hedeland, Christian Huitma, Neil Katin, Eliot Lear,
Harold A. Miller, Dan Oscarsson, Einar Stefferud, Rayan Zachariassen,
and probably several others.  Of course, none of these people are
necessarily responsible for the combination of ideas represented here.
Indeed, in some cases, the response to a particular criticism was to
accept the problem identification but to include an entirely different
solution from the one originally proposed.


17. Security considerations.

  This RFC does not discuss security issues and is not believed to
raise any security issues not endemic in electronic mail and present
in fully conforming implementations of RFC-821.  It does provide, via
the EHLO verb and response, an announcement of system mail
capabilities, but all of the information provided can be readily
deduced by selective probing of the verbs required to transport and
deliver mail.  Similarly, as discussed above, capabilities such as those
provided by the SIZE verb might be used for crude attempts at denial of
service attacks, but, unless implementations are very weak, there is no
problem here that has not always existed with SMTP.


18.  Editor's Address

John C. Klensin
Department of Architecture
Room N52-457
Massachusetts Institute of Technology
Cambridge, MA 02139
USA
 tel: 617 253 1355 (international: +1 617 253 1355)
 fax: 617 491 6266 (international: +1 617 491 6266)
 email: Klensin@MIT.EDU

------------------------------
Appendix A

New response codes (or response codes used in new ways) introduced in
this document.

  255   EHLO: Normal informational response
  452   SIZE: insufficient system storage
  503   SIZE: not accepted without a FROM verb
  503   Redundant use of any state-inducing command: Use of, e.g., MAIL
        FROM twice in the same mail transaction, or both MAIL FROM and
        EMAL FROM.
  520   DATA: Invalid data on transmission channel (8-bit data
        encountered after MAIL FROM command or non-MIME data after EMAL
        FROM.
  552   SIZE: message size too large
  552   RCPT: message size too large for this specific address.
  556   EVFY: Address ok, enhanced transport not accepted.
  556   EMAL: Enhanced transport not accepted.
  556   RCPT: Enhanced transport not accepted (destination specific)
  558   DATA: Invalid message format (i.e., not MIME) encountered
         after EMAL FROM command.
  559   RCPT: user not local, please try... (if message can be delivered
        directly, but not to the originally-specified address)

Expires: January 27, 1993

**********
[Temporary] Appendix B:

Changes for the 15 July version arising from the Boston IETF and
subsequent fine-tuning discussions.
  (Changes to draft-ietf-smtpext-8bittransport-05.txt)
  -- Additional language for the expanded trace fields, including a
     discussion of bogus messages.
  -- More specific language about what should be returned when EHLO is
     sent and how the response should be construed.
  -- Change of title to remove "text-based".
  -- Clarification of the role of "discussion" sections.
  -- Several more typos and small clarifications.


Changes for the 24 June version arising from on-list discussion of the
draft produced after the San Diego IETF.
  (Changes to draft-ietf-smtpext-8bittransport-04.txt)
  -- Reinstated general extension mechanism in section 5.
  -- Cleaned up some section numbering and make several small editorial
     changes.
  -- Tightened definition of EHLO with regard to features and future
     extensions.


Changes from the 12 March version arising from discussion at the March
1992 (San Diego) IETF and discussion subsequent to that meeting:
 (Changes to draft-ietf-smtpext-8bittransport-03.txt)
  -- Consolidate of the "capabilities" notion (former CPBL verb) with an
     enhanced "hello" command (EHLO).
  -- Added discussion of a possible 551 response to RCPT TO in the
     context of a "too large" message announced by SIZE (section 8.3).
  -- Added some new text to the complicance summary (section 14).
  -- Replacement of "negotiate" with the more precise "verify"
  -- Removed several remaining textual artifacts of incomplete or
     rejected design ideas.
  -- Removed open issue in EVFY and replaced all placeholders with
     text.
  -- Removed explicit disclaimer that these features are not asserted
     to be necessary.
  -- Changed language about failure/reporting of inability to deliver
     in 8-bit format to a Discussion and made explicit provision for
     non-return of content.
  -- Replaced the "conformance summary" placeholder with text.
  -- Defined the reply syntax from EHLO/CPBL and the associated size
     model.
  -- Defined and clarified the use of SIZE.
  -- Added new subsection to discuss the error handling when non-MIME
     message bodies are received with EMAL.
  -- Fixed text to be less tedious about "ANSI X3.4".
  -- Addition of brief discussion/explanation about the issues
     associated with "long" (multiple octet) characters.

Changes from the 22 November version
(draft-ietf-smtpex-8bittransport-02.txt)
  -- Fixed several small typographical errors
  -- Removed a few residual vestiges of very wide transport and
     envelope character set specifications.
  -- Replaced several different types of references to what is now MIME
     with that designation.
  -- Removed several server requirements for the 8->7 boundary,
     adopting the "conformance on the wire" and "separate requirements
     doc" approach agreed to on the mailing list.  In particular, the
     "wretched solution" has been removed or, more exactly, downgraded
     to a discussion note as agreed upon on the mailing list.
  -- Improved rules for server when unnegotiated 8-bit is encountered,
     per mailing list.  These are a change in tone, but not a real
     change in requirements.
  -- Removed "tentative decision" identifiers in all areas in which no
     disagreement has been expressed on the list, since these tentative
     agreements were discussed in Santa Fe.
  -- Made the rule requiring all hosts that support EMAL to also
     support MAIL explicit as the result of mailing list discussion.
  -- Provided a thinly-disguised forward pointer to the MXE proposal.

**********
[Temporary] Appendix C.

Outstanding issues, RFC-ZZZZ-03.  12 March 1992.
 ** Items marked ** have been fixed or eliminated as problems in the
   first or second drafts of version 4. **

**(1) Should the remaining discussion paragraphs be retained somewhat longer
or removed at this time?  (general)

**(2) Is the extension mechanism for alternate "FROM" forms adequate
specified, and, if how, how should it be specified (Section 5, IESG/IANA
issue).

**(3) Does EVFY need to distinguish between "cannot verify remote address"
and "cannot verify remote address and whether 8-bit mail can be delivered
to it"?  (Section 6).

**(4) Syntax model for CPBL (Section 7.1)

**(5) (placeholder) Syntax for SIZE replies (Section 7.1)

**(6) (placeholder) Improved wording needed for the trace field requirement
(Section 10)

**(7) Need a pair of Received keywords to replace 7/8-bit-MIME. (section 10)

**(8) Granularity of "Received: ...convert...to" (Section 10)

**(9) Errors returned by different paths than messages were sent over and
non-return of content. (Section 12.2)

**(10) (placeholder) The compliance summary is still a placeholder
(section 14).


---------------------

The following additional items, left over from November, will go away as
indicated unless serious proposals appear early in the San Diego meeting
or sooner.

(11) Open issue: The CPBL functionality gives us a way to explicitly
specify how further extensions beyond those of this document (including
"private" ones) might be tested.  In addition to possibly the usual words
about "X"s, we could *require* that the attempted use of any verb not
specified in a standard or near-standard RFC must be preceeded by the use
of CPBL to verify that the server supports it.  My bias that being
explicit reduces later problems makes a small argument for including some
text to this effect.  Anyone who feels strongly one way or the other
should speak up.
   Default decision: Defer (punt)

**(12) Extensions/ mechanisms for formatted error messages when such
messages are mailed back.  There are really two separate problems here:
an encapsulation model (MIME extension) for returning the content of
8-bit messages over 7-bit channels and a canonical representation and
taxonomy for mailed error responses.  Note that these are primarily
MIME problems; RFC-ZZZZ mostly just needs to point to the solution.
Also note that not solving the encapsulation problem implies non-return
of content in some cases.
  Default decision: if agreement cannot be reached, the language in
RFC-ZZZZ that permits non-return of content in some cases will be
strengthened.  The canonical mesage form problem is one we have been
living with since before RFC-821 and is not on the critical path for
RFC-ZZZZ.

**********
Expires: January 27, 1993
Document	Document type	Expired Internet-Draft (smtpext WG) Expired & archived
	Select version	06
	Author	Dr. John C. Klensin Email authors
	RFC stream
	Intended RFC status	(None)
	Other formats	txt pdf bibtex bibxml
	Additional resources	ftp%3A//list.cren.net%3A/archives/ietf-smtp/%2A