Network Working Group                                   A. Melnikov (Ed.)
Internet Draft                                             Isode Limited
Expires: February 2007                                       August 2006


                          SMTP Language Extension
                     draft-melnikov-smtp-lang-05.txt


   By submitting this Internet-Draft, each author represents that any
   applicable patent or other IPR claims of which he or she is aware
   have been or will be disclosed, and any of which he or she becomes
   aware will be disclosed, in accordance with Section 6 of BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

Copyright Notice

   Copyright (C) The Internet Society (2006).


Abstract

   The Simple Mail Transfer Protocol (RFC 2821) allows server responses to
   include human-readable text that in many cases needs to be presented to
   the user.  This document specifies a way for a client to negotiate which
   language the server should use when sending human-readable text. It also
   extends DSN format to include language field for the human-readable text.


0. Meta Information on this draft

   This information is intended to facilitate discussion.

   <<NOTE to RFC Editor: please remove this section>>

   The protocol discussed in this document is experimental and subject to
   change.  Persons planning on either implementing or using this protocol
   are STRONGLY URGED to get in touch with the author before embarking on
   such a project.

   Changes since -00

1). Corrected grammar error in LANG command description section

2). Included Mark Crispin's suggestion of allowing the server to substitute
    a primary language if the sublanguage asked for is not available.

3). Added section 5 that describes extended LANG reply

4). Corrected example, more examples

5). Added extension mechanism

6). Specified interaction with RFC-2034 ("SMTP Service Extension for
    Returning Enhanced Error Codes")

7). LANG command must always have language-tag as a parameter. Only EHLO
    response could be used to examine list of supported languages.


   Changes since -01

1). Corrected ABNF for CR

2). Updated Copyright section

3). Other minor bugfixes


   Changes since -02

1). Extended DSN format to include language tag

2). Fixed few typos.


   Changes since -03

1). Changed DSN format to include language tag and translation of text part of
    diagnostic-code-field. Don't use diagnostic-code-field for a non English text.

2). Added LANG parameter to MAIL FROM.


   Changes since -04

1). Updated boilerplate.

2). Updated references. Split references into Normative and Informative.

3). Updated ABNF (ABNF for UTF-8 responses, allow for multiple language tags)

4). Clarified that LANGUAGE extension is applicable to both MTAs and MSAs.

5). Added '*' language (to match EAI POP3 draft).

6). Fixed EHLO capabilities in examples to match HELP output.

7). Added new section on line length limit.

8). Copied Security Considerations from EAI POP3 document.

9). Removed any mentioning of extensions to the LANG command.

10). Made extended data prefix optional (currently it is only used
     in response to the LANG command)

11). Many minor editorial changes.


   Open issues

0). Open issues are enclosed in <<>>.


1. Conventions used in this document

   In examples, "C:" and "S:" indicate lines sent by the client and server
   respectively.  If such lines are wrapped without a new "C:" or "S:"
   label, then the wrapping is for editorial clarity and is not part of the
   command.

   The key words "MUST", "MUST NOT", "SHOULD", "SHOULD NOT", and "MAY"
   in this document are to be interpreted as defined in "Key words for
   use in RFCs to Indicate Requirement Levels" [KEYWORDS].


2. Framework for the Language SMTP service extension

   The Language SMTP service extension uses the SMTP service extension
   mechanism described in [ESMTP]. The following SMTP service extension is
   therefore defined:

  (1) The name of the SMTP service extension is "Language". This extension
      is applicable to regular SMTP [RFC2821], Message Submission
      Protocol [SUBMIT] and LMTP [LMTP].

  (2) The EHLO keyword value associated with this service extension is
      "LANGUAGE".

  (3) The LANGUAGE EHLO keyword will have zero or more space separated
      arguments, each containing a name of supported language tags.
      If no arguments are specified, this means that server is
      unable to enumerate the list of languages it supports.

  (4) A new SMTP verb "LANG" is defined by this document.

  (5) One optional parameter is added to the MAIL command:
      <<Is this actually useful?>>

      An optional parameter for the MAIL command, using the esmtp-keyword
      "LANG", (used to propagate a language that should be used in human
      readable part and/or localized-diagnostic-text-field field of
      "message/delivery-status" part (see section 6.) of a delivery status
      notification for the message), is defined in section 7.


3. LANG Command

   LANG 1*(SP language-tag)

     Arguments:
         one or more language tag as defined in [LANG-TAGS].

     Restrictions:
         The LANG command is permitted throughout a mail connection.

     Reply Codes:
         Success:
            250 LANG command completed successfully
         Error:
            504 Language is not supported
            421 <domain> Service not available, closing transmission channel

     Discussion:
         The LANG command requests that human-readable text emitted by the
         server be localized to one of the language specified in the argument.
         If multiple language tags are specified, they are specified in the
         decreasing order of preference.

         If a sublanguage was asked for and not available but the primary
         language is available, the server SHOULD switch to the primary
         language and MUST use an extended LANG reply containing the
         identifier of the primary language it switched to as described in
         section 5.

         <<Should the fallback to primary languages be removed?>>

         <<Need to describe that the server first picks a language from the list,
         if it can't find any, it should try to use sublanguages for the specified
         languages>>

         It is also recommended that server recognizes languages that have
         multiple different tags (for example "ru" and "rus").

         Any server that supports this extension MUST support the language
         "i-default".  It SHOULD <<MUST?>> use the language "i-default" as
         described in [CHARSET-POLICY] as its default language until another
         supported language is negotiated by the client.  If a server is able
         to enumerate supported languages it MUST include "i-default" in
         the EHLO response. Otherwise it MUST NOT return any language in the
         LANGUAGE EHLO response.

         The client MUST NOT use MUL (Multiple languages) and UND
         (Undetermined) language tags and server MUST return error code 504
         to the LANG command that is used with such language tag.

         The special "*" language range argument indicates a request to use
         a language designated as preferred by the server administrator.
         The preferred language MAY vary based on the currently authenticated
         user.

         If the command succeeds, the server will return human-readable
         responses in the specified language starting with the successful
         250 response to the LANG command.  These responses will be in UTF-8
         [UTF-8]. In particular, LANG command MAY affect the result of a
         HELP command. The successful 250 response to the LANG command
         MUST use the extended reply as described in section 5. This
         reply will communicate the selected language to the client.

         If the command fails, the server will continue to return human-
         readable responses in the language it was previously using.

     Example 1:

        < The server defaults to using responses in "i-default" language
          until the user explicitly changes the language. >

         S: 220 smtp.example.com ESMTP server ready
         C: EHLO main.example.com
         S: 250-smtp.example.com
         S: 250-AUTH CRAM-MD5 DIGEST-MD5
         S: 250-EXPN
         S: 250-VRFY
         S: 250-DSN
         S: 250 LANGUAGE EN FR RU i-default
         C: HELP
         S: 214-This is Bukamail version X.X.X
         S: 214-Topics:
         S: 214-    HELO    EHLO    MAIL    RCPT    DATA
         S: 214-    RSET    NOOP    QUIT    HELP    VRFY
         S: 214-    EXPN    VERB    AUTH    DSN
         S: 214-For more info use "HELP <topic>".
         S: 214 End of HELP info

        < Once the client changes the language, all responses will be in
          that language starting with 250 response to the LANG command. >

         C: LANG FR
         S: 250 La Language commande a ete execute avec success

         C: HELP
         S: 214-C'est le programme Bukamail version X.X.X
         S: 214-Topics:
         S: 214-    HELO    EHLO    MAIL    RCPT    DATA
         S: 214-    RSET    NOOP    QUIT    HELP    VRFY
         S: 214-    EXPN    VERB    AUTH    DSN
         S: 214-Pour obtenir l'information supplementaire utilisez "HELP <topic>".
         S: 214 La fin de l'information

        < If a server does not support the requested language, responses
          will continue to be returned in the current language the server is
          using. >

         C: LANG DE
         S: 504 Ce Language n'est pas supporte

     Example 2:

        < The client tries to select MUL language that couldn't be used with
          described extension>

         C: LANG MUL
         S: 504 It is not allowed to use MUL language.

     Example 3:

        < The client tries to use LANG extension not supported by server>

         C: LANG i-default (blah blah)
         S: 504 LANG extension blah is not recognized.


         Note 1. [LANG-TAGS] warns that there is no guaranteed relationship
         between languages whose tags start out with the same series of
         subtags. However it is believed that for the purpose of this
         document it is safe to treat all languages, whose tags starts with
         primary language described in ISO 639-1 and ISO 639-2 (i.e. all 2
         or 3 letters primary languages) as hierarchical.  For all languages
         with other primary tags described fallback rule MUST NOT be used.
         In particular, language tags starting with 'i-' and 'x-' SHOULD NOT
         be treated as hierarchical.


4. SMTP Line length limit

   This extension doesn't affect the maximim text line length as defined by
   [RFC2821] and extended by other ESMTP extensions. However ESMTP maximum
   line length is defined in octets and *not* Unicode characters.
   If a server implementation truncates (or splits to create multiline response)
   UTF-8 response text to conform to such ESMTP limits, it MUST truncate
   the response at the Unicode character boundary. (I.e. if a response encoded
   in UTF-8 would be 1002 octets including the terminating CRLF and the last
   Unicode character before CRLF is represented as 3 UTF-8 octets, the response
   must be truncated to 999 octets including the terminating CRLF)


5. "LANG" extended reply

   <<Extended reply is the reply that contains additional information in the
   text part. Extended reply allows to pass additional information from
   server to client.  Client may choose to ignore additional information in
   an extended reply. Thus client that doesn't recognize an extended reply
   would treat it as a regular SMTP reply.>>

     Example 4:

        < The client tries to select the language, but it is unavailable.
          However primary language is available>

         C: LANG FR-ca
         S: 250 [LANG FR]La Language commande a ete execute avec success

   Client that supports LANGUAGE extension must recognize Enhanced Error
   Codes defined in [RFC-2034]. When server supports both LANGUAGE and
   ENHANCEDSTATUSCODES extensions, Extended reply data MUST follow Enhanced
   Error Code in reply.

     Example 5:

        < The server supports both LANGUAGE and ENHANCEDSTATUSCODES>

         S: 220 smtp.example.com ESMTP server ready
         C: EHLO main.example.com
         S: 250-smtp.example.com
         S: 250-LANGUAGE EN FR RU i-default
         S: 250 ENHANCEDSTATUSCODES
         C: LANG FR-ca
         S: 250 2.0.0 [LANG FR]La Language commande a ete execute avec success


6. The LANG parameter of the ESMTP MAIL command

   Then LANG esmtp-keyword on the extended MAIL command specifies what
   language should be used in human readable part and/or
   localized-diagnostic-text-field field of "message/delivery-status" part
   (see section 7.) of a delivery status notification for the message.

   If the LANG esmtp-keyword is used, it MUST have an associated
   esmtp-value. The ABNF for the LANG parameter is:

     lang-parameter = "LANG=" language-tag

   If the message is relayed to another SMTP server that supports LANGUAGE
   ESMTP extension, the MTA acting as the client MUST check if the receiving
   MTA lists the language specified in lang-param ("requested language") in
   the list of supported language tags in LANGUAGE EHLO response.  If the
   receiving MTA either lists the requested language or doesn't list any
   language tag (i.e. the receiving MTA is unable to list languages it
   supports) the sender MUST issue LANG command for the requested language.
   After that, regardless of the result of LANG command, the client MTA MUST
   specify LANG parameter in MAIL command.

   The receiving MTA SHOULD use the language specified in LANG parameter if
   it has to generates a DSN for the message. Human readable part in
   generated DSN SHOULD contain the description of the event in both i-default
   and requested language. If the server MTA doesn't support the requested
   language, it MUST act as if the client didn't specify any LANG parameter in
   the MAIL command.

     Example 6:

        < Relaying of the message >

         S: 220 smtp.example.com ESMTP server ready
         C: EHLO main.example.com
         S: 250-smtp.example.com
         S: 250-DSN
         S: 250-8BITMIME
         S: 250 LANGUAGE
         C: LANG RU
         S: 504 Unsupported language
         C: MAIL FROM:<Katerina@example.ru> LANG=ru
         S: 250 <Katerina@example.ru> sender ok
         C: DATA
         S: 354 okay, send message
         C: (message goes here)
         C: .
         S: 250 message accepted
         C: QUIT
         S: 221 goodbye


7. Delivery status notifications and extension

   The format of delivery status notifications (DSNs) is specified in [DSN].
   This memo extends the per-recipient-fields of [DSN] to include two new
   DSN fields, Localized-Diagnostic-Text, that is equivalent to text part of
   Diagnostic-Code but contains text in any language other than English, and
   Language, indicating the language tag for Localized-Diagnostic-Text
   field. In the augmented BNF [ABNF], per-recipient-fields is
   therefore extended as follows:

     per-recipient-fields =
          [ original-recipient-field CRLF ]
          final-recipient-field CRLF
          action-field CRLF
          status-field CRLF
          [ remote-mta-field CRLF ]
          [ [language-field CRLF
            localized-diagnostic-text-field CRLF ]
            diagnostic-code-field CRLF ]
          [ last-attempt-date-field CRLF ]
          [ will-retry-until-field CRLF ]
          *( extension-field CRLF )

    language-field = "Language" ":" language

    localized-diagnostic-text-field = "Localized-Diagnostic-Text" ":" *text

   where language is a language tag as described in [LANG-TAGS].

   An SMTP server that supports both DSN and LANGUAGE extensions SHOULD
   include localized-diagnostic-text-field. If
   localized-diagnostic-text-field is present, language-field MUST be
   present too. diagnostic-code-field MUST NOT contain text in any language
   other than English.


8. Formal Syntax

   The following syntax specification uses the augmented Backus-Naur Form
   (BNF) as described in [ABNF]. Non-terminals referenced but not defined
   below are as defined by [RFC2821] or [ABNF].

   Except as noted otherwise, all alphabetic characters are
   case-insensitive.  The use of upper or lower case characters to define
   token strings is for editorial clarity only.  Implementations MUST accept
   these strings in a case-insensitive fashion.

   LANG-Command = "LANG" 1*(SP language-tag) CRLF

   LANGUAGE-List = "LANGUAGE" *(SP <language-tag>) CRLF
      ; Note 1: When "i-default" is used, all responses MUST entirely in
      ; ASCII.
      ;
      ; Note 2: Language tags MUL (Multiple languages) and UND
      ; (Undetermined) MUST NOT be used.

   language-tag =  <language-tag> as defined in [LANG-TAGS]

   Reply-line |= Lang-Reply-line
      ; Reply-line is defined in [RFC2821]
      ; See section 5 for description of Lang-Reply-line

   Lang-Reply-line = Reply-code [ SP ext-text ] CRLF
      ; Reply line for LANG command

   ext-text = [ext-data] utf8-text

   utf8-text = 1*utf8-no-cr-lf

   utf8-no-cr-lf = %d1-9 /         ; Any Unicode character except for NUL, CR
                   %d11 /          ; and LF
                   %d12 /
                   %d14-127 /
                   UTFMB

   UTFMB   = UTF2 / UTF3 / UTF4

   UTF0    = %x80-BF

   UTF2    = %xC2-DF UTF0

   UTF3    = %xE0 %xA0-BF UTF0 / %xE1-EC 2(UTF0) /
             %xED %x80-9F UTF0 / %xEE-EF 2(UTF0)

   UTF4    = %xF0 %x90-BF 2(UTF0) / %xF1-F3 3(UTF0) /
             %xF4 %x80-8F 2(UTF0)

   ext-data = "[" ext-name SP ext-value "]"
      ; Note 1: In the case of multiline response the same ext-data SHOULD
      ; appear on every line.
      ;
      ; Note 2: In case when server also supports "SMTP Service Extension
      ; for Returning Enhanced Error Codes" [RFC-2034], ext-data MUST follow
      ; Enhanced Error Code.

   ext-name = "LANG"

   ext-value = Primary-tag
      ; Primary tag as defined by [LANG-TAGS]


9. Security Considerations

   It is possible for a man-in-the-middle attacker to insert a LANG
   command in the command stream thus making protocol-level diagnostic
   responses unintelligible to the user.  A mechanism to integrity
   protect the session, such as TLS [TLS] can be used to defeat such
   attacks.


10. References

10.1. Normative References

   [RFC2821] Klensin, J., "Simple Mail Transfer Protocol", RFC 2821,
   April 2001.

   [SUBMIT] Gellens, R. and J. Klensin, "Message Submission for
   Mail", RFC 4409, April 2006.

   [LMTP] Myers, J., "Local Mail Transfer Protocol", Carnegie-Mellon
   University, RFC 2033, October 1996.

   [LANG-TAGS] Alvestrand, H., "Tags for the Identification of
   Languages", BCP 47, RFC 3066, January 2001.

   [CHARSET-POLICY] Alvestrand, H., "IETF Policy on Character Sets and
   Languages", RFC 2277, January 1998.

   [UTF-8]  Yergeau, F., "UTF-8, a transformation format of ISO
   10646", STD 63, RFC 3629, November 2003.

   [KEYWORDS] Bradner, S., "Key words for use in RFCs to Indicate
   Requirement Levels", RFC 2119, March 1997.

   [ABNF] Crocker, D. and P. Overell, "Augmented BNF for Syntax
   Specifications: ABNF", RFC 4234, October 2005.

   [RFC-2034] Freed, N., "SMTP Service Extension for Returning Enhanced
   Error Codes", RFC 2034, Innosoft, October 1996.

   [DSN] Moore , K., Vaudreuil, G., "An Extensible Message Format for
   Delivery Status Notifications", University of Tennessee, Lucent
   Technologies, RFC 3464, January 2003.


10.2. Informative References

   [IMAP-LANGUAGE], Gahrns, M., Melnikov, A., "IMAP4 Language Extension",
   draft-gahrns-imap-language-xx.txt (work in progress).

   [TLS] Dierks, T. and E. Rescorla, "The Transport Layer
   Security (TLS) Protocol Version 1.1", RFC 4346, April 2006.


11.  Acknowledgments

   This document is based on the early version of [IMAP-LANGUAGE].
   Thus the work of Andrew McCown is appreciated.

   Many thanks to the following people who gave feedback on the document:
   Chris Newman, Brad Knowles and Paul Hoffman.


12. Author's Address

    Mike Gahrns
    Microsoft
    One Microsoft Way
    Redmond, WA, 98072

    Phone: (425) 936-9833
    Email: mikega@microsoft.com

    Alexey Melnikov (Editor)
    Isode Limited
    5 Castle Business Village
    36 Station Road
    Hampton, Middlesex
    TW12 2BX,
    United Kingdom

    EMail: Alexey.Melnikov@isode.com


Intellectual Property Statement

   The IETF takes no position regarding the validity or scope of any
   Intellectual Property Rights or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; nor does it represent that it has
   made any independent effort to identify any such rights.  Information
   on the procedures with respect to rights in RFC documents can be
   found in BCP 78 and BCP 79.

   Copies of IPR disclosures made to the IETF Secretariat and any
   assurances of licenses to be made available, or the result of an
   attempt made to obtain a general license or permission for the use of
   such proprietary rights by implementers or users of this
   specification can be obtained from the IETF on-line IPR repository at
   http://www.ietf.org/ipr.

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights that may cover technology that may be required to implement
   this standard.  Please address the information to the IETF at
   ietf-ipr@ietf.org.


Disclaimer of Validity

   This document and the information contained herein are provided on an
   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.


Copyright Statement

   Copyright (C) The Internet Society (2006).  This document is subject
   to the rights, licenses and restrictions contained in BCP 78, and
   except as set forth therein, the authors retain all their rights.


Acknowledgment

   Funding for the RFC Editor function is currently provided by the
   Internet Society.