INTERNET DRAFT                         E. J. Whitehead, Jr., UC Irvine
<draft-whitehead-mime-xml-02>      M. Murata, Fuji Xerox Info. Systems

Expires September, 1998                                    May 8, 1998


                            XML Media Types

Status of this Memo

   This document is an Internet-Draft. Internet-Drafts are working
   documents of the Internet Engineering Task Force (IETF), its areas,
   and its working groups. Note that other groups may also distribute
   working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six
   months and may be updated, replaced, or made obsolete by other
   documents at any time. It is inappropriate to use Internet-Drafts as
   reference material or to cite them other than as "work in progress".

   To learn the current status of any Internet-Draft, please check the
   "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow
   Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe),
   munnari.oz.au (Pacific Rim), ftp.ietf.org (US East Coast), or
   ftp.isi.edu (US West Coast).

   Distribution of this document is unlimited.

Copyright Notice

   Copyright (C) The Internet Society (1998). All Rights Reserved.

Abstract

   This document proposes two new media subtypes, text/xml and
   application/xml, for use in exchanging network entities which are
   conformant Extensible Markup Language (XML). XML entities are
   currently exchanged via the HyperText Transfer Protocol on the World
   Wide Web, and are an integral part of the WebDAV protocol for remote
   web authoring, and are expected to have utility in many domains.

















draft-whitehead-mime-xml-02                                   [Page 1]


INTERNET-DRAFT             XML Media Types                 May 8, 1998



Contents

STATUS OF THIS MEMO...................................................1
COPYRIGHT NOTICE......................................................1
ABSTRACT..............................................................1
CONTENTS..............................................................2
1 INTRODUCTION .......................................................3
2 XML MEDIA TYPES ....................................................3
2.1  Text/xml Registration ...........................................5
2.2  Application/xml Registration ....................................7
3 SECURITY CONSIDERATIONS ............................................9
4 REFERENCES ........................................................11
5 ACKNOWLEDGEMENTS ..................................................11
6 AUTHOR'S ADDRESS ..................................................12







































draft-whitehead-mime-xml-02                                   [Page 2]


INTERNET-DRAFT             XML Media Types                 May 8, 1998



1  Introduction

   The World Wide Web Consortium (W3C) has issued a Recommendation
   [REC-XML] which defines the Extensible Markup Language (XML),
   version 1. To enable the exchange of XML network entities, this
   document proposes two new media types, text/xml and application/xml.

   XML entities are currently exchanged on the World Wide Web, and XML
   is also used for property values and parameter marshalling by the
   WebDAV protocol for remote web authoring. Thus, there is a need for
   a media type to properly label the exchange of XML network entities.
   (Note that, as sometimes happens between two communities, both MIME
   and XML have defined the term entity, with different meanings.)

   Although XML is a subset of the Standard Generalized Markup Language
   (SGML) [ISO-8897], which currently is assigned the media types
   text/sgml and application/sgml, there are several reasons why use of
   text/sgml or application/sgml to label XML is inappropriate. First,
   there exist many applications which can process XML, but which
   cannot process SGML, due to SGML's larger feature set. Second, SGML
   applications cannot always process XML entities, because XML uses
   features of recent technical corrigenda to SGML.  Third, the
   definition of text/sgml and application/sgml [RFC-1874] includes
   parameters for SGML bit combination transformation format (SGML-
   bctf), and SGML boot attribute (SGML-boot). Since XML does not use
   these parameters, it would be ambiguous if such parameters were
   given for an XML entity. For these reasons, the best approach for
   labeling XML network entities is to provide a new media type for
   XML.

   Since XML is an integral part of the WebDAV Distributed Authoring
   Protocol, and since World Wide Web Consortium Recommendations have
   conventionally been assigned IETF tree media types, and since
   similar media types (HTML, SGML) have been assigned IETF tree media
   types, the XML media types also belong in the IETF tree.

2  XML Media Types

   This document introduces two new media types for XML entities,
   text/xml and application/xml.  Registration information for these
   media types are described in the sections below.

   An XML network entity should be labeled as text/xml under the
   following circumstances:

       - it is encoded using the UTF-8 character set encoding,

       - it is encoded using a character set encoding which is
       compatible with the requirements for text media types as
       described in [RFC-2045] and [RFC-2046],




draft-whitehead-mime-xml-02                                   [Page 3]


INTERNET-DRAFT             XML Media Types                 May 8, 1998



       - it is being transmitted via an 8-bit clean protocol such as
       HTTP [RFC-2068].

   If an XML network entity fails any of these criteria, then it should
   be labeled as application/xml.  Specifically, if sending UTF-16
   encoded XML in email, it should be labeled as application/xml.

   Some applications of XML will require security or runtime
   information specific to these applications.  This document does not
   prohibit future media types dedicated to such XML applications.
   However, developers of such media types are recommended to use this
   document as a basis.  In particular, encoding determination by the
   charset parameter should be the same.

   Within the XML specification, XML entities can be classified into
   four types.  In the XML terminology, they are called "document
   entities", "external DTD subsets", "external parsed entities", and
   "external parameter entities".  The media types text/xml and
   application/xml can be used for any of these four types.


































draft-whitehead-mime-xml-02                                   [Page 4]


INTERNET-DRAFT             XML Media Types                 May 8, 1998



2.1 Text/xml Registration

   MIME media type name: text

   MIME subtype name: xml

   Mandatory parameters: none

   Optional parameters: charset

       Although listed as an optional parameter, the use of the charset
       parameter is STRONGLY RECOMMENDED, since this information can be
       used by XML processors to determine authoritatively the
       character set of the XML entity.

       "UTF-8" [RFC-2279] or "UTF-16" (Appendix C.3 of [UNICODE] and
       Amendment 1 of [ISO-10646]) are the recommended charset values,
       representing the UTF-8 and UTF-16 character set encodings. These
       two encodings are preferred since they are supported by all
       conformant XML processors [REC-XML].  UTF-16 should be sent in
       network byte order (big-endian), but recipients should be able
       to handle both big-endian and little-endian. The UTF-16 encoding
       for text/xml network entities should only be used when the
       entity is being transmitted via an 8-bit clean protocol such as
       HTTP.

       Note that if a character set encoding other than UTF-8 or UTF-16
       is used, and the character set is not declared within an XML
       entity by an XML "encoding declaration" (a non-conformant
       situation according to the XML specification), an XML processor
       will be unable to determine the character set of the XML entity
       if the charset parameter is not given.  The definition of XML
       encoding declarations is given in 4.3.3 of [REC-XML].

       Since the charset parameter is authoritative, the character set
       is not always declared within an XML encoding declaration.
       Thus, special care is needed when the recipient strips the MIME
       header and provides persistent storage of the received XML
       (e.g., in a file system).  Unless the character set is UTF-8 or
       UTF-16, the recipient should also persistently store information
       about the character set encoding, perhaps by embedding a correct
       XML encoding declaration within the XML entity.

   Encoding considerations: May be encoded.

   Security considerations:

       See section 3 below.





draft-whitehead-mime-xml-02                                   [Page 5]


INTERNET-DRAFT             XML Media Types                 May 8, 1998



   Interoperability considerations:

       XML has proven to be interoperable across WebDAV clients and
       servers, and for import and export from multiple XML authoring
       tools.

   Published specification: see [REC-XML]

   Applications which use this media type:

       XML is device-, platform-, and vendor-neutral and is supported
       by a wide range of Web user agents, WebDAV clients and servers,
       as well as XML authoring tools.

   Additional information:

       Magic number(s): none

       Although no byte sequences can be counted on to always be
       present, XML entities in ASCII-compatible encodings (including
       UTF-8) often begin with hexadecimal 3C 3F 78 6D 6C ("<?xml"),
       and those in UTF-16 often begin with hexadecimal FE FF 00 3C 00
       3F 00 78 00 6D or FF FE 3C 00 3F 00 78 00 6D 00 (the Byte Order
       Mark (BOM) followed by "<?xml").  For more information, see
       Annex F of [REC-XML].

       File extension(s): .xml
       Macintosh File Type Code(s): "TEXT"

   Person & email address for further information:

       Murata Makoto (Family Given) <murata@fxis.fujixerox.co.jp>
       Jim Whitehead <ejw@ics.uci.edu>
       Kurt Conrad <conrad@SagebrushGroup.com>

   Intended usage: COMMON

   Author/Change controller:

       The XML specification is a work product of the World Wide Web
       Consortium's XML Working Group, and was edited by:

       Tim Bray <tbray@textuality.com>
       Jean Paoli <jeanpa@microsoft.com>
       C. M. Sperberg-McQueen <cmsmcq@uic.edu>

       The W3C, and the W3C XML working group, has change control over
       the XML specification.





draft-whitehead-mime-xml-02                                   [Page 6]


INTERNET-DRAFT             XML Media Types                 May 8, 1998



2.2 Application/xml Registration

   MIME media type name: application

   MIME subtype name: xml

   Mandatory parameters: none

   Optional parameters: charset

       Although listed as an optional parameter, the use of the charset
       parameter is STRONGLY RECOMMENDED, since this information can be
       used by XML processors to determine authoritatively the
       character set of the XML entity.

       "UTF-8" [RFC-2279] or "UTF-16" (Appendix C.3 of [UNICODE] and
       Amendment 1 of [ISO-10646]) are the recommended charset values,
       representing the UTF-8 and UTF-16 character set encodings. These
       two encodings are preferred since they are supported by all
       conformant XML processors [REC-XML].  UTF-16 should be sent in
       network byte order (big-endian), but recipients should be able
       to handle both big-endian and little-endian.

       Note that if a character set encoding other than UTF-8 or UTF-16
       is used, and the character set is not declared within an XML
       entity by an XML "encoding declaration" (a non-conformant
       situation according to the XML specification), an XML processor
       will be unable to determine the character set of the XML entity
       if the charset parameter is not given.  The definition of XML
       encoding declarations is given in 4.3.3 of [REC-XML].

       Since the charset parameter is authoritative, the character set
       is not always declared within an XML encoding declaration.
       Thus, special care is needed when the recipient strips the MIME
       header and provides persistent storage of the received XML
       (e.g., in a file system).  Unless the character set is UTF-8 or
       UTF-16, the recipient should also persistently store information
       about the character set encoding, perhaps by embedding a correct
       XML encoding declaration within the XML entity.

   Encoding considerations: May be encoded.

   Security considerations:

       See section 3 below.

   Interoperability considerations:

       XML has proven to be interoperable for import and export from
       multiple XML authoring tools.

   Published specification: see [REC-XML]

draft-whitehead-mime-xml-02                                   [Page 7]


INTERNET-DRAFT             XML Media Types                 May 8, 1998




   Applications which use this media type:

       XML is device-, platform-, and vendor-neutral and is supported
       by a wide range of Web user agents and XML authoring tools.

   Additional information:

       Magic number(s): none

       Although no byte sequences can be counted on to always be
       present, XML entities in ASCII-compatible encodings (including
       UTF-8) often begin with hexadecimal 3C 3F 78 6D 6C ("<?xml"),
       and those in UTF-16 often begin with hexadecimal FE FF 00 3C 00
       3F 00 78 00 6D or FF FE 3C 00 3F 00 78 00 6D 00 (the Byte Order
       Mark (BOM) followed by "<?xml").  For more information, see
       Annex F of [REC-XML].

       File extension(s): .xml
       Macintosh File Type Code(s): "TEXT"

   Person & email address for further information:

       Murata Makoto (Family Given) <murata@fxis.fujixerox.co.jp>
       Jim Whitehead <ejw@ics.uci.edu>
       Kurt Conrad <conrad@SagebrushGroup.com>

   Intended usage: COMMON

   Author/Change controller:

       The XML specification is a work product of the World Wide Web
       Consortium's XML Working Group, and was edited by:

       Tim Bray <tbray@textuality.com>
       Jean Paoli <jeanpa@microsoft.com>
       C. M. Sperberg-McQueen <cmsmcq@uic.edu>

       The W3C, and the W3C XML working group, has change control over
       the XML specification.













draft-whitehead-mime-xml-02                                   [Page 8]


INTERNET-DRAFT             XML Media Types                 May 8, 1998



3  Security Considerations

   XML, as a subset of SGML, has the same security considerations as
   specified in [RFC-1874].

   To paraphrase section 3 of [RFC-1874], XML entities contain
   information to be parsed and processed by the recipient's XML
   system.  These entities may contain and such systems may permit
   explicit system level commands to be executed while processing the
   data.  To the extent that an XML system will execute arbitrary
   command strings, recipients of XML entities may be at risk. In
   general, it may be possible to specify commands that perform
   unauthorized file operations or make changes to the display
   processor's environment that affect subsequent operations.

   Use of XML is expected to be varied, and widespread.  XML is under
   scrutiny by a wide range of communities for use as a common syntax
   for community-specific metadata.  For example, the Dublin Core group
   is using XML for document metadata, and a new effort has begun which
   is considering use of XML for medical information.  Other groups
   view XML as a mechanism for marshalling parameters for remote
   procedure calls.  More uses of XML will undoubtedly arise.

   Security considerations will vary by domain of use.  For example,
   XML medical records will have much more stringent privacy and
   security considerations than XML library metadata. Similarly, use of
   XML as a parameter marshalling syntax necessitates a case by case
   security review.

   Since XML entities may contain explicit processing instructions for
   a presentation, composition, scripting, or remote procedure call
   language, use of such instructions present concerns similar to those
   of Application/PostScript [RFC-2046]. Applications which interpret
   XML DTDs which allow the specification of unsafe file operations or
   other system-level commands should perform a review of the security
   considerations of supporting such DTDs, especially potential
   interactions between DTDs, and should provide mechanisms to users to
   increase the security of such systems.  These mechanisms may
   include, but are not limited to, provision of a "safe" mode which
   disables these commands in a fashion similar to many display
   Postscript and Java interpreters.

   XML may also have some of the same security concerns as plain text.
   Like plain text, XML can contain escape sequences which, when
   displayed, have the potential to change the display processor
   environment in ways that adversely affect subsequent operations.
   Possible effects include, but are not limited to, locking the
   keyboard, changing display parameters so subsequent displayed text
   is unreadable, or even changing display parameters to deliberately
   obscure or distort subsequent displayed material so that its meaning
   is lost or altered.  Display processors should either filter such


draft-whitehead-mime-xml-02                                   [Page 9]


INTERNET-DRAFT             XML Media Types                 May 8, 1998



   material from displayed text or else make sure to reset all
   important settings after a given display operation is complete.

   Some terminal devices have keys whose output, when pressed, can be
   changed by sending the display processor a character sequence. If
   this is possible the display of a text object containing such
   character sequences could reprogram keys to perform some illicit or
   dangerous action when the key is subsequently pressed by the user.
   In some cases not only can keys be programmed, they can be triggered
   remotely, making it possible for a text display operation to
   directly perform some unwanted action. As such, the ability to
   program keys should be blocked either by filtering or by disabling
   the ability to program keys entirely.

   Note that it is also possible to construct XML documents which make
   use of what XML terms "entity references" (using the XML meaning of
   the term "entity", which differs from the MIME definition of this
   term), to construct repeated expansions of text. Recursive
   expansions are prohibited [REC-XML] and XML processors are required
   to detect them.  However, even non-recursive expansions may cause
   problems with the finite computing resources of computers, if they
   are performed many times.































draft-whitehead-mime-xml-02                                  [Page 10]


INTERNET-DRAFT             XML Media Types                 May 8, 1998



4  References

   [ISO-10646] ISO/IEC, Information Technology - Universal Multiple-
   Octet Coded Character Set (UCS) - Part 1: Architecture and Basic
   Multilingual Plane, May 1993, with amendments 1 through 7.

   [ISO-8897] ISO (International Organization for Standardization) ISO
   8879:1986(E) Information Processing _ Text and Office Systems _
   Standard Generalized Markup Language (SGML). First edition _ 1986-
   10-15.

   [REC-XML] T. Bray, J. Paoli, C. M. Sperberg-McQueen, "Extensible
   Markup Language (XML)." World Wide Web Consortium Recommendation
   REC-xml-19980210. http://www.w3.org/TR/1998/REC-xml-19980210.

   [RFC-1874] E. Levinson. "SGML Media Types_ Accurate Information
   Systems. RFC 1874. December, 1995.

   [RFC-2045] N. Freed, N. Borenstein. "Multipurpose Internet Mail
   Extensions (MIME) Part One: Format of Internet Message Bodies"
   Innosoft, First Virtual. RFC 2045. November, 1996.

   [RFC-2046] N. Freed, N. Borenstein. "Multipurpose Internet Mail
   Extensions (MIME) Part Two: Media Types_ Innosoft, First Virtual.
   RFC 2046. November, 1996.

   [RFC-2068] R. Fielding, J. Gettys, J. Mogul, H. Frystyk, T. Berners-
   Lee. "Hypertext Transfer Protocol -- HTTP/1.1" UC Irvine, DEC,
   MIT/LCS. RFC 2068. January, 1997.

   [RFC-2279] F. Yergeau, "UTF-8, a transformation format of ISO
   10646", January 1998.

   [UNICODE] The Unicode Consortium, "The Unicode Standard -- Version
   2.0", Addison-Wesley, 1996.

5  Acknowledgements

   Chris Newman and Yaron Y. Goland both contributed content to the
   security considerations section of this document.  In particular,
   some text in the security considerations section is copied verbatim
   from draft-newman-mime-textpara-00, by permission of the author.

   Members of the W3C XML Working Group and XML Special Interest group
   have made significant contributions to this document.








draft-whitehead-mime-xml-02                                  [Page 11]


INTERNET-DRAFT             XML Media Types                 May 8, 1998



6  Author's Address

   E. James Whitehead, Jr.
   Dept. of Information and Computer Science
   University of California, Irvine
   Irvine, CA 92697-3425
   Email: ejw@ics.uci.edu

   Murata Makoto (Family Given)
   Fuji Xerox Information Systems,
   KSP 9A7, 2-1, Sakado 3-chome, Takatsu-ku,
   Kawasaki-shi, Kanagawa-ken,
   213 Japan
   Email: murata@fxis.fujixerox.co.jp







































draft-whitehead-mime-xml-02                                  [Page 12]