INTERNET DRAFT E. J. Whitehead, Jr., UC Irvine
<draft-whitehead-mime-xml-01> M. Makoto, Fuji Xerox Info. Systems
Expires September, 1998 May 3, 1998
The text/xml Media Type
Status of this Memo
This document is an Internet-Draft. Internet-Drafts are working
documents of the Internet Engineering Task Force (IETF), its areas,
and its working groups. Note that other groups may also distribute
working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or made obsolete by other
documents at any time. It is inappropriate to use Internet-Drafts as
reference material or to cite them other than as "work in progress".
To learn the current status of any Internet-Draft, please check the
"1id-abstracts.txt" listing contained in the Internet-Drafts Shadow
Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe),
munnari.oz.au (Pacific Rim), ftp.ietf.org (US East Coast), or
ftp.isi.edu (US West Coast).
Distribution of this document is unlimited.
Copyright Notice
Copyright (C) The Internet Society (1998). All Rights Reserved.
Abstract
This document proposes a new media subtype, text/xml, for use in
exchanging network entities which are conformant Extensible Markup
Language (XML). XML entities are currently exchanged via the
HyperText Transfer Protocol on the World Wide Web, and are an
integral part of the WebDAV protocol for remote web authoring, and
are expected to have utility in many domains.
draft-whitehead-mime-xml-01 [Page 1]
INTERNET-DRAFT The text/xml Media Type May 3, 1998
Contents
STATUS OF THIS MEMO...................................................1
COPYRIGHT NOTICE......................................................1
ABSTRACT..............................................................1
CONTENTS..............................................................2
1 INTRODUCTION .......................................................3
2 A MEDIA TYPE FOR XML ...............................................3
2.1 Text/xml ........................................................4
3 SECURITY CONSIDERATIONS ............................................6
4 REFERENCES .........................................................8
5 ACKNOWLEDGEMENTS ...................................................8
6 AUTHOR'S ADDRESS ...................................................8
draft-whitehead-mime-xml-01 [Page 2]
INTERNET-DRAFT The text/xml Media Type May 3, 1998
1 Introduction
The World Wide Web Consortium (W3C) has issued a Recommendation
[REC-XML] which defines the Extensible Markup Language (XML),
version 1. To enable the exchange of XML network entities, this
document proposes a new media type, text/xml.
XML entities are currently exchanged on the World Wide Web, and XML
is also used for property values and parameter marshalling by the
WebDAV protocol for remote web authoring. Thus, there is a need for
a media type to properly label the exchange of XML network entities.
(Note that, as sometimes happens between two communities, both MIME
and XML have defined the term entity, with different meanings.)
Although XML is a subset of the Standard Generalized Markup Language
(SGML) [ISO-8897], which currently is assigned the media types
text/sgml and application/sgml, there are several reasons why use of
text/sgml or application/sgml to label XML is inappropriate. First,
there exist many applications which can process XML, but which
cannot process SGML, due to SGML's larger feature set. Second, SGML
applications cannot always process XML entities, because XML uses
features of recent technical corrigenda to SGML. Third, the
definition of text/sgml and application/sgml [RFC-1874] includes
parameters for SGML bit combination transformation format (SGML-
bctf), and SGML boot attribute (SGML-boot). Since XML does not use
these parameters, it would be ambiguous if such parameters were
given for an XML entity. For these reasons, the best approach for
labeling XML network entities is to provide a new media type for
XML.
Since XML is an integral part of the WebDAV Distributed Authoring
Protocol, and since World Wide Web Consortium Recommendations have
conventionally been assigned IETF tree media types, and since
similar media types (HTML, SGML) have been assigned IETF tree media
types, the XML media type also belongs in the IETF tree.
2 A Media Type for XML
This document introduces a new media type for XML entities,
text/xml.
Some applications of XML will require security or runtime
information specific to these applications. This document does not
prohibit future media types dedicated to such XML applications.
However, developers of such media types are recommended to use this
document as a basis. In particular, encoding determination by the
charset parameter should be the same.
Within the XML specification, XML entities can be classified into
four types. In the XML terminology, they are called "document
entities", "external DTD subsets", "external parsed entities", and
draft-whitehead-mime-xml-01 [Page 3]
INTERNET-DRAFT The text/xml Media Type May 3, 1998
"external parameter entities". The media type text/xml can be used
for any of these four types.
2.1 Text/xml Registration
MIME media type name: text
MIME subtype name: xml
Mandatory parameters: none
Optional parameters: charset
Although listed as an optional parameter, the use of the charset
parameter is STRONGLY RECOMMENDED, since this information can be
used by XML processors to determine authoritatively the
character set of the XML entity.
"UTF-8" [RFC-2279] or "UTF-16" (Appendix C.3 of [UNICODE] and
Amendment 1 of [ISO-10646]) are the recommended charset values,
representing the UTF-8 and UTF-16 character set encodings. These
two encodings are preferred since they are supported by all
conformant XML processors [REC-XML]. UTF-16 should be sent in
network byte order (big-endian), but recipients should be able
to handle both big-endian and little-endian.
Note that if a character set encoding other than UTF-8 or UTF-16
is used, and the character set is not declared within an XML
entity by an XML "encoding declaration" (a non-conformant
situation according to the XML specification), an XML processor
will be unable to determine the character set of the XML entity
if the charset parameter is not given. The definition of XML
encoding declarations is given in 4.3.3 of [REC-XML].
Since the charset parameter is authoritative, the character set
is not always declared within an XML encoding declaration.
Thus, special care is needed when the recipient strips the MIME
header and provides persistent storage of the received XML
(e.g., in a file system). Unless the character set is UTF-8 or
UTF-16, the recipient should also persistently store information
about the character set encoding, perhaps by embedding a correct
XML encoding declaration within the XML entity.
Encoding considerations: May be encoded.
Security considerations:
See section 3 below.
draft-whitehead-mime-xml-01 [Page 4]
INTERNET-DRAFT The text/xml Media Type May 3, 1998
Interoperability considerations:
XML has proven to be interoperable across WebDAV clients and
servers, and for import and export from multiple XML authoring
tools.
Published specification: see [REC-XML]
Applications which use this media type:
XML is device-, platform-, and vendor-neutral and is supported
by a wide range of Web user agents, WebDAV clients and servers,
as well as XML authoring tools.
Additional information:
Magic number(s): none
Although no byte sequences can be counted on to always be
present, XML entities in ASCII-compatible encodings (including
UTF-8) often begin with hexadecimal 3C 3F 78 6D 6C ("<?xml"),
and those in UTF-16 often begin with hexadecimal FE FF 00 3C 00
3F 00 78 00 6D or FF FE 3C 00 3F 00 78 00 6D 00 (the Byte Order
Mark (BOM) followed by "<?xml"). For more information, see
Annex F of [REC-XML].
File extension(s): .xml
Macintosh File Type Code(s): "TEXT"
Person & email address for further information:
Murata Makoto (Family Given) <murata@fxis.fujixerox.co.jp>
Jim Whitehead <ejw@ics.uci.edu>
Kurt Conrad <conrad@SagebrushGroup.com>
Intended usage: COMMON
Author/Change controller:
The XML specification is a work product of the World Wide Web
Consortium's XML Working Group, and was edited by:
Tim Bray <tbray@textuality.com>
Jean Paoli <jeanpa@microsoft.com>
C. M. Sperberg-McQueen <cmsmcq@uic.edu>
The W3C, and the W3C XML working group, has change control over
the XML specification.
draft-whitehead-mime-xml-01 [Page 5]
INTERNET-DRAFT The text/xml Media Type May 3, 1998
3 Security Considerations
XML, as a subset of SGML, has the same security considerations as
specified in [RFC-1874].
To paraphrase section 3 of [RFC-1874], XML entities contain
information to be parsed and processed by the recipient's XML
system. These entities may contain and such systems may permit
explicit system level commands to be executed while processing the
data. To the extent that an XML system will execute arbitrary
command strings, recipients of XML entities may be at risk. In
general, it may be possible to specify commands that perform
unauthorized file operations or make changes to the display
processor's environment that affect subsequent operations.
Use of XML is expected to be varied, and widespread. XML is under
scrutiny by a wide range of communities for use as a common syntax
for community-specific metadata. For example, the Dublin Core group
is using XML for document metadata, and a new effort has begun which
is considering use of XML for medical information. Other groups
view XML as a mechanism for marshalling parameters for remote
procedure calls. More uses of XML will undoubtedly arise.
Security considerations will vary by domain of use. For example,
XML medical records will have much more stringent privacy and
security considerations than XML library metadata. Similarly, use of
XML as a parameter marshalling syntax necessitates a case by case
security review.
Since XML entities may contain explicit processing instructions for
a presentation, composition, scripting, or remote procedure call
language, use of such instructions present concerns similar to those
of Application/PostScript [RFC-2046]. Applications which interpret
XML DTDs which allow the specification of unsafe file operations or
other system-level commands should perform a review of the security
considerations of supporting such DTDs, especially potential
interactions between DTDs, and should provide mechanisms to users to
increase the security of such systems. These mechanisms may
include, but are not limited to, provision of a "safe" mode which
disables these commands in a fashion similar to many display
Postscript and Java interpreters.
XML may also have some of the same security concerns as plain text.
Like plain text, XML can contain escape sequences which, when
displayed, have the potential to change the display processor
environment in ways that adversely affect subsequent operations.
Possible effects include, but are not limited to, locking the
keyboard, changing display parameters so subsequent displayed text
is unreadable, or even changing display parameters to deliberately
obscure or distort subsequent displayed material so that its meaning
is lost or altered. Display processors should either filter such
draft-whitehead-mime-xml-01 [Page 6]
INTERNET-DRAFT The text/xml Media Type May 3, 1998
material from displayed text or else make sure to reset all
important settings after a given display operation is complete.
Some terminal devices have keys whose output, when pressed, can be
changed by sending the display processor a character sequence. If
this is possible the display of a text object containing such
character sequences could reprogram keys to perform some illicit or
dangerous action when the key is subsequently pressed by the user.
In some cases not only can keys be programmed, they can be triggered
remotely, making it possible for a text display operation to
directly perform some unwanted action. As such, the ability to
program keys should be blocked either by filtering or by disabling
the ability to program keys entirely.
Note that it is also possible to construct XML documents which make
use of what XML terms "entity references" (using the XML meaning of
the term "entity", which differs from the MIME definition of this
term), to construct repeated expansions of text. Recursive
expansions are prohibited [REC-XML] and XML processors are required
to detect them. However, even non-recursive expansions may cause
problems with the finite computing resources of computers, if they
are performed many times.
draft-whitehead-mime-xml-01 [Page 7]
INTERNET-DRAFT The text/xml Media Type May 3, 1998
4 References
[ISO-10646] ISO/IEC, Information Technology - Universal Multiple-
Octet Coded Character Set (UCS) - Part 1: Architecture and Basic
Multilingual Plane, May 1993, with amendments 1 through 7.
[ISO-8897] ISO (International Organization for Standardization) ISO
8879:1986(E) Information Processing _ Text and Office Systems _
Standard Generalized Markup Language (SGML). First edition _ 1986-
10-15.
[REC-XML] T. Bray, J. Paoli, C. M. Sperberg-McQueen, "Extensible
Markup Language (XML)." World Wide Web Consortium Recommendation
REC-xml-19980210. http://www.w3.org/TR/1998/REC-xml-19980210.
[RFC-1874] E. Levinson. "SGML Media Types_ Accurate Information
Systems. RFC 1874. December, 1995.
[RFC-2046] N. Freed, N. Borenstein. "Multipurpose Internet Mail
Extensions (MIME) Part Two: Media Types_ Innosoft, First Virtual.
RFC 2046. November, 1996.
[RFC-2279] F. Yergeau, "UTF-8, a transformation format of ISO
10646", January 1998.
[UNICODE] The Unicode Consortium, "The Unicode Standard -- Version
2.0", Addison-Wesley, 1996.
5 Acknowledgements
Chris Newman and Yaron Y. Goland both contributed content to the
security considerations section of this document. In particular,
some text in the security considerations section is copied verbatim
from draft-newman-mime-textpara-00, by permission of the author.
Members of the W3C XML Working Group and XML Special Interest group
have made significant contributions to this document.
6 Author's Address
E. James Whitehead, Jr.
Dept. of Information and Computer Science
University of California, Irvine
Irvine, CA 92697-3425
Email: ejw@ics.uci.edu
Murata Makoto (Family Given)
Fuji Xerox Information Systems,
KSP 9A7, 2-1, Sakado 3-chome, Takatsu-ku,
Kawasaki-shi, Kanagawa-ken,
213 Japan
Email: murata@fxis.fujixerox.co.jp
draft-whitehead-mime-xml-01 [Page 8]
draft-whitehead-mime-xml-01 [Page 9]