Network Working Group                                      S. Hollenbeck
Internet-Draft                                            VeriSign, Inc.
Expires: October 28, 2002                                        M. Rose
                                            Dover Beach Consulting, Inc.
                                                             L. Masinter
                                              Adobe Systems Incorporated
                                                          April 29, 2002


          Guidelines for the Use of XML within IETF Protocols
              draft-hollenbeck-ietf-xml-guidelines-02.txt

Status of this Memo

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC2026.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at http://
   www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on October 28, 2002.

Copyright Notice

   Copyright (C) The Internet Society (2002).  All Rights Reserved.

Abstract

   The Extensible Markup Language (XML) is a framework for structuring
   data.  While it evolved from SGML -- a markup language primarily
   focused on structuring documents -- XML has evolved to be a widely-
   used mechanism for representing structured data.

   There are a wide variety of Internet protocols being developed; many
   have need for a representation for structured data relevant to their
   application.  There has been much interest in the use of XML as a



Hollenbeck, et al.      Expires October 28, 2002                [Page 1]


Internet-Draft          XML Within IETF Protocols             April 2002


   representation method.  This document describes basic XML concepts,
   analyzes various alternatives in the use of XML, and provides
   guidelines for the use of XML within IETF standards-track protocols.

Intended Publication Status

   It is the goal of the authors that this draft (when completed and
   then approved by the IESG) be published as a Best Current Practice
   (BCP).

Conventions Used In This Document

   This document recommends, as policy, what specifications for Internet
   protocols -- and, in particular, IETF standards track protocol
   documents -- should include as normative language within them.  The
   capitalized keywords "SHOULD", "MUST", "REQUIRED", etc.  are used in
   the sense of how they would be used within other documents with the
   meanings as specified in RFC 2119 [1].

Discussion Venue

   The authors welcome discussion and comments relating to the topics
   presented in this document.  Though direct comments to the authors
   are welcome, public discussion is taking place on the "ietf-xml-
   use@imc.org" mailing list.  To join the list, send a message to
   "ietf-xml-use-request@imc.org" with the word "subscribe" in the body
   of the message.  There is a web site for the archives of the list at
   http://www.imc.org/ietf-xml-use/.























Hollenbeck, et al.      Expires October 28, 2002                [Page 2]


Internet-Draft          XML Within IETF Protocols             April 2002


Table of Contents

   1.    Introduction and Overview  . . . . . . . . . . . . . . . . .  4
   1.1   Intended Audience  . . . . . . . . . . . . . . . . . . . . .  4
   1.2   Scope  . . . . . . . . . . . . . . . . . . . . . . . . . . .  4
   1.3   XML Evolution  . . . . . . . . . . . . . . . . . . . . . . .  4
   1.4   XML Users, Support Groups, and Additional Information  . . .  5
   2.    XML Selection Considerations . . . . . . . . . . . . . . . .  6
   3.    XML Alternatives . . . . . . . . . . . . . . . . . . . . . .  8
   4.    XML Use Considerations and Recommendations . . . . . . . . . 10
   4.1   XML Declarations . . . . . . . . . . . . . . . . . . . . . . 10
   4.2   XML Processing Instructions  . . . . . . . . . . . . . . . . 10
   4.3   Well-Formedness  . . . . . . . . . . . . . . . . . . . . . . 11
   4.4   Validity and Extensibility . . . . . . . . . . . . . . . . . 11
   4.5   Namespaces . . . . . . . . . . . . . . . . . . . . . . . . . 12
   4.5.1 Namespaces and Attributes  . . . . . . . . . . . . . . . . . 13
   4.6   Element and Attribute Design Considerations  . . . . . . . . 13
   4.7   Binary Data  . . . . . . . . . . . . . . . . . . . . . . . . 15
   4.8   Incremental Processing . . . . . . . . . . . . . . . . . . . 15
   5.    Internationalization Considerations  . . . . . . . . . . . . 16
   5.1   Character Sets and Encodings: UTF-8 and UTF-16 . . . . . . . 16
   5.2   Language Declaration . . . . . . . . . . . . . . . . . . . . 16
   5.3   Other Considerations . . . . . . . . . . . . . . . . . . . . 16
   6.    IANA Considerations  . . . . . . . . . . . . . . . . . . . . 18
   7.    Security Considerations  . . . . . . . . . . . . . . . . . . 19
   8.    Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 20
         Normative References . . . . . . . . . . . . . . . . . . . . 21
         Informative References . . . . . . . . . . . . . . . . . . . 22
         Authors' Addresses . . . . . . . . . . . . . . . . . . . . . 24
   A.    Appendix A: Change History . . . . . . . . . . . . . . . . . 26
         Full Copyright Statement . . . . . . . . . . . . . . . . . . 28




















Hollenbeck, et al.      Expires October 28, 2002                [Page 3]


Internet-Draft          XML Within IETF Protocols             April 2002


1. Introduction and Overview

   The Extensible Markup Language (XML) is a framework for structuring
   data.  While it evolved from the Standard Generalized Markup Language
   (SGML) [31] -- a markup language primarily focused on structuring
   documents -- XML has evolved to be a widely-used mechanism for
   representing structured data  in protocol exchanges.  See [39] for an
   introduction to XML.

1.1 Intended Audience

   Many Internet protocol designers are considering using XML and XML
   fragments within the context of existing and new Internet protocols.
   This document is intended as a guide to XML usage and as IETF policy
   for standards track documents.  Experienced XML practitioners will
   likely already be familiar with the background material here, but the
   guidelines are intended to be appropriate for those readers as well.

1.2 Scope

   This document is intended to give guidelines for the use of XML
   content within a larger protocol.  The goal is not to suggest that
   XML is the "best" or "preferred" way to represent data; rather, the
   goal is to lay out the context for the use XML within a protocol once
   other factors point to XML as a possible data representation
   solution.

   There are a number of protocol frameworks already in use or under
   development which focus entirely on "XML protocol": the exclusive use
   of XML as the data representation in the protocol.  For example, the
   World Wide Web Consortium (W3C) is developing an XML Protocol
   framework [41] based on the Simple Object Access Protocol (SOAP)
   [42].  The applicability of such protocols is not part of the scope
   of this document.

   In addition, there are higher-level representation frameworks, based
   on XML, that have been designed as carriers of certain classes of
   information; for example, the Resource Description Framework (RDF)
   [36] is an XML-based representation for logical assertions.  This
   document does not provide guidelines for the use of such frameworks.

1.3 XML Evolution

   Originally published in February 1998 [35], XML's popularity has led
   to several additions to the base specification.  Although these
   additions are designed to be consistent with version 1.0 of XML, they
   have varying levels of stability, consensus, and implementation.
   Accordingly, this document identifies the major evolutionary features



Hollenbeck, et al.      Expires October 28, 2002                [Page 4]


Internet-Draft          XML Within IETF Protocols             April 2002


   of XML and makes suggestions as to the circumstances in which each
   feature should be used.

1.4 XML Users, Support Groups, and Additional Information

   There are many XML support groups, some devoted to the entire XML
   industry (e.g., http://xml.org/), some devoted to developers (http://
   xmlhack.com/), some devoted to the business applications of XML
   (e.g., http://oasis-open.org/), and many, many groups devoted to the
   use of XML in a particular context.

   It is beyond the scope of this document to provide a comprehensive
   list of referrals.  Interested readers are directed to the three
   links above as starting points, as well as their favorite Internet
   search engine.




































Hollenbeck, et al.      Expires October 28, 2002                [Page 5]


Internet-Draft          XML Within IETF Protocols             April 2002


2. XML Selection Considerations

   XML is a tool that provides a means towards an end.  Choosing the
   right tool for a given task is an essential part of ensuring that the
   task can be completed in a satisfactory manner.  This section
   describes factors to be aware of when considering XML as a tool for
   use in IETF protocols:

   o  XML is a meta-markup language that can be used to define markup
      languages for specific domains and problem spaces.

   o  XML provides both logical structure and physical structure to
      describe data.  Data framing is built-in.

   o  XML includes features to support internationalization and
      localization.

   o  XML is extensible.  New tags (and thus new protocol elements) can
      be defined without requiring changes to XML itself.

   o  XML is still evolving.  The formal specifications are still being
      influenced and updated as use experience is gained and applied.

   o  XML is text-based, so XML fragments are easily created, edited,
      and managed using common utilities.  Further, being text-based
      means it more readily supports incremental development, debugging,
      and logging.  A simple "canned" XML fragment can be embedded
      within a program as a string constant, rather than constructed.

   o  Binary data has to be encoded into a text-based form to be
      represented in XML.

   o  XML is verbose when compared with many other structured data
      representation languages.  A representation with element
      extensibility and human readability typically requires more bits
      when compared to one optimized for efficient machine processing.

   o  XML implementations are still relatively new.  As designers and
      implementers gain experience, it is not uncommon to find defects
      in early and current products.

   o  XML support is available in a large number of software development
      utilities, available in both open source and proprietary products.

   o  XML processing speed can be an issue in some environments.  XML
      processing can be slower because XML data streams may be larger
      than other representations, and the use of general purpose XML
      parsers will add a software layer with its own performance costs



Hollenbeck, et al.      Expires October 28, 2002                [Page 6]


Internet-Draft          XML Within IETF Protocols             April 2002


      (though these costs can be reduced through consistent use of an
      optimized parser).  Further, processing XML requires scanning the
      entire XML data stream; in some situations, this is the primary
      overhead.















































Hollenbeck, et al.      Expires October 28, 2002                [Page 7]


Internet-Draft          XML Within IETF Protocols             April 2002


3. XML Alternatives

   This document focuses on guidelines for the use of XML.  It is useful
   to consider why one might use XML as opposed to some other mechanism.
   This section considers some other commonly used representation
   mechanisms and compares XML to those alternatives.

   For many fundamental protocols, the extensibility requirements are
   modest, and the performance requirements are high enough that fixed
   binary data blocks are the appropriate representation; mechanisms
   such as XML merely add bloat [25].

   In addition, there are other representation and extensibility
   frameworks that have been used successfully within communication
   protocols.  For example, Abstract Syntax Notation 1 (ASN.1) [29]
   along with the corresponding Basic Encoding Rules (BER) [30] are part
   of the OSI communication protocol suite, and have been used in many
   subsequent communications standards (e.g., the ANSI Information
   Retrieval protocol [28] and the Simple Network Management Protocol
   (SNMP) [15]).  The External Data Representation (XDR) [16] and
   variations of it have been used in many other distributed network
   applications (e.g., the Network File System protocol [24]).  With
   ASN.1, data types are explicit in the representation, while with XDR,
   the data types of components are described externally as part of an
   interface specification.

   Many other protocols use data structures directly (without data
   encapsulation) by describing the data structure with Backus Normal
   Form (BNF) [26]; many IETF protocols use an Augmented Backus-Naur
   Form (ABNF) [18].  The Simple Mail Transfer Protocol [23] is an
   example of a protocol specified using ABNF.

   Representation methods differ from XML in several important ways:

   Specification encoding: XML schema are themselves represented in XML,
   and the specification itself can be written using arbitrary
   characters from the language.  The specification of representations
   in other systems (ASN.1, XDR, ABNF) are generally in ASCII [27] text.

   Text Encoding and character sets: the character encoding used to
   represent a formal specification.  XML defines a consistent character
   model based on ISO 10646 [32], with a base that supports at least
   UTF-8 [4] and UTF-16 [22], and allows for other encodings.  While
   ASN.1 and XDR may carry strings in any encoding, there is no common
   mechanism for defining character encodings within them.  Typically,
   ABNF definitions tend to be defined in terms of octets or characters
   in ASCII.




Hollenbeck, et al.      Expires October 28, 2002                [Page 8]


Internet-Draft          XML Within IETF Protocols             April 2002


   Data Encoding: XML is based on a character model.  XML Schema [11]
   includes mechanisms for representing some datatypes (integer, date,
   array, etc.) but other binary datatypes are encoded in Base64 [17].
   ASN.1 and XDR have rich mechanisms for encoding a wide variety of
   datatypes.

   Extensibility: XML has a rich extensibility model: XML
   representations can frequently be versioned independently.  Many XML
   representations can be extended by adding new element names and
   attributes (if done compatibly); other extensions can be added by
   defining new XML namespaces [9], though there is no standard
   mechanism in XML to indicating whether or not new extensions are
   mandatory to recognize.  ASN.1 is similarly extensible through the
   use of Object Identifiers (OIDs).  XDR representations tend to not be
   independently extensible by different parties because the framing and
   datatypes are implicit and not self-describing.  The extensibility of
   BNF-based protocol elements needs to be explicitly planned.

   Legibility of protocol elements: As noted above, XML is text-based,
   and thus carries the advantages (and disadvantages) of text-based
   protocol elements.  Typically this is shared with (A)BNF-defined
   protocol elements.  ASN.1 and XDR use binary encodings which are not
   visible.

   ASN.1, XDR, and BNF are described here as examples of alternatives to
   XML for use in IETF protocols.  There are other alternatives, but a
   complete enumeration of all possible alternatives is beyond the scope
   of this document.























Hollenbeck, et al.      Expires October 28, 2002                [Page 9]


Internet-Draft          XML Within IETF Protocols             April 2002


4. XML Use Considerations and Recommendations

   This section notes several aspects of XML and makes recommendations
   for use.  Since the 1998 publication of XML version 1 [35], an
   editorial second edition [8] was published in 2000; this section
   refers to the second edition.

4.1 XML Declarations

   An XML declaration (defined in section 2.8 of [8]) is a small header
   at the beginning of an XML data stream that indicates the XML version
   and the character encoding used.  For example,

   <?xml version="1.0" encoding="UTF-8"?>

   specifies the use of XML version 1 and UTF-8 character encoding.

   Protocol specifications must be clear about use of XML declarations.
   In some cases, the XML used is a small fragment in a larger context,
   where the XML version is fixed at "1.0" and the character encoding is
   known to be "UTF-8".  In those cases, the XML declaration might add
   extra overhead.  In other cases, the XML is a larger component which
   may find its way alone as an external entity body, transported as a
   MIME message.  In those cases, the XML declaration is an important
   marker and useful for reliability and extensibility.  The XML
   declaration is also an important marker for character set/encoding
   (see Section 5.1), if any encoding other than UTF-8 is allowed.  In
   general, an XML protocol element should either disallow XML
   declarations ("MUST NOT be used") or require one ("MUST have").  A
   design which allows but does not require an XML declaration leads to
   unreliable implementations.  When in doubt, require an XML
   declaration.

4.2 XML Processing Instructions

   An XML processing instruction (defined in section 2.6 of [8]) is a
   component of an XML document that signals extra "out of band"
   information to the receiver; a common use of XML processing
   instructions are for document applications.  For example, the XML2RFC
   application used to generate this document and described in [21]
   supports a "table of contents" processing instruction:

   <?rfc toc="yes"?>

   Again, protocol specifications must be clear about whether -- and if
   so, what kind of -- XML processing instructions are allowed.
   However, XML processing instructions appear to have rare
   applicability to XML fragments embedded in Internet protocols, and it



Hollenbeck, et al.      Expires October 28, 2002               [Page 10]


Internet-Draft          XML Within IETF Protocols             April 2002


   is recommended that their use be explicitly disallowed ("MUST NOT
   use").  In cases where XML processing instructions are allowed, the
   nature of the allowable processing instructions should be specified
   explicitly.

4.3 Well-Formedness

   A well-formed XML instance is one in which all character and markup
   data conforms to a specific set of structural rules defined in
   section 2.1 of [8].

   Character and markup data that is not well-formed is not XML; well-
   formedness is the basis for syntactic compatibility with XML.
   Without well-formedness, all of the advantages of using XML
   disappear.  For this reason, it is recommended that protocol
   specifications explicitly require XML well-formedness ("MUST be well-
   formed").

   The IETF has a long-standing tradition of "be liberal in what you
   accept" that might seem to be at odds with this recommendation.
   Given that XML requires well-formedness, XML parsers are typically
   intolerant of well-formedness errors.  Protocol designers need to
   recognize this limitation and provide specific guidelines for
   recovery when malformed data is encountered.

4.4 Validity and Extensibility

   There are formal mechanisms for XML for defining structural and data
   content constraints that constrain the identity of elements or
   attributes or the values contained within them:

   A "Document Type Definition" (DTD) is defined in section 2.8 of [8];
   the concept came from a similar mechanism for SGML.

   XML Schema (defined in [10] and [11]) provides additional features to
   allow a tighter and more precise specification of allowable protocol
   syntax and data type specifications.

   There are also a number of other mechanisms for describing XML
   instance validity; these include, for example, Schematron [44], RELAX
   NG [45], and the Document Schema Definition Language [33].

   There is ongoing discussion within the XML community on the use and
   applicability of various constraint mechanisms.  The choice of tool
   depends on the needs for extensibility or for a formal language and
   mechanism for constraining permissible values and validating
   adherence to the constraints.  An Internet protocol that uses XML
   must choose whether or not to describe "valid" XML protocol elements



Hollenbeck, et al.      Expires October 28, 2002               [Page 11]


Internet-Draft          XML Within IETF Protocols             April 2002


   using an appropriate validity mechanism, and whether and how to
   require validity.  Many protocols have successfully used the DTD
   mechanism for describing validity, whether or not they insist that
   all XML elements are valid.  However, the features in XML Schema for
   data typing and constraining values seem very appropriate for many of
   the uses of XML.

   This document recommends that, in the absence of reasons to choose
   some other mechanism, protocol designs use W3C XML Schema as the
   language for describing validity.  Note, though, that there is still
   some controversy within the XML community relating to validity and
   XML Schema; the other mechanisms described above have largely been
   developed as a result of the ongoing debate.

   Whether protocol definitions also require the corresponding protocol
   elements be valid according to the schema depends to some degree on
   the extensibility design; for example, if the protocol has its own
   versioning mechanism, way of updating the schema, or pointing to a
   new one.  The use of XML namespaces (Section 4.5) allows other kinds
   of extensibility without compromising schema validity.

   For whatever formalism chosen, there are often additional constraints
   that cannot be expressed in that formalism.  These additional
   requirements should be clearly called out in the specification.
   Ideally, a process model might first check for well-formedness; if
   OK, apply the primary formalism and, if the instances "passes", apply
   the other constraints so that the entire set (or as mush is machine
   processable) can be checked at the same time.

4.5 Namespaces

   XML namespaces, defined in [9], provide a means of assigning markup
   to a specific vocabulary.  If two elements or attributes from
   different vocabularies have the same name, they can be distinguished
   unambiguously if they belong to different namespaces.  Additionally,
   namespaces provide significant support for protocol extensibility as
   they can be defined, reused, and processed dynamically.

   Markup vocabulary collisions are very possible when namespaces are
   not used to separate and uniquely identify vocabularies.  Protocol
   definitions should use existing XML namespaces where appropriate.
   When a new namespace is needed, the "namespace name" is a URI that is
   used to identify the namespace; it's also useful for that URI to
   point to a description of the namespace.  Typically (and recommended
   practice in W3C) is to assign namespace names using persistent http
   URIs.

   In the case of namespaces in IETF standards-track documents, it would



Hollenbeck, et al.      Expires October 28, 2002               [Page 12]


Internet-Draft          XML Within IETF Protocols             April 2002


   be useful if there were some permanent part of the IETF's own web
   space that could be used for this purpose.  In lieu of such, other
   permanent URIs can be used, e.g., URNs in the IETF URN namespace (see
   [13] and [14]).

4.5.1 Namespaces and Attributes

   There is a frequently misunderstood aspect of the relationship
   between unprefixed attributes and the default XML namespace - the
   natural assumption is that an unprefixed attribute is qualified by
   the default namespace, but this is not true.  Rather, the unprefixed
   attribute belongs to a set of attributes that are defined
   specifically for the element to which it is applied.  Thus, in the
   following:

      <ns1:fox a="xxx" n:b="qqq"/>
      <ns1:bay a="yyy" n:b="rrr"/>
      <ns2:baz a="zzz" n:b="sss"/>

   The meaning of attribute "a=" is defined separately for each element.
   By comparison, a prefixed attribute name is defined independently of
   the element to which it is applied.  For details, see appendix A.2 of
   [9].

   One practical way to deal with this is to always use any attribute
   that can be applied to any element from any namespace with a
   namespace prefix, even when that namespace is also the default
   namespace.

   As described in Section 3 there is no standard mechanism in XML for
   indicating whether or not new extensions are mandatory to recognize.
   XML-based protocol specifications should thus explicitly describe
   extension mechanisms and requirements to recognize or ignore
   extensions.

4.6 Element and Attribute Design Considerations

   XML provides much flexibility in allowing a designer to use either
   elements or element attributes to carry data.  Element attributes are
   generally intended to contain meta-data that describes the value of
   the element, and as such they are subject to the following
   restrictions:

   o  Attributes are unordered,

   o  There can be no more than one instance of a given attribute within
      a given element, and




Hollenbeck, et al.      Expires October 28, 2002               [Page 13]


Internet-Draft          XML Within IETF Protocols             April 2002


   o  Attribute values can contain only simple XML data types.

   Consider the following example that describes an IP address using a
   "type" attribute to describe the address value:

      <address type="ipv4">10.1.2.3</address>

   XML allows the same information to be encapsulated using a <type>
   element instead of a "type" attribute:

      <address>
        <type>ipv4</type>
        <value>10.1.2.3</value>
      </address>

   The first example is preferable, in that the "type" attribute is used
   to describe the value of the <address> element.

   Another way of encoding the same information would be to use markup
   for the "type":

      <address>
        <type><ipv4/></type>
        <value>10.1.2.3</value>
      </address>

   This last form allows for extensibility: the "ipv4" space can be
   extended using other namespaces, and the <ipv4> element can include
   additional markup.

   Many protocols include parameters that are selected from an
   enumerated set of values.  As shown in the above examples, such
   enumerated values can be encoded as elements, attributes, or strings
   within element values.  Any protocol design should consider how the
   set of enumerated values is to be extended: by revising the protocol,
   by including different values in different XML namespaces, or by
   establishing an IANA registry (as per RFC 2434 [20]).  In addition, a
   common practice in XML is to use a URI as an XML attribute value or
   content.

   Languages that describe syntactic validity often provide a mechanism
   for specifying "default" values for an attribute.  If an element does
   not specify a value for the attribute, then the "default" value is
   used.  The use of default values for attributes is discouraged by
   this document.  Although the use of this feature can reduce both the
   size and clutter of XML documents, it has a negative impact on
   software which doesn't know the document's validity constraints
   (e.g., for packet tracing or digital signature).



Hollenbeck, et al.      Expires October 28, 2002               [Page 14]


Internet-Draft          XML Within IETF Protocols             April 2002


   Consistent use of elements and element attributes is a characteristic
   of a sound design.  The choices depend on the likely extensibility
   needed.  Protocols are strongly urged to use elements as the primary
   XML data presentation structure.  Attributes, if used at all in
   protocol elements, should contain only meta-data that describes the
   value of the enclosing element.

4.7 Binary Data

   XML is defined as a character stream rather than a stream of octets.
   There is no way to embed raw binary data directly within an XML data
   stream; all binary data must be encoded as characters.  There are a
   number of possible encodings; for example, XML Schema [11] defines
   encodings using decimal digits for integers, Base64 [17], or
   hexadecimal digits.  In addition, binary data might be transmitted
   using some other communication channel, and referenced within the XML
   data itself using a URI.

   Protocols that need a container that can hold both structural data
   and large quantities of binary data should consider carefully whether
   XML is appropriate, since the Base64 and hex encodings are
   inefficient.  Otherwise, protocols should use the mechanisms of XML
   Schema to represent binary data; the Base64 encoding is best for
   larger quantities of data.

   Note that the XML character range does not include arbitrary
   "control" characters.  This means that strings that might be
   considered "text" within an ABNF-defined protocol element may need to
   be treated as binary data within an XML representation.

4.8 Incremental Processing

   In some situations, is possible to incrementally process an XML
   document as each tag is received; this is analogous to the process by
   which browsers incrementally render HTML pages as they are received.
   Note that incremental processing is difficult to implement if
   interspersed across multiple interactions.  In other words, if a
   protocol requires incremental processing across both directions of a
   bidirectional stream, then it may place significant burden on
   protocol implementers.











Hollenbeck, et al.      Expires October 28, 2002               [Page 15]


Internet-Draft          XML Within IETF Protocols             April 2002


5. Internationalization Considerations

   This section describes internationalization considerations for the
   use of XML to represent data in IETF protocols.  In addition to the
   recommendations here, IETF policy on the use of character sets and
   languages [3] also apply.

5.1 Character Sets and Encodings: UTF-8 and UTF-16

   IETF protocols frequently speak of the "character set" or "charset"
   of a string, which is used to denote both the character repertoire
   and the encoding used to represent sequences of characters as
   sequences of bytes.

   XML performs all character processing in terms of the Universal
   Character Set (UCS, [32] and [34]).  XML requires all XML processors
   to support both the UTF-8 [4] and UTF-16 [22] encodings of UCS,
   although other encodings (charsets) compatible with UCS may be
   allowed.

   Protocols must allow both UTF-8 and UTF-16 (for XML compatibility).
   This document recommends allowing only those encodings.  In cases
   where there are strong reason to allow others, it should be required
   to specify the encoding using an "encoding" attribute in the XML
   declaration (see Section 4.1).

5.2 Language Declaration

   Text encapsulated in XML can be represented in many different human
   languages, and it is often useful to explicitly identify the language
   used to present the text.  XML version 1 defines a special attribute
   in the "xml" namespace, xml:lang, that can be used to specify the
   language used to represent data in an XML document.  The xml:lang
   attribute and the values it can assume are defined in section 2.12 of
   [8].

   It is strongly recommended that protocols representing data in a
   human language mandate use of an xml:lang attribute if the XML
   instance might be interpreted in language-dependent contexts.

5.3 Other Considerations

   There are standard mechanisms in the typography of some human
   languages that can be difficult to represent using merely XML
   character string data types.  For example, pronunciation clues can be
   provided using Ruby annotation [37], and embedding controls (such as
   those described in section 3.4 of [43]) or an XHTML [38] "dir"
   attribute can be used to note the proper display direction for



Hollenbeck, et al.      Expires October 28, 2002               [Page 16]


Internet-Draft          XML Within IETF Protocols             April 2002


   bidirectional text.

   There are a number of tricky issues that can arise when using
   extended character sets with XML document formats.  For example:

   o  there are different ways of representing characters consisting of
      combining characters, and

   o  there has been some debate about whether URIs should be
      represented using a restricted US-ASCII subset or arbitrary
      Unicode (c.f.  "URI character sequence" vs "original character
      sequence" in RFC 2396 [19]).

   Some of these issues are discussed, with recommendations, in [40].

   It is strongly recommended that protocols representing data in a
   human language reuse existing mechanisms as needed to ensure proper
   display of human-legible text.

































Hollenbeck, et al.      Expires October 28, 2002               [Page 17]


Internet-Draft          XML Within IETF Protocols             April 2002


6. IANA Considerations

   This section does not contain any specific directives for IANA.
   However, when XML is used in an IETF protocol there are multiple
   factors that might require IANA action, including:

   o  XML media types.  Some protocols have protocol elements that are
      MIME bodies, and allow MIME labeling.  In cases where a MIME label
      is used to identify a protocol element the MIME labeling policies
      defined in RFC 3023 [5] should be followed and an XML declaration
      should be present.  The "application/xml" media type is most
      appropriate for general XML; if a new media type is expected, it
      should be registered.

   o  URI registration.  There is an ongoing effort ( [13], [14]) to
      create a URN namespace explicitly for defining URIs for namespace
      names and other URI-designated protocol elements for use within
      IETF standards track documents; it might also establish IETF
      policy for such use.
































Hollenbeck, et al.      Expires October 28, 2002               [Page 18]


Internet-Draft          XML Within IETF Protocols             April 2002


7. Security Considerations

   Being text-based, protocols built with XML face significant threats,
   including unintended disclosure, modification, and replay.  Simple
   passive attacks, such as packet sniffing, allow an attacker to
   capture and view information intended for someone else.  Captured
   data can be modified and replayed to the original intended recipient,
   with the recipient having no way to know that the information has
   been compromised, detect modifications, be assured of the sender's
   identity, or to confirm which protocol instance is legitimate.

   Several security service options are available mitigate these risks.
   Though XML does not include any built-in security services, other
   protocols and protocol layers provide services that can be used to
   protect XML protocols.  XML encryption [12] provides privacy services
   to prevent unintended disclosure.  Canonical XML [6] XML digital
   signatures [7] provide integrity services to detect modification and
   authentication services to confirm the identity of the data source.
   Other IETF security protocols (e.g., the Transport Layer Security
   (TLS) protocol [2]) are also available to protect data and service
   endpoints as appropriate.  Given the lack of security services in
   XML, it is imperative that protocol specifications REQUIRE additional
   security services to counter common threats and attacks; the specific
   required services will depend on the protocol's threat model.



























Hollenbeck, et al.      Expires October 28, 2002               [Page 19]


Internet-Draft          XML Within IETF Protocols             April 2002


8. Acknowledgements

   The authors would like to thank the following people who have
   provided significant contributions to the development of this
   document:

   Tim Bray, Josh Cohen, Alan Crouch, Martin Duerst, Yaron Goland,
   Graham Klyne, Chris Lilley, Murata Makoto, Andrew Newton, Julian
   Reschke, Jonathan Rosenberg, and Simon St.Laurent.










































Hollenbeck, et al.      Expires October 28, 2002               [Page 20]


Internet-Draft          XML Within IETF Protocols             April 2002


Normative References

   [1]   Bradner, S., "Key words for use in RFCs to Indicate Requirement
         Levels", BCP 14, RFC 2119, March 1997.

   [2]   Dierks, T., Allen, C., Treese, W., Karlton, P., Freier, A. and
         P. Kocher, "The TLS Protocol Version 1.0", RFC 2246, January
         1999.

   [3]   Alvestrand, H., "IETF Policy on Character Sets and Languages",
         BCP 18, RFC 2277, January 1998.

   [4]   Yergeau, F., "UTF-8, a transformation format of ISO 10646", RFC
         2279, January 1998.

   [5]   Murata, M., St.Laurent, S. and D. Kohn, "XML Media Types", RFC
         3023, January 2001.

   [6]   Boyer, J., "Canonical XML Version 1.0", RFC 3076, March 2001.

   [7]   Eastlake, D., Reagle, J. and D. Solo, "(Extensible Markup
         Language) XML-Signature Syntax and Processing", RFC 3275, March
         2002.

   [8]   Bray, T., Paoli, J., Sperberg-McQueen, C. and E. Maler,
         "Extensible Markup Language (XML) 1.0 (2nd ed)", W3C REC-xml,
         October 2000, <http://www.w3.org/TR/2000/REC-xml-20001006>.

   [9]   Bray, T., Hollander, D. and A. Layman, "Namespaces in XML", W3C
         REC-xml-names, January 1999, <http://www.w3.org/TR/REC-xml-
         names>.

   [10]  Thompson, H., Beech, D., Maloney, M. and N. Mendelsohn, "XML
         Schema Part 1: Structures", W3C REC-xmlschema-1, May 2001,
         <http://www.w3.org/TR/xmlschema-1/>.

   [11]  Biron, P. and A. Malhotra, "XML Schema Part 2: Datatypes", W3C
         REC-xmlschema-2, May 2001, <http://www.w3.org/TR/xmlschema-2/>.

   [12]  Imamura, T., Dillaway, B., Schaad, J. and E. Simon, "XML
         Encryption Syntax and Processing", W3C REC-xmlenc-core, October
         2001, <http://www.w3.org/TR/xmlenc-core/>.









Hollenbeck, et al.      Expires October 28, 2002               [Page 21]


Internet-Draft          XML Within IETF Protocols             April 2002


Informative References

   [13]  Mealling, M., "An IETF URN Sub-namespace for Registered
         Protocol Parameters", draft-mealling-iana-urn-02 (work in
         progress), October 2001.

   [14]  Mealling, M., "The IETF XML Registry", draft-mealling-iana-
         xmlns-registry-03 (work in progress), November 2001.

   [15]  Case, J., Fedor, M., Schoffstall, M. and J. Davin, "Simple
         Network Management Protocol (SNMP)", STD 15, RFC 1157, May
         1990.

   [16]  Srinivasan, R., "XDR: External Data Representation Standard",
         RFC 1832, August 1995.

   [17]  Freed, N. and N. Borenstein, "Multipurpose Internet Mail
         Extensions (MIME) Part One: Format of Internet Message Bodies",
         RFC 2045, November 1996.

   [18]  Crocker, D. and P. Overell, "Augmented BNF for Syntax
         Specifications: ABNF", RFC 2234, November 1997.

   [19]  Berners-Lee, T., Fielding, R. and L. Masinter, "Uniform
         Resource Identifiers (URI): Generic Syntax", RFC 2396, August
         1998.

   [20]  Narten, T. and H. Alvestrand, "Guidelines for Writing an IANA
         Considerations Section in RFCs", BCP 26, RFC 2434, October
         1998.

   [21]  Rose, M., "Writing I-Ds and RFCs using XML", RFC 2629, June
         1999.

   [22]  Hoffman, P. and F. Yergeau, "UTF-16, an encoding of ISO 10646",
         RFC 2781, February 2000.

   [23]  Klensin, J., "Simple Mail Transfer Protocol", RFC 2821, April
         2001.

   [24]  Shepler, S., Callaghan, B., Robinson, D., Thurlow, R., Beame,
         C., Eisler, M. and D. Noveck, "NFS version 4 Protocol", RFC
         3010, December 2000.

   [25]  Kennedy, H., "Binary Lexical Octet Ad-hoc Transport", RFC 3252,
         1 April 2002.

   [26]  Backus, J., "The syntax and semantics of the proposed



Hollenbeck, et al.      Expires October 28, 2002               [Page 22]


Internet-Draft          XML Within IETF Protocols             April 2002


         international algebraic language of the Zurich ACM-GAMM
         conference", June 1959.

   [27]  American National Standards Institute, "Code Extension
         Techniques for Use with the 7-bit Coded Character Set of
         American National Standard Code (ASCII) for Information
         Interchange", ANSI X3.41, FIPS PUB 35, 1974.

   [28]  American National Standards Institute, "Information Retrieval:
         Application Service Definition and Protocol Specification",
         ANSI Z39.50, ISO Standard 23950, 1995.

   [29]  International Organization for Standardization, "Information
         Processing Systems - Open Systems Interconnection -
         Specification of Abstract Syntax Notation One (ASN.1)", ISO
         Standard 8824, December 1990.

   [30]  International Organization for Standardization, "Information
         Processing Systems - Open Systems Interconnection -
         Specification of Basic Encoding Rules for Abstract Syntax
         Notation One (ASN.1)", ISO Standard 8825, December 1990.

   [31]  International Organization for Standardization, "Information
         processing - Text and office systems - Standard Generalized
         Markup Language (SGML)", ISO Standard 8879, 1988.

   [32]  International Organization for Standardization, "Information
         Technology - Universal Multiple-octet coded Character Set (UCS)
         - Part 1: Architecture and Basic Multilingual Plane", ISO
         Standard 10646-1, May 1993.

   [33]  International Organization for Standardization, "Document
         Description and Processing Languages", December 2001, <http://
         www.jtc1.org/FTP/Public/SC34/DOCREG/0275.htm/>.

   [34]  Unicode Consortium, "Unicode 3.2", UAX 28, March 2002, <http://
         www.unicode.org/unicode/reports/tr28/>.

   [35]  Bray, T., Paoli, J. and C. Sperberg-McQueen, "Extensible Markup
         Language (XML) 1.0", W3C REC-xml-1998, February 1998, <http://
         www.w3.org/TR/1998/REC-xml-19980210/>.

   [36]  Lassila, O. and R. Swick, "Resource Description Framework (RDF)
         Model and Syntax Specification", W3C REC-rdf-syntax, February
         1999, <http://www.w3.org/TR/REC-rdf-syntax>.

   [37]  Suignard, M., Ishikawa, M., Duerst, M. and T. Texin, "Ruby
         Annotation", W3C REC-RUBY, May 2001, <http://www.w3.org/TR/



Hollenbeck, et al.      Expires October 28, 2002               [Page 23]


Internet-Draft          XML Within IETF Protocols             April 2002


         ruby/>.

   [38]  Pemberton, S., "XHTML 1.0: The Extensible HyperText Markup
         Language", W3C REC-XHTML, January 2000, <http://www.w3.org/TR/
         xhtml1/>.

   [39]  W3C Communications Team, "XML in 10 points", November 2001,
         <http://www.w3.org/XML/1999/XML-in-10-points/>.

   [40]  Duerst, M., Yergeau, F., Ishida, R., Wolf, M., Freytag, A. and
         T. Texin, "Character Model for the World Wide Web 1.0",
         February 2002, <http://www.w3.org/TR/charmod/>.

   [41]  Williams, S. and M. Jones, "XML Protocol Abstract Model", July
         2001, <http://www.w3.org/TR/xmlp-am/>.

   [42]  Gudgin, M., Hadley, M., Moreau, J. and H. Nielsen, "SOAP
         Version 1.2 Part 1: Messaging Framework", December 2001,
         <http://www.w3.org/TR/soap12-part1/>.

   [43]  Duerst, M. and A. Freytag, "Unicode in XML and other Markup
         Languages", February 2002, <http://www.w3.org/TR/unicode-xml/
         >.

   [44]  Jelliffe, R., "The Schematron", November 2001, <http://
         www.ascc.net/xml/resource/schematron/schematron.html/>.

   [45]  OASIS Technical Committee: RELAX NG, "RELAX NG Specification",
         December 2001, <http://www.oasis-open.org/committees/relax-ng/
         spec-20011203.html/>.


Authors' Addresses

   Scott Hollenbeck
   VeriSign, Inc.
   21345 Ridgetop Circle
   Dulles, VA  20166-6503
   US

   Phone: +1 703 948 3257
   EMail: shollenbeck@verisign.com









Hollenbeck, et al.      Expires October 28, 2002               [Page 24]


Internet-Draft          XML Within IETF Protocols             April 2002


   Marshall T. Rose
   Dover Beach Consulting, Inc.
   POB 255268
   Sacramento, CA  95865-5268
   US

   Phone: +1 916 483 8878
   EMail: mrose@dbc.mtview.ca.us


   Larry Masinter
   Adobe Systems Incorporated
   Mail Stop W14
   345 Park Ave.
   San Jose, CA  95110
   US

   Phone: +1 408 536 3024
   EMail: LMM@acm.org
   URI:   http://larry.masinter.net































Hollenbeck, et al.      Expires October 28, 2002               [Page 25]


Internet-Draft          XML Within IETF Protocols             April 2002


Appendix A. Appendix A: Change History

   The following changes were made to produce version -02 from -01:

   o  Changed the title slightly ("in IETF" to "within IETF") to help
      clarify the scope.

   o  Changed the abstract slightly (added "being developed") to the
      first sentence.

   o  Changed the "conventions" paragraph slightly.

   o  Added text to the introduction/scope to clarify that the document
      is not intended as an endorsement to use XML.

   o  Removed TBD from Section 1.

   o  Added an additional list element on binary data encoding in
      Section 2, added another sentence to the "text based" list
      element, and modified the "processing speed" list element.

   o  Rewrote the first paragraphs of Section 3, adding a reference to
      RFC 3252.

   o  Rewrote Section 4.1.

   o  Reworded and added text to Section 4.3.

   o  Changed "in lieu of" to "in the absence of" in old paragraph 7 of
      Section 4.4.

   o  Restructured Section 4.4 to acknowledge that there is still some
      controversy surrounding XML Schema.

   o  Added paragraph on default attributes to Section 4.6, added a new
      paragraph to address value enumeration, reworked the example, and
      changed the last paragraph slightly.

   o  Rewrote Section 4.7.

   o  Added Section 4.8 to address incremental processing.

   o  Rewrote portions of Section 5; adding references to Unicode 3.2
      and ISO 10646.

   The following changes were made to produce version -01 from -00:

   o  Changed "eXtensible" to "Extensible" throughout.



Hollenbeck, et al.      Expires October 28, 2002               [Page 26]


Internet-Draft          XML Within IETF Protocols             April 2002


   o  Fixed the discussion mailing list name in the front matter.

   o  Changed use of "data encapsulation" to "structured data
      representation" (or similar) throughout.

   o  Added namespace reference and text to discussion of extensibility
      in Section 3.

   o  Rewrote Section 4.4 and added needed references.

   o  Added text to address extension recognition and attributes in
      Section 4.5.

   o  Added another attribute restriction in Section 4.6.

   o  Added reference to the "An IETF URN Sub-namespace for Registered
      Protocol Parameters" I-D in Section 6.

   o  Added reference to RFC 2396 and W3C character model in Section 5.
































Hollenbeck, et al.      Expires October 28, 2002               [Page 27]


Internet-Draft          XML Within IETF Protocols             April 2002


Full Copyright Statement

   Copyright (C) The Internet Society (2002).  All Rights Reserved.

   This document and translations of it may be copied and furnished to
   others, and derivative works that comment on or otherwise explain it
   or assist in its implementation may be prepared, copied, published
   and distributed, in whole or in part, without restriction of any
   kind, provided that the above copyright notice and this paragraph are
   included on all such copies and derivative works.  However, this
   document itself may not be modified in any way, such as by removing
   the copyright notice or references to the Internet Society or other
   Internet organizations, except as needed for the purpose of
   developing Internet standards in which case the procedures for
   copyrights defined in the Internet Standards process must be
   followed, or as required to translate it into languages other than
   English.

   The limited permissions granted above are perpetual and will not be
   revoked by the Internet Society or its successors or assigns.

   This document and the information contained herein is provided on an
   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Acknowledgement

   Funding for the RFC Editor function is currently provided by the
   Internet Society.



















Hollenbeck, et al.      Expires October 28, 2002               [Page 28]