Digest Values for DOM (DOMHASH)
RFC 2803

 
Document Type RFC - Informational (April 2000; No errata)
Last updated 2013-03-02
Stream IETF
Formats plain text pdf html
Stream WG state (None)
Document shepherd No shepherd assigned
IESG IESG state RFC 2803 (Informational)
Telechat date
Responsible AD (None)
Send notices to (None)
Network Working Group                                         H. Maruyama
Request for Comments: 2803                                      K. Tamura
Category: Informational                                        N. Uramoto
                                                                      IBM
                                                               April 2000

                    Digest Values for DOM (DOMHASH)

Status of this Memo

   This memo provides information for the Internet community.  It does
   not specify an Internet standard of any kind.  Distribution of this
   memo is unlimited.

Copyright Notice

   Copyright (C) The Internet Society (2000).  All Rights Reserved.

Abstract

   This memo defines a clear and unambiguous definition of digest (hash)
   values of the XML objects regardless of the surface string variation
   of XML. This definition can be used for XML digital signature as well
   efficient replication of XML objects.

Table of Contents

   1. Introduction............................................2
   2. Digest Calculation......................................3
   2.1. Overview..............................................3
   2.2. Namespace Considerations..............................4
   2.3. Definition with Code Fragments........................5
   2.3.1. Text Nodes..........................................5
   2.3.2. Processing Instruction Nodes........................6
   2.3.3. Attr Nodes..........................................6
   2.3.4. Element Nodes.......................................7
   2.3.5. Document Nodes......................................9
   3. Discussion..............................................9
   4. Security Considerations.................................9
   References................................................10
   Authors' Addresses........................................10
   Full Copyright Statement..................................11

Maruyama, et al.             Informational                      [Page 1]
RFC 2803            Digest Values for DOM (DOMHASH)           April 2000

1. Introduction

   The purpose of this document is to give a clear and unambiguous
   definition of digest (hash) values of the XML objects [XML].  Two
   subtrees are considered identical if their hash values are the same,
   and different if their hash values are different.

   There are at least two usage scenarios of DOMHASH. One is as a basis
   for digital signatures for XML. Digital signature algorithms normally
   require hashing a signed content before signing.  DOMHASH provides a
   concrete definition of the hash value calculation.

   The other is to use DOMHASH when synchronizing two DOM structures
   [DOM]. Suppose that a server program generates a DOM structure which
   is to be rendered by clients. If the server makes frequent small
   changes on a large DOM tree, it is desirable that only the modified
   parts are sent over to the client. A client can initiate a request by
   sending the root hash value of the structure in the cache memory. If
   it matches with the root hash value of the current server structure,
   nothing needs be sent. If not, then the server compares the client
   hash with the older versions in the server's cache. If it finds one
   that matches the client's version of the structure, then it locates
   differences with the current version by recursively comparing the
   hash values of each node. This way, the client can receive only an
   updated portion of a large structure without requesting the whole
   thing.

   One way of defining digest values is to take a surface string as the
   input for a digest algorithm. However, this approach has several
   drawbacks. The same internal DOM structure may be represented in may
   different ways as surface strings even if they strictly conform to
   the XML specification.  Treatment of white spaces, selection of
   character encodings, entity references (i.e., use of ampersands), and
   so on have impact on the generation of a surface string. If the
   implementations of surface string generation are different, the hash
   values would be different, resulting in unvalidatable digital
   signatures and unsuccessful detection of identical DOM structures.
   Therefore, it is desirable that digest of DOM is defined in the DOM
   terms -- that is, as an unambiguous algorithm operating on a DOM
   tree.  This is the approach we take in this specification.

   Introduction of namespace is another source of variation of surface
   string because different namespace prefixes can be used for
   representing the same namespace URI [URI]. In the following example,
   the namespace prefix "edi" is bound to the URI
   "http://ecommerce.org/schema" but this prefix can be arbitrary chosen
Show full document text