Internet-Draft                                                G. Staykov
Intended status: Standards Track                                  VMware
Expires: May 07, 2013                                              J. Hu
                                                                  VMware
                                                       November 07, 2012




                           JSON Canonical Form
                 draft-staykov-hu-json-canonical-form-00

Abstract

  A single JSON document can have multiple logically equivalent
  physical representations. While convenient for human interaction, this
  flexibility is inconvenient for cases where a machine is used to
  assess the logical equivalence of documents. In cases where logical
  equivalence is useful, an encoder should produce a canonical form of a
  JSON document. For example, since digital signatures demand the same
  physical representation for logically equivalent documents, a
  canonical physical representation would allow the signature to apply
  to the logical document. This internet draft has the goal to define a
  canonical form of JSON documents. Two logically equivalent documents
  should have same canonical form.

Requirements Language

  The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
  "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
  document are to be interpreted as described in [RFC2119].

Status of this Memo

  This Internet-Draft is submitted in full conformance with the
  provisions of BCP 78 and BCP 79. Internet-Drafts are working
  documents of the Internet Engineering Task Force (IETF). Note that
  other groups may also distribute working documents as
  Internet-Drafts. The list of current Internet-Drafts is at
  http://datatracker.ietf.org/drafts/current.

  Internet-Drafts are draft documents valid for a maximum of six months
  and may be updated, replaced, or obsoleted by other documents at any
  time. It is inappropriate to use Internet-Drafts as reference
  material or to cite them other than as "work in progress."

Copyright Notice

  Copyright (c) 2012 IETF Trust and the persons identified as the
  document authors. All rights reserved.

  This document is subject to BCP 78 and the IETF Trust's Legal
  Provisions Relating to IETF Documents
  (http://trustee.ietf.org/license-info) in effect on the date of
  publication of this document. Please review these documents
  carefully, as they describe your rights and restrictions with respect
  to this document. Code Components extracted from this document must
  include Simplified BSD License text as described in Section 4.e of
  the Trust Legal Provisions and are provided without warranty as
  described in the Simplified BSD License.

1. Introduction

  JSON [JSON] is a lightweight data-interchange text format that is
  suitable for both humans and machines. It allows multiple physical
  representations that are logically equivalent. For example, a
  formatting change to add whitespaces and line endings to make a
  document more human readable will result in a different representation
  when doing a byte for byte comparison. There are cases however where
  it is essential to have a single physical representation of a data
  document. For example when a cryptographic hash is applied over a JSON
  document, a single physical representation allows the hash to
  represent the logical content of the document by removing variation in
  how that content is encoded in JSON. Thus a common physical
  representation of logically equivalent JSON documents should be
  defined. It is called canonical form.

2. JSON canonical form

  The canonical form is defined by the following rules:
  *  The document MUST be encoded in UTF-8 [UTF-8]
  *  Non-significant(1) whitespace characters MUST NOT be used
  *  Non-significant(1) line endings MUST NOT be used
  *  Entries (set of name/value pairs) in JSON objects MUST be sorted
     lexicographically(2) by their names
  *  Arrays MUST preserve their initial ordering

  (1)As defined in JSON data-interchange format [JSON], JSON objects
     consists of multiple "name"/"value" pairs and JSON arrays consists
     of multiple "value" fields. Non-significant means not part of
     "name" or "value".


  (2)Lexicographic comparison, which orders strings from least to
     greatest alphabetically based on the UCS (Unicode Character Set)
     codepoint values.

2.1 Canonical representation of data types

2.1.1 Double

  The double data type is represented as specified in the XML schema
  standard [XML]
  *  The canonical representation of the double data type consists of
     mantissa followed by "E", followed by exponent.
  *  Mantissa
     *  MUST be represented as a decimal. The decimal point is mandatory
     *  There MUST be a single non zero digit on the left of the decimal
        point (unless a zero is represented).
     *  There MUST be at least single digit on the right of the decimal
        point.
  *  Exponent
     *  Zero exponent is represented by "E0".
  *  "+" sign is prohibited in both the mantissa and the exponent.
  *  Leading zeroes are prohibited from the left side of the decimal
     point in the mantissa and from the exponent.
  *  Special values (NaN, INF) MUST not be used.

3. Applications

  The JSON canonical form can be used when digitally signing JSON
  documents generated from a serialization library.  Because
  serialization and deserialization libraries might tolerate variation
  in physical representation, different physical representations may
  result after several serialization / deserialization cycles.  This
  could result in false signature verification failures as the hash
  digest of the same document differs from the hash digest used when
  signing.  A way to avoid this problem is to use canonical form when
  signing and verifying hash digests.

4. Examples

4.1. Example 1

  Input:
  {
    "foo" : "foo bar"
  }

  Canonical form:
  {"foo":"foo bar"}

  Demonstrates:
  *  Non-significant whitespace characters and line endings are removed.
  *  Whitespaces inside name/value object entities are preserved.

4.2. Example 2

  Input:
  {
    "foo":"bar",
    "abc":"def",
    "zoo" :
      [
        "def",
        "abc"
      ]
  }

  Canonical Form:
  {"abc":"def","foo":"bar","zoo":["def","abc"]}

  Demonstrates:
  *  Non-significant whitespaces and line endings are removed.
  *  Name/value pairs in JSON objects are lexicographically sorted by
   "name" key.
  *  Array order is preserved.

4.3. Example 3

  Input:
  {
    "d1":-12.34e4,
    "d2":1E-130,
    "d3":0.0E-0,
    "d4":1.2
  }

  Canonical Form:
  {"d1":-1.234E5,"d2":1.0E-130,"d3":0.0E0,"d4":1.2E0}

  Demonstrates:
  *  Various canonical representations of double data types.

5. Security Considerations

  This document provides a groundwork needed for providing data
  integrity by using digital signatures over JSON messages.

6. IANA Considerations

  This document has no actions for IANA

7. References

7.1. Normative References

  [JSON]     http://www.json.org/

  [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
             Requirement Levels", BCP 14, RFC 2119, March 1997.

  [UTF-8]    UTF-8, a transformation format of ISO 10646, IETF RFC 3629.
             F. Yergeau. January 1998.
             http://www.ietf.org/rfc/rfc3629.txt

  [XML]      http://www.w3.org/TR/xmlschema-2

Authors' Addresses

  Georgi Staykov
  VMware
  Email: gstaykov@vmware.com

  Jeff Hu
  VMware
  Email: jhu@vmware.com