Skip to main content

Binary Uniform Language Kit 1.0
draft-thierry-bulk-01

The information below is for an old version of the document.
Document Type
This is an older version of an Internet-Draft whose latest revision state is "Active".
Author Pierre Thierry
Last updated 2013-07-30
RFC stream (None)
Formats
Stream Stream state (No stream defined)
Consensus boilerplate Unknown
RFC Editor Note (None)
IESG IESG state I-D Exists
Telechat date (None)
Responsible AD (None)
Send notices to (None)
draft-thierry-bulk-01
Network Working Group                                         P. Thierry
Internet-Draft                                      Thierry Technologies
Intended status: Experimental                             August 1, 2013
Expires: February 2, 2014

                    Binary Uniform Language Kit 1.0
                         draft-thierry-bulk-01

Abstract

   This specification describes a uniform, decentrally extensible and
   efficient format for data serialization.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on February 2, 2014.

Copyright Notice

   Copyright (c) 2013 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

Thierry                 Expires February 2, 2014                [Page 1]
Internet-Draft                    BULK1                      August 2013

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4
     1.1.  Rationale  . . . . . . . . . . . . . . . . . . . . . . . .  4
       1.1.1.  Definitions  . . . . . . . . . . . . . . . . . . . . .  4
       1.1.2.  State of the art . . . . . . . . . . . . . . . . . . .  5
     1.2.  Format overview  . . . . . . . . . . . . . . . . . . . . .  6
     1.3.  Conventions and Terminology  . . . . . . . . . . . . . . .  7
   2.  BULK syntax  . . . . . . . . . . . . . . . . . . . . . . . . .  7
     2.1.  Parsing algorithm  . . . . . . . . . . . . . . . . . . . .  8
     2.2.  Forms  . . . . . . . . . . . . . . . . . . . . . . . . . .  8
       2.2.1.  starting marker byte . . . . . . . . . . . . . . . . .  8
       2.2.2.  ending marker byte . . . . . . . . . . . . . . . . . .  9
     2.3.  Atoms  . . . . . . . . . . . . . . . . . . . . . . . . . .  9
       2.3.1.  nil  . . . . . . . . . . . . . . . . . . . . . . . . .  9
       2.3.2.  Array  . . . . . . . . . . . . . . . . . . . . . . . .  9
       2.3.3.  32 bits word . . . . . . . . . . . . . . . . . . . . .  9
       2.3.4.  64 bits word . . . . . . . . . . . . . . . . . . . . .  9
       2.3.5.  128 bits word  . . . . . . . . . . . . . . . . . . . . 10
       2.3.6.  Unsigned 8 bits integer  . . . . . . . . . . . . . . . 10
       2.3.7.  Unsigned 32, 64 and 128 bits integer . . . . . . . . . 10
       2.3.8.  Signed 32, 64 and 128 bits integer . . . . . . . . . . 10
       2.3.9.  Reserved marker bytes  . . . . . . . . . . . . . . . . 11
       2.3.10. Reference  . . . . . . . . . . . . . . . . . . . . . . 11
   3.  Standard namespaces  . . . . . . . . . . . . . . . . . . . . . 11
     3.1.  BULK core namespace  . . . . . . . . . . . . . . . . . . . 11
       3.1.1.  Version  . . . . . . . . . . . . . . . . . . . . . . . 12
       3.1.2.  Required namespace . . . . . . . . . . . . . . . . . . 12
       3.1.3.  Optional namespace . . . . . . . . . . . . . . . . . . 12
       3.1.4.  Strings encoding . . . . . . . . . . . . . . . . . . . 12
       3.1.5.  IANA registered character set  . . . . . . . . . . . . 13
       3.1.6.  Windows code page  . . . . . . . . . . . . . . . . . . 13
       3.1.7.  true . . . . . . . . . . . . . . . . . . . . . . . . . 13
       3.1.8.  false  . . . . . . . . . . . . . . . . . . . . . . . . 14
     3.2.  arithmetic namespace . . . . . . . . . . . . . . . . . . . 14
       3.2.1.  fraction . . . . . . . . . . . . . . . . . . . . . . . 14
       3.2.2.  Arbitrary precision signed integer . . . . . . . . . . 14
       3.2.3.  Binary floating-point number . . . . . . . . . . . . . 14
       3.2.4.  Decimal floating-point number  . . . . . . . . . . . . 15
   4.  Extension namespaces . . . . . . . . . . . . . . . . . . . . . 15
     4.1.  Official namespaces  . . . . . . . . . . . . . . . . . . . 15
     4.2.  User-defined namespaces  . . . . . . . . . . . . . . . . . 15
   5.  Profiles . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
     5.1.  Profile redundancy . . . . . . . . . . . . . . . . . . . . 16
     5.2.  Standard profile . . . . . . . . . . . . . . . . . . . . . 16
   6.  Security Considerations  . . . . . . . . . . . . . . . . . . . 16
     6.1.  Parsing  . . . . . . . . . . . . . . . . . . . . . . . . . 16
     6.2.  Forwarding . . . . . . . . . . . . . . . . . . . . . . . . 16
   7.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 17
   8.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 18

Thierry                 Expires February 2, 2014                [Page 2]
Internet-Draft                    BULK1                      August 2013

   9.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 18
     9.1.  Normative References . . . . . . . . . . . . . . . . . . . 18
     9.2.  Informative references . . . . . . . . . . . . . . . . . . 18

Thierry                 Expires February 2, 2014                [Page 3]
Internet-Draft                    BULK1                      August 2013

1.  Introduction

1.1.  Rationale

   This specification aims at finding an original trade-off between
   uniformity, generality, extensibility, decentralization, compactness
   and processing speed for a data format.  It is our opinion that every
   widely used existing format occupy a different position in the
   solution space for formats, hence this new design.  It is also our
   opinion that most of those existing formats constitute an optimal
   solution for their specific use case, either in a absolute sense, or
   at least at the time of their design.  But the ever-changing field of
   IT now faces new challenges that call for a new approach.

   In particular, whereas the previous trend for Internet and Web
   standards and programming tools has been to create human-readable
   syntaxes for data and protocols, the advent of technologies like
   protocol buffers [protobuf], Thrift [Thrift], the various binary
   serializations for JSON like Avro [Avro] or Smile [Smile], or the
   binary HTTP/2.0 [HTTP2] currently in development seem to indicate
   that the time is ripe for a generalized use of binary, reserved until
   now for the low-level protocols.  The lessons about flexibility
   learnt in the previous switch from binary to plain text can now be
   applied to efficient binary syntaxes.

1.1.1.  Definitions

   By uniformity, we mean the property of a syntax that can be parsed
   even by an application that doesn't understand the semantics of every
   part of the processed data.  Of course, almost all syntaxes that
   feature uniformity contain a limited number of non uniform elements.
   Also, uniformity really only has value in the face of extension, as a
   fixed syntax doesn't need uniformity (it only makes the
   implementation simpler).

   Almost all extensible syntaxes have their extensible part uniform to
   a great degree.  In this specification, uniformity is hence evaluated
   on two criteria: first, the number of non uniform elements (and,
   incidentally, their diversity), second, the fact that the uniformity
   of the extensible part is not a limitation to the users (i.e. that
   the temptation to extend the language in a non-uniform way is as
   absent as possible).

   A good counter-example is found in most programming languages.
   Adding a new branching construct cannot be done in a terse way
   without modifying the underlying implementation.  Such a construct
   either cannot be defined by user code (because of evaluation rules)
   or can in a terribly verbose and inconvenient way (with lot of

Thierry                 Expires February 2, 2014                [Page 4]
Internet-Draft                    BULK1                      August 2013

   boilerplate code).  Notable exceptions to this limitation of
   programming languages are Lisp, Haskell and stack programming
   languages.

   On the other hand, a stack programming language is the canonical
   example of a non-uniform language.  Each operator takes a number of
   operands from the stack.  Not knowing the arity of an operator makes
   it impossible to continue parsing, even when its evaluation was
   optional to the final processing.  In the design space, stack
   programming languages completely sacrifice uniformity to achieve one
   the highest combination of extensibility, compactness and speed of
   processing.

   By generality, we mean the ability of a syntax to lend itself to
   describe any kind of data with a reasonable (or better yet, high)
   level of compactness and simplicity.  For example, although both
   arrays and linked lists could be considered very general as they are
   both able to store any kind of data, they actually are at the
   respective cost of complexity (arrays need the embedding of data
   structure in the data or in the processing logic) and size (in-memory
   linked lists can waste as much as half or two third of the space for
   the overhead of the data structure).

   By decentralization, we mean the ability to extend the syntax in a
   way that avoid naming collisions without the use of a central
   registry.  Note that the DNS is NOT decentralized in this sense, it
   is distributed (it cannot work without its root servers and not even
   without prior knowledge of their location).

1.1.2.  State of the art

   Uniformity, generality and extensibility are usually highly-valued
   traits in formats design.  Programming languages obviouly feature
   them foremost, although their generality usually stops at what they
   are supposed to express: procedures.  Most of them are ill-suited to
   represent arbitrary data, but notable exceptions include Lisp (where
   "code is data") and Javascript, from which a subset has been
   extracted to exchange data, JSON, which has seen a tremendous success
   for this purpose.  JSON may lack in generality and compactness, but
   its design makes its parsing really straightforward and fast.  All of
   them, though, lack decentralization.  Some of them make it possible
   to extend them in a relatively decentralized way if some discipline
   is followed (for example, by naming modules after domaine names), but
   the discipline is not mandatory.

   The SGML/XML family of formats also feature these traits and actually
   fare much better than programming languages on the three fronts.  XML
   namespaces also make it fairly decentralized and there have been

Thierry                 Expires February 2, 2014                [Page 5]
Internet-Draft                    BULK1                      August 2013

   attempts at making it compact (Efficient XML Interchange).

   But all the previously cited formats clearly lack compactness,
   although just applying standard compression techniques would
   sacrifice only very little processing time to gain huge size
   reductions on most of their intended use cases.

   So-called binary formats pretty much exhibit the opposite trade-offs.
   Most of them are not uniform to achieve better compactness.  Some are
   specifically designed for a great generality, but many lack
   extensibility.  When they are extensible, it's never in a
   decentralized way, again for reasons that have to do with
   compactness.  They are usually extremely fast to parse.

   Actually, many binary formats are not so much formats but formats
   frameworks, and exclude extensibility by design.  For each use case,
   an IDL compiler creates a brand new format that is essentially
   incompatible with all other formats created by the same compiler.  If
   the IDL compiler and framework are correctly designed, such a format
   usually represent an optimum in compactness and speed of processing,
   as the compiler can also automatically generate an ad-hoc optimized
   parser.

1.2.  Format overview

   A BULK stream is a stream of 8-bit bytes, in big-endian order.
   Parsing a BULK stream yields a sequence of expressions, which can be
   either atoms or forms, composed of atoms.  The syntax of forms is
   entirely uniform, without a single exception: a starting byte marker,
   a sequence of atoms and an ending byte marker.  Among atoms, only nil
   (the null byte), arrays, words and fixed-sized integers have a
   special syntax, for efficiency purposes.  Even booleans and floating-
   point numbers follow the uniform syntax that every other expression
   follow.

   Non uniform atoms start with a marker byte, followed by a static or
   dynamic number of bytes, depending on the type (none for nil, static
   for words and fixed-size integers, dynamic for arrays).

   Any other atom is a reference, which consists of a namespace marker
   (in most of the cases, a single byte) followed by a identifier within
   this namespace (a single byte).  All in all, a very little sacrifice
   is made in compactness for the benefit of a very simple syntax: apart
   from nil, nothing is smaller than 2 bytes, and as most forms involve
   a reference followed by some content, a form is 4 bytes + its
   content.

   A namespace marker in a BULK stream is associated by either of two

Thierry                 Expires February 2, 2014                [Page 6]
Internet-Draft                    BULK1                      August 2013

   simple forms to a namespace identified by a UUID, thus ensuring
   decentralized extension.  One of the two forms declares that the
   stream can be processed even if the application doesn't reckognize
   the namespace.  Parsing remains possible thanks to the uniform
   syntax.

1.3.  Conventions and Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [RFC2119].

   Literal numerical values are provided in decimal or hexadecimal as
   appropriate.  Hexadecimal literals are prefixed with "0x" to
   distinguish them from decimal literals.

   In the grammar, types are named by a capitalized word, like "Foo".
   The text representation of the BULK stream uses mnemonics for some
   bytes sequences.  Mnemonics are series of characters, excluding all
   capital letters and white space, like "this-is-one-mnemonic" or
   "what-the-%S.!?#-is-that?".  They are always separated by white
   space.

   In the grammar, a shape is a pattern of bytes, following the rules of
   the text representation for a BULK stream.  Apart from mnemonics and
   types, a shape can contain:

   o  a fixed byte, represented by its value as an unsigned integer
      (i.e. 0x3F)

   o  an arbitrary sequence of a fixed number of bytes, represented by
      the number of bytes in decimal immediately followed by a B
      uppercase letter (i.e. 4B)

   An arbitrary byte or sequence of bytes can be preceded by a name to
   make its description clearer.  The name is a series of character
   between { and } immediately followed by a semi-colon (i.e.
   {quux}:4B).

2.  BULK syntax

   A BULK stream is a sequence of 8-bit bytes.  Bits and bytes are in
   big-endian order.  The result of parsing a BULK stream is a sequence
   of abstract data, called the abstract yield.  BULK parsing is
   injective: a BULK stream has only one abstract yield, but different
   BULK streams can have the same abstract yield.

   A processing application is not expected to actually produce the

Thierry                 Expires February 2, 2014                [Page 7]
Internet-Draft                    BULK1                      August 2013

   abstract yield, but an adaptation of the abstract yield to its own
   implementation, called the concrete yield.  Also, some expressions in
   a BULK stream may have the semantics of a transformation of the
   abstract yield.  A processing application may thus not produce or
   retain the concrete yield but the result of its transformation.  This
   specification deals mainly with the byte sequence and the abstract
   yield and occasionnally provide guidelines about the concrete yield.

   The abstract yield is a sequence of expressions.  Expressions can be
   atoms or forms.  Forms are sequences of expressions.  If a byte
   sequence is parsed as an expression, this byte sequence is said to
   denote this expression.

2.1.  Parsing algorithm

   The parser operates with a context, which is a sequence of
   expressions.  Each time an expression is parsed, it is appended at
   the end of the context.  The initial context is the abstract yield.

   At the beginning of a BULK stream and after having consumed the byte
   sequence denoting a complete expression, the parser is at the
   dispatch stage.  At this stage, the next byte is a marker byte, which
   tells the parser what kind of expression comes next (the marker byte
   is the first byte of the sequence that denotes an expression).

   The 0x1 and 0x2 marker bytes are special cases.  When the parser
   reads 0x1, it immediately appends an empty sequence to the current
   context.  This sequence becomes the new context.  This new context
   has the previous context as parent.  Then the parser returns to its
   dispatch stage.  When the parser reads 0x2, it appends nothing to the
   context, but instead the parent of the current context becomes the
   new context and the parser returns to the dispatch stage.  Thus it is
   a parsing error to read 0x2 when the context is the abstract yield.

   Whenever a parsing error is encountered, parsing of the BULK stream
   MUST stop.

2.2.  Forms

2.2.1.  starting marker byte

   value  0x1

   mnemonic  (

Thierry                 Expires February 2, 2014                [Page 8]
Internet-Draft                    BULK1                      August 2013

2.2.2.  ending marker byte

   value  0x2

   mnemonic  )

2.3.  Atoms

2.3.1.  nil

   marker  0x0 (mnemonic: nil)

   shape  nil

   Apart from being a possible short marker value, the fact that the 0x0
   byte represents a valid atom means that a sequence of null bytes is a
   valid part of a BULK stream, thus making the format less fragile.  In
   a network communication, nil atoms can also be sent to keep the
   channel open.

2.3.2.  Array

   marker  0x3 (mnemonic: #)

   shape  # Int ...

   Arrays have a special parsing rule.  After consuming the marker byte,
   the parser returns to the dispatch stage.  It is a parser error if
   the parsed expression is not of type Int or if its value cannot be
   reckognized.  This integer is not added to any context, but the
   parser consumes as many bytes as this integer and they constitute the
   content of this array.

   Type: Array

2.3.3.  32 bits word

   marker  0x4 (mnemonic: word32)

   shape  word32 4B

   Types: Word, Word32

2.3.4.  64 bits word

Thierry                 Expires February 2, 2014                [Page 9]
Internet-Draft                    BULK1                      August 2013

   marker  0x5 (mnemonic: word64)

   shape  word64 8B

   Types: Word, Word64

2.3.5.  128 bits word

   marker  0x6 (mnemonic: word128)

   shape  word128 16B

   Types: Word, Word128

2.3.6.  Unsigned 8 bits integer

   marker  0x7 (mnemonic: int8)

   shape  int8 1B

   The value of this integer is the value of its contained byte.

   Type: Number, Int

2.3.7.  Unsigned 32, 64 and 128 bits integer

   marker  0x8 (mnemonic: uint)

   shape  uint Word

   The value of this integer is the value of its contained word.

   Type: Number, Int

2.3.8.  Signed 32, 64 and 128 bits integer

   marker  0x9 (mnemonic: sint)

   shape  sint Word

   The value of its contained word is the value of this integer in
   two's-complement notation.

   Type: Number, Int

Thierry                 Expires February 2, 2014               [Page 10]
Internet-Draft                    BULK1                      August 2013

2.3.9.  Reserved marker bytes

   Marker bytes 0xA-0xF are reserved for future major versions of BULK.
   It is a parser error if a BULK stream with major version 1 contains
   such a marker byte.

2.3.10.  Reference

   marker  0x10-0xFF

   shape  {ns}:1B {name}:1B

   The {ns} byte is a value associated with a namespace.  Values 0x10-
   0x1F are reserved for namespaces defined by BULK specifications.
   Greater values can be associated with namespaces identified by a
   UUID.

   The {name} byte is the name within the namespace.  Vocabularies with
   more than 256 names thus need to be spread accross several
   namespaces.

   The specification of a namespace SHOULD include a mnemonic for the
   namespace and for each defined name.  When descriptions use several
   namespaces, the mnemonic of a reference SHOULD be the concatenation
   of the namespace mnemonic, ":" and the name mnemonic.  For example,
   the fp name in namespace math becomes math:fp.

2.3.10.1.  Speciale case

   References have a special parsing rule.  In case a BULK stream needs
   an important number of namespaces, if the marker byte is 0xFF, the
   parser continues to read bytes until it finds a byte different than
   0xFF.  The value of this sequence of bytes is the value associated
   with a namespace.  For example, the reference denoted by the bytes
   0xFF 0xFF 0x8C 0x1A is the name 26 in the namespace associated with
   16777100.

3.  Standard namespaces

   Standard namespaces have a fixed namespace value and are not
   identified by a UUID.

3.1.  BULK core namespace

Thierry                 Expires February 2, 2014               [Page 11]
Internet-Draft                    BULK1                      August 2013

   marker  0x10 (mnemonic: bulk)

3.1.1.  Version

   name  0x1 (mnemonic: version)

   shape  ( version {major}:Int {minor}:Int )

   This form MUST be the first in a BULK stream.  It is a parsing error
   otherwise.  This specification defines BULK 1.0.  When writing a BULK
   stream, a processing application MUST denote {major} and {minor} by
   the smallest byte sequence possible.

   Two BULK versions with the same major version MUST share the same
   parsing rules and the same definitions of marker bytes.  Changing the
   syntax or semantics of existing marker bytes and using marker bytes
   in the reserved interval warrants a new major version.  Changing the
   syntax or semantics of existing names in standard namespaces also.

   Adding standard namespaces or adding names in existing standard
   namespaces warrants a new minor version.

3.1.2.  Required namespace

   name  0x2 (mnemonic: ns)

   shape  ( ns {mark}:Int {uuid}:Word128 )

   This associates the namespace identified by {uuid} to the value
   {mark}.  It is a parsing error if a processing application doesn't
   reckognise the designated namespace.

3.1.3.  Optional namespace

   name  0x3 (mnemonic: ns*)

   shape  ( ns* {mark}:Int {uuid}:Word128 )

   This associates the namespace identified by {uuid} to the value
   {mark}.

3.1.4.  Strings encoding

   name  0x4 (mnemonic: stringenc)

Thierry                 Expires February 2, 2014               [Page 12]
Internet-Draft                    BULK1                      August 2013

   shape  ( stringenc {enc}:Encoding )

   This tells the processing application that all following expressions
   in the current context that are understood by the application as
   character strings will be encoded with the encoding designated by
   {enc}.

   As the abstract yield doesn't contains strings but expressions that
   will be used as strings by the application, it is not a parsing error
   if the application doesn't reckognize {enc}.  In this situation, it
   is a parsing error when the application actually needs to decode a
   byte sequence as a string.  It is not a parsing error when a
   processing application only transmits a byte sequence encoding a
   string, if it can accurately convey the encoding to the receiving
   application.

3.1.5.  IANA registered character set

   name  0x5 (mnemonic: iana-charset)

   shape  ( iana-charset {id}:Int )

   This designates the string encoding registered among the IANA
   Character Sets [IANA-Charsets] whose MIBenum is {id}.

   Type: Encoding.

3.1.6.  Windows code page

   name  0x6 (mnemonic: code-page)

   shape  ( code-page {id}:Int )

   This designates the string encoding among Windows code pages whose
   identifier is {id}.

   Type: Encoding.

3.1.7.  true

   name  0x7 (mnemonic: true)

   shape  true

   Type: Boolean.

Thierry                 Expires February 2, 2014               [Page 13]
Internet-Draft                    BULK1                      August 2013

3.1.8.  false

   name  0x8 (mnemonic: false)

   shape  false

   Type: Boolean.

3.2.  arithmetic namespace

   marker  0x11 (mnemonic: math)

   A processing application must reckognize the type of all expressions
   defined in this specification that have the type Number, but an
   application MAY consider a number as having an unknown value if it
   has no adequate data type to store it.

3.2.1.  fraction

   name  0x1 (mnemonic: frac)

   shape  ( frac {num}:Int {div}:Int )

   This is the number {num}/{div}.

   Type: Number.

3.2.2.  Arbitrary precision signed integer

   name  0x2 (mnemonic: bigint)

   shape  ( bigint {bits}:Array )

   The bits contained in {bits} is the value of this integer in two's-
   complement notation.

   Type: Number, Int.

3.2.3.  Binary floating-point number

   name  0x3 (mnemonic: binary)

   shape  ( binary {bits}:Word )

   shape  ( binary {bits}:Array )

   This is a floating-point number expressed in IEEE 754-2008 binary
   interchange format.  If {bits} is an Array, the size of its contents

Thierry                 Expires February 2, 2014               [Page 14]
Internet-Draft                    BULK1                      August 2013

   must be a multiple of 32 bits, as per IEEE 754-2008 rules.

   Type: Number.

3.2.4.  Decimal floating-point number

   name  0x4 (mnemonic: decimal)

   shape  ( decimal {bits}:Word )

   shape  ( decimal {bits}:Array )

   This is a floating-point number expressed in IEEE 754-2008 decimal
   interchange format.  If {bits} is an Array, the size of its contents
   must be a multiple of 32 bits, as per IEEE 754-2008 rules.

   Type: Number.

4.  Extension namespaces

   Extension namespaces are defined with an identifier UUID, to be
   associated by a ns or ns* form.

4.1.  Official namespaces

   Extension namespaces defined as part of the official BULK suite MUST
   be identified by a v5 UUID.  The namespace UUID used to generate it
   MUST be <urn:uuid:abaddeed-face-11e2-9605-74de2b4102f1>.  The name
   used to generate it SHOULD be a URI designating the described
   vocabulary when one exists.

4.2.  User-defined namespaces

   User-defined namespaces are actually no different than official
   namespaces, apart from the choice of UUID.

5.  Profiles

   A profile is a sequence of expressions used as the initial abstract
   yield by a processing application.  With respect to the BULK stream,
   the profile is an out-of-band information, usually implicit.

   A processing application doesn't need to include the profile in the
   concrete yield, as long as the semantics of the abstract yield are
   maintained.

   The same BULK stream might be processed with different profiles.

Thierry                 Expires February 2, 2014               [Page 15]
Internet-Draft                    BULK1                      August 2013

   A processing application MUST NOT deduce the profile from the content
   of a BULK stream.

5.1.  Profile redundancy

   A processing application should only rely on the use of a profile
   when it is a safe assumption that the profile is known, for example
   within a communication where the protocol dictates the profile.

   In particular, long-term storage of a BULK stream should preserve
   profile information, for example with a media type that dictates the
   profile.

   Otherwise, an application creating a BULK stream SHOULD include the
   profile after the version form.  For this reason, the expressions in
   a profile SHOULD have idempotent semantics.

5.2.  Standard profile

   This specification defines the default profile that a processing
   application MUST use when it is not using a specific profile:

   ( bulk:stringenc 106 )

   This means that the default string encoding in a BULK stream is
   UTF-8.

6.  Security Considerations

6.1.  Parsing

   Parsing a BULK stream is designed to be free of side-effects for the
   processing application, apart from storing the parsed results.

   Arrays in BULK carry their size, so as for the application to know in
   advance the size of the data to read and store, thus making it easier
   to build robust code.  A malicious software, however, may announce an
   array with a size choosen to get an application to exhaust its
   available memory.  When a BULK stream has been completely received,
   an array bigger than the remaining data SHOULD trigger an error.
   When a BULK stream's size is not known in advance, the application
   SHOULD use a growable data structure.

6.2.  Forwarding

   When a processing application forwards all or part of the data in a
   BULK stream to another application, care must be taken if part of the
   forwarded data was not entirely reckognized, as it could be used by

Thierry                 Expires February 2, 2014               [Page 16]
Internet-Draft                    BULK1                      August 2013

   an attacker to benefit from the authority the forwarding application
   has on the recipient of the data.

7.  IANA Considerations

   This specification defines a new media type, application/bulk.  Here
   are the informations for its registration to IANA:

   Type name  application

   Subtype name  bulk

   Required parameters  none

   Optional parameters  none

   Encoding considerations  none, content is self-describing

   Security considerations  cf. Section 6

   Interoperability considerations  the constraint to start any BULK
      stream with a version form has the side-effect that classes of
      BULK streams can be identified by a sequence of bytes acting as
      "magic number":

      0x011001  any BULK stream

      0x01100107  a BULK stream of any major version beneath 256

      0x0110010701  a BULK stream of major version 1

      0x0110010701070202  a BULK stream of version 1.2

   Published specification  this document

   Applications that use this media type  none so far

   Fragment identifier considerations  this specification defines no
      semantics for addressing the data with a fragment identifier; a
      future specification could define fragment identifier syntaxes to
      address the content by byte offset or the parsed results by their
      number in the yielded sequence

   Additional information  a future specification may define a naming
      convention for media types based on bulk with a +bulk suffix, as
      for XML with +xml

Thierry                 Expires February 2, 2014               [Page 17]
Internet-Draft                    BULK1                      August 2013

8.  Acknowledgements

   The original author of this specification read Erik Naggum's famous
   rant about XML [1] several years before, and it may well have
   unconsciouly influenced this design.  He happened to stumble upon it
   again while writing the earliest draft of this specification and it
   struck him how much it embodies Erik's ideas.  In any case, this
   format is dedicated to Erik.

9.  References

9.1.  Normative References

   [IANA-Charsets]  "IANA Charset Registry (archived at):",
                    <http://www.iana.org/assignments/character-sets>.

   [RFC2119]        Bradner, S., "Key words for use in RFCs to Indicate
                    Requirement Levels", BCP 14, RFC 2119, March 1997.

9.2.  Informative references

   [Avro]           Cutting, D., "Apache Avro[TM] 1.7.4 Specification",
                    February 2013,
                    <http://avro.apache.org/docs/1.7.4/spec.html>.

   [HTTP2]          Belshe, M., Peon, R., Thomson, M., Ed., and A.
                    Melnikov, Ed., "Hypertext Transfer Protocol version
                    2.0", draft-ietf-httpbis-http2-04 (work in
                    progress), July 2013.

   [Smile]          Saloranta, T., "Smile Data Format", September 2010,
                    <http://wiki.fasterxml.com/SmileFormat>.

   [Thrift]         Slee, M., Agarwal, A., and M. Kwiatkowski, "Thrift:
                    Scalable Cross-Language Services Implementation",
                    April 2007, <http://thrift.apache.org/static/files/
                    thrift-20070401.pdf>.

   [protobuf]       "Protocol Buffers", July 2008,
                    <https://developers.google.com/protocol-buffers/>.

URIs

   [1]  <http://www.schnada.de/grapt/eriknaggum-xmlrant.html>

Thierry                 Expires February 2, 2014               [Page 18]
Internet-Draft                    BULK1                      August 2013

Author's Address

   Pierre Thierry
   Thierry Technologies

   EMail: pierre@nothos.net

Thierry                 Expires February 2, 2014               [Page 19]