INTERNET-DRAFT                                      Greg Hudson
       Expires: October 22, 1999                       ghudson@mit.edu
                                                                   MIT
       
                   Simple Protocol Application Data Encoding
                           draft-hudson-spade-00.txt
       
       1. Status of this Memo
       
       This document is an Internet-Draft and is in full
       conformance with all provisions of Section 10 of RFC2026.
       
       Internet-Drafts are working documents of the Internet
       Engineering Task Force (IETF), its areas, and its working
       groups.  Note that other groups may also distribute working
       documents as Internet-Drafts.
       
       Internet-Drafts are draft documents valid for a maximum of
       six months and may be updated, replaced, or obsoleted by
       other documents at any time.  It is inappropriate to use
       Internet-Drafts as reference material or to cite them other
       than as "work in progress."
       
       The list of current Internet-Drafts can be accessed at
       http://www.ietf.org/ietf/1id-abstracts.txt
       
       The list of Internet-Draft Shadow Directories can be
       accessed at http://www.ietf.org/shadow.html.
       
       Please send comments to ghudson@mit.edu.
       
       2. Abstract
       
       This document describes a simple scheme for encoding network
       protocol data, and a simple notation for describing protocol
       data elements.  All encodings are self-terminating (you know
       when you've reached the end) and assume that the decoder
       knows what type of protocol element it is expecting.
       
       3. Encoding
       
       This encoding scheme uses the ASCII translation of
       characters into bytes except when otherwise noted.  Protocol
       elements are encoded as follows:
       
       An integer: a sequence of decimal digits followed by a
       colon.  For instance, the number 27 encodes as "27:".
       Negative integers are preceded by a minus sign, so -27
       encodes as "-27:".  Leading zeroes are not allowed (so that
       each integer has a unique encoding).
       
       A byte string: an integer giving the length of the string,
       followed by the bytes of the string itself.  For instance,
       the string "foo" encodes as "3:foo".  A string of wide
       characters must be first encoded as a byte string (using
       either 16-bit character values or UTF-8, for instance); how
       that is done is up to the protocol.
       
       A symbol: a sequence of letters, numbers, and dashes,
       beginning with a letter, followed by a colon.  Case is
       significant.  For instance, the symbol "foo" encodes as
       "foo:".
       
       A list of <type>: an integer giving the number of elements in
       the list, followed by the elements of the list.  For instance,
       the list of strings "a", "b", and "c" encodes as
       "3:1:a1:b1:c".
       
       A structure: a collection of dissimilar elements can simply
       be concatenated together.  For instance, a structure
       containing the number 3 and the byte string "a" encodes as
       "3:1:a".
       
       A union: a symbol giving the type of element, an integer
       giving the length of the encoding of the element's data, and
       the data itself.  For instance, an element of type "foo"
       with the same data as in the structure example above would
       be encoded as "foo:5:3:1:a".  If there is no data to be
       encoded, a data length of 0 should be given, e.g. "bar:0:".
       
       4. Notation
       
       This notation gives a scheme for describing protocol element
       types and giving them names for the purpose of semantic
       descriptions.
       
       A variable declaration associates a name to be used in
       semantic descriptions with a type.  Variable names are valid
       symbols beginning with a lowercase letter.  Variable
       declarations end with a line break, and are written as
       follows:
       
       An integer: "Integer <name>"
       A byte string: "String <name>"
       A symbol: "Symbol <name>"
       A list of <type>: "List[<type>] <name>"
       A structure named <structurename>: "<structurename> <name>"
       A union named <unionname>: "<unionname> <name>"
       
       Structure and union names are valid symbols beginning with a
       capital letter.  A structure definition is written as:
       
                structure <structurename> {
                        <variable declaration>
                        .
                        .
                        .
                }
       
       Unions are defined as:
       
                union <unionname> {
                        <symbol>: <variable declaration>
                        .
                        .
                        .
                }
       
       As a special case, if there is no data for a particular
       union tag, "Null" can be written in place of a variable
       declaration.
       
       Here is an example of two structure definitions which might
       be used to describe a mail message:
       
                structure Header {
                        String name
                        String value
                }
       
                structure Message {
                        List[Header] headers
                        String body
                }
       
       Here is an example of a union definition which might be used
       together with the above structure definitions to describe a
       command set:
       
                union Command {
                        send: Message m
                        help: Null
                        quit: Null
                }
       
       A quit command would be encoded as "quit:0:".  If I have a
       message with two headers, one with name "From" and value
       "Greg" and another with name "To" and value "Bob", and the
       message body is "Test", then I would encode a command to
       send this message as:
       
                send:19:2:4:From4:Greg2:To3:Bob4:Test
       
       5. Rationale
       
       The primary goal of this encoding scheme is simplicity.  For
       want of a simple encoding scheme, protocols have been
       turning to ASN.1's basic encoding rules, which are highly
       complicated and which have presented a barrier to
       implementation in practice.
       
       Two secondary goals of this encoding scheme are human
       readability and space efficiency.  These goals are of course
       at odds; numbers could be encoded more compactly by using
       more than ten values per byte, for instance, at the expense
       of making it more difficult to examine ASCII translations of
       protocol data.
       
       The tagged union encoding provides easy extensibility in
       most protocols.  A protocol can find the end of the encoding
       of a tagged union element even if it doesn't know the data
       types for the individual tags.
       
       This encoding does not include a length field for structures
       or an overall length field for lists.  Thus, it is
       impossible to skip to the end of a structure or list without
       decoding it.  This decision was a tradeoff; it simplifies
       encoding and uses space more efficiently in return for
       making certain decoding situations more complicated.
       
       6. Security Considerations
       
       For maximum generality, this encoding scheme places no
       limits on the length of any data type.  This could lead to
       denial of service attacks against implementations of
       protocols using this encoding ("here follows a string of
       length two gazillion").  It does not seem appropriate to
       choose limits in the wire encoding to prevent this sort of
       attack, so guarding against these attacks will have to be
       the responsibility of particular protocols or their
       implementations.